pandas中shift和diff函数关系

这篇博客探讨了pandas中shift和diff函数的使用。shift操作针对DataFrame,当行索引为日期时,行索引会移动而列数据保持不变;非日期索引时,列数据移动。diff函数则与shift有密切关系,详细说明了两者在处理数据移动上的区别。

通过?pandas.DataFrame.shift命令查看帮助文档

Signature: pandas.DataFrame.shift(self, periods=1, freq=None, axis=0)
Docstring:
Shift index by desired number of periods with an optional time freq
该函数主要的功能就是使数据框中的数据移动,若freq=None时,根据axis的设置,行索引数据保持不变,列索引数据可以在行上上下移动或在列上左右移动;若行索引为时间序列,则可以设置freq参数,根据periods和freq参数值组合,使行索引每次发生periods*freq偏移量滚动,列索引数据不会移动

①对于DataFrame的行索引是日期型,行索引发生移动,列索引数据不变

In [2]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame(np.arange(24).reshape(6,4),index=pd.date_range(start=
   ...: '20170101',periods=6),columns=['A','B','C','D'])
   ...: df
   ...:
Out[2]:
             A   B   C   D
2017-01-01   0   1   2   3
2017-01-02   4   5   6   7
2017-01-03   8   9  10  11
2017-01-04  12  13  14  15
2017-01-05  16  17  18  19
2017-01-06  20  21  22  23

In [3]: df.shift(2,axis=0,freq='2D')
Out[3]:
             A   B   C   D
2017-01-05   0   1   2   3
2017-01-06   4   5   6   7
2017-01-07   8   9  10  11
2017-01-08  12  13  14  15
2017-01-09  16  17  18  19
2017-01-10  20  21  22  23

In [4]: df.shift(2,axis=1,freq='2D')
Out[4]:
             A   B   C   D
2017-01-05   0   1   2   3
2017-01-06   4   5   6   7
2017-01-07   8   9  10  11
2017-01-08  12  13  14  15
2017-01-09  16  17  18  19
2017-01-10  20  21  22  23

In [5]: df.shift(2,freq='2D')
Out[5]:
             A   B   C   D
2017-01-05   0   1   2   3
2017-01-06   4   5   6   7
2017-01-07   8   9  10  11
2017-01-08  12  13  14  15
2017-01-09  16  17  18  19
2017-01-10  20  21  22  23
结论:对于时间索引而言,shift使时间索引发生移动,其他数据保存原样,且axis设置没有任何影响

②对于DataFrame行索引为非时间序列,行索引数据保持不变,列索引数据发生移动

In [6]: import pandas as pd
   ...: import numpy as np
   ...: df = pd.DataFrame(np.arange(24).reshape(6,4),index=['r1','r2','r3','r4'
   ...: ,'r5','r6'],columns=['A','B','C','D'])
   ...: df
   ...:
Out[6]:
     A   B   C   D
r1   0   1   2   3
r2   4   5   6   7
r3   8   9  10  11
r4  12  13  14  15
r5  16  17  18  19
r6  20  21  22  23

In [7]: df.shift(periods=2,axis=0)
Out[7]:
       A     B     C     D
r1   NaN   NaN   NaN   NaN
r2   NaN   NaN   NaN   NaN
r3   0.0   1.0   2.0   3.0
r4   4.0   5.0   6.0   7.0
r5   8.0   9.0  10.0  11.0
r6  12.0  13.0  14.0  15.0

In [8]: df.shift(periods=-2,axis=0)
Out[8]:
       A     B     C     D
r1   8.0   9.0  10.0  11.0
r2  12.0  13.0  14.0  15.0
r3  16.0  17.0  18.0  19.0
r4  20.0  21.0  22.0  23.0
r5   NaN   NaN   NaN   NaN
r6   NaN   NaN   NaN   NaN

In [9]: df.shift(periods=2,axis=1)
Out[9]:
     A   B     C     D
r1 NaN NaN   0.0   1.0
r2 NaN NaN   4.0   5.0
r3 NaN NaN   8.0   9.0
r4 NaN NaN  12.0  13.0
r5 NaN NaN  16.0  17.0
r6 NaN NaN  20.0  21.0

In [10]: df.shift(periods=-2,axis=1)
Out[10]:
       A     B   C   D
r1   2.0   3.0 NaN NaN
r2   6.0   7.0 NaN NaN
r3  10.0  11.0 NaN NaN
r4  14.0  15.0 NaN NaN
r5  18.0  19.0 NaN NaN
r6  22.0  23.0 NaN NaN
通过?pandas.DataFrame.diff命令查看帮助文档,发现和shift函数形式一样

Signature: pd.DataFrame.diff(self, periods=1, axis=0)
Docstring:
1st discrete difference of object

下面看看diff函数和shift函数之间的关系

In [13]: df.diff(periods=2,axis=0)
Out[13]:
      A    B    C    D
r1  NaN  NaN  NaN  NaN
r2  NaN  NaN  NaN  NaN
r3  8.0  8.0  8.0  8.0
r4  8.0  8.0  8.0  8.0
r5  8.0  8.0  8.0  8.0
r6  8.0  8.0  8.0  8.0

In [14]: df -df.diff(periods=2,axis=0)
Out[14]:
       A     B     C     D
r1   NaN   NaN   NaN   NaN
r2   NaN   NaN   NaN   NaN
r3   0.0   1.0   2.0   3.0
r4   4.0   5.0   6.0   7.0
r5   8.0   9.0  10.0  11.0
r6  12.0  13.0  14.0  15.0

In [15]: df.shift(periods=2,axis=0)
Out[15]:
       A     B     C     D
r1   NaN   NaN   NaN   NaN
r2   NaN   NaN   NaN   NaN
r3   0.0   1.0   2.0   3.0
r4   4.0   5.0   6.0   7.0
r5   8.0   9.0  10.0  11.0
r6  12.0  13.0  14.0  15.0



评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值