文档版本:0.20.3
这些例子是用python3.4写出来的。对于较早的python版本需要对代码做些相应的调整。
Pandas(pd)和Numpy(np)是唯一两个默认导入的包。其余的包会显示导入给新用户看。
若有翻译不当的地方,请多多指教。
这份文档中的例子都是从Stack-Overflow和Github中别人提问的比较经典的问题,作者从中进行提炼与总结。
Arithmetic
对多重索引执行算法需要进行广播
In [61]: cols = pd.MultiIndex.from_tuples([ (x,y) for x in ['A','B','C'] for y in ['O','I']])
In [62]: df = pd.DataFrame(np.random.randn(2,6),index=['n','m'],columns=cols); df
Out[62]:
A B C
O I O I O I
n 1.920906 -0.388231 -2.314394 0.665508 0.402562 0.399555
m -1.765956 0.850423 0.388054 0.992312 0.744086 -0.739776
In [63]: df = df.div(df['C'],level=1); df
Out[63]:
A B C
O I O I O I
n 4.771702 -0.971660 -5.749162 1.665625 1.0 1.0
m -2.373321 -1.149568 0.521518 -1.341367 1.0 1.0
切片
用xs函数对多重索引进行切片
In [64]: coords = [('AA','one'),('AA','six'),('BB','one'),('BB','two'),('BB','six')]
In [65]: index = pd.MultiIndex.from_tuples(coords)
In [66]: df = pd.DataFrame([11,22,33,44,55],index,['MyData']); df
Out[66]:
MyData
AA one 11
six 22
BB one 33
two 44
six 55
获取第一水平和第一个轴的交叉部分
In [67]: df.xs('BB',level=0,axis=0) #Note : level and axis are optional, and default to zero
Out[67]:
MyData
one 33
two 44
six 55
获取第二水平和第一个轴的交叉部分
In [68]: df.xs('six',level=1,axis=0)
Out[68]:
MyData
AA 22
BB 55
用xs函数对多重索引进行切片方法二
In [69]: index = list(itertools.product(['Ada','Quinn','Violet'],['Comp','Math','Sci']))
In [70]: headr = list(itertools.product(['Exams','Labs'],['I','II']))
In [71]: indx = pd.MultiIndex.from_tuples(index,names=['Student','Course'])
In [72]: cols = pd.MultiIndex.from_tuples(headr) #Notice these are un-named
In [73]: data = [[70+x+y+(x*y)%3 for x in range(4)] for y in range(9)]
In [74]: df = pd.DataFrame(data,indx,cols); df
Out[74]:
Exams Labs
I II I II
Student Course
Ada Comp 70 71 72 73
Math 71 73 75 74
Sci 72 75 75 75
Quinn Comp 73 74 75 76
Math 74 76 78 77
Sci 75 78 78 78
Violet Comp 76 77 78 79
Math 77 79 81 80
Sci 78 81 81 81
In [75]: All = slice(None)
In [76]: df.loc['Violet']
Out[76]:
Exams Labs
I II I II
Course
Comp 76 77 78 79
Math 77 79 81 80
Sci 78 81 81 81
In [77]: df.loc[(All,'Math'),All]
Out[77]:
Exams Labs
I II I II
Student Course
Ada Math 71 73 75 74
Quinn Math 74 76 78 77
Violet Math 77 79 81 80
In [78]: df.loc[(slice('Ada','Quinn'),'Math'),All]
Out[78]:
Exams Labs
I II I II
Student Course
Ada Math 71 73 75 74
Quinn Math 74 76 78 77
In [79]: df.loc[(All,'Math'),('Exams')]
Out[79]:
I II
Student Course
Ada Math 71 73
Quinn Math 74 76
Violet Math 77 79
In [80]: df.loc[(All,'Math'),(All,'II')]
Out[80]:
Exams Labs
II II
Student Course
Ada Math 73 74
Quinn Math 76 77
Violet Math 79 80
排序
在多重索引中用某一列进行排序
In [81]: df.sort_values(by=('Labs', 'II'), ascending=False)
Out[81]:
Exams Labs
I II I II
Student Course
Violet Sci 78 81 81 81
Math 77 79 81 80
Comp 76 77 78 79
Quinn Sci 75 78 78 78
Math 74 76 78 77
Comp 73 74 75 76
Ada Sci 72 75 75 75
Math 71 73 75 74
Comp 70 71 72 73