连接
设a1,a2为两个dataframe,二者中存在相同的键值,两个对象连接的方式有下面几种:
(1)内连接,pd.merge(a1, a2, on='key')(2)左连接,pd.merge(a1, a2, on='key', how='left')
(3)右连接,pd.merge(a1, a2, on='key', how='right')
(4)外连接, pd.merge(a1, a2, on='key', how='outer')
In [7]: a=pd.DataFrame({"a":[3,2,4],'b':[1,3,5],'c':[9,3,1]})
In [8]: a
Out[8]:
a b c
0 3 1 9
1 2 3 3
2 4 5 1
In [9]: b=pd.DataFrame({"c":[1,3,2],"d":[6,4,1]})
In [10]: b
Out[10]:
c d
0 1 6
1 3 4
2 2 1
In [12]: pd.merge(a,b,on="c")#内连接
Out[12]:
a b c d
0 2 3 3 4
1 4 5 1 6
In [13]: pd.merge(a,b,on="c",how="left")#左外连接
Out[13]:
a b c d
0 3 1 9 NaN
1 2 3 3 4.0
2 4 5 1 6.0
In [14]: pd.merge(a,b,on="c",how="right")#右外连接
Out[14]:
a b c d
0 2.0 3.0 3 4
1 4.0 5.0 1 6
2 NaN NaN 2 1
In [15]: pd.merge(a,b,on="c",how="outer")#外连接
Out[15]:
a b c d
0 3.0 1.0 9 NaN
1 2.0 3.0 3 4.0
2 4.0 5.0 1 6.0
3 NaN NaN 2 1.0
分组
In [16]: d=pd.DataFrame({"a":['a','c','b','a'],"b":[1,2,1,2]})
In [17]: d
Out[17]:
a b
0 a 1
1 c 2
2 b 1
3 a 2
In [18]: d.groupby('a')#可以传入一个列表的参数(by),还可以设置参数axis
Out[18]: <pandas.core.groupby.DataFrameGroupBy object at 0x000001F4EE49FD68>
In [19]: d.groupby('a').size()
Out[19]:
a
a 2
b 1
c 1
dtype: int64
In [20]: for name,group in d.groupby('a'):
...: print("name:",name)
...: print(group)
...:
name: a
a b
0 a 1
3 a 2
name: b
a b
2 b 1
name: c
a b
1 c 2
In [21]: d.groupby('a').sum()
Out[21]:
b
a
a 3
b 1
c 2