import numpy as np
import pandas as pd
from pandas import Series,DataFrame
df1 = DataFrame({'key':['b','b','a','c','a','a','b'],
'data1':range(7)})
df2 = DataFrame({'key':['a','b','d'],
'data2':range(3)})
print(df1)
data1 key
0 0 b
1 1 b
2 2 a
3 3 c
4 4 a
5 5 a
6 6 b
print(df2)
data2 key
0 0 a
1 1 b
2 2 d
一、默认情况下按相同列名合并
print(pd.merge(df1,df2))
data1 key data2
0 0 b 1
1 1 b 1
2 6 b 1
3 2 a 0
4 4 a 0
5 5 a 0
二、显示指定进行合并的列名
1.双方有公共列名
print(pd.merge(df1,df2,on='key'))
data1 key data2
0 0 b 1
1 1 b 1
2 6 b 1
3 2 a 0
4 4 a 0
5 5 a 0
2.双方无公共列名
df3 = DataFrame({'lkey':['b','b','a','c','a','a','b'],
'data1':range(7)})
df4 = DataFrame({'rkey':['a','b','d'],
'data2':range(3)})
print(pd.merge(df3,df4,left_on='lkey',right_on='rkey'))
data1 lkey data2 rkey
0 0 b 1 b
1 1 b 1 b
2 6 b 1 b
3 2 a 0 a
4 4 a 0 a
5 5 a 0 a
三、根据多个列名进行合并
只有多个列完全相同的行才进行合并
left = DataFrame({'key1':['foo','foo','bar'],
'key2':['one','two','one'],
'lval':[1,2,3]})
right = DataFrame({'key1':['foo','foo','bar','bar'],
'key2':['one','one','one','two'],
'rval':[4,5,6,7]})
print(pd.merge(left,right,on=['key1','key2']))
key1 key2 lval rval
0 foo one 1 4
1 foo one 1 5
2 bar one 3 6
四、指定合并的方式:inner、outer、left、right
print(pd.merge(df1,df2,how='outer'))
data1 key data2
0 0.0 b 1.0
1 1.0 b 1.0
2 6.0 b 1.0
3 2.0 a 0.0
4 4.0 a 0.0
5 5.0 a 0.0
6 3.0 c NaN
7 NaN d 2.0
五、处理重复列名
1.默认情况下
print(pd.merge(left,right,on='key1')) # key2是重复列名
key1 key2_x lval key2_y rval
0 foo one 1 one 4
1 foo one 1 one 5
2 foo two 2 one 4
3 foo two 2 one 5
4 bar one 3 one 6
5 bar one 3 two 7
2.使用shuffixes参数
print(pd.merge(left,right,on='key1',suffixes=['_left','_right']))
key1 key2_left lval key2_right rval
0 foo one 1 one 4
1 foo one 1 one 5
2 foo two 2 one 4
3 foo two 2 one 5
4 bar one 3 one 6
5 bar one 3 two 7