
数据分析方面,“连接”是高频出现的一个词。在python应用种,merge方法就是一种重要的连接方法。
merge主要功能在于合并数据集,通过left、right来连结字段。
1.merge默认按相同字段名合并,且取两个都有的。
import pandas as pd
df1 = pd.DataFrame({"name":["kate","herz","catherine","sally"],
"age":[25,28,39,35]})
df2 = pd.DataFrame({"name":["kate","herz","sally"],
"score":[70,60,90]})
pd.merge(df1,df2)

2.当左右连结字段名不相同时,使用left_on,right_on
df1 = pd.DataFrame({"name":["kate","herz","catherine","sally"],
"age":[25,28,39,35]})
df2 = pd.DataFrame({"call_name":["kate","herz","sally"],
"score":[70,60,90]})
pd.merge(df1,df2,left_on = "name",right_on = "call_name")

3.合并后,删除重复的列
df1 = pd.DataFrame({"name":["kate","herz","catherine","sally"],
"age":[25,28,39,35]})
df2 = pd.DataFrame({"call_name":["kate","herz","sally"],
"score":[70,60,90]})
pd.merge(df1,df2,left_on = "name",right_on = "call_name").drop("name",axis = 1)

4.参数how的使用
- inner 内连接,取交集
pd.merge(df1,df2,left_on = "name",right_on = "call_name",how ="inner")

- outer 外连接,取并集,并用nan填充
df3 = pd.DataFrame({"name":["kate","herz","sally","cristin"],"score":[70,60,90,30]})
pd.merge(df1,df3,on="name",how="outer")

- left 左连接,左侧取全部,右侧取部分
pd.merge(df1,df3,on = "name",how = "left")

- right 右链接,左侧取部分,右侧取全部
pd.merge(df1,df3,on = "name",how = "right")
