1.数据关联的几种方式-inner_join内连接、outer_join外连接、left_join左连接、righr_join右连接。

import pandas as pd
df4=pd.DataFrame(np.random.randint(2,9,(3,2)),index=['a','s','f'],columns=['A','B'])
print(df4)
out:
A B
a 6 6
s 4 4
f 7 8
df5=pd.DataFrame(np.random.randint(2,9,(2,2)),index=['a','b'],columns=['B','D'])
print(df5)
out:
B D
a 2 2
b 7 2
print(df4.join(df5,how='left',lsuffix='_left',rsuffix='_right'))
out:
A B_left B_right D
a 6 6 2.0 2.0
s 4 4 NaN NaN
f 7 8 NaN NaN
print(df4.join(df5,how='right',lsuffix='_left',rsuffix='_right'))
out:
A B_left B_right D
a 6.0 6.0 2 2
b NaN NaN 7 2
print(df4.join(df5,how='inner',lsuffix='_left',rsuffix='_right'))
out:
A B_left B_right D
a 6 6 2 2
print(df4.join(df5,how='outer',lsuffix='_left',rsuffix='_right'))
out:
A B_left B_right D
a 6.0 6.0 2.0 2.0
b NaN NaN 7.0 2.0
f 7.0 8.0 NaN NaN
s 4.0 4.0 NaN NaN
print(df4.append(df5))
out:
A B D
a 6.0 6 NaN
s 3.0 7 NaN
f 2.0 2 NaN
a NaN 7 7.0
b NaN 6 3.0
2.数据合并merge。
merge的参数
on:列名,join用来对齐的那一列的名字,用到这个参数的时候一定要保证左表和右表用来对齐的那一列都有相同的列名。
left_on:左表对齐的列,可以是列名,也可以是和dataframe同样长度的arrays。
right_on:右表对齐的列,可以是列名,也可以是和dataframe同样长度的arrays。
left_index/ right_index: 如果是True的haunted以index作为对齐的key
how:数据融合的方法,默认为inner。
sort:根据dataframe合并的keys按字典顺序排序,默认是,如果置false可以提高表现。
df5=pd.DataFrame({'data':[0,5,8,4,8,7,2],'key1':['b','b','a','c','a','a','b']})
df6=pd.DataFrame({'data':[0,9,7],'key2':['a','b','d']},index=list('cba'))
print(df5)
out:
data key1
0 0 b
1 5 b
2 8 a
3 4 c
4 8 a
5 7 a
6 2 b
print(df6)
out:
data key2
c 0 a
b 9 b
a 7 d
print(pd.merge(df5,df6,left_on='key1',right_on='key2',how='inner'))
out:
data_x key1 data_y key2
0 0 b 9 b
1 5 b 9 b
2 2 b 9 b
3 8 a 0 a
4 8 a 0 a
5 7 a 0 a
print(pd.merge(df5,df6,left_on='key1',right_on='key2',how='outer'))
out:
data_x key1 data_y key2
0 0.0 b 9.0 b
1 5.0 b 9.0 b
2 2.0 b 9.0 b
3 8.0 a 0.0 a
4 8.0 a 0.0 a
5 7.0 a 0.0 a
6 4.0 c NaN NaN
7 NaN NaN 7.0 d
#对key1进行排序。
print(df5.join(df6,on='key1',lsuffix='_left',sort=True))
out:
data_left key1 data key2
2 8 a 7 d
4 8 a 7 d
5 7 a 7 d
0 0 b 9 b
1 5 b 9 b
6 2 b 9 b
3 4 c 0 a
3.轴关联concat。
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False,
copy=True)
- 轴关联 – concat series
Axis = 0:将不同对象中的行连接起来,用列名关联,因为都只有一列,所以是将行追加的效果。 行索引会重复,可以使用ignore_index ,inner、outer的效果是一致的。
series1=pd.Series(np.arange(5),index=['a','b','c','d','e'])
series2=pd.Series(np.random.randn(5),index=['A','B','C','d','e'])
print(series1)
out:
a 0
b 1
c 2
d 3
e 4
print(series2)
out:
A -0.820341
B -0.969029
C -0.559098
d 0.904361
e 2.160740
dtype: float64
print(pd.concat([series1,series2]))
out:
a 0.000000
b 1.000000
c 2.000000
d 3.000000
e 4.000000
A -0.820341
B -0.969029
C -0.559098
d 0.904361
e 2.160740
dtype: float64
print(pd.concat([series1,series2],ignore_index=True))
out:
0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 -0.820341
6 -0.969029
7 -0.559098
8 0.904361
9 2.160740
dtype: float64
Axis = 1:将不同对象的列连接起来,以行索引进行关联,行索引不会重复,inner、outer的效果不一样。
series1=pd.Series(np.arange(5),index=['a','b','c','d','e'])
series2=pd.Series(np.random.randn(5),index=['A','B','C','d','e'])
print(series1)
out:
a 0
b 1
c 2
d 3
e 4
print(series2)
out:
A -0.820341
B -0.969029
C -0.559098
d 0.904361
e 2.160740
dtype: float64
print(pd.concat([series1,series2],axis=1,join='inner'))
out:
0 1
d 3 -1.663519
e 4 0.504052
print(pd.concat([series1,series2],axis=1,join='outer'))
out:
0 1
a 0.0 NaN
b 1.0 NaN
c 2.0 NaN
d 3.0 -1.663519
e 4.0 0.504052
A NaN 0.768003
B NaN 0.546443
C NaN 0.723390
- 轴关联--concat DataFrame
Axis = 0:将不同对象的行连接起来,用列名做join进行关联。
Axis = 1:将不同对象的列连接起来,用行索引进行关联。
import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(9).reshape(3,3),index=list("acb"),columns=['w','m','z',])
df1=pd.DataFrame(np.arange(9).reshape(3,3),index=list("abf"),columns=['w','k','h',])
print(df)
out:
w m z
a 0 1 2
c 3 4 5
b 6 7 8
print(df1)
out:
w k h
a 0 1 2
b 3 4 5
f 6 7 8
print(pd.concat([df,df1],join='inner',axis=0))
out:
w
a 0
c 3
b 6
a 0
b 3
f 6
print(pd.concat([df,df1],join='inner',axis=1))
out:
w m z w k h
a 0 1 2 0 1 2
b 6 7 8 3 4 5
print(pd.concat([df,df1],join='inner',axis=0,ignore_index=True))
out:
w
0 0
1 3
2 6
3 0
4 3
5 6
862

被折叠的 条评论
为什么被折叠?



