pandas.concat

最新推荐文章于 2024-02-07 22:59:42 发布

原创最新推荐文章于 2024-02-07 22:59:42 发布 · 266 阅读

0 ·

CC 4.0 BY-SA版权

Pandas 专栏收录该内容

9 篇文章

订阅专栏

本文深入讲解了Pandas库中concat函数的使用方法，包括不同轴(axis)上的数据连接、索引处理、内外并集(join)操作及层级索引的创建。通过具体示例展示了如何合并Series和DataFrame，处理重复索引，并提供了官方文档链接。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

沿着指定的一个axis(轴)将pandas object连接在一起.

pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True)

axis = 0 y轴
axis = 1 x轴

axis可以选择用其他多个轴设置成一个逻辑(a particular axis with optional set logic along the other axes)

也可以在连接轴上添加一层层索引(hierarchical indexing. 如果在传入的axis值的labels是相同的或部分相同, 那么就很好用~

join: default outer 除axis的指定轴的其他轴做索引的集合交集操作或并集操作
- outer: 并集
- inner: 交集
join_axes : Index对象的list. 指定按那些轴做集合的交集

重点示例

concat在合并时将根据索引自动填充列

In [63]: df1                                                                    
Out[63]: 
   b    c    d
a             
1  o  NaN  NaN
2  i  NaN  NaN
3  u  NaN  NaN
4  y  NaN  NaN
1  t  NaN  NaN
2  r  NaN  NaN
3  e  NaN  NaN
4  w  NaN  NaN

In [64]: df2                                                                    
Out[64]: 
     d    f  g
a             
1  NaN  NaN  0
2  NaN  NaN  9
3  NaN  NaN  8
4  NaN  NaN  7

In [65]: pd.concat([df1, df2], axis=1, join_axes=[df1.index])                   
Out[65]: 
   b    c    d    d    f  g
a                          
1  o  NaN  NaN  NaN  NaN  0
2  i  NaN  NaN  NaN  NaN  9
3  u  NaN  NaN  NaN  NaN  8
4  y  NaN  NaN  NaN  NaN  7
1  t  NaN  NaN  NaN  NaN  0
2  r  NaN  NaN  NaN  NaN  9
3  e  NaN  NaN  NaN  NaN  8
4  w  NaN  NaN  NaN  NaN  7

In [67]: pd.concat([df1, df2], axis=1, join='inner')                            
Out[67]: 
   b    c    d    d    f  g
a                          
1  o  NaN  NaN  NaN  NaN  0
2  i  NaN  NaN  NaN  NaN  9
3  u  NaN  NaN  NaN  NaN  8
4  y  NaN  NaN  NaN  NaN  7
1  t  NaN  NaN  NaN  NaN  0
2  r  NaN  NaN  NaN  NaN  9
3  e  NaN  NaN  NaN  NaN  8
4  w  NaN  NaN  NaN  NaN  7

In [68]: pd.concat([df1, df2], axis=1, join='outer')                            
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

关于inner和outer的例子
官方文档: https://pandas.pydata.org/pandas-docs/version/0.20/merging.html

示例

合并Series

In [37]: s1 = pd.Series(['a', 'b'])                             

In [38]: s2 = pd.Series(['c', 'd'])                             

In [39]: s1                                                     
Out[39]: 
0    a
1    b
dtype: object

In [40]: s2                                                     
Out[40]: 
0    c
1    d
dtype: object

In [41]: pd.concat([s1, s2])                                    
Out[41]: 
0    a
1    b
0    c
1    d
dtype: object

In [42]: # 忽略已存在的索引, concat后重置索引                   

In [43]: pd.concat([s1, s2], ignore_index=True)                 
Out[43]: 
0    a
1    b
2    c
3    d
dtype: object

In [44]: # 在最外层(outermost level)添加一个层索引              

In [45]: pd.concat([s1, s2], keys=['s1', 's2'])                 
Out[45]: 
s1  0    a
    1    b
s2  0    c
    1    d
dtype: object

In [46]: # 给你创建的index keys一个标签                         

In [47]: pd.concat([s1, s2], keys=['s1', 's2'], 
    ...:         names=['Series名', '列id'])                    
Out[47]: 
Series名  列id
s1       0      a
         1      b
s2       0      c
         1      d
dtype: object

合并DataFrame

In [48]: # 用相同的列连接两个DataFrame                          

In [49]: df1 = pd.DataFrame([['a', 1], ['b', 2]], columns=['lett
    ...: er', 'number'])                                        

In [50]: df1                                                    
Out[50]: 
  letter  number
0      a       1
1      b       2

In [51]: df2 = pd.DataFrame([['c', 3], ['d', 4]], columns=['lett
    ...: er', 'number'])                                        

In [52]: df2                                                    
Out[52]: 
  letter  number
0      c       3
1      d       4

In [53]: pd.concat([df1, df2])                                  
Out[53]: 
  letter  number
0      a       1
1      b       2
0      c       3
1      d       4

In [54]: # 用部分相同的列连接DataFrame并且返回所有列.           

In [55]: # 没有相交的列会返回Na                                 

In [56]: # 没有相交的列会返回NaN                                

In [57]: df3 = pd.DataFrame([['c', 3, 'cat'],['d', 4, 'dog']], c
    ...: olumns=['letter', 'number', 'animal'])                 

In [58]: df3                                                    
Out[58]: 
  letter  number animal
0      c       3    cat
1      d       4    dog

In [59]: df1                                                    
Out[59]: 
  letter  number
0      a       1
1      b       2

In [61]: pd.concat([df1, df3])                                  
Out[61]: 
  animal letter  number
0    NaN      a       1
1    NaN      b       2
0    cat      c       3
1    dog      d       4

In [63]: df2                                                    
Out[63]: 
  letter  number
0      c       3
1      d       4

In [64]: df3                                                    
Out[64]: 
  letter  number animal
0      c       3    cat
1      d       4    dog

In [65]: pd.concat([df2, df3])                                  
Out[65]: 
  animal letter  number
0    NaN      c       3
1    NaN      d       4
0    cat      c       3
1    dog      d       4

# 沿x轴水平合并
In [71]: df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'geor
    ...: ge']],                    columns=['animal', 'name'])  

In [72]: df4                                                    
Out[72]: 
   animal    name
0    bird   polly
1  monkey  george

In [73]: df1                                                    
Out[73]: 
  letter  number
0      a       1
1      b       2

In [74]: pd.concat([df1, df4], axis=1)                          
Out[74]: 
  letter  number  animal    name
0      a       1    bird   polly
1      b       2  monkey  george

In [75]: # 阻止有相同index values的DataFrame进行合并         

In [76]: df5 = pd.DataFrame([1], index=['a'])                   

In [77]: df6 = pd.DataFrame([2], index=['a'])                   

In [78]: df5                                                    
Out[78]: 
   0
a  1

In [79]: df6                                                    
Out[79]: 
   0
a  2

In [80]: pd.concat([df5, df6], verify_integrity=True)  

ValueError: Indexes have overlapping values: ['a']