笔记——pandas学习3

本文介绍如何使用Python的Pandas库进行数据表的合并与连接,包括concat函数按行或列连接表,以及merge函数实现类似数据库的表联接操作。

合并来自多个表的数据

import pandas as pd
air_quality_no2=pd.read_csv("data/air_quality_long.csv",parse_dates=True)
#parse_dates参数:将csv中的时间字符串转换成日期格式
air_quality_no2=air_quality_no2[["date.utc","location","parameter","value"]]
air_quality_no2.head()
date.utclocationparametervalue
02019-06-18 06:00:00+00:00BETR801pm2518.0
12019-06-17 08:00:00+00:00BETR801pm256.5
22019-06-17 07:00:00+00:00BETR801pm2518.5
32019-06-17 06:00:00+00:00BETR801pm2516.0
42019-06-17 05:00:00+00:00BETR801pm257.5
air_quality_pm25=pd.read_csv("data/air_quality_pm25_long.csv",parse_dates=True)
air_quality_pm25.head()
citycountrydate.utclocationparametervalueunit
0AntwerpenBE2019-06-18 06:00:00+00:00BETR801pm2518.0µg/m³
1AntwerpenBE2019-06-17 08:00:00+00:00BETR801pm256.5µg/m³
2AntwerpenBE2019-06-17 07:00:00+00:00BETR801pm2518.5µg/m³
3AntwerpenBE2019-06-17 06:00:00+00:00BETR801pm2516.0µg/m³
4AntwerpenBE2019-06-17 05:00:00+00:00BETR801pm257.5µg/m³

1. concat()函数:

  1. 沿一个轴(行或列)执行多个表的串联操作
  2. 默认沿行进行拼接
  3. concat()的keys参数:提供了一个附加的(分层的)行索引
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ina28uDp-1595754064474)(attachment:image.png)]
air_quality=pd.concat([air_quality_pm25,air_quality_no2])
air_quality.head()
citycountrydate.utclocationparametervalueunit
0AntwerpenBE2019-06-18 06:00:00+00:00BETR801pm2518.0µg/m³
1AntwerpenBE2019-06-17 08:00:00+00:00BETR801pm256.5µg/m³
2AntwerpenBE2019-06-17 07:00:00+00:00BETR801pm2518.5µg/m³
3AntwerpenBE2019-06-17 06:00:00+00:00BETR801pm2516.0µg/m³
4AntwerpenBE2019-06-17 05:00:00+00:00BETR801pm257.5µg/m³
print("Shape of pm25:",air_quality_pm25.shape)
print("Shape of no2:",air_quality_no2.shape)
print("shape of quality:",air_quality.shape)
#3178 = 1110 + 2068行。
Shape of pm25: (1110, 7)
Shape of no2: (5272, 4)
shape of quality: (6382, 7)
air_quality=air_quality.sort_values("date.utc")
air_quality
citycountrydate.utclocationparametervalueunit
3500NaNNaN2019-04-09 01:00:00+00:00FR04014no224.4NaN
176NaNNaN2019-04-09 01:00:00+00:00BETR801pm2576.0NaN
3663NaNNaN2019-04-09 01:00:00+00:00BETR801no222.5NaN
5271NaNNaN2019-04-09 02:00:00+00:00London Westminsterno267.0NaN
1824NaNNaN2019-04-09 02:00:00+00:00London Westminsterpm2542.0NaN
........................
102LondonGB2019-06-20 23:00:00+00:00London Westminsterpm257.0µg/m³
178NaNNaN2019-06-20 23:00:00+00:00London Westminsterpm257.0NaN
177NaNNaN2019-06-21 00:00:00+00:00London Westminsterpm257.0NaN
1825NaNNaN2019-06-21 00:00:00+00:00FR04014no220.0NaN
101LondonGB2019-06-21 00:00:00+00:00London Westminsterpm257.0µg/m³

6382 rows × 7 columns

#concat的keys参数提供了一个方便的解决方案,添加了一个附加的(分层的)行索引
air_quality_=pd.concat([air_quality_pm25,air_quality_no2],keys=["PM25","NO2"])
air_quality_
citycountrydate.utclocationparametervalueunit
PM250AntwerpenBE2019-06-18 06:00:00+00:00BETR801pm2518.0µg/m³
1AntwerpenBE2019-06-17 08:00:00+00:00BETR801pm256.5µg/m³
2AntwerpenBE2019-06-17 07:00:00+00:00BETR801pm2518.5µg/m³
3AntwerpenBE2019-06-17 06:00:00+00:00BETR801pm2516.0µg/m³
4AntwerpenBE2019-06-17 05:00:00+00:00BETR801pm257.5µg/m³
...........................
NO25267NaNNaN2019-04-09 06:00:00+00:00London Westminsterno241.0NaN
5268NaNNaN2019-04-09 05:00:00+00:00London Westminsterno241.0NaN
5269NaNNaN2019-04-09 04:00:00+00:00London Westminsterno241.0NaN
5270NaNNaN2019-04-09 03:00:00+00:00London Westminsterno267.0NaN
5271NaNNaN2019-04-09 02:00:00+00:00London Westminsterno267.0NaN

6382 rows × 7 columns

2. reset_index():将任何级别的索引转换为列

air_quality.reset_index(level=0)
indexcitycountrydate.utclocationparametervalueunit
03500NaNNaN2019-04-09 01:00:00+00:00FR04014no224.4NaN
1176NaNNaN2019-04-09 01:00:00+00:00BETR801pm2576.0NaN
23663NaNNaN2019-04-09 01:00:00+00:00BETR801no222.5NaN
35271NaNNaN2019-04-09 02:00:00+00:00London Westminsterno267.0NaN
41824NaNNaN2019-04-09 02:00:00+00:00London Westminsterpm2542.0NaN
...........................
6377102LondonGB2019-06-20 23:00:00+00:00London Westminsterpm257.0µg/m³
6378178NaNNaN2019-06-20 23:00:00+00:00London Westminsterpm257.0NaN
6379177NaNNaN2019-06-21 00:00:00+00:00London Westminsterpm257.0NaN
63801825NaNNaN2019-06-21 00:00:00+00:00FR04014no220.0NaN
6381101LondonGB2019-06-21 00:00:00+00:00London Westminsterpm257.0µg/m³

6382 rows × 8 columns

merge():加入使用公共标识符表(自然连接)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9LXXyMGI-1595754064481)(attachment:image.png)]

station_coord=pd.read_csv("data/air_quality_stations.csv")
station_coord.head()
locationcoordinates.latitudecoordinates.longitude
0BELAL0151.236194.38522
1BELHB2351.170304.34100
2BELLD0151.109985.00486
3BELLD0251.120385.02155
4BELR83351.327664.36226
air_quality.head()
citycountrydate.utclocationparametervalueunit
3500NaNNaN2019-04-09 01:00:00+00:00FR04014no224.4NaN
176NaNNaN2019-04-09 01:00:00+00:00BETR801pm2576.0NaN
3663NaNNaN2019-04-09 01:00:00+00:00BETR801no222.5NaN
5271NaNNaN2019-04-09 02:00:00+00:00London Westminsterno267.0NaN
1824NaNNaN2019-04-09 02:00:00+00:00London Westminsterpm2542.0NaN
air_quality=pd.merge(air_quality,station_coord,how='left',on='location')
air_quality.head()
citycountrydate.utclocationparametervalueunitcoordinates.latitudecoordinates.longitude
0NaNNaN2019-04-09 01:00:00+00:00FR04014no224.4NaN48.837242.39390
1NaNNaN2019-04-09 01:00:00+00:00FR04014no224.4NaN48.837222.39390
2NaNNaN2019-04-09 01:00:00+00:00BETR801pm2576.0NaN51.209664.43182
3NaNNaN2019-04-09 01:00:00+00:00BETR801no222.5NaN51.209664.43182
4NaNNaN2019-04-09 02:00:00+00:00London Westminsterno267.0NaN51.49467-0.13193
air_quality_parameters = pd.read_csv("data/air_quality_parameters.csv")
air_quality_parameters.head()
iddescriptionname
0bcBlack CarbonBC
1coCarbon MonoxideCO
2no2Nitrogen DioxideNO2
3o3OzoneO3
4pm10Particulate matter less than 10 micrometers in...PM10
air_quality=pd.merge(air_quality,air_quality_parameters,how='left',left_on="parameter",right_on="id")
air_quality.head()
#在left_on和right_on 参数都在这里(而不是仅仅使用on),使两个表之间的联系。
citycountrydate.utclocationparametervalueunitcoordinates.latitudecoordinates.longitudeid_xdescription_xname_xid_ydescription_yname_y
0NaNNaN2019-04-09 01:00:00+00:00FR04014no224.4NaN48.837242.39390no2Nitrogen DioxideNO2no2Nitrogen DioxideNO2
1NaNNaN2019-04-09 01:00:00+00:00FR04014no224.4NaN48.837222.39390no2Nitrogen DioxideNO2no2Nitrogen DioxideNO2
2NaNNaN2019-04-09 01:00:00+00:00BETR801pm2576.0NaN51.209664.43182pm25Particulate matter less than 2.5 micrometers i...PM2.5pm25Particulate matter less than 2.5 micrometers i...PM2.5
3NaNNaN2019-04-09 01:00:00+00:00BETR801no222.5NaN51.209664.43182no2Nitrogen DioxideNO2no2Nitrogen DioxideNO2
4NaNNaN2019-04-09 02:00:00+00:00London Westminsterno267.0NaN51.49467-0.13193no2Nitrogen DioxideNO2no2Nitrogen DioxideNO2

小结:

  1. 使用该concat功能,可以将多个表按行连接到两列。

  2. 对于类似数据库的表合并/联接,请使用该merge 函数。


评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值