pd综合练习-优快云博客

本文通过实际操作展示了使用Pandas进行数据处理的一些常见任务，包括拍卖中标率分析、时间列拆分、统计量计算、多级索引设置、数据过滤和货运航班的运载量统计。主要涉及数据清洗、统计分析和数据重构等环节。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import pandas as pd
import numpy as np
df = pd.read_csv('data/数据集/2002年-2018年上海机动车拍照拍卖.csv')
df

	Date	Total number of license issued	lowest price	avg price	Total number of applicants
0	2-Jan	1400	13600	14735	3718
1	2-Feb	1800	13100	14057	4590
2	2-Mar	2000	14300	14662	5190
3	2-Apr	2300	16000	16334	4806
4	2-May	2350	17800	18357	4665
...	...	...	...	...	...
198	18-Aug	10402	88300	88365	192755
199	18-Sep	12712	87300	87410	189142
200	18-Oct	10728	88000	88070	181861
201	18-Nov	11766	87300	87374	177355
202	18-Dec	12850	87400	87508	165442

203 rows × 5 columns

(1) 哪一次拍卖的中标率首次小于 5%？

df['rate']=df['Total number of license issued']/df['Total number of applicants']

df.loc[lambda x:x['rate']<0.05].head(1)

	Date	Total number of license issued	lowest price	avg price	Total number of applicants	rate
159	15-May	7482	79000	79099	156007	0.047959

(3) 将第一列时间列拆分成两个列，一列为年份（格式为 20××），另一列为月份（英语缩写），添加到列表作为第一第二列，并将原表第一列删除，其他列依次向后顺延。

df.loc[:,'Date']

0       2-Jan
1       2-Feb
2       2-Mar
3       2-Apr
4       2-May
        ...  
198    18-Aug
199    18-Sep
200    18-Oct
201    18-Nov
202    18-Dec
Name: Date, Length: 203, dtype: object

a=[]
b=[]
for each in range(len(df.loc[:,'Date'])):
    a.append(2000+int(df.loc[:,'Date'][each].split('-')[0]))
for each in range(len(df.loc[:,'Date'])):
    b.append(df.loc[:,'Date'][each].split('-')[1])

df['year']=a
df['month']=b

column = df.columns.tolist()

column.remove('Date')

a = column.pop()
column.insert(0,a)
b = column.pop()
column.insert(0,b)

column

['year',
 'month',
 'Total number of license issued',
 'lowest price ',
 'avg price',
 'Total number of applicants',
 'rate']

df.loc[:,column]

	year	month	Total number of license issued	lowest price	avg price	Total number of applicants	rate
0	2002	Jan	1400	13600	14735	3718	0.376547
1	2002	Feb	1800	13100	14057	4590	0.392157
2	2002	Mar	2000	14300	14662	5190	0.385356
3	2002	Apr	2300	16000	16334	4806	0.478568
4	2002	May	2350	17800	18357	4665	0.503751
...	...	...	...	...	...	...	...
198	2018	Aug	10402	88300	88365	192755	0.053965
199	2018	Sep	12712	87300	87410	189142	0.067209
200	2018	Oct	10728	88000	88070	181861	0.058990
201	2018	Nov	11766	87300	87374	177355	0.066342
202	2018	Dec	12850	87400	87508	165442	0.077671

203 rows × 7 columns

(2) 按年统计拍卖最低价的下列统计量：最大值、均值、 0.75 分位数，要求显示在同一张表上。

import pandas as pd
import numpy as np
df = pd.read_csv('data/数据集/2002年-2018年上海机动车拍照拍卖.csv')
df = pd.concat([df['Date'].str.split('-',expand=True),df],axis=1)

df

	0	1	Date	Total number of license issued	lowest price	avg price	Total number of applicants
0	2	Jan	2-Jan	1400	13600	14735	3718
1	2	Feb	2-Feb	1800	13100	14057	4590
2	2	Mar	2-Mar	2000	14300	14662	5190
3	2	Apr	2-Apr	2300	16000	16334	4806
4	2	May	2-May	2350	17800	18357	4665
...	...	...	...	...	...	...	...
198	18	Aug	18-Aug	10402	88300	88365	192755
199	18	Sep	18-Sep	12712	87300	87410	189142
200	18	Oct	18-Oct	10728	88000	88070	181861
201	18	Nov	18-Nov	11766	87300	87374	177355
202	18	Dec	18-Dec	12850	87400	87508	165442

203 rows × 7 columns

df.drop(['Date'],axis=1,inplace=True)

df.rename(columns={
   
   0:'year',1:'month'},inplace=True)

df['year'] = df['year'].apply(lambda x:int(x)+2000)

df

	year	month	Total number of license issued	lowest price	avg price	Total number of applicants
0	2002	Jan	1400	13600	14735	3718
1	2002	Feb	1800	13100	14057	4590
2	2002	Mar	2000	14300	14662	5190
3	2002	Apr	2300	16000	16334	4806
4	2002	May	2350	17800	18357	4665
...	...	...	...	...	...	...
198	2018	Aug	10402	88300	88365	192755
199	2018	Sep	12712	87300	87410	189142
200	2018	Oct	10728	88000	88070	181861
201	2018	Nov	11766	87300	87374	177355
202	2018	Dec	12850	87400	87508	165442

203 rows × 6 columns

df.groupby('year')['lowest price '].describe()

	count	mean	std