pandas入门教程(4)

本文介绍如何使用 Pandas 进行数据处理,包括数据分类、重命名、排序等操作,并展示了如何利用 Pandas 的 plot 功能进行数据可视化。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

分类

df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'e', 'e']})
df
'''
   id raw_grade
0   1         a
1   2         b
2   3         b
3   4         a
4   5         e
5   6         e
'''

把raw_grade转换为分类类型

df["grade"] = df["raw_grade"].astype("category")
df["grade"]
'''
0    a
1    b
2    b
3    a
4    e
5    e
Name: grade, dtype: category
Categories (3, object): [a, b, e]
'''

重命名类别名为更有意义的名称

df["grade"].cat.categories = ["very good", "good", "very bad"]
df["grade"]
'''
0    very good
1         good
2         good
3    very good
4     very bad
5     very bad
Name: grade, dtype: category
Categories (3, object): [very good, good, very bad]
'''

对分类重新排序,并添加缺失的分类

df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"])
df["grade"]
'''
0    very good
1         good
2         good
3    very good
4     very bad
5     very bad
Name: grade, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]
'''

排序是按照分类的顺序进行的,而不是字典序,也就是按照 [very bad, bad, medium, good, very good]顺序分类

df.sort_values(by="grade")
'''
   id raw_grade      grade
4   5         e   very bad
5   6         e   very bad
1   2         b       good
2   3         b       good
0   1         a  very good
3   4         a  very good
'''

按分类分组时,也会显示空的分类

df.groupby("grade").size()
'''
grade
very bad     2
bad          0
medium       0
good         2
very good    2
dtype: int64
'''

绘图

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()

在这里插入图片描述
对于DataFrame类型,plot()能很方便地画出所有列及其标签

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
plt.figure(); df.plot(); plt.legend(loc='best')

在这里插入图片描述

获取数据的I/O

CSV

写入一个csv文件

df.to_csv('data/foo.csv')

从一个csv文件读入

pd.read_csv('data/foo.csv')
'''

Unnamed: 0	         A	         B	        C	D
0	2000-01-01	-0.547915	0.758504	-0.791553	-0.014276
1	2000-01-02	-1.478652	-1.824863	0.001416	-1.067864
2	2000-01-03	-3.070727	-1.655967	-0.551411	-2.188238
3	2000-01-04	-3.317105	-1.810778	0.354000	-1.995760
4	2000-01-05	-2.822308	-2.757077	1.210057	-1.250592
5	2000-01-06	-2.223865	-3.256931	3.658672	-1.232510
6	2000-01-07	-0.559411	-2.842820	4.860993	-0.418853
7	2000-01-08	-0.524812	-2.748762	6.280814	0.170616
8	2000-01-09	0.204528	-0.865022	5.560456	0.875871
9	2000-01-10	0.688895	-1.704024	5.226062	0.444483
10	2000-01-11	1.272138	-2.478377	2.763524	-1.229797
11	2000-01-12	1.255657	-2.994073	2.869688	0.745504
12	2000-01-13	-0.001803	-2.295823	2.738087	2.640307
13	2000-01-14	0.144209	-0.926810	4.087308	0.164138
14	2000-01-15	1.304811	0.625530	3.542700	-1.499446
15	2000-01-16	0.682026	0.552441	3.201389	-0.155674
16	2000-01-17	-1.065321	-0.339092	3.792634	-1.754946
17	2000-01-18	-1.268418	-3.512773	3.656896	-0.132124
18	2000-01-19	-1.936390	-1.069188	2.554887	0.413930
19	2000-01-20	-1.138678	-2.993870	3.094198	-0.956574
20	2000-01-21	-0.096721	-4.141108	3.341248	-0.436886
21	2000-01-22	-0.678763	-5.012663	0.823576	1.303377
22	2000-01-23	-2.498525	-6.075938	0.302908	-0.294204
23	2000-01-24	-0.937785	-6.768487	-0.078521	-0.165799
24	2000-01-25	-0.252338	-6.882860	0.545684	-1.108641
25	2000-01-26	-0.772653	-7.080129	0.764647	-3.041450
26	2000-01-27	-0.348831	-6.035375	0.507673	-2.522836
27	2000-01-28	1.751721	-5.756738	-0.469014	-2.825118
28	2000-01-29	0.799938	-4.559172	-0.086763	-3.243101
29	2000-01-30	1.149650	-4.433331	1.476689	-2.479956
...	...	...	...	...	...
970	2002-08-28	0.466756	38.780786	-6.743120	-15.206051
971	2002-08-29	1.903199	36.687258	-5.720742	-14.335689
972	2002-08-30	1.432801	37.186065	-6.274877	-14.879621
973	2002-08-31	2.218021	36.200395	-5.638178	-15.625389
974	2002-09-01	3.405318	36.987365	-4.441912	-14.162031
975	2002-09-02	3.854252	38.283545	-4.845949	-15.449302
976	2002-09-03	4.328033	37.123804	-6.006344	-14.158929
977	2002-09-04	5.219103	37.203486	-6.035182	-14.461389
978	2002-09-05	4.884547	38.192836	-6.447570	-13.541204
979	2002-09-06	5.817435	37.669998	-5.983732	-12.238303
980	2002-09-07	7.953334	37.503576	-4.783659	-12.092902
981	2002-09-08	8.575181	37.377622	-3.154240	-12.895208
982	2002-09-09	8.242240	38.155500	-3.285132	-13.483775
983	2002-09-10	9.699693	37.421024	-1.763105	-13.397276
984	2002-09-11	7.228027	38.070563	-1.250151	-14.455654
985	2002-09-12	7.276336	38.503057	-1.110047	-15.362900
986	2002-09-13	6.955453	37.038613	-0.571942	-14.527590
987	2002-09-14	6.232800	37.254276	-1.893050	-14.385401
988	2002-09-15	5.331802	36.671504	-1.782282	-13.645305
989	2002-09-16	5.825638	36.920733	-2.718137	-13.875984
990	2002-09-17	5.972137	36.785577	-3.923763	-13.281418
991	2002-09-18	5.154685	34.789858	-4.426825	-13.805227
992	2002-09-19	4.228035	34.400350	-4.431980	-14.520423
993	2002-09-20	3.047640	36.244469	-5.093584	-15.106412
994	2002-09-21	2.984361	38.095349	-5.560222	-16.584216
995	2002-09-22	3.187098	37.169742	-6.132250	-14.910771
996	2002-09-23	1.772470	36.399488	-6.257600	-13.668261
997	2002-09-24	1.886272	34.543307	-6.644915	-15.213739
998	2002-09-25	2.918438	33.722114	-5.736130	-14.919711
999	2002-09-26	2.608268	33.473087	-5.111581	-15.969800
1000 rows × 5 columns
'''
HDF5

HDFStores的读写

写入一个HDF5 Store

df.to_hdf('data/foo.h5', 'df')

从一个HDF5 Store读入

pd.read_hdf('data/foo.h5', 'df')
Excel

MS Excel的读写

写入一个Excel文件

df.to_excel('data/foo.xlsx', sheet_name='Sheet1')

从一个excel文件读入

pd.read_excel('data/foo.xlsx', 'Sheet1', index_col=None, na_values=['NA'])
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值