Pandas基础一

最新推荐文章于 2025-05-19 00:32:53 发布

原创最新推荐文章于 2025-05-19 00:32:53 发布 · 328 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#pandas

python 专栏收录该内容

18 篇文章

订阅专栏

本文介绍了Pandas库的基础知识，包括Numpy与Pandas的区别、Pandas的数据结构Series和DataFrame的操作方法，如生成、索引、排序及数据选择等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Pandas基础知识:
Numpy与Pandas有什么不同？如果用 python 的列表和字典来作比较, 那么可以说 Numpy 是列表形式的，没有数值标签，而 Pandas 就是字典形式。

pandas的两个数据结构：Series和DataFrame

1.Series

#Series
import pandas as pd
import numpy as np
s=pd.Series([1,3,6,np.nan,44,12])
print(s)     #Series的字符串表现形式为：索引在左边，值在右边。没有指定索引，自动创建一个0到N-1（N为长度）的整数型索引。

输出：

0     1.0
1     3.0
2     6.0
3     NaN
4    44.0
5    12.0
dtype: float64

2.DataFrame

#DataFrame
dates=pd.date_range(u'20180101',periods=6)  #开始于20180101，共6个
print(dates)
df=pd.DataFrame(np.random.randn(6,4),index=dates,columns=['a','b','c','d'])   #randn:随机矩阵
print(df)
print('按列索引：')
print(df['a'])
print('多列索引：')
#print(df['b':'c'])         ##疑问？？？？，此语法错误values错误

print('按行索引：')   
#print(df['20180102'])      #按行索引，此语法错误：KeyError
print(df['20180102':'20180102'])  #正确写法
print(df['20180101':'20180102'])  #结果为2018-01-01、2018-01-02 两行

#另一种生成df方法
df2=pd.DataFrame({'A':np.array([11,12,13,14]),
                  'B':pd.Timestamp('20180303'),   #时间戳
                  'C':pd.Series(1,index=list(range(4)),dtype='float32'),
                  'D':np.array([3]*4,dtype='int32'),
                  'E':pd.Categorical(['test','train','test','train'])
                })
print(df2)
print('查看每一列的数据类型:')
print(df2.dtypes)  #查看每一列的数据类型

print('查看行index：')
print(df2.index) 

print('查看列columns:')
print(df2.columns)  #columns

print('查看数据的值')
print(df2.values)

print('数据特征：')
print(df2.describe())      #查看数据总结，包括mean、min、max等

print('数据转置：')
print(df2.T)        #转置数据

print('对行降序排序：')
print(df2.sort_index(axis=1,ascending=False))   #对行降序排序

print('按照A列降序排序：')
print(df2.sort_values(by='A',ascending=False))                  #按照A列降序排序

3.Panda选择数据
（1）标签loc

#根据标签loc ，某一行或者所有行，不能部分行
print(df)
print(df.loc['20180102'])                   #选择某一行数据
print(df.loc[:,['a','b']])                  #选择所有行，a,b两列数据
print(df.loc['20180102',['a','b']])         #选择20180102的a,b两列数据
print('---------')
print(df.loc['20180101',:])                 #同df.loc['20180101']
#print(df.loc[['20180101','20180102'],:])    #能所有行、部分列；不能部分行、所有列KeyError: "None of [['20180101', '20180102']] are in the [index]"

输出：

                   a         b         c         d
2018-01-01  1.605685  0.958052  1.545515 -1.532078
2018-01-02  0.671020 -0.056588  0.602537  0.957299
2018-01-03 -0.410295  0.303129 -0.566329 -0.487873
2018-01-04 -0.563143  0.672877 -1.670422 -1.079927
2018-01-05  1.041299  0.525498  0.757789 -0.380195
2018-01-06  1.611737  0.669790  0.465596 -0.377440
a    0.671020
b   -0.056588
c    0.602537
d    0.957299
Name: 2018-01-02 00:00:00, dtype: float64
                   a         b
2018-01-01  1.605685  0.958052
2018-01-02  0.671020 -0.056588
2018-01-03 -0.410295  0.303129
2018-01-04 -0.563143  0.672877
2018-01-05  1.041299  0.525498
2018-01-06  1.611737  0.669790
a    0.671020
b   -0.056588
Name: 2018-01-02 00:00:00, dtype: float64
---------
a    1.605685
b    0.958052
c    1.545515
d   -1.532078
Name: 2018-01-01 00:00:00, dtype: float64

（2）序列iloc

#根据序列iloc
#   进行位置选择,可以选择数据的某一个，连续的或者跨行的多个
print(df)
print(df.iloc[3,1])          #第4行2列
print(df.iloc[1:2,2:4])      #1行2列，右开区间
print(df.iloc[[1,3,5],1:3])  #跨行取值

输出：

                   a         b         c         d
2018-01-01  1.605685  0.958052  1.545515 -1.532078
2018-01-02  0.671020 -0.056588  0.602537  0.957299
2018-01-03 -0.410295  0.303129 -0.566329 -0.487873
2018-01-04 -0.563143  0.672877 -1.670422 -1.079927
2018-01-05  1.041299  0.525498  0.757789 -0.380195
2018-01-06  1.611737  0.669790  0.465596 -0.377440
0.672876752828
                   c         d
2018-01-02  0.602537  0.957299
                   b         c
2018-01-02 -0.056588  0.602537
2018-01-04  0.672877 -1.670422
2018-01-06  0.669790  0.465596

（3）根据混合的ix

#根据混合的ix
print(df.ix[:3,['a','c']])

输出：

                   a         c
2018-01-01  1.605685  1.545515
2018-01-02  0.671020  0.602537
2018-01-03 -0.410295 -0.566329