Pandas基础01（Series创建/索引/切片/属性/方法/运算）

最新推荐文章于 2025-07-08 21:37:26 发布

XYX的Blog

最新推荐文章于 2025-07-08 21:37:26 发布

阅读量416

点赞数 3

CC 4.0 BY-SA版权

分类专栏：数据分析与可视化文章标签： pandas

本文链接：https://blog.youkuaiyun.com/XYX_888/article/details/145357787

数据分析与可视化专栏收录该内容

12 篇文章

订阅专栏

Pandas基础

Pandas 是一个功能强大的数据分析和操作库，主要用于处理和分析表格型数据（例如：CSV、Excel、SQL数据库等）。它建立在 NumPy 基础上，提供了许多便捷的数据结构，主要是 Series 和 DataFrame，用于处理和分析数据。

3.1 Series数据结构

Series 是一种类似于一维数组的对象，它包含了一组数据（可以是整数、浮点数等）以及与之相关的标签（索引）。可以将 Series 看作一个带有索引的一维数组。

Series可以看作一个有序的字典结果，可以通过键获取值

3.1.1 Series的创建

Series由一组数据（values）和数据的索引标签（index）构成。

由列表和numpy数组构成：pd.Series(list/ndarray)

#索引默认为0~N—1的整数
list = [11, 22, 33, 44]
p = pd.Series(list)
print(p)

n = np.array(list)
p = pd.Series(n)
print(p)

>>> 0    11
	1    22
	2    33
	3    44
	dtype: int32

type(p.values) #numpy.ndarray
print(p.index)#RangeIndex(start=0, stop=4, step=1)

由字典创建

aDict = {
    'a': np.random.randint(0, 10, (2, 3)),
    'b': 123,
    'c': 'hello'
}

p = pd.Series(aDict)
p

可以在Series()函数中直接设置索引

#通过index属性直接修改索引的值
n = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'] )

3.1.2 Series的索引

显示索引：通过索引名访问索引的值

p = pd.Series({'Chinese':120, 'Math':140, 'English':130})

# 显示索引：
1.通过p['索引名']/p.非数字索引名 获取值
print(p.Chinese) #注意索引为菲对象时，如整数1不可以使用此方法
print(p['Chinese'])

通过p[[索引名列名]]获取值仍然为Series数据类型
print(p[['Chinese', 'English']])

2.通过p.loc[]获取值
print(p.loc['Chinese'])
print(p.loc[['Chinese', 'Math']])

隐式索引

p = pd.Series({'Chinese':120, 'Math':140, 'English':130})

# 隐式索引：通过数字下标获取值
print(p[0])
print(p[[0, 1]])#返回的仍然是Series类型
print(p.iloc[0])
print(p.iloc[[0, 2]])#返回的仍然是Series类型

#注意：如果存在数字同名索引，会按照显示索引来确定值
p = pd.Series({1:120, 2:140, 3:130})
p[1]
>>>120

3.1.3 Series的切片

显示切片

p = pd.Series({'a':11,'b':22,'c':33,'d':44,'e':55})
# 显示切片:通过索引名切片（左闭右闭）
print(p['a':'d'].values)#[11 22 33 44]
print(p.loc['b':'d'].values)#[22 33 44]

隐式切片

# 隐式切片：通过数字下标切片（左闭右开）
print(p[0::2].values) #[11 33 55]
print(p.iloc[0:3].values) #[11 22 33]

3.1.4 Series的基本属性

p = pd.Series({'a':11,'b':22,'c':33,'d':44,'e':55})
# shape属性：返回形状 (n,)
print(p.shape) #(5,)

# size属性：返回长度
print(p.size) #5

# index属性：返回数组的索引
print(p.index) #Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

# values属性：返回数组的值
print(p.values) #[11 22 33 44 55]是ndarray类型

3.1.5 Serries的基本方法

head()和tail()

# 默认返回Serries中的前5个数据和后5个数据
p = pd.Series({'a':11,'b':22,'c':33,'d':44,'e':55, 'f':66, 'g':77, 'h':88, 'i':99})
print(p.head().values) #[11 22 33 44 55]
print(p.head(3).values) #[11 22 33]

print(p.tail().values) #[55 66 77 88 99]
print(p.tail(3).values) #[77 88 99]

pd.isnull()/isnull() 和 pd.notnull()/notnull()

p = pd.Series(['aaa', np.nan, 'ccc', 'ddd'])
# 判断是否为NaN，返回Series dtype=bool
print(p.isnull().values) #[False  True False False]
print(pd.isnull(p))

# 判断是否不为NaN，返回Series dtype=bool
print(p.notnull().values) #[ True False  True  True]
print(pd.notnull(p))

# 通过bool值去过滤空值：如果是True返回值，如果为False不返回
p[~p.isnull()]#[False  True False False]通过“~”进行bool的取反
0    aaa
2    ccc
3    ddd
dtype: object

p[p.notnull()]#[ True False  True  True]
0    aaa
2    ccc
3    ddd
dtype: object

3.1.6 Series运算

基本代数运算

s = pd.Series(np.random.randint(1, 10, (3, )))
print(s + 100)
print(s - 100)
print(s * 100)
print(s / 100)
print(s // 2)
print(s ** 2)
print(s % 2)

Series与Series的运算

在运算中自动对齐索引
如果索引不对应，则补NaN
Series没有广播机制

s1 = pd.Series(np.random.randint(1, 10, (3, )))
s2 = pd.Series(np.random.randint(1, 10, (4, )), index=[3, 2, 1, 0])
s1 + s2 #对应索引相同元素相加，索引不对应处为NaN
>>>3    NaN

#如果想要保留索引的值，不想出现NaN则需要调用add()函数
s1.add(s2, fill_value=0)
>>>3	8.0