pandas的API进阶

原创已于 2024-12-22 14:20:47 修改 · 243 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#pandas #numpy

于 2024-12-22 14:20:13 首次发布

pandas 专栏收录该内容

3 篇文章

订阅专栏

首先导一下Pandas和numpy的包,下文默认导包

# 导包
import pandas as pd
import numpy as np

pandas中Dataframe的属性

创建Dataframe

# 通过numpy的随机数生成矩阵
score_data = np.random.randint(40, 100, size=(10, 5))
score_data

# 创建DataFrame对象
score_df = pd.DataFrame(score_data)
score_df

展示DataFrame的属性

# shape属性: 获取df对象的形状, 格式为: (行数, 列数)
score_df.shape      # (10, 5)

# index属性: 获取df对象的行索引
score_df.index      # 在Pandas中, object表示字符串, 即: 等价于Python中的 str类型

# columns属性: 获取df对象的列名
score_df.columns

# values属性: 获取df对象的数据, 返回ndarray
score_df.values

# T属性, 行列转置.
score_df.T

# size属性, 获取df对象的元素个数, 即:行数 * 列数
score_df.size 
  
# dtypes属性, 获取df对象的数据类型
score_df.dtypes

DataFrame的方法

# head()方法, 默认: 获取前5行数据
score_df.head(10)       # 可以获取指定条数的数据. 

# tail()方法, 默认: 获取最后5行数据
score_df.tail()
score_df.tail(10)   # 可以获取指定条数的数据. 

# describe()方法, 获取数据的描述性 统计 信息
score_df.describe()

# info()方法, 获取数据的描述性 详细 信息
score_df.info()

DataFrame的索引操作

修改索引

修改DataFrame的索引必须是整列换,不能单独更改一个

# 把索引修改为同学0 ~ 同学9
df.index = ['stu_' + str(i) for i in range(10)]

设置索引列

# 准备数据,下面演示把month设置为索引列
df = pd.DataFrame({
    'month': [1, 4, 7, 10],
    'sale': [55, 40, 84, 31],
    'year': [2024, 2025, 2026, 2027]
})
df

把month设置为索引

# 把month设置为索引
df.set_index('month',inplace = True,drop = True)

其中inplace参数,是是否改变原数据(默认为False),drop表示是否删除原来的列(默认为True)

重置索引列

df.reset_index(inplace=True,drop = False)
df

其中,reset_index的inplace和drop参数与set_index的默认参数相反