前言
在 Pandas 中,索引(Index)是数据框架(DataFrame)中最基础却也最重要的概念之一。它就像数据的"门牌号",不仅能帮助我们定位和访问数据,还能实现强大的数据对齐和分组操作。而set_index和reset_index这两个方法,就像是数据重组的"变形金刚",让我们能够灵活地在列和索引之间转换,从而以不同的视角组织和分析数据。无论是在数据预处理、多级索引构建,还是在数据透视等场景中,熟练运用这两个方法都能让我们的数据分析工作事半功倍。
一、set_index
1.基本语法
DataFrame.set_index(keys, *, drop=True, append=False, inplace=False, verify_integrity=False)
描述:
- 使用现有列设置DataFrame索引。
- 使用一个或多个现有列或数组(具有正确长度)设置DataFrame索引(行标签)。索引可以替换现有索引或在其基础上扩展。
参数:
-
keys
: label or array-like or list of labels/arrays
This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index, np.ndarray, and instances of Iterator. -
drop
: bool, default True
Delete columns to be used as the new index. -
append
: bool, default False
Whether to append columns to existing index. -
inplace
: bool, default False
Whether to modify the DataFrame rather than creating a new one. -
verify_integrity
: bool, default False
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method.
一般只需要指定 keys
,其他默认即可。
2.示例
创建数据集
>>> df = pd.DataFrame({'month': [1, 4, 7, 10],
... 'year': [2012, 2014, 2013, 2014],
... 'sale': [55, 40, 84, 31]})
>>> df
month year sale
0 1 2012 55
1 4 2014 40
2 7 2013 84
3 10 2014 31
(1)将’month’列设置为索引
>>> df.set_index('month')
year sale
month
1 2012 55
4 2014 40
7 2013 84
10 2014 31
(2)将’year’和’month’列设置成多级索引
>>> df.set_index(['year', 'month'])
sale
year month
2012 1 55
2014 4 40
2013 7 84
2014 10 31
(3)使用一个index对象和一列创建多级索引
>>> df.set_index([pd.Index([1, 2, 3, 4]), 'year'])
month sale
year
1 2012 1 55
2 2014 4 40
3 2013 7 84
4 2014 10 31
(4)使用两个 Series 创建多级索引
>>> s = pd.Series([1, 2, 3, 4])
>>> df.set_index([s, s**2])
month year sale
1 1 1 2012 55
2 4 4 2014 40
3 9 7 2013 84
4 16 10 2014 31
二、reset_index
1.基本语法
DataFrame.reset_index(level=None, *, drop=False, inplace=False, col_level=0, col_fill='', allow_duplicates=<no_default>, names=None)
描述:
- 重置索引或其中的一个层级。
- 重置DataFrame的索引,改用默认索引。如果DataFrame具有多级索引,此方法可以移除一个或多个层级。
参数:
-
level
: int, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by default. -
drop
: bool, default False
Do not try to insert index into dataframe columns. This resets the index to the default integer index. -
inplace
: bool, default False
Whether to modify the DataFrame rather than creating a new one. -
col_level
: int or str, default 0
If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level. -
col_fill
: object, default ‘’
If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated. -
allow_duplicates
: bool, optional, default lib.no_default
Allow duplicate column labels to be created. -
names
: int, str or 1-dimensional list, default None
Using the given string, rename the DataFrame column which contains the index data. If the DataFrame has a MultiIndex, this has to be a list or tuple with length equal to the number of levels.
一般所有参数都默认即可,有时也会设置 drop=True
2.示例
import pandas as pd
# 创建一个带索引的DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd']
}, index=['w', 'x', 'y', 'z'])
print("原始数据:")
print(df)
# 输出:
原始数据:
A B
w 1 a
x 2 b
y 3 c
z 4 d
(1)基本重置索引
reset_df = df.reset_index()
print("\n重置索引后:")
print(reset_df)
# 输出:
重置索引后:
index A B
0 w 1 a
1 x 2 b
2 y 3 c
3 z 4 d
(2)使用drop=True删除原索引
reset_drop_df = df.reset_index(drop=True)
print("\n删除原索引:")
print(reset_drop_df)
# 输出:
删除原索引:
A B
0 1 a
1 2 b
2 3 c
3 4 d
(3)处理多级索引
# 创建多级索引的DataFrame
multi_df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd']
})
multi_df.index = pd.MultiIndex.from_tuples([
('2024', 'Q1'),
('2024', 'Q2'),
('2025', 'Q1'),
('2025', 'Q2')
], names=['year', 'quarter'])
print("\n多级索引数据:")
print(multi_df)
# 重置所有级别的索引
reset_all = multi_df.reset_index()
print("\n重置所有索引级别:")
print(reset_all)
# 只重置第一级索引
reset_level_0 = multi_df.reset_index(level=0)
print("\n只重置第一级索引:")
print(reset_level_0)
输出结果如下:
多级索引数据:
A B
year quarter
2024 Q1 1 a
Q2 2 b
2025 Q1 3 c
Q2 4 d
重置所有索引级别:
year quarter A B
0 2024 Q1 1 a
1 2024 Q2 2 b
2 2025 Q1 3 c
3 2025 Q2 4 d
只重置第一级索引:
year A B
quarter
Q1 2024 1 a
Q2 2024 2 b
Q1 2025 3 c
Q2 2025 4 d