如何重置熊猫数据框中的索引？ [重复]

最新推荐文章于 2025-11-29 16:24:14 发布

翻译最新推荐文章于 2025-11-29 16:24:14 发布 · 138 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：https://oldbug.net/q/1nYsi/How-to-reset-index-in-a-pandas-data-frame-duplicate

文章标签：

#python #indexing #pandas #dataframe

本文介绍如何在Pandas中重置数据框的索引，包括使用reset_index()函数，以及通过直接赋值RangeIndex或range实现更快的索引重置方法。

本文翻译自：How to reset index in a pandas data frame? [duplicate]

This question already has answers here : 这个问题已经在这里有了答案 ：

How to convert index of a pandas dataframe into a column? 如何将熊猫数据框的索引转换为列？ (6 answers) （6个答案）

Closed 6 months ago . 6个月前关闭。

I have a data frame from which I remove some rows. 我有一个数据框，从中删除了一些行。 As a result, I get a data frame in which index is something like that: [1,5,6,10,11] and I would like to reset it to [0,1,2,3,4] . 结果，我得到一个数据帧，其中的索引是这样的： [1,5,6,10,11] ，我想将其重置为[0,1,2,3,4] 。 How can I do it? 我该怎么做？

The following seems to work: 以下似乎有效：

df = df.reset_index()
del df['index']

The following does not work: 以下内容不起作用：

df = df.reindex()

#1楼

参考：https://stackoom.com/question/1nYsi/如何重置熊猫数据框中的索引-重复

#2楼

reset_index() is what you're looking for. reset_index()是您要寻找的。 If you don't want it saved as a column, then do: 如果您不希望将其另存为列，请执行以下操作：

df = df.reset_index(drop=True)

If you don't want to reassign: 如果您不想重新分配：

df.reset_index(drop=True, inplace=True)

#3楼

Another solutions are assign RangeIndex or range : 另一个解决方案是分配RangeIndex或range ：

df.index = pd.RangeIndex(len(df.index))

df.index = range(len(df.index))

It is faster: 它更快：

df = pd.DataFrame({'a':[8,7], 'c':[2,4]}, index=[7,8])
df = pd.concat([df]*10000)
print (df.head())

In [298]: %timeit df1 = df.reset_index(drop=True)
The slowest run took 7.26 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 105 µs per loop

In [299]: %timeit df.index = pd.RangeIndex(len(df.index))
The slowest run took 15.05 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.84 µs per loop

In [300]: %timeit df.index = range(len(df.index))
The slowest run took 7.10 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.2 µs per loop