Pandas的索引index的用途

最新推荐文章于 2025-10-17 08:44:45 发布

原创最新推荐文章于 2025-10-17 08:44:45 发布 · 555 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#pandas

Pandas index的用途：

更方便的数据查询
使用Index可以获得性能提升
自动的数据对齐功能
更多更强大的数据结构支持

0. 读取数据

import pandas as pd

file_path = r'C:\TELCEL_MEXICO_BOT\A\Movie.csv'
df = pd.read_csv(file_path,encoding='utf-8')
print(df.head()) ## 查看前5行数据
   userId  movieId  rating  timestamp
0      10      104       6        255
1      12      120       7        294
2      13      136       4        333
3      15      152       7        372
4      16      168       5        411

print(df.count())  ## 查看数据有多少条
userId       39
movieId      39
rating       39
timestamp    39
dtype: int64

1. 使用index查询数据

## drop==False,让索引列还保持在column

注意，数据体的内容是中间那块

df.set_index('userId',inplace=True,drop=False)
print(df.head())
        userId  movieId  rating  timestamp
userId                                    
10          10      104       6        255
12          12      120       7        294
13          13      136       4        333
15          15      152       7        372
16          16      168       5        411

print(df.index)
Int64Index([10, 12, 13, 15, 16, 18, 20, 24, 27, 30, 33, 36, 40, 43, 46, 49, 52,
            56, 59, 62, 65, 68, 72, 75, 78, 81, 84, 88, 91, 94, 97, 10, 11, 14,
            16, 19, 25, 35, 40],
           dtype='int64', name='userId')

print(df.loc[78]) ##使用index进行查询
        userId  movieId  rating  timestamp
userId                                    
78          78      784       7       1917
78          78     1008      10       2465

2. 使用column的condition查询方法

print(df.loc[df['userId'] == 78].head())
        userId  movieId  rating  timestamp
userId                                    
78          78      784       7       1917
78          78     1008      10       2465

3. 使用index会提升查询性能

查询代码时间

code_to_test = "df.loc[78]"
# 运行 timeit 并获取结果
execution_time = timeit.timeit(code_to_test, globals=globals(), number=1000)  # number 是执行次数
Execution time: 0.000039 seconds per loop