pandas 中的索引与选择操作（.loc，.iloc）_df.iloc设置索引-优快云博客

本文链接：https://blog.youkuaiyun.com/myDarling_/article/details/127952370

文章目录

Indexing Series
Indexing DataFrame
References

import pandas as pd
import numpy as np

Indexing Series

Series 的索引和 NumPy array 类似，但除了可以使用整数值，还可以使用 Series 的 index：

obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
obj
"""
a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64
"""

obj['b']
"""
1.0
"""

obj[1]
"""
1.0
"""

类似 NumPy array 的 fancy indexing：

obj[['b', 'a', 'd']]
"""
b    1.0
a    0.0
d    3.0
dtype: float64
"""

也可以使用 index 的 label 进行切片，但和常规切片操作不同的是，右区间为闭合区间 (index 非整数情况下)：

obj['b':'c']
"""
b    1.0
c    2.0
dtype: float64
"""

obj[1:2]
"""
b    1.0
dtype: float64
"""

Indexing DataFrame

Indexing columns

df = pd.DataFrame(np.arange(16).reshape((4, 4)),
                  index=['Beijing', 'Shanghai', 'Guangzhou', 'Xian'],
                  columns=['one', 'two', 'three', 'four'])
print(df)
"""
           one  two  three  four
Beijing      0    1      2     3
Shanghai     4    5      6     7
Guangzhou    8    9     10    11
Xian        12   13     14    15
"""

索引 two 这一列：

df['two']
"""
Beijing       1
Shanghai      5
Guangzhou     9
Xian         13
Name: two, dtype: int32
"""

df.two
df['two']
"""
Beijing       1
Shanghai      5
Guangzhou     9
Xian         13
Name: two, dtype: int32
"""

索引多列：

print(df[['three', 'one']])
"""
           three  one
Beijing        2    0
Shanghai       6    4
Guangzhou     10    8
Xian          14   12
"""

Selecting rows

print(df[:2])
"""
          one  two  three  four
Beijing     0    1      2     3
Shanghai    4    5      6     7
"""

使用布尔 array：

df['three'] > 5
"""
Beijing      False
Shanghai      True
Guangzhou     True
Xian          True
Name: three, dtype: bool
"""

print(df[df['three'] > 5])
"""
           one  two  three  four
Shanghai     4    5      6     7
Guangzhou    8    9     10    11
Xian        12   13     14    15
"""

Indexing using a bool DataFrame

print(df < 5)
"""
             one    two  three   four
Beijing     True   True   True   True
Shanghai    True  False  False  False
Guangzhou  False  False  False  False
Xian       False  False  False  False
"""

df[df < 5] = 0
print(df)
"""
           one  two  three  four
Beijing      0    0      0     0
Shanghai     0    5      6     7
Guangzhou    8    9     10    11
Xian        12   13     14    15
"""

Selection with `loc` and `iloc`

相比上面的索引操作，loc 和 iloc 可以帮助我们更灵活的对 DataFrame 进行索引，得到不同行列的组合。

.loc 主要基于 label 来索引。例如，我们想要选择一行多列：

df.loc['Beijing', ['two', 'three']]
"""
two      0
three    0
Name: Beijing, dtype: int32
"""

.iloc 则主要用整数来进行索引。同样是选择一行多列：

df.iloc[0, [3, 0, 1]]
"""
four    0
one     0
two     0
Name: Beijing, dtype: int32
"""

选择第三行：

df.iloc[2]
"""
one       8
two       9
three    10
four     11
Name: Guangzhou, dtype: int32
"""

任意行列：

print(df.iloc[[1, 2], [3, 0, 1]])
"""
           four  one  two
Shanghai      7    0    5
Guangzhou    11    8    9
"""

注意上面一行多列的情形 .loc 和 .iloc 返回的都是 Series 对象。但如果我们将

df.loc['Beijing', ['two', 'three']]
df.iloc[0, [3, 0, 1]]

写为

df.loc[['Beijing'], ['two', 'three']]
df.iloc[[0], [3, 0, 1]]

则会返回 DataFrame 对象。

在 .loc 和 .iloc 中我们也可以结合使用切片操作，但注意 .loc 末端仍为闭区间，但 .iloc 为左闭右开：

df.loc[:'Xian', 'two']
"""
Beijing       0
Shanghai      5
Guangzhou     9
Xian         13
Name: two, dtype: int32
"""

print(df.iloc[:, :3][df.three > 5])
"""
           one  two  three
Shanghai     0    5      6
Guangzhou    8    9     10
Xian        12   13     14
"""

总结

DataFrame 的索引操作：

Type	Description
`df[val]`	一般为选择一列或多列（通过列标记）；但如果使用切片时也可以选择多行（使用行标记）
`df.loc[val]`	通过行标记选择一行或多行
`df.loc[:, val]`	通过列标记选择一列或多列
`df.loc[val1, val2]`	同时选择行和列 (by label)
`df.iloc[where]`	通过整数位置选择一行或多行
`df.iloc[:, where]`	通过整数位置选择一列或多列
`df.iloc[where_i, where_j]`	同时选择行和列 (by integer)