【python数据分析(8)】 Pandas数据结构Dataframe:选择行和列、索引(切片)

本文详细介绍了如何使用Pandas库中的DataFrame进行数据选择与索引,包括使用df[], df.loc[], df.iloc[]以及布尔型索引等方法,并展示了如何按列、按行以及条件筛选数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

注意事项:
Dataframe既有行索引也有列索引,可以被看做由Series组成的字典(共用一个索引)

1. 选择列

1.1 df[] 一般用于选择列,也可以选择行(默认是进行列选择的)
df = pd.DataFrame(np.random.rand(12).reshape(3,4)*100,
                   index = ['one','two','three'],
                   columns = ['a','b','c','d'])
print(df)

data1 = df['a']
data2 = df[['b','c']]  #
print(data1)
print(data2)

–> 输出的结果为:

               a          b          c          d
one    58.508966  95.955052  21.001119  11.598748
two    39.940444   4.822591  63.117561  24.915640
three  10.141366  42.279737  81.585248  99.513415

one      58.508966
two      39.940444
three    10.141366
Name: a, dtype: float64

               b          c
one    95.955052  21.001119
two     4.822591  63.117561
three  42.279737  81.585248
1.2df[] 用于选择行(一般不这么使用,但是可以这么操作),后面有专门对于行的操作方法
1.3df[] 不能通过索引标签名来选择行(比如这里df[‘one’])
data3 = df[:1]
print(data3)
print(type(data3))

–> 输出的结果为:

             a          b          c          d
one  58.508966  95.955052  21.001119  11.598748 

<class 'pandas.core.frame.DataFrame'>

2. 选择行

2.1 df.loc[] - 按index选择行

df.loc[label]主要针对index选择行,同时支持指定index,及默认数字index

2.1.1 首先创建数组**
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   index = ['one','two','three','four'],
                   columns = ['a','b','c','d'])
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   columns = ['a','b','c','d'])
print(df1)
print(df2)

–> 输出的结果为:

               a          b          c          d
one    32.739293  74.631681  57.738041  64.283459
two    49.329576  96.607287  37.576970  21.803517
three  62.766459  49.264659  71.193031  22.111200
four   48.914713  84.778627  49.706254   7.874963
           a          b          c          d
0  79.514782  45.871142  57.086445  11.709671
1   3.236386  61.162491  18.101219  38.525494
2  46.595874  13.619774  15.503499   0.832061
3  52.592679  18.123406  54.248833  59.938835
2.1.2 单标签索引(根据有无标签名进行索引),返回Series
data1 = df1.loc['one']
data2 = df2.loc[1]
print(data1)
print(data2)

–> 输出的结果为:(Series的name会以索引的标签为名)

a    32.739293
b    74.631681
c    57.738041
d    64.283459
Name: one, dtype: float64
a     3.236386
b    61.162491
c    18.101219
d    38.525494
Name: 1, dtype: float64
2.1.3 多标签索引,如果标签不存在,则返回NaN(索引顺序可变)
data3 = df1.loc[['two','three','five']]
data4 = df2.loc[[3,2,1]]
print(data3)
print(data4)

–> 输出的结果为:(注意pandas版本的问题)

               a          b          c          d
two    49.329576  96.607287  37.576970  21.803517
three  62.766459  49.264659  71.193031  22.111200
five         NaN        NaN        NaN        NaN
           a          b          c          d
3  52.592679  18.123406  54.248833  59.938835
2  46.595874  13.619774  15.503499   0.832061
1   3.236386  61.162491  18.101219  38.525494
2.1.4 切片索引,末端包含
data5 = df1.loc['one':'three']
data6 = df2.loc[1:3]
print(data5)
print(data6)

–> 输出的结果为:

               a          b          c          d
one    32.739293  74.631681  57.738041  64.283459
two    49.329576  96.607287  37.576970  21.803517
three  62.766459  49.264659  71.193031  22.111200
           a          b          c          d
1   3.236386  61.162491  18.101219  38.525494
2  46.595874  13.619774  15.503499   0.832061
3  52.592679  18.123406  54.248833  59.938835
2.2 df.iloc[] - 按照整数位置选择行

类似list的索引,其顺序就是dataframe的整数位置(从轴的0到length-1)

2.2.1 首先创建数组
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   index = ['one','two','three','four'],
                   columns = ['a','b','c','d'])
print(df)

–> 输出的结果为:

               a          b          c          d
one    64.196153   3.181391  71.407232  66.672682
two    46.100913  51.140302  92.888548  12.207747
three  55.724660  28.906997  21.150581   6.250792
four   80.663114  36.770303  88.255988  21.949060
2.2.2 单标签索引,和loc[]索引不同,不能索引超出数据行数的整数位置,比如下面的.iloc[4]
print(df.iloc[0])
print(df.iloc[-1])
#print(df.iloc[4])

–> 输出的结果为:

a    64.196153
b     3.181391
c    71.407232
d    66.672682
Name: one, dtype: float64
a    80.663114
b    36.770303
c    88.255988
d    21.949060
Name: four, dtype: float64
2.2.3 多标签索引,索引顺序可变
print(df.iloc[[0,2]])
print(df.iloc[[3,2,1]])

–> 输出的结果为:

               a          b          c          d
one    64.196153   3.181391  71.407232  66.672682
three  55.724660  28.906997  21.150581   6.250792
               a          b          c          d
four   80.663114  36.770303  88.255988  21.949060
three  55.724660  28.906997  21.150581   6.250792
two    46.100913  51.140302  92.888548  12.207747
2.2.4 切片索引,末端不包含(注意和上面的区别)
print(df.iloc[1:3])
print(df.iloc[::2])

–> 输出的结果为:

               a          b          c          d
two    46.100913  51.140302  92.888548  12.207747
three  55.724660  28.906997  21.150581   6.250792
               a          b          c          d
one    64.196153   3.181391  71.407232  66.672682
three  55.724660  28.906997  21.150581   6.250792

3 布尔型索引

3.1 前期准备数据
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   index = ['one','two','three','four'],
                   columns = ['a','b','c','d'])
print(df)

–> 输出的结果为:

               a          b          c          d
one    38.986549  81.009721  57.779180   6.768009
two    61.818468  24.443819  72.064397  87.910932
three  66.612955  48.643065  36.655897  37.299216
four    3.155591  25.298921   1.175081  49.936492
3.2 全局索引
b1 = df < 20
print(b1,type(b1))
print(df[b1]) 
 # 也可以书写为 df[df < 20]

–> 输出的结果为:

           a      b      c      d
one    False  False  False   True
two    False  False  False  False
three  False  False  False  False
four    True  False   True  False <class 'pandas.core.frame.DataFrame'>

              a   b         c         d
one         NaN NaN       NaN  6.768009
two         NaN NaN       NaN       NaN
three       NaN NaN       NaN       NaN
four   3.155591 NaN  1.175081       NaN
3.3 单列(行)判断索引
b2 = df['a'] > 50
print(b2,type(b2))
print(df[b2]) 
# 也可以书写为 df[df['a'] > 50]

–> 输出的结果为:

one      False
two       True
three     True
four     False
Name: a, dtype: bool <class 'pandas.core.series.Series'>

               a          b          c          d
two    61.818468  24.443819  72.064397  87.910932
three  66.612955  48.643065  36.655897  37.299216
3.4 多列做判断索引
b3 = df[['a','b']] > 50
print(b3,type(b3))
print(df[b3])  
# 也可以书写为 df[df[['a','b']] > 50]

–> 输出的结果为:

           a      b
one    False   True
two     True  False
three   True  False
four   False  False <class 'pandas.core.frame.DataFrame'>

               a          b   c   d
one          NaN  81.009721 NaN NaN
two    61.818468        NaN NaN NaN
three  66.612955        NaN NaN NaN
four         NaN        NaN NaN NaN
3.5 多行做判断索引
b4 = df.loc[['one','three']] < 50
print(b4,type(b4))
print(df[b4])  
# 也可以书写为 df[df.loc[['one','three']] < 50]

–> 输出的结果为:

           a      b      c     d
one     True  False  False  True
three  False   True   True  True <class 'pandas.core.frame.DataFrame'>

               a          b          c          d
one    38.986549        NaN        NaN   6.768009
two          NaN        NaN        NaN        NaN
three        NaN  48.643065  36.655897  37.299216
four         NaN        NaN        NaN        NaN
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

lys_828

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值