pandas 索引与切片总结

本文将总结Pandas中Series和DataFrame的数据选择和索引方法。介绍Series的基本索引,包括用标签和位置下标的方式,以及切片操作。同时讲解了DataFrame的选择方式,包括按标签和位置的选择,并探讨了结合两者的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Pandas: Indexing and selecting data

1.Introduction

In this article, I will summarize the various indexing methods in Pandas. The primary focus will be on Series and DataFrame.

Series is a one-dimensional labeled array capable of holding any data type. The axis labels are collectively referred to as the index.

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series object.

I will not discuss the details of these two types here. You can get more information about Series and DataFrame in pandas documentation.

Reference Document Links: https://pandas.pydata.org/docs/user_guide

2.Basics

Object TypeSelectionReturn Value Type
Seriesseries[label]scalar value
DataFrameframe[colname]Series (corresponding to colname)

2.1 basic indexing of Series

2.1.1 obj[…]

The basic indexing (obj[…]) of Series is similar to the indexing of ndarrays, except that the index values of Series can be more than just integers.

obj = pd.Series(np.arange(4),index=['a', 'b', 'c', 'd'])
'''
a    0
b    1
c    2
d    3
dtype: int32
'''

obj.index # Index(['a', 'b', 'c', 'd'], dtype='object')

I think the most confusing thing for the beginners is that you can index data in Series with both labels and positional subscripts. These two types of indexing can be respectively called label indexing and location indexing.

obj['b'] # 1

obj[1] #1

obj[['b', 'a', 'd']]
'''
b    1
a    0
d    3
dtype: int32
'''

obj[[1,0,3]]
'''
b    1
a    0
d    3
dtype: int32
'''

Of course you can also set the label to an integer. But this is not recommended, because you will not be able to distinguish between the label indexing and location indexing.

2.1.2 slicing

You can also use slicing to select data in Series.

obj['a':'c'] 
'''
a    0
b    1
c    2
dtype: int32
'''

obj[0:2]
'''
a    0
b    1
dtype: int32
'''

Can you spot the difference between slicing with subscripts and slicing with labels?

You should note that when slicing with labels, both the start bound AND the stop bound are included, if present in the index. BUT when slicing with subscripts, the stop bound are not included!

2.1.3 boolean indexing

Boolean indexing are still vaild in Pandas.

obj[obj<2]
'''
a    0
b    1
dtype: int32
'''

2.2 basic indexing of DataFrame

Select columns:

you can use a single value or sequence to select out one or more columns from DataFrame.

data = pd.DataFrame(np.arange(16).reshape((4,4)), 
                    index=['Ohio','Colorado','Utah','New York'], 
                    columns=['one','two','three','four'])
'''
	     one    two    three    four
Ohio      0      1       2       3
Colorado  4      5       6	     7
Utah      8      9       10	     11
New York  12     13      14	     15
'''

#use a single value
data['two']
'''
the result is a Series
Ohio         1
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32
'''

#use a sequence
data[['three','one']]
'''
the result is a DataFrame
	     three	one
Ohio       2	 0
Colorado   6	 4
Utah	  10	 8
New York  14	 12
'''

Select rows:

you can use slicing or boolean indexing to select out one or more rows.

#note the difference between slicing with labels and subscripts
data[:2]
'''
	      one two three	four
    Ohio	0	1	2	3
Colorado	4	5	6	7
'''

data[:'Utah']
'''
	      one two three	four
    Ohio	0	1	2	3
Colorado	4	5	6	7
    Utah	8	9	10	11
'''
data[data['three']>5]
'''
          one two three	four
Colorado	4	5	6	7
    Utah	8	9	10	11
New York	12	13	14	15
'''

3. Selection by label (.loc)

pandas allows you to select a subset of rows and columns of an array from a DataFrame in NumPy style syntax.

The .loc attribute is the primary access method. The following are valid inputs:

  • A single label, e.g. 5 or ‘a’ (Note that 5 is interpreted as a label of the index. This use is not an integer position along the index.)
  • A list or array of labels [‘a’,’b’,’c’]
  • A slice object with labels ‘a’:‘f’
  • A boolean array.
#Select a single value
data.loc['Colorado','two'] # 5

#Select single row and multiple columns
data.loc['Colorado',['two','three']]
'''
the result is a Series
two      5
three    6
Name: Colorado, dtype: int32
'''

#Select multiple rows and single column
data.loc[['Colorado','New York'],'two']
'''
the result is a Series
Colorado     5
New York    13
Name: two, dtype: int32
'''

#Select multiple rows and multiple columns
data.loc[['Colorado','New York'],['two','three']]
'''
the result is a DataFrame
	      two three
Colorado	5	6
New York	13	14
'''

#you also can use slicing
data.loc[:'Utah','two']
'''
Ohio        1
Colorado    5
Utah        9
Name: two, dtype: int32
'''

4. Selection by position (.iloc)

pandas provides a suit of methods in order to get pure;y integer based indexing.

The .iloc attribute is the primary access method. The following are valid inputs:

  • An integer e.g. 5.
  • A list or array of integers [4, 3, 0].
  • A slice object with ints 1:7.
  • A boolean array.
#Select a single value
data.iloc[1,1] #5

#Select single row and multiple columns
data.iloc[1,[1,2]]
'''
two      5
three    6
Name: Colorado, dtype: int32
'''

#Select multiple rows and single column
data.iloc[[1,3],1]
'''
Colorado     5
New York    13
Name: two, dtype: int32
'''

#Select multiple rows and multiple columns
data.iloc[[1,3],[1,2]]
'''
	      two three
Colorado	5	6
New York	13	14
'''

#you also can use slicing
data.iloc[:2,1]
'''
Ohio        1
Colorado    5
Name: two, dtype: int32
'''

5. Combining positional and label-based indexing

If you wish to get the 0th and the 2nd elements from the index in the ‘A’ column, you can do:

dfd = pd.DataFrame({'A': [1, 2, 3],
                    'B': [4, 5, 6]},
                   index=list('abc'))
'''
    A	B
a	1	4
b	2	5
c	3	6
'''

dfd.loc[dfd.index[[0, 2]], 'A']
'''
a    1
c    3
Name: A, dtype: int64
'''

This can also be expressed using .iloc, by explicitly getting locations on the indexers, and using positional indexing to select things.

dfd.iloc[[0, 2], dfd.columns.get_loc('A')]
'''
a    1
c    3
Name: A, dtype: int64
'''

For getting multiple indexers, using .get_indexer

dfd.columns.get_indexer(['A', 'B'])
# array([0, 1], dtype=int64)

dfd.iloc[[0, 2], dfd.columns.get_indexer(['A', 'B'])]
'''
   A  B
a  1  4
c  3  6
'''

oc[[0, 2], dfd.columns.get_loc(‘A’)]
‘’’
a 1
c 3
Name: A, dtype: int64
‘’’


For getting *multiple* indexers, using **.get_indexer**

```python
dfd.columns.get_indexer(['A', 'B'])
# array([0, 1], dtype=int64)

dfd.iloc[[0, 2], dfd.columns.get_indexer(['A', 'B'])]
'''
   A  B
a  1  4
c  3  6
'''
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值