pandas排序与统计

最新推荐文章于 2025-06-19 16:24:13 发布

Shingle_

最新推荐文章于 2025-06-19 16:24:13 发布

阅读量1.6w

点赞数 5

CC 4.0 BY-SA版权

分类专栏： pandas 数据分析文章标签： python pandas 数据分析

本文链接：https://blog.youkuaiyun.com/shingle_/article/details/71480334

数据分析同时被 2 个专栏收录

21 篇文章

订阅专栏

pandas

13 篇文章

订阅专栏

本文介绍了pandas库在Python数据分析中的排序和统计方法，包括对行或列索引的排序，Series和DataFrame的值排序，间接统计，累积型统计，针对列的汇总统计，相关系数和协方差计算，以及唯一值、值计数和成员资格检查等关键操作。

《Python for Data Analysis》

排序

`sort_index()`

对行或列索引进行排序

In [1]: import pandas as pd

In [2]: from pandas import DataFrame, Series

In [3]: obj = Series(range(4), index=['d','a','b','c'])

In [4]: obj
Out[4]:
d    0
a    1
b    2
c    3
dtype: int64

In [5]: obj.sort_index()
Out[5]:
a    1
b    2
c    3
d    0
dtype: int64

In [6]: import numpy as np

In [8]: frame = DataFrame(np.arange(8).reshape((2,4)), index=['three','one'],
   ...:                   columns=['d','a','b','c'])

In [9]: frame
Out[9]:
       d  a  b  c
three  0  1  2  3
one    4  5  6  7

In [10]: frame.sort_index()
Out[10]:
       d  a  b  c
one    4  5  6  7
three  0  1  2  3

In [11]: frame.sort_index(axis=1)
Out[11]:
       a  b  c  d
three  1  2  3  0
one    5  6  7  4

In [12]: frame.sort_index(axis=1, ascending=False)
Out[12]:
       d  c  b  a
three  0  3  2  1
one    4  7  6  5

`sort_values`

对Series按值进行排序, 排序时，任何缺失值默认都会被放到Series的末尾。

In [18]: obj = Series([4, np.nan, 6, np.nan, -3, 2])

In [19]: obj
Out[19]:
0    4.0
1    NaN
2    6.0
3    NaN
4   -3.0
5    2.0
dtype: float64

In [21]: obj.sort_values()
Out[21]:
4   -3.0
5    2.0
0    4.0
2    6.0
1    NaN
3    NaN
dtype: float64

在DataFrame上，根据一个或多个列中的值进行排序。将一个或多个列的名字传递给by选项即可达到该目的：

In [16]: frame.sort_values(by='b')
Out[16]:
       d  a  b  c
three  0  1  2  3
one    4  5  6  7

汇总和统计

sum、mean、max

约简方法的选项

选项	说明
axis	约简的轴。DataFrame的行用0，列用1
skipna	排除缺失值，默认值为True
level	如果轴是层次化索引的（MiltiIndex）,根据level分组约简。