Dataframe求众数的解决方法

最新推荐文章于 2024-01-28 12:13:10 发布

原创最新推荐文章于 2024-01-28 12:13:10 发布 · 1w 阅读

13 ·

CC 4.0 BY-SA版权

自学专栏收录该内容

59 篇文章

订阅专栏

探讨了在Pandas中对DataFrame进行分组并求某一列的众数时遇到的问题及解决方案，包括自定义统计函数、使用scipy.stats.mode、value_counts及pd.Series.mode等方法。

Pandas在实际使用过程中，遇到如下问题。

有如下一个Dataframe，打算对A的每一个类别求B的众数，但是不能使用Dataframe.groupby('A').mode()，报如下错误。

>>import pandas as pd
>>df = pd.DataFrame({'A':['a','a','a','a','b','b','b','b','b'],'B':[1,1,2,3,1,2,2,3,3]})
Traceback (most recent call last):

  File "<ipython-input-293-3972c0972961>", line 1, in <module>
    df.groupby('A').mode()

  File "d:\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 762, in __getattr__
    return self._make_wrapper(attr)

  File "d:\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 799, in _make_wrapper
    raise AttributeError(msg)

AttributeError: Cannot access callable attribute 'mode' of 'DataFrameGroupBy' objects, try using the 'apply' method

首先，定义如下统计函数。

#统计数据
def getlistnum(li):
    li = list(li)
    set1 = set(li)
    dict1 = {}
    for item in set1:
        dict1.update({item:li.count(item)})
    return dict1

查看df统计，

>>df.groupby('A')['B'].apply(getlistnum)
A   
a  1    2
   2    1
   3    1
b  1    1
   2    2
   3    2
Name: B, dtype: int64

考虑groupby()方法可以通过agg调用外部函数，因此尝试了以下方法：

法1.使用scipy.stats.mode()：df中的B类别有两个众数，返回了B类别的众数取了较小的结果

>>from scipy import stats
>>df.groupby('A').agg(lambda x: stats.mode(x)[0][0]).reset_index()
   A  B
0  a  1
1  b  2

法2.使用value_counts() ，有两个众数以上的时候，返回了B类别的众数取了较大的结果

>>df.groupby('A').agg(lambda x: x.value_counts().index[0]).reset_index()
   A  B
0  a  1
1  b  3

法3.使用pd.Series.mode()：该函数是返回Series的众数的，当众数有多个时，会返回一个list，里面包含了所有众数

>>df.groupby('A').agg(pd.Series.mode).reset_index()
   A       B
0  a       1
1  b  [2, 3]