GroupBy对象:DataFrameGroupBy,SeriesGroupBy
import pandas as pd
import numpy as np
dict_obj = {'key1' : ['a', 'b', 'a', 'b',
'a', 'b', 'a', 'a'],
'key2' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'data1': np.random.randn(8),
'data2': np.random.randn(8)}
df = pd.DataFrame(dict_obj)
#print(df)
print(type(df.groupby('key1'))) #<class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
print(type(df['data1'].groupby(df['key1']))) #<class 'pandas.core.groupby.groupby.SeriesGroupBy'>
分组运算
group1 = df.groupby('key1')
group2 = df['data1'].groupby(df['key1'])
print(group1.mean())
"""
data1 data2
key1
a 0.524696 -0.544060
b -0.739816 0.023044
"""
print(group2.mean())
"""
key1
a -0.426066
b -0.629560
Name: data1, dtype: float64
"""
size() 返回每个分组的元素个数
print(group1.size())
print(group2.size())
key1
a 5
b 3
dtype: int64
key1
a 5
b 3
Name: data1, dtype: int64
按多个列多层分组
grouped3 = df.groupby([‘key1’, ‘key2’])
print(grouped3.size())
key1 key2
a one 2
three 1
two 2
b one 1
three 1
two 1
dtype: int64
GroupBy对象支持迭代操作
for groupname,groupdata in group1:
print(groupname)
print(groupdata)
a
key1 key2 data1 data2
0 a one -1.922410 0.188846
2 a two -0.397336 -0.030794
4 a two -2.252202 -0.524890
6 a one 0.496702 0.421412
7 a three 0.060959 1.270431
b
key1 key2 data1 data2
1 b one 0.439111 -1.526786
3 b three 0.055915 0.841940
5 b two 1.172161 1.340567