<pandas.core.groupby.generic.SeriesGroupBy object at 0x000001CCD8450910>
grouped.mean()
key1
a 0.090859
b 0.804812
Name: data1, dtype: float64
means = df['data1'].groupby([df['key1'],df['key2']]).mean()
means
key1 key2
a one -0.037649
two 0.347874
b one 0.399766
two 1.209857
Name: data1, dtype: float64
means.unstack()
key2
one
two
key1
a
-0.037649
0.347874
b
0.399766
1.209857
states = np.array(['Ohio','California','California','Ohio','Ohio'])
years = np.array([2005,2005,2006,2005,2006])
df['data1'].groupby([states,years]).mean()
key1 key2
a one 2
two 1
b one 1
two 1
dtype: int64
对分组进行迭代
for name,group in df.groupby('key1'):print(name)print(group)
a
key1 key2 data1 data2
0 a one -0.074122 -0.571432
1 a two 0.347874 -0.794645
4 a one -0.001175 0.180895
b
key1 key2 data1 data2
2 b one 0.399766 -0.596056
3 b two 1.209857 -0.266257
for(k1,k2),group in df.groupby(['key1','key2']):print(k1,k2)print(group)
a one
key1 key2 data1 data2
0 a one -0.074122 -0.571432
4 a one -0.001175 0.180895
a two
key1 key2 data1 data2
1 a two 0.347874 -0.794645
b one
key1 key2 data1 data2
2 b one 0.399766 -0.596056
b two
key1 key2 data1 data2
3 b two 1.209857 -0.266257
{dtype('float64'): data1 data2
0 -0.074122 -0.571432
1 0.347874 -0.794645
2 0.399766 -0.596056
3 1.209857 -0.266257
4 -0.001175 0.180895,
dtype('O'): key1 key2
0 a one
1 a two
2 b one
3 b two
4 a one}
<pandas.core.groupby.generic.SeriesGroupBy object at 0x000001CCD8452DA0>
s_grouped.mean()
key1 key2
a one -0.195268
two -0.794645
b one -0.596056
two -0.266257
Name: data2, dtype: float64
通过字典或Series进行分组
people = DataFrame(np.random.randn(5,5),
columns=['a','b','c','d','e'],
index=['Joe','Steve','Wes','Jim','Travis'])
people.loc[2:3,['b','c']]= np.nan#添加几个NA值
people
C:\windows\ FutureWarning: Slicing a positional slice with .loc is not supported, and will raise TypeError in a future version. Use .loc with labels or .iloc with positions instead.
people.loc[2:3,['b','c']] = np.nan#添加几个NA值