有DataFrame:
df = pd.DataFrame({
'group': [1, 1, 2, 3, 3, 3, 4],
'param': ['a', 'a', 'b', np.nan, 'a', 'a', np.nan]
})
print(df)
# group param
# 0 1 a
# 1 1 a
# 2 2 b
# 3 3 NaN
# 4 3 a
# 5 3 a
# 6 4 NaN
想要得到的结果:
# a 2
# b 1
方法一
nunique()
print (df.groupby('param')['group'].nunique())
param
# a 2
# b 1
# Name: group, dtype: int64
方法二
- unique()
- create new df by DataFrame.from_records()
- reshape to Series by stack
- value_counts()
a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
# a 2
# b 1
# dtype: int64