初学python,主要是记录自己的学习经历,也方便日后查找使用。
欢迎在评论区留下你的问题,我们一起探讨,一起成长
第1部分:汇总及简单运算
1.1 python中groupby的用法
df_group = df.groupby(['Month']).agg(user_count = ('phone','nunique') # 统计phone的不同值个数
,id_count = ('orderid','count')
,sum_amount = ('actual_amount','sum')).reset_index()
1.2 python中使用窗口函数,获取最大值、最小值、平均值等
# 最小值
df_min['min_time'] = df.groupby(['phone'])['creat_time'].transform('min')
# 最大值
df_max['max_time'] = df.groupby(['phone'])['creat_time'].transform('max')
# 平均值
df_mean['mean_time'] = df.groupby(['phone'])['creat_time'].transform('mean')
1.3 python中concat的用法
# 两个df文件上下拼接时,axis = 0(可以省略)
df_concat = pd.concat([df1, df2])
# 两个df文件进行左右拼接
df_concat = pd.concat([df1, df2], axis=1) #左右拼接
1.4 把列名和内容拼接后放在df中
df1 = pd.DataFrame({'A': [1], 'B': [3],'C':[4],'good':[1],'bad':[1]})
df2 = pd.DataFrame({'A': [5,6], 'C': [7, 8], 'D': [7, 8],'good':[3,4],'bad':[5,6]})
df3 = pd.DataFrame({'E': [5,6], 'F': [7, 8], 'G': [7, 8],'good':[3,4],'bad':[5,6]})
for i in range(0,3):
df1[df1.columns[i]] = df1.columns[i] + '&'+df1[df1.columns[i]].astype('str')
df2[df2.columns[i]] = df2.columns[i] + '&'+df2[df2.columns[i]].astype('str')
df3[df3.columns[i]] = df3.columns[i] + '&'+df3[df3.columns[i]].astype('str')
df1 = df1.rename(columns={str(df1.columns[i]): 'fea' + str(i+1)})
df2 = df2.rename(columns={str(df2.columns[i]): 'fea' + str(i+1)})
df3= df3.rename(columns={str(df3.columns[i]): 'fea' + str(i+1)})
final = pd.concat([df1,df2,df3])
1.5 python中merge的用法
# 左链接how = 'left',对比SQL中的left join
df_merge = pd.merge(df1, df2, on = ['customerId', 'phone'], how = 'left')
# 右链接how = 'right',对比SQL中的right join
df_merge = pd.merge(df1, df2, on = ['customerId', 'phone'], how = 'right')
# 内链接how = 'inner',对比SQL中的inner join
df_merge = pd.merge(df1, df2, on = ['customerId', 'phone'], how = 'inner')