分箱操作就是将连续型数据离散化,分为等距分箱和等频分箱
1.等距分箱
- .cut()
- 参数bins:组数
- 参数right:True(默认)左开右闭,False左闭右开
- 参数labels:分箱后分类的标签
import numpy as np
import pandas as pd
data = np.random.randint(0,100,size=(5,3))
df = pd.DataFrame(data=data,columns=['Python','Pandas','PyTorch'])
print(df)
print('##################################')
s = pd.cut(df.Python,bins=4)
print(s)
print('##################################')
s = pd.cut(df.Python,bins=[0,30,60,80,100],right=False,labels=['D','C','B','A'])
print(s)
2.等频分箱
- .qcut()
- 参数q:设置等份
- 参数labels:分箱后分类的标签
import numpy as np
import pandas as pd
data = np.random.randint(0,100,size=(5,3))
df = pd.DataFrame(data=data,columns=['Python','Pandas','PyTorch'])
print(df)
print('##################################')
s = pd.qcut(df.Python,q=3,labels=['C','B','A'])
print(s)
知识点为听课总结笔记,课程为B站“千锋教育Pandas数据分析从入门到实战,零基础小白保姆级Python数据分析教程”:001_Pandas_Pandas介绍_哔哩哔哩_bilibili