seaborn 绘制dataframe histogram 直方图
我们经常需要绘制直方图来可视化某些特征或变量的分布。
如何快速获得有用的情节并完成工作? 如果我们关心的是每个值的频率,seaborn 提供 一个方便的方法,countplot()函数,不用自己统计数据就可以得到图,迅速完成工作。
参考下面的示例:
get the data and do a count plot
%matplotlib inline
import seaborn as sns
titanic = sns.load_dataset("titanic")
titanic['class'] = titanic['class'].astype('str')
display(titanic)
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
886 | 0 | 2 | male | 27.0 | 0 | 0 | 13.0000 | S | Second | man | True | NaN | Southampton | no | True |
887 | 1 | 1 | female | 19.0 | 0 | 0 | 30.0000 | S | First | woman | False | B | Southampton | yes | True |
888 | 0 | 3 | female | NaN | 1 | 2 | 23.4500 | S | Third | woman | False | NaN | Southampton | no | False |
889 | 1 | 1 | male | 26.0 | 0 | 0 | 30.0000 | C | First | man | True | C | Cherbourg | yes | True |
890 | 0 | 3 | male | 32.0 | 0 | 0 | 7.7500 | Q | Third | man | True | NaN | Queenstown | no | True |
891 rows × 15 columns
sns.set_theme(style="darkgrid")
ax = sns.countplot(x="embark_town", data=titanic)
what if we have too many values for the feature, and we can’t plot all of their distributions in the histogram?
# get the distinct values first, then choose the top n values we want to present; here we choose 2 as an example
sub_index = titanic['class'].value_counts().index[:2]
sub_data = titanic[titanic['class'].isin(sub_index)]
sub_data = sub_data.reset_index(drop=True)
ax = sns.countplot(x="class", data=sub_data)
# we can also explicitly require the order to be ascending
ax = sns.countplot(x="class", data=sub_data,order=sub_index[::-1])
now how to show the value counts for two categorical variables?
ax = sns.countplot(x="class", hue="who", data=titanic)
更多精彩文章:
https://datasciencebyexample.com/
datascience by example