seaborn 绘制dataframe histogram 直方图

最新推荐文章于 2025-02-23 22:06:46 发布

robot_learner

最新推荐文章于 2025-02-23 22:06:46 发布

阅读量1.7k

点赞数

分类专栏：数据挖掘杂谈

本文链接：https://blog.youkuaiyun.com/robot_learner/article/details/118538488

版权

数据挖掘同时被 2 个专栏收录

66 篇文章

订阅专栏

杂谈

19 篇文章

订阅专栏

seaborn 绘制dataframe histogram 直方图

get the data and do a count plot
what if we have too many values for the feature, and we can't plot all of their distributions in the histogram?
now how to show the value counts for two categorical variables?

我们经常需要绘制直方图来可视化某些特征或变量的分布。
如何快速获得有用的情节并完成工作？如果我们关心的是每个值的频率，seaborn 提供一个方便的方法，countplot（）函数，不用自己统计数据就可以得到图，迅速完成工作。

参考下面的示例：

get the data and do a count plot

%matplotlib inline
import seaborn as sns


titanic = sns.load_dataset("titanic")
titanic['class'] = titanic['class'].astype('str')
display(titanic)

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
886	0	2	male	27.0	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19.0	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	NaN	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26.0	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32.0	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

sns.set_theme(style="darkgrid")
ax = sns.countplot(x="embark_town", data=titanic)

在这里插入图片描述

what if we have too many values for the feature, and we can’t plot all of their distributions in the histogram?

# get the distinct values first, then choose the top n values we want to present; here we choose 2 as an example

sub_index = titanic['class'].value_counts().index[:2]
sub_data = titanic[titanic['class'].isin(sub_index)]
sub_data = sub_data.reset_index(drop=True)

ax = sns.countplot(x="class", data=sub_data)

在这里插入图片描述

# we can also explicitly require the order to be ascending
ax = sns.countplot(x="class", data=sub_data,order=sub_index[::-1])

在这里插入图片描述

now how to show the value counts for two categorical variables?

ax = sns.countplot(x="class", hue="who", data=titanic)

在这里插入图片描述

更多精彩文章：
https://datasciencebyexample.com/
datascience by example