DataWhale组队数据分析学习 – 第二章第四节数据可视化–task04
主要内容
本次依然是 pandas 的部分用法,比以往 加入了 matplotlib.pyplot 模块的用法,做了常用的柱状图、折线图等用法取展示数据
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
file_path = os.path.join(os.getcwd(), '..', 'datasets')
file_name = 'result.csv'
file_url = os.path.join(file_path, file_name)
data = pd.read_csv(file_url)
data.head()
Unnamed: 0 | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
1 | 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
2 | 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
4 | 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
sex = data.groupby('Sex')['Survived'].sum()
test_data = pd.DataFrame([[1, 2], [2, 2], [0, 2]], columns=['a', 'b'])
test_data.groupby('b')['a'].sum()
sex.plot.bar()
plt.title('sex survived count')
plt.show()
groupby
还是需要再次熟悉一下,用其返回值画柱状图非常适合。
使用 plt.title()函数设置图表标题
使用 plt.show() 展示函数图像
# 展示不同性别所占比例
test_data1 = data.groupby(['Sex']).sum()
print(test_data1)
test_data3 = data.groupby(['Sex']).Sex.count()
print('--------sex count-----------')
print(test_data3)
test_data3.plot(kind='bar')
test_data2 = data.groupby(['Sex', 'Survived']).sum()
print('------sex Survived-----------')
print(test_data2)
# 展示不同性别中幸存者的比例
data.groupby(['Sex','Survived'])['Survived'].count().unstack().plot(kind='bar',stacked='True')
Unnamed: 0 PassengerId Survived Pclass Age SibSp Parch \
Sex
female 71374 135343 233 678 7286.00 218.0 204.0
male 126693 262043 109 1379 13919.17 248.0 136.0
Fare
Sex
female 13966.6628
male 14727.2865
--------sex count-----------
Sex
female 314
male 577
Name: Sex, dtype: int64
------sex Survived-----------
Unnamed: 0 PassengerId Pclass Age SibSp Parch \
Sex Survived
female 0 18460 35223 231 1603.00 98.0 84.0
1 52914 100120 447 5683.00 120.0 120.0
male 0 103044 210189 1159 11382.50 206.0 97.0
1 23649 51854 220 2536.67 42.0 39.0
Fare
Sex Survived
female 0 1864.9752
1 12101.6876
male 0 10277.7447
1 4449.5418
<AxesSubplot:xlabel='Sex'>


- .plot(kind=‘bar’, stack=’’) 函数,调用 sum、count之类后的统计数据,均可以使用此属性
- 需要熟悉 .count/.value_counts等属性
# 可视化展示,不同票价的人的存活情况
test_data5 = data.groupby(['Fare'])['Survived'].value_counts().sort_values(ascending=False)
print('====== test_data5 =========')
print(test_data5)
fig = plt.figure(figsize=(20, 18))
test_data5.plot(grid=True)
plt.title('Survived of Fare')
plt.legend()
plt.show()
====== test_data5 =========
Fare Survived
8.0500 0 38
7.8958 0 37
13.0000 0 26
7.7500 0 22
13.0000 1 16
..
7.7417 0 1
26.2833 1 1
7.7375 1 1
26.3875 1 1
22.5250 0 1
Name: Survived, Length: 330, dtype: int64