复习:在前面我们已经学习了Pandas基础,第二章我们开始进入数据分析的业务部分,在第二章第一节的内容中,我们学习了数据的清洗,这一部分十分重要,只有数据变得相对干净,我们之后对数据的分析才可以更有力。而这一节,我们要做的是数据重构,数据重构依旧属于数据理解(准备)的范围。
import numpy as np
import pandas as pd
df = pd.read_csv('DataWhale教程/data/train-left-up.csv')
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
2 第二章:数据重构
2.4 数据的合并
2.4.1 任务一:将data文件夹里面的所有数据都载入,与之前的原始数据相比,观察他们的之间的关系
left_up = pd.read_csv("DataWhale教程/data/train-left-up.csv")
left_down = pd.read_csv("DataWhale教程/data/train-left-down.csv")
right_up = pd.read_csv("DataWhale教程/data/train-right-up.csv")
right_down = pd.read_csv("DataWhale教程/data/train-right-down.csv")
left_up.head()
|
PassengerId |
Survived |
Pclass |
Name |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
left_down.head()
|
PassengerId |
Survived |
Pclass |
Name |
0 |
440 |
0 |
2 |
Kvillner, Mr. Johan Henrik Johannesson |
1 |
441 |
1 |
2 |
Hart, Mrs. Benjamin (Esther Ada Bloomfield) |
2 |
442 |
0 |
3 |
Hampe, Mr. Leon |
3 |
443 |
0 |
3 |
Petterson, Mr. Johan Emil |
4 |
444 |
1 |
2 |
Reynaldo, Ms. Encarnacion |
right_down.head()
|
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
male |
31.0 |
0 |
0 |
C.A. 18723 |
10.500 |
NaN |
S |
1 |
female |
45.0 |
1 |
1 |
F.C.C. 13529 |
26.250 |
NaN |
S |
2 |
male |
20.0 |
0 |
0 |
345769 |
9.500 |
NaN |
S |
3 |
male |
25.0 |
1 |
0 |
347076 |
7.775 |
NaN |
S |
4 |
female |
28.0 |
0 |
0 |
230434 |
13.000 |
NaN |
S |
right_up.head()
|
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
【提示】结合之前我们加载的train.csv数据,大致预测一下上面的数据是什么
2.4.2:任务二:使用concat方法:将数据train-left-up.csv和train-right-up.csv横向合并为一张表,并保存这张表为result_up
result_up = pd.concat([left_up,right_up],axis=1)
result_up.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
2.4.3 任务三:使用concat方法:将train-left-down和train-right-down横向合并为一张表,并保存这张表为result_down。然后将上边的result_up和result_down纵向合并为result。
result_down = pd.concat([left_down,right_down],axis=1)
result = pd.concat([result_up,result_down],axis=0)
result.shape
(891, 12)
2.4.4 任务四:使用DataFrame自带的方法join方法和append:完成任务二和任务三的任务
resul_up = left_up.join(right_up)
result_down = left_down.join(right_down)
result = result_up.append(result_down)
result.shape
(891, 12)
- join 把列横向合并 左右并
- append 把行纵向合并 上下并
2.4.5 任务五:使用Panads的merge方法和DataFrame的append方法:完成任务二和任务三的任务