第二章:第二节数据重构1-课程

复习:在前面我们已经学习了Pandas基础,第二章我们开始进入数据分析的业务部分,在第二章第一节的内容中,我们学习了数据的清洗,这一部分十分重要,只有数据变得相对干净,我们之后对数据的分析才可以更有力。而这一节,我们要做的是数据重构,数据重构依旧属于数据理解(准备)的范围。

开始之前,导入numpy、pandas包和数据
# 导入基本库
import pandas as pd 
import numpy as np

# 载入data文件中的:train-left-up.csv
train_left_up=pd.read_csv(r"./data/train-left-up.csv",encoding="utf_8_sig")
train_left_up
PassengerIdSurvivedPclassName
0103Braund, Mr. Owen Harris
1211Cumings, Mrs. John Bradley (Florence Briggs Th...
2313Heikkinen, Miss. Laina
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)
4503Allen, Mr. William Henry
...............
43443501Silvey, Mr. William Baird
43543611Carter, Miss. Lucile Polk
43643703Ford, Miss. Doolina Margaret "Daisy"
43743812Richards, Mrs. Sidney (Emily Hocking)
43843901Fortune, Mr. Mark

439 rows × 4 columns

2 第二章:数据重构

2.4 数据的合并

2.4.1 任务一:将data文件夹里面的所有数据都载入,观察数据的之间的关系
#写入代码
train_left_down=pd.read_csv(r"./data/train-left-down.csv",encoding="utf_8_sig")
train_right_down=pd.read_csv(r"./data/train-right-down.csv",encoding="utf_8_sig")
train_right_up=pd.read_csv(r"./data/train-right-up.csv",encoding="utf_8_sig")
train_left_up
PassengerIdSurvivedPclassName
0103Braund, Mr. Owen Harris
1211Cumings, Mrs. John Bradley (Florence Briggs Th...
2313Heikkinen, Miss. Laina
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)
4503Allen, Mr. William Henry
...............
43443501Silvey, Mr. William Baird
43543611Carter, Miss. Lucile Polk
43643703Ford, Miss. Doolina Margaret "Daisy"
43743812Richards, Mrs. Sidney (Emily Hocking)
43843901Fortune, Mr. Mark

439 rows × 4 columns

【提示】结合之前我们加载的train.csv数据,大致预测一下上面的数据是什么

train_right_up
SexAgeSibSpParchTicketFareCabinEmbarked
0male22.010A/5 211717.2500NaNS
1female38.010PC 1759971.2833C85C
2female26.000STON/O2. 31012827.9250NaNS
3female35.01011380353.1000C123S
4male35.0003734508.0500NaNS
...........................
434male50.0101350755.9000E44S
435female14.012113760120.0000B96 B98S
436female21.022W./C. 660834.3750NaNS
437female24.0232910618.7500NaNS
438male64.01419950263.0000C23 C25 C27S

439 rows × 8 columns

2.4.2:任务二:使用concat方法:将数据train-left-up.csv和train-right-up.csv横向合并为一张表,并保存这张表为result_up

axis: 0是行,1是列

data1=pd.concat([train_left_up,train_right_up],axis=1)
data1
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
43443501Silvey, Mr. William Bairdmale50.0101350755.9000E44S
43543611Carter, Miss. Lucile Polkfemale14.012113760120.0000B96 B98S
43643703Ford, Miss. Doolina Margaret "Daisy"female21.022W./C. 660834.3750NaNS
43743812Richards, Mrs. Sidney (Emily Hocking)female24.0232910618.7500NaNS
43843901Fortune, Mr. Markmale64.01419950263.0000C23 C25 C27S

439 rows × 12 columns

2.4.3 任务三:使用concat方法:将train-left-down和train-right-down横向合并为一张表,并保存这张表为result_down。然后将上边的result_up和result_down纵向合并为result。
#写入代码
data2=pd.concat([train_left_down,train_right_down],axis=1)


2.4.4 任务四:使用DataFrame自带的方法join方法和append:完成任务二和任务三的任务
#写入代码
# data1 = train_left_up.join(train_right_up)
# data1 = pd.merge(train_left_up,train_right_up,left_index=True,right_index=True)
data=pd.concat([data1,data2],axis=0)
data
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
44788702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
44888811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
44988903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
45089011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
45189103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

2.4.6 任务六:完成的数据保存为result.csv
#写入代码
data.to_csv(r"./result.csv",encoding="utf_8_sig")

2.5 换一种角度看数据

2.5.1 任务一:将我们的数据变为Series类型的数据

等同于行专列 不过注意的是Serie是一维度数据

#写入代码
data.stack()

0    PassengerId                          1
     Survived                             0
     Pclass                               3
     Name           Braund, Mr. Owen Harris
     Sex                               male
                             ...           
451  SibSp                                0
     Parch                                0
     Ticket                          370376
     Fare                              7.75
     Embarked                             Q
Length: 9826, dtype: object
#写入代码


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值