1. 数据载入及初步观察
1.1 载入数据
此处略
1.1.1 任务一:导入numpy和pandas
import numpy as np
import pandas as pd
1.1.2 任务二:载入数据
(1) 使用相对路径载入数据
把数据文件安置在和.py
同个文件夹下就可以
'''使用相对路径以及打印前十行'''
df1 = pd.read_csv("train.csv")
df2 = pd.read_csv("test.csv")
print(df1.head(10))
# PassengerId Survived Pclass ... Fare Cabin Embarked
# 0 1 0 3 ... 7.2500 NaN S
# 1 2 1 1 ... 71.2833 C85 C
# 2 3 1 3 ... 7.9250 NaN S
# 3 4 1 1 ... 53.1000 C123 S
# 4 5 0 3 ... 8.0500 NaN S
# 5 6 0 3 ... 8.4583 NaN Q
# 6 7 0 1 ... 51.8625 E46 S
# 7 8 0 3 ... 21.0750 NaN S
# 8 9 1 3 ... 11.1333 NaN S
# 9 10 1 2 ... 30.0708 NaN C
#
# [10 rows x 12 columns]
(2)使用相对路径
这里的路径我是手打的
'''使用绝对路径以及打印前五行'''
df3 = pd.read_csv("E:\Python\python操作\datawhale数据分析\\train.csv")
df4 = pd.read_csv("E:\Python\python操作\datawhale数据分析\\test.csv")
print(df3.head(5))
# PassengerId Survived Pclass ... Fare Cabin Embarked
# 0 1 0 3 ... 7.2500 NaN S
# 1 2 1 1 ... 71.2833 C85 C
# 2 3 1 3 ... 7.9250 NaN S
# 3 4 1 1 ... 53.1000 C123 S
# 4 5 0 3 ... 8.0500 NaN S
#
# [5 rows x 12 columns]
1.1.3 任务三:每1000行为一个数据模块,逐块读取
【思考】什么是逐块读取?为什么要逐块读取呢?
【回答】chunksize是在对数据进行逐块提取时使用的参