import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_train=pd.read_csv('kaggle_bike_competition_train.csv',header = 0)
df_train.head(10)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | casual | registered | count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2011-01-01 00:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0000 | 3 | 13 | 16 |
1 | 2011-01-01 01:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0000 | 8 | 32 | 40 |
2 | 2011-01-01 02:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0000 | 5 | 27 | 32 |
3 | 2011-01-01 03:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0000 | 3 | 10 | 13 |
4 | 2011-01-01 04:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0000 | 0 | 1 | 1 |
5 | 2011-01-01 05:00:00 | 1 | 0 | 0 | 2 | 9.84 | 12.880 | 75 | 6.0032 | 0 | 1 | 1 |
6 | 2011-01-01 06:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0000 | 2 | 0 | 2 |
7 | 2011-01-01 07:00:00 | 1 | 0 | 0 | 1 | 8.20 | 12.880 | 86 | 0.0000 | 1 | 2 | 3 |
8 | 2011-01-01 08:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0000 | 1 | 7 | 8 |
9 | 2011-01-01 09:00:00 | 1 | 0 | 0 | 1 | 13.12 | 17.425 | 76 | 0.0000 | 8 | 6 | 14 |
字段的名字和类型
df_train.dtypes
datetime object season int64 holiday int64 workingday int64 weather int64 temp float64 atemp float64 humidity int64 windspeed float64 casual int64 registered int64 count int64 dtype: object
df_train.shape
(10886, 12) 看看有没有缺省的字段,好吧,其实那个字段的数据都不缺!
df_train.count()
datetime 10886 season 10886 holiday 10886 workingday 10886 weather 10886 temp 10886 atemp 10886 humidity 10886 windspeed 10886 casual 10886 registered 10886 count 10886 dtype: int64
type(df_train.datetime)
pandas.core.series.Series
# 把月、日、和 小时单独拎出来,放到3列中
df_train['month']=pd.DatetimeIndex(df_train.datetime).month #创建新的一列'month',月份
df_train['dayofweek']=pd.DatetimeIndex(df_train.datetime).dayofweek #星期几
df_train['hour']=pd.DatetimeIndex(df_train.datetime).hour #小时
df_train.head(20)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }