matplotlib必背系列:
https://blog.youkuaiyun.com/ssswill/article/details/86419094
sklearn必背系列:
下面开始pandas必须掌握的编程语句~~
1.关于pandas读取文件
df_train = pd.read_csv('../input/train.csv')
2.关于pandas特征工程/预处理(preprocessing)
2.1 pandas处理缺失值
#两个df文件有同样的包含缺失值的列
for df in [df_hist_trans,df_new_merchant_trans]:
df['category_2'].fillna(1.0,inplace=True)
df['category_3'].fillna('A',inplace=True)
df['merchant_id'].fillna('M_ID_00a6ca8a8a',inplace=True)
df['Age'].fillna(df['Age'].mean(),inplace=True)
2.2 pandas处理时间量
df['purchase_date'] = pd.to_datetime(df['purchase_date'])
#train = pd.read_csv('../input/train.csv',parse_dates=["first_active_month"])
df['year'] = df['purchase_date'].dt.year
df['weekofyear'] = df['purchase_date'].dt.weekofyear
df['month'] = df['purchase_date'].dt.month
df['dayofweek'] = df['purchase_date'].dt.dayofweek
关于pandas时间量处理更多详细内容见:
https://blog.youkuaiyun.com/ssswill/article/details/86530045
2.3 关于类型转换
2.3.1 关于str类型yes no转