1.3代码示例
本部分为对于数据读取和指标评价的示例。
1.3.1数据读取Pandas
#coding=utf-8
import pandas as pd
import numpy as np
import os
#结果保存路径
output_path='G:/newjourney/Datawhale/output'
if not os.path.exists(output_path):
os.makedirs(output_path)
## 1)载入训练集和测试集
path='G:/newjourney/Datawhale/'
Train_data=pd.read_csv(path+'used_car_train_20200313.csv',sep=' ')
Test_data=pd.read_csv(path+'used_car_testB_20200421.csv',sep=' ')
print('Train data shape:',Train_data.shape)
print('TestB data shape',Test_data.shape)
Train data shape: (150000, 31)
TestB data shape (50000, 30)
Train_data.head()
SaleID | name | regDate | model | brand | bodyType | fuelType | gearbox | power | kilometer | ... | v_5 | v_6 | v_7 | v_8 | v_9 | v_10 | v_11 | v_12 | v_13 | v_14 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 736 | 20040402 | 30.0 | 6 | 1.0 | 0.0 | 0.0 | 60 | 12.5 | ... | 0.235676 | 0.101988 | 0.129549 | 0.022816 | 0.097462 | -2.881803 | 2.804097 | -2.420821 | 0.795292 | 0.914762 |
1 | 1 | 2262 | 20030301 | 40.0 | 1 | 2.0 | 0.0 | 0.0 | 0 | 15.0 | ... | 0.264777 | 0.121004 | 0.135731 | 0.026597 | 0.020582 | -4.900482 | 2.096338 | -1.030483 | -1.722674 | 0.245522 |
2 | 2 | 14874 | 20040403 | 115.0 | 15 | 1.0 | 0.0 | 0.0 | 163 | 12.5 | ... | 0.251410 | 0.114912 | 0.165147 | 0.062173 | 0.027075 | -4.846749 | 1.803559 | 1.565330 | -0.832687 | -0.229963 |
3 | 3 | 71865 | 19960908 | 109.0 | 10 | 0.0 | 0.0 | 1.0 | 193 | 15.0 | ... | 0.274293 | 0.110300 | 0.121964 | 0.033395 | 0.000000 | -4.509599 | 1.285940 | -0.501868 | -2.438353 | -0.478699 |
4 | 4 | 111080 | 20120103 | 110.0 | 5 | 1.0 | 0.0 | 0.0 | 68 | 5.0 | ... | 0.228036 | 0.073205 | 0.091880 | 0.078819 | 0.121534 | -1.896240 | 0.910783 | 0.931110 | 2.834518 | 1.923482 |
5 rows × 31 columns
1.3.2 分类指标评价计算示例
## accuracy
import numpy as np
from sklearn.metrics import accuracy_score
y_pred=[0,1,0,1]
y_true=[0,1,1,1]
print('ACC:',accuracy_score(y_true,y_pred))
ACC: 0.75
## Precision,Recall,F1-score
from sklearn import metrics
y_pred=[0,1,0,0]
y_true=[0,1,0,1]
print('Precision:',metrics.precision_score(y_true,y_pred))
print('Recall:',metrics.recall_score(y_true,y_pred))
print('F1-score:',metrics.f1_score(y_true,y_pred))
Precision: 1.0
Recall: 0.5
F1-score: 0.6666666666666666
1.3.3 回归指标评价计算实例
import numpy as np
from sklearn import metrics
#MAPE(平均绝对百分误差(Mean Absolute Error,MAPE))需要自己实现
def mape(y_true,y_pred):
return np.mean(np.abs((y_pred-y_true)/y_true))
y_true=np.array([1.0,5.0,4.0,3.0,2.0,5.0,-3.0])
y_pred=np.array([1.0,4.5,3.8,3.2,3.0,4.8,-2.2])
#MSE
print('MSE:',metrics.mean_squared_error(y_true,y_pred))
#RMSE
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_true,y_pred)))
#MAE
print('MAE:',metrics.mean_absolute_error(y_true,y_pred))
#MAPE
print('MAPE:',mape(y_true,y_pred))
MSE: 0.2871428571428571
RMSE: 0.5358571238146014
MAE: 0.4142857142857143
MAPE: 0.1461904761904762
##R2-score
from sklearn.metrics import r2_score
y_true=[3,-0.5,2,7]
y_pred=[2.5,0.0,2,8]
print('R2-score:',r2_score(y_true,y_pred))
R2-score: 0.9486081370449679
Ref.
[Datawhale 零基础入门数据挖掘-Task1 赛题理解 — By: AI蜗牛车]
知乎: https://www.zhihu.com/people/seu-aigua-niu-che
github: https://github.com/chehongshu
公众号: AI蜗牛车