【Datawhale】[task1]1.3代码示例

1.3代码示例

本部分为对于数据读取和指标评价的示例。

1.3.1数据读取Pandas

#coding=utf-8
import pandas as pd 
import numpy as np
import os 

#结果保存路径
output_path='G:/newjourney/Datawhale/output'
if not os.path.exists(output_path):
    os.makedirs(output_path)
## 1)载入训练集和测试集
path='G:/newjourney/Datawhale/'
Train_data=pd.read_csv(path+'used_car_train_20200313.csv',sep=' ')
Test_data=pd.read_csv(path+'used_car_testB_20200421.csv',sep=' ')

print('Train data shape:',Train_data.shape)
print('TestB data shape',Test_data.shape)
Train data shape: (150000, 31)
TestB data shape (50000, 30)
Train_data.head()
SaleIDnameregDatemodelbrandbodyTypefuelTypegearboxpowerkilometer...v_5v_6v_7v_8v_9v_10v_11v_12v_13v_14
007362004040230.061.00.00.06012.5...0.2356760.1019880.1295490.0228160.097462-2.8818032.804097-2.4208210.7952920.914762
1122622003030140.012.00.00.0015.0...0.2647770.1210040.1357310.0265970.020582-4.9004822.096338-1.030483-1.7226740.245522
221487420040403115.0151.00.00.016312.5...0.2514100.1149120.1651470.0621730.027075-4.8467491.8035591.565330-0.832687-0.229963
337186519960908109.0100.00.01.019315.0...0.2742930.1103000.1219640.0333950.000000-4.5095991.285940-0.501868-2.438353-0.478699
4411108020120103110.051.00.00.0685.0...0.2280360.0732050.0918800.0788190.121534-1.8962400.9107830.9311102.8345181.923482

5 rows × 31 columns

1.3.2 分类指标评价计算示例

## accuracy
import numpy as np
from sklearn.metrics import accuracy_score
y_pred=[0,1,0,1]
y_true=[0,1,1,1]
print('ACC:',accuracy_score(y_true,y_pred))
ACC: 0.75
## Precision,Recall,F1-score
from sklearn import metrics
y_pred=[0,1,0,0]
y_true=[0,1,0,1]
print('Precision:',metrics.precision_score(y_true,y_pred))
print('Recall:',metrics.recall_score(y_true,y_pred))
print('F1-score:',metrics.f1_score(y_true,y_pred))
Precision: 1.0
Recall: 0.5
F1-score: 0.6666666666666666

1.3.3 回归指标评价计算实例

import numpy as np
from sklearn import metrics

#MAPE(平均绝对百分误差(Mean Absolute Error,MAPE))需要自己实现
def mape(y_true,y_pred):
    return np.mean(np.abs((y_pred-y_true)/y_true))

y_true=np.array([1.0,5.0,4.0,3.0,2.0,5.0,-3.0])
y_pred=np.array([1.0,4.5,3.8,3.2,3.0,4.8,-2.2])

#MSE
print('MSE:',metrics.mean_squared_error(y_true,y_pred))
#RMSE
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_true,y_pred)))
#MAE
print('MAE:',metrics.mean_absolute_error(y_true,y_pred))
#MAPE
print('MAPE:',mape(y_true,y_pred))
MSE: 0.2871428571428571
RMSE: 0.5358571238146014
MAE: 0.4142857142857143
MAPE: 0.1461904761904762
##R2-score
from sklearn.metrics import r2_score
y_true=[3,-0.5,2,7]
y_pred=[2.5,0.0,2,8]
print('R2-score:',r2_score(y_true,y_pred))
R2-score: 0.9486081370449679

Ref.
[Datawhale 零基础入门数据挖掘-Task1 赛题理解 — By: AI蜗牛车]
知乎: https://www.zhihu.com/people/seu-aigua-niu-che
github: https://github.com/chehongshu
公众号: AI蜗牛车

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值