【Datawhale】[task1]1.3代码示例

最新推荐文章于 2025-05-13 10:15:12 发布

zyq_go

最新推荐文章于 2025-05-13 10:15:12 发布

阅读量183

点赞数

分类专栏：日常学习

原文链接：https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12586969.1002.24.1cd8593aw4bbL5&postId=95422&accounttraceid=3d88d863c7e64d4dbc82c720ccb6d376pmtr

版权

日常学习专栏收录该内容

27 篇文章

订阅专栏

1.3代码示例

本部分为对于数据读取和指标评价的示例。

1.3.1数据读取Pandas

#coding=utf-8
import pandas as pd 
import numpy as np
import os 

#结果保存路径
output_path='G:/newjourney/Datawhale/output'
if not os.path.exists(output_path):
    os.makedirs(output_path)

## 1)载入训练集和测试集
path='G:/newjourney/Datawhale/'
Train_data=pd.read_csv(path+'used_car_train_20200313.csv',sep=' ')
Test_data=pd.read_csv(path+'used_car_testB_20200421.csv',sep=' ')

print('Train data shape:',Train_data.shape)
print('TestB data shape',Test_data.shape)

Train data shape: (150000, 31)
TestB data shape (50000, 30)

Train_data.head()

	SaleID	name	regDate	model	brand	bodyType	gearbox	power	kilometer	...	v_5	v_6	v_7	v_8	v_9	v_10	v_11	v_12	v_13	v_14
0	0	736	20040402	30.0	6	1.0	0.0	60	12.5	...	0.235676	0.101988	0.129549	0.022816	0.097462	-2.881803	2.804097	-2.420821	0.795292	0.914762
1	1	2262	20030301	40.0	1	2.0	0.0	0	15.0	...	0.264777	0.121004	0.135731	0.026597	0.020582	-4.900482	2.096338	-1.030483	-1.722674	0.245522
2	2	14874	20040403	115.0	15	1.0	0.0	163	12.5	...	0.251410	0.114912	0.165147	0.062173	0.027075	-4.846749	1.803559	1.565330	-0.832687	-0.229963
3	3	71865	19960908	109.0	10	0.0	1.0	193	15.0	...	0.274293	0.110300	0.121964	0.033395	0.000000	-4.509599	1.285940	-0.501868	-2.438353	-0.478699
4	4	111080	20120103	110.0	5	1.0	0.0	68	5.0	...	0.228036	0.073205	0.091880	0.078819	0.121534	-1.896240	0.910783	0.931110	2.834518	1.923482

5 rows × 31 columns

1.3.2 分类指标评价计算示例

## accuracy
import numpy as np
from sklearn.metrics import accuracy_score
y_pred=[0,1,0,1]
y_true=[0,1,1,1]
print('ACC:',accuracy_score(y_true,y_pred))

ACC: 0.75

## Precision,Recall,F1-score
from sklearn import metrics
y_pred=[0,1,0,0]
y_true=[0,1,0,1]
print('Precision:',metrics.precision_score(y_true,y_pred))
print('Recall:',metrics.recall_score(y_true,y_pred))
print('F1-score:',metrics.f1_score(y_true,y_pred))

Precision: 1.0
Recall: 0.5
F1-score: 0.6666666666666666

1.3.3 回归指标评价计算实例

import numpy as np
from sklearn import metrics

#MAPE(平均绝对百分误差（Mean Absolute Error,MAPE）)需要自己实现
def mape(y_true,y_pred):
    return np.mean(np.abs((y_pred-y_true)/y_true))

y_true=np.array([1.0,5.0,4.0,3.0,2.0,5.0,-3.0])
y_pred=np.array([1.0,4.5,3.8,3.2,3.0,4.8,-2.2])

#MSE
print('MSE:',metrics.mean_squared_error(y_true,y_pred))
#RMSE
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_true,y_pred)))
#MAE
print('MAE:',metrics.mean_absolute_error(y_true,y_pred))
#MAPE
print('MAPE:',mape(y_true,y_pred))

MSE: 0.2871428571428571
RMSE: 0.5358571238146014
MAE: 0.4142857142857143
MAPE: 0.1461904761904762

##R2-score
from sklearn.metrics import r2_score
y_true=[3,-0.5,2,7]
y_pred=[2.5,0.0,2,8]
print('R2-score:',r2_score(y_true,y_pred))

R2-score: 0.9486081370449679

Ref.
[Datawhale 零基础入门数据挖掘-Task1 赛题理解 — By: AI蜗牛车]
知乎： https://www.zhihu.com/people/seu-aigua-niu-che
github: https://github.com/chehongshu
公众号： AI蜗牛车