利用随机森林、GBDT、xgboost、LightGBM计算准确率和auc

本文对比了随机森林、GBDT、XGBoost和LightGBM在某数据集上的表现,通过计算准确率和AUC值,分析了各模型的优劣。结果显示,GBDT在准确率和AUC值上表现最佳。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

利用随机森林、GBDT、xgboost、LightGBM计算准确率和auc

  • 用到的模块
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import metrics
from sklearn.metrics import accuracy_score,roc_auc_score
from xgboost.sklearn import XGBClassifier
  • 读取数据集
data_all = pd.read_csv('/home/infisa/wjht/project/DataWhale/data_all.csv', encoding='gbk')
  • 划分数据集和测试集
features = [x for x in data_all.columns if x not in ['status']]
X = data_all[features]
y = data_all['status']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2018)
  • 构建模型 计算准确率
forest=RandomForestClassifier(n_estimators=100,random_state=2018) # 随机森林
forest.fit(X_train,y_train)
forest_y_score=forest.predict_proba(X_test)
# print(forest_y_score[:,1])
forest_score=forest.score(X_test,y_test) #准确率
# print('forest_score:',forest_score)
'ranfor_score:0.7820602662929222'

Gbdt=GradientBoostingClassifier(random_state=2018) #CBDT
Gbdt.fit(X_train,y_train)
Gbdt_score=Gbdt.score(X_train,y_train) #准确率
# print('Gbdt_score:',Gbdt_score)
'Gbdt_score:0.8623384430417794'

Xgbc=XGBClassifier(random_state=2018)  #Xgbc
Xgbc.fit(X_train,y_train)
y_xgbc_pred=Xgbc.predict(X_test)
Xgbc_score=accuracy_score(y_test,y_xgbc_pred) #准确率
# print('Xgbc_score:',Xgbc_score)
'Xgbc_score:0.7855641205325858'

gbm=lgb.LGBMClassifier(random_state=2018)  #lgb
gbm.fit(X_train,y_train)
y_gbm_pred=gbm.predict(X_test)
gbm_score=accuracy_score(y_test,y_gbm_pred)  #准确率
# print('gbm_score:',gbm_score)
'gbm_score:0.7701471618780659'
  • 计算auc
y_test_hot = label_binarize(y_test,classes =(0, 1)) # 将测试集标签数据用二值化编码的方式转换为矩阵
Gbdt_y_score = Gbdt.decision_function(X_test) # 得到Gbdt预测的损失值
forest_fpr,forest_tpr,forest_threasholds=metrics.roc_curve(y_test_hot.ravel(),forest_y_score[:,1].ravel()) # 计算ROC的值,forest_threasholds为阈值
Gbdt_fpr,Gbdt_tpr,Gbdt_threasholds=metrics.roc_curve(y_test_hot.ravel(),Gbdt_y_score.ravel()) # 计算ROC的值,Gbdt_threasholds为阈值

forest_auc=metrics.auc(forest_fpr,forest_tpr) #Gbdt_auc值
# print('forest_auc',forest_auc)
'forest_auc 0.7491366989035293'

Gbdt_auc=metrics.auc(Gbdt_fpr,Gbdt_tpr) #Gbdt_auc值
# print('Gbdt_auc:',Gbdt_auc)
'Gbdt_auc:0.7633094425839567'

Xgbc_auc=roc_auc_score(y_test,y_xgbc_pred) #Xgbc_auc值
# print('Xgbc_auc:',Xgbc_auc)
'Xgbc_auc:0.6431606209508309'

gbm_auc=roc_auc_score(y_test,y_gbm_pred) #gbm_auc值
# print('gbm_auc:',gbm_auc)
'gbm_auc:0.6310118097503468'
  • 简要分析

综合Forest,GBDT,XGBoot,lightgbm几种算法得出的准确率和auc值,GBDT的score:0.8623384430417794,auc:0.7633094425839567的效果最好.

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值