【算法实践】模型融合

本文介绍了如何使用评分最高的GBDT和XGBoost模型进行stacking融合,以提高预测准确性。通过模型评估,发现stacking并未显著提升模型表现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

任务描述:

用上一篇博客中评分最高的模型作为基准模型,和其他模型进行stacking融合,得到最终模型及评分果。

模型融合

表现最好的模型是GBDT和XGBoost。(以准确率和AUC值为判别标准)

模型评估代码如下:

#GBDT
#训练集预测标签和概率输出
train_gbdt_predict = clf_gbdt.predict(X_train)
train_gbdt_predict_pro = clf_gbdt.predict_proba(X_train)[:,1]
#测试集预测标签和概率输出
test_gbdt_predict = clf_gbdt.predict(X_test)
test_gbdt_predict_pro = clf_gbdt.predict_proba(X_test)[:,1]

#训练集评分
model_evaluation(y_train,train_gbdt_predict,train_gbdt_predict_pro)
#测试集评分
model_evaluation(y_test,test_gbdt_predict,test_gbdt_predict_pro)

结果:

====================训练集
accuracy: 0.856026450255
precision: 0.865979381443
recall: 0.503597122302
f1_score: 0.636846095527
roc_auc_score: 0.909330778458
====================测试集
accuracy: 0.771548703574
precision: 0.577464788732
recall: 0.342618384401
f1_score: 0.43006993007
roc_auc_score: 0.762693916727
#XGBoost
#训练集预测标签和概率输出
train_xgb_predict = clf_xgb.predict(X_train)
train_xgb_predict_pro = clf_xgb.predict_proba(X_train)[:,1]
#测试集预测标签和概率输出
test_xgb_predict = clf_xgb.predict(X_test)
test_xgb_predict_pro = clf_xgb.predict_proba(X_test)[:,1]

#训练集评分
model_evaluation(y_train,train_xgb_predict,train_xgb_predict_pro)
#测试集评分
model_evaluation(y_test,test_xgb_predict,test_xgb_predict_pro)

结果:

====================训练集
accuracy: 0.848512173129
precision: 0.846638655462
recall: 0.483213429257
f1_score: 0.615267175573
roc_auc_score: 0.905284436711
====================测试集
accuracy: 0.784162578837
precision: 0.624390243902
recall: 0.356545961003
f1_score: 0.45390070922
roc_auc_score: 0.769253440164
#lightGBM
#训练集预测标签和概率输出
train_lgb_predict = clf_lgb.predict(X_train)
train_lgb_predict_pro = clf_lgb.predict_proba(X_train)[:,1]
#测试集预测标签和概率输出
test_lgb_predict = clf_lgb.predict(X_test)
test_lgb_predict_pro = clf_lgb.predict_proba(X_test)[:,1]

#训练集评分
model_evaluation(y_train,train_lgb_predict,train_lgb_predict_pro)
#测试集评分
model_evaluation(y_test,test_lgb_predict,test_lgb_predict_pro)

结果:

====================训练集
accuracy: 0.994289149384
precision: 1.0
recall: 0.97721822542
f1_score: 0.988477865373
roc_auc_score: 0.999994709407
====================测试集
accuracy: 0.768745620182
precision: 0.564444444444
recall: 0.353760445682
f1_score: 0.434931506849
roc_auc_score: 0.749950444952

stacking

# 将xgb作为初始模型进行stacking
folds_stack = RepeatedKFold(n_splits=5, n_repeats=2, random_state=4590)
oof_stack = np.zeros(X_train.shape[0])
predictions = np.zeros(X_test.shape[0])

#XGBoost
clf_xgb = xgb.XGBClassifier()

for fold_, (trn_idx, val_idx) in enumerate(folds_stack.split(X_train,y_train)):
    print("fold {}".format(fold_))
    trn_data, trn_y = X_train.iloc[trn_idx], y_train.iloc[trn_idx].values
    val_data, val_y = X_train.iloc[val_idx], y_train.iloc[val_idx].values
    
    clf_xgb.fit(trn_data,trn_y)
    meta_train_x = clf_xgb.predict(trn_data)#次级模型的训练、验证和测试输入
    meta_val_x = clf_xgb.predict(val_data)
    meta_test_x = clf_xgb.predict(X_test)
    
    clf_3 = BayesianRidge()
    clf_3.fit(meta_train_x, trn_y)
    
    oof_stack[val_idx] = clf_3.predict(meta_val_x)
    predictions += clf_3.predict(meta_test_x) / 10

#模型融合评估
model_evaluation(y_train,np.int64(oof_stack>0.5),oof_stack)
model_evaluation(y_test,np.int64(predictions>0.5),predictions)

结果:

====================训练集 交叉验证
accuracy: 0.793207093478
precision: 0.661504424779
recall: 0.358513189448
f1_score: 0.46500777605
roc_auc_score: 0.64548842274
====================测试集
accuracy: 0.781359495445
precision: 0.614634146341
recall: 0.350974930362
f1_score: 0.446808510638
roc_auc_score: 0.682931154998

问题:

stacking没有提高。。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值