【机器学习-贷款用户逾期情况分析】2.集成模型构建

最新推荐文章于 2024-03-19 16:17:37 发布

原创

最新推荐文章于 2024-03-19 16:17:37 发布 · 688 阅读

3 ·

CC 4.0 BY-SA版权

2.0 任务说明

接着上一篇博客的金融数据集，这次构建随机森林、GBDT、XGBoost和LightGBM这4个模型，并对每一个模型进行评分，例如准确度和auc值。

在集成学习中，主要分为bagging算法和boosting算法，上面的算法中随机森林属于bagging算法，另外三个是boosting方法。

2.1 随机森林

随机森林指的是利用多棵树对样本进行训练并预测的一种分类器。

（1）模型训练

from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
re=rfc.fit(x_train, y_train)

（2）模型评估

代码与之前一样，如下：

r = re.score(x_test,y_test)
print("R值(准确率):",r)
y_scores=re.predict_proba(x_test) #预测出来的得分，有两列，第一列为此值为0的概率，第二列为此值为1的概率
y_scores=y_scores[:,1]  #我们这里只需要预测为1的概率
from sklearn import metrics
from sklearn.metrics import confusion_matrix

y_pred = rfc.predict(x_test)    #测试数据预测值
cmat=confusion_matrix(y_test, y_pred)  #混淆矩阵
print (cmat)
fpr, tpr, thresholds=metrics.roc_curve(y_test,y_scores,pos_label=None,sample_weight=None,drop_intermediate=True)
plt.plot(fpr,tpr,marker = 'o')
plt.show()  #画出ROC曲线图
auc=metrics.auc(fpr,tpr)  #计算auc值

print("auc：",auc)

R值：

ROC：

auc:

混淆矩阵：

2.2 GBDT（提升树算法）

GBDT(Gradient Boosting Decision Tree) 是Boosting算法的一种，是一种迭代的决策树算法，该算法由多棵决策树组成，所有树的结论累加起来做最终答案。

（1）模型训练

from sklearn.ensemble import GradientBoostingClassifier
gbdt=GradientBoostingClassifier()
re=gbdt.fit(x_train, y_train)

（2）模型评估

与之前一样，代码如下：

r = re.score(x_test,y_test)
print("R值(准确率):",r)
y_scores=re.predict_proba(x_test) #预测出来的得分，有两列，第一列为此值为0的概率，第二列为此值为1的概率
y_scores=y_scores[:,1]  #我们这里只需要预测为1的概率

from sklearn import metrics
from sklearn.metrics import confusion_matrix

cmat=confusion_matrix(y_test, y_pred)  #混淆矩阵
print (cmat)

fpr, tpr, thresholds=metrics.roc_curve(y_test,y_scores,pos_label=None,sample_weight=None,drop_intermediate=True)
plt.plot(fpr,tpr,marker = 'o')
plt.show()  #画出ROC曲线图
auc=metrics.auc(fpr,tpr