金融贷款逾期的模型评分
1.数据信息:金融数据(非原始数据)
2. 任务类型:分类模型和集成模型评分和ROC曲线
记录7个模型(逻辑回归、SVM、决策树、随机森林、GBDT、XGBoost和LightGBM)关于accuracy、precision,recall和F1-score、auc值的评分表格,并画出ROC曲线。
3.代码及注释
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import LinearSVC
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import roc_curve,auc
import pandas as pd
import matplotlib.pyplot as plt
data_all = pd.read_csv('data_all.csv')
X = data_all.drop(['status'],axis=1)
y = data_all['status']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=2018)
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
def score(y_true, y_predicet, y_predict_pro):
acc_score = accuracy_score(y_true,y_predicet)
pre_score = precision_score(y_true,y_predicet)
recall = recall_score(y_true,y_predicet)
F = f1_score(y_true,y_predicet)
auc_score = roc_auc_score(y_true,y_predict_pro)
fpr, tpr, thresholds = roc_curve(y_test,y_predict_pro)
plt.plot(fpr,tpr,'b',label='AUC = %0.4f'% auc_score)
plt.plot([0,1],[0,1],'r--',label= 'Random guess')
plt.legend(loc='lower right')
plt.title('ROC')
plt.xlabel('false positive rate')
plt.ylabel('true positive rate')
plt.show()
lr = LogisticRegression()
lr.fit(X_train_std,y_train)
lr_predict = lr.predict(X_test_std)
lr_predict_pro = lr.predict_proba(X_test_std)[:,1]
score(y_test,lr_predict,lr_predict_pro)
svc = LinearSVC()
svc.fit(X_train_std,y_train)
svc_predict = svc.predict(X_test_std)
svc_predict_pro = svc.decision_function(X_test_std)
score(y_test,svc_predict,svc_predict_pro)
clf = DecisionTreeClassifier()
clf.fit(X_train_std,y_train)
clf_predict = clf.predict(X_test_std)
clf_predict_proba = clf.predict_proba(X_test_std)[:,1]
score(y_test,clf_predict,clf_predict_proba)
rfc = RandomForestClassifier()
rfc.fit(X_train_std,y_train)
rfc_predict = rfc.predict(X_test_std)
rfc_predict_proba = rfc.predict_proba(X_test_std)[:,1]
score(y_test,rfc_predict,rfc_predict_proba)
gdbt = GradientBoostingClassifier()
gdbt.fit(X_train_std,y_train)
gdbt_predict = gdbt.predict(X_test_std)
gdbt_predict_proba = gdbt.predict_proba(X_test_std)[:,1]
score(y_test,gdbt_predict,gdbt_predict_proba)
xgbs = XGBClassifier()
xgbs.fit(X_train_std,y_train)
xgbs_predict = xgbs.predict(X_test_std)
xgbs_predict_proba = xgbs.predict_proba(X_test_std)[:,1]
score(y_test,xgbs_predict,xgbs_predict_proba)
lgbm = LGBMClassifier()
lgbm.fit(X_train_std,y_train)
lgbm_predict = lgbm.predict(X_test_std)
lgbm_predict_proba = lgbm.predict_proba(X_test_std)[:,1]
score(y_test,lgbm_predict,lr_predict_pro)
4.评分表汇总
逻辑回归 | 模型评分 |
---|
accuracy | 0.7876664330763841 |
precision | 0.6609195402298851 |
recall | 0.3203342618384401 |
F1-Score | 0.4315196998123827 |
Auc | 0.7657428562486307 |

决策树 | 模型评分 |
---|
accuracy | 0.6860546601261388 |
precision | 0.3850129198966408 |
recall | 0.415041782729805 |
F1-Score | 0.39946380697050937 |
Auc | 0.5960976703911197 |

SVM | 模型评分 |
---|
accuracy | 0.7813594954449895 |
precision | 0.6535947712418301 |
recall | 0.2785515320334262 |
F1-Score | 0.390625 |
Auc | 0.7677954784931093 |

随机森林 | 模型评分 |
---|
accuracy | 0.7680448493342676 |
precision | 0.5875 |
recall | 0.2618384401114206 |
F1-Score | 0.3622350674373796 |
Auc | 0.7194427926095167 |

GBDT | 模型评分 |
---|
accuracy | 0.7806587245970568 |
precision | 0.6116504854368932 |
recall | -0.35097493036211697- |
F1-Score | 0.44601769911504424 |
Auc | 0.763233805932 |

XGboost | 模型评分 |
---|
accuracy | 0.7841625788367204 |
precision | 0.624390243902439 |
recall | 0.3565459610027855 |
F1-Score | 0.45390070921985815 |
Auc | 0.7708522424963224 |

lightGBM | 模型评分 |
---|
accuracy | 0.7701471618780659 |
precision | 0.5688888888888889 |
recall | - 0.3565459610027855- |
F1-Score | 0.4383561643835616 |
Auc | 0.7657428562486307 |

关于matplotlib的一篇文章
matplotlib:http://old.sebug.net/paper/books/scipydoc/matplotlib_intro.html