任务5:使用网格搜索法对5个模型进行调优(调参时采用五折交叉验证的方式),并进行模型评估,记得展示代码的运行结果。
注: 由于时间限制,这里的任务我只做SVM的先吧
导入必要的包
# 最优参数选择
from sklearn.model_selection import GridSearchCV
# KNN
from sklearn.neighbors import KNeighborsClassifier as KNN
# 评估
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import classification_report,confusion_matrix
from sklearn import cross_validation, metrics
# K折交叉验证
from sklearn.model_selection import cross_val_score as cvs
# 逻辑回归
from sklearn.linear_model import SGDClassifier
做一个评估函数
def do_estimation(y_train,
y_train_pred,
y_test,
y_test_pred,
model):
#精确值
acc_train = accuracy_score(y_train, y_train_pred)
acc_pred = accuracy_score(y_test, y_test_pred)print('Pred train acc:{0:.4f}, Pred test acc:{1:.4f}'\
.format(acc_train, acc_pred),'\n')precision, recall, F1, _ = precision_recall_fscore_support\
(y_test, y_test_pred, average="binary")print ("precision: {0:.2f}. recall: {1:.2f}, F1: {2:.2f}"\
.format(precision, recall, F1),'\n')clf_rpt = classification_report(y_test,y_test_pred)
print(clf_rpt,'\n')# 混淆矩阵
confusion_table = pd.DataFrame(confusion_matrix(y_test,y_test_pred))
confusion_table.index.name='Label'
confusion_table.columns.name='Prediction'confusion_table.rename(columns={1:'Yes',0:'No'},
index={1:'Yes',0:'No'},
inplace = True)# print('oob_score: ',model.oob_score_,'\n') # 随机森林模型用
y_train_prob = model.predict_proba(X_train)[:,1]
y_pred_prob = model.predict_proba(X_test)[:,1]
print("AUC Score (Train): %f" % metrics.roc_auc_score(y_train, y_train_prob))
print("AUC Score (Test): %f" % metrics.roc_auc_score(y_test, y_pred_prob))
return confusion_table
svc = SVC(kernel='rbf',
class_weight='balanced',
random_state=2018)
调参 min_samples_split, min_samples_leaf
param_grid = {'C':[0.01,0.05, 0.1, 1],
'gamma':[0.001, 0.01, 0.1, 1]}grid = GridSearchCV(estimator = svc,
param_grid = param_grid,
scoring = 'roc_auc',
cv = 5,
iid = True)grid.fit(X_train, y_train)
grid.grid_scores_, grid.best_params_, grid.best_score_
print("The best parameters are %s with a 'roc_auc' score of %0.6f"
% (grid.best_params_, grid.best_score_))
根据最优参数初始化模型
svc2 = SVC(kernel='rbf',
class_weight='balanced',
C = 0.1,
gamma = 0.01,
random_state=2018,
probability=True)
使用模型进行预测,并给出模型评估
注:结果太差了,作为练习,先知道要这么做吧,慢慢再调。。。。。。。。。。。。。。。。。。
svc2.fit(X_train,y_train)
y_test_pred = svc2.predict(X_test)
y_train_pred = svc2.predict(X_train)do_estimation(y_train,
y_train_pred,
y_test,
y_test_pred,
svc2)
使用K折交叉验证,这里只用KNN模型,因为以前用过,其他模型待研究。。。。
使用csv算出每个KNN的每n个neighbors 对应的cv_score是多少
neighbors = [2*i+1 for i in range(40)]
knn_scores = []
for k in neighbors:
knn_s = KNN(n_neighbors = k)
cv_scores = cvs(knn_s, X_train, y_train, cv=5, scoring='accuracy')
knn_scores.append(cv_scores.mean())
画图
plt.plot(neighbors, knn_scores);
i = knn_scores.index(max(knn_scores))
plt.scatter(neighbors[i], knn_scores[i], color='red', s=100);
plt.xlabel('Number of Neighbors K');
plt.ylabel('Misclassification Error');
plt.title('Best k = %d, knn_score = %.2f'%(neighbors[i], knn_scores[i]))
k_knn = neighbors[i]
使用最好的k建模预测
knn = KNN(n_neighbors = 11, weights='distance',p=2)
knn.fit(X_train, y_train)y_pred_knn = knn.predict(X_test)
y_train_knn = knn.predict(X_train)
评估
do_estimation(y_train, y_train_knn,y_test,y_pred_knn,knn)
--- End ---