DataWhale 组队学习数据挖掘实践 任务四 和 五 模型调优 和 K折交叉验证

本文介绍了如何使用网格搜索法对SVM模型进行调优,采用五折交叉验证的方式,并展示了代码运行结果。同时,通过KNN模型探讨了不同邻居数对模型性能的影响,并找到了最佳的K值。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

任务5:使用网格搜索法对5个模型进行调优(调参时采用五折交叉验证的方式),并进行模型评估,记得展示代码的运行结果。

注: 由于时间限制,这里的任务我只做SVM的先吧

 

导入必要的包

# 最优参数选择
from sklearn.model_selection import GridSearchCV
# KNN
from sklearn.neighbors import KNeighborsClassifier as KNN
# 评估
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import classification_report,confusion_matrix
from sklearn import cross_validation, metrics
# K折交叉验证
from sklearn.model_selection import cross_val_score as cvs
# 逻辑回归
from sklearn.linear_model import SGDClassifier

做一个评估函数

def do_estimation(y_train,
                  y_train_pred,
                  y_test,
                  y_test_pred,
                  model):
    #精确值
    acc_train = accuracy_score(y_train, y_train_pred)
    acc_pred = accuracy_score(y_test, y_test_pred)

    print('Pred train acc:{0:.4f}, Pred test acc:{1:.4f}'\
          .format(acc_train, acc_pred),'\n')

    precision, recall, F1, _ = precision_recall_fscore_support\
                               (y_test, y_test_pred, average="binary")

    print ("precision: {0:.2f}. recall: {1:.2f}, F1: {2:.2f}"\
           .format(precision, recall, F1),'\n')

    clf_rpt = classification_report(y_test,y_test_pred)
    print(clf_rpt,'\n')

    # 混淆矩阵
    confusion_table = pd.DataFrame(confusion_matrix(y_test,y_test_pred))
    confusion_table.index.name='Label'
    confusion_table.columns.name='Prediction'

    confusion_table.rename(columns={1:'Yes',0:'No'},
                           index={1:'Yes',0:'No'},
                          inplace = True)

    # print('oob_score: ',model.oob_score_,'\n') # 随机森林模型用
    y_train_prob = model.predict_proba(X_train)[:,1]
    y_pred_prob = model.predict_proba(X_test)[:,1]
    print("AUC Score (Train): %f" % metrics.roc_auc_score(y_train, y_train_prob))
    print("AUC Score (Test): %f" % metrics.roc_auc_score(y_test, y_pred_prob))
    return confusion_table

 

svc = SVC(kernel='rbf',
          class_weight='balanced',
          random_state=2018)

 

调参 min_samples_split, min_samples_leaf

 

param_grid = {'C':[0.01,0.05, 0.1, 1],
              'gamma':[0.001, 0.01, 0.1, 1]}

grid = GridSearchCV(estimator = svc,
                    param_grid = param_grid,
                   scoring = 'roc_auc',
                   cv = 5,
                   iid = True)

grid.fit(X_train, y_train)

grid.grid_scores_, grid.best_params_, grid.best_score_

 

print("The best parameters are %s with a 'roc_auc' score of %0.6f"
      % (grid.best_params_, grid.best_score_))

 

根据最优参数初始化模型

svc2 = SVC(kernel='rbf',
           class_weight='balanced',
           C = 0.1,
           gamma = 0.01,
           random_state=2018,
          probability=True)

 

使用模型进行预测,并给出模型评估

注:结果太差了,作为练习,先知道要这么做吧,慢慢再调。。。。。。。。。。。。。。。。。。

svc2.fit(X_train,y_train)

y_test_pred = svc2.predict(X_test)
y_train_pred = svc2.predict(X_train)

do_estimation(y_train,
              y_train_pred,
              y_test,
              y_test_pred,
              svc2)

 

 

使用K折交叉验证,这里只用KNN模型,因为以前用过,其他模型待研究。。。。

使用csv算出每个KNN的每n个neighbors 对应的cv_score是多少

neighbors = [2*i+1 for i in range(40)]
knn_scores = []
for k in neighbors:
    knn_s = KNN(n_neighbors = k)
    cv_scores = cvs(knn_s, X_train, y_train, cv=5, scoring='accuracy')
    knn_scores.append(cv_scores.mean())

画图

plt.plot(neighbors, knn_scores);
i = knn_scores.index(max(knn_scores))
plt.scatter(neighbors[i], knn_scores[i], color='red', s=100);
plt.xlabel('Number of Neighbors K');
plt.ylabel('Misclassification Error');
plt.title('Best k = %d, knn_score = %.2f'%(neighbors[i], knn_scores[i]))
k_knn = neighbors[i]

 

使用最好的k建模预测

knn = KNN(n_neighbors = 11, weights='distance',p=2)
          
knn.fit(X_train, y_train)

y_pred_knn = knn.predict(X_test)
y_train_knn = knn.predict(X_train)

评估

do_estimation(y_train, y_train_knn,y_test,y_pred_knn,knn)

 

 

 

--- End ---

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值