Python-ML(第十六周作业)

本文通过实验对比了三种常见机器学习算法:高斯朴素贝叶斯、支持向量机和服务于不同参数配置的随机森林,在分类任务上的表现。采用10折交叉验证的方式评估算法准确性、F1分数和AUC-ROC等指标。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

题目

Assignment
Steps
1 Create a classification dataset (n samples ! 1000, n features ! 10)
2 Split the dataset using 10-fold cross validation
3 Train the algorithms
I GaussianNB
I SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
I RandomForestClassifier (possible n estimators values [10, 100, 1000])
4 Evaluate the cross-validated performance
I Accuracy
I F1-score
I AUC ROC
5 Write a short report summarizing the methodology and the results

代码

from sklearn import datasets, cross_validation, metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

datasets = datasets.make_classification(n_samples=1000, n_features=10)
kf = cross_validation.KFold(len(datasets[0]), n_folds=10, shuffle=True)
clf = GaussianNB()
for train_index, test_index in kf:
    X_train, y_train = datasets[0][train_index], datasets[1][train_index]
    X_test, y_test = datasets[0][test_index], datasets[1][test_index]
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)
    acc = metrics.accuracy_score(y_test, pred)
    f1 = metrics.f1_score(y_test, pred)
    auc = metrics.roc_auc_score(y_test, pred)
    print("Naive Bayes")
    print("acc:", acc, "f1:", f1, "auc", auc)

    for C_data in [1e-02, 1e-01, 1e00, 1e01, 1e02]:
        clf = SVC(C=C_data, kernel='rbf', gamma=0.1)
        clf.fit(X_train, y_train)
        pred = clf.predict(X_test)
        acc = metrics.accuracy_score(y_test, pred)
        f1 = metrics.f1_score(y_test, pred)
        auc = metrics.roc_auc_score(y_test, pred)
        print("SVM C_data",C_data)
        print("acc:", acc, "f1:", f1, "auc", auc)

    for n_data in [10,100,1000]:
        clf=RandomForestClassifier(n_estimators=n_data)
        clf.fit(X_train,y_train)
        pred=clf.predict(X_test)
        acc = metrics.accuracy_score(y_test, pred)
        f1 = metrics.f1_score(y_test, pred)
        auc = metrics.roc_auc_score(y_test, pred)
        print("Random Forest n_data", n_data)
        print("acc:", acc, "f1:", f1, "auc", auc)

结果

Naive Bayes
acc: 0.93 f1: 0.9306930693069307 auc 0.930172068827531
SVM C_data 0.01
acc: 0.91 f1: 0.9032258064516129 auc 0.9117647058823529
SVM C_data 0.1
acc: 0.96 f1: 0.9591836734693878 auc 0.9607843137254901
SVM C_data 1.0
acc: 0.95 f1: 0.9494949494949494 auc 0.950580232092837
SVM C_data 10.0
acc: 0.94 f1: 0.9411764705882353 auc 0.9399759903961585
SVM C_data 100.0
acc: 0.92 f1: 0.9215686274509803 auc 0.9199679871948779
Random Forest n_data 10
acc: 0.97 f1: 0.9696969696969697 auc 0.9705882352941176
Random Forest n_data 100
acc: 0.98 f1: 0.98 auc 0.9803921568627452
Random Forest n_data 1000
acc: 0.97 f1: 0.9696969696969697 auc 0.9705882352941176
Naive Bayes
acc: 0.96 f1: 0.96 auc 0.96
SVM C_data 0.01
acc: 0.93 f1: 0.9263157894736842 auc 0.9299999999999999
SVM C_data 0.1
acc: 0.95 f1: 0.9484536082474226 auc 0.95
SVM C_data 1.0
acc: 0.94 f1: 0.9375 auc 0.94
SVM C_data 10.0
acc: 0.9 f1: 0.8936170212765958 auc 0.8999999999999999
SVM C_data 100.0
acc: 0.88 f1: 0.8749999999999999 auc 0.8799999999999999
Random Forest n_data 10
acc: 0.97 f1: 0.9702970297029702 auc 0.9699999999999999
Random Forest n_data 100
acc: 0.96 f1: 0.96 auc 0.96
Random Forest n_data 1000
acc: 0.96 f1: 0.96 auc 0.96
Naive Bayes
acc: 0.99 f1: 0.9894736842105264 auc 0.9895833333333333
SVM C_data 0.01
acc: 0.89 f1: 0.8910891089108911 auc 0.891826923076923
SVM C_data 0.1
acc: 0.97 f1: 0.967741935483871 auc 0.96875
SVM C_data 1.0
acc: 0.94 f1: 0.9347826086956522 auc 0.938301282051282
SVM C_data 10.0
acc: 0.94 f1: 0.9361702127659574 auc 0.9391025641025641
SVM C_data 100.0
acc: 0.93 f1: 0.924731182795699 auc 0.9286858974358976
Random Forest n_data 10
acc: 0.98 f1: 0.9787234042553191 auc 0.9791666666666667
Random Forest n_data 100
acc: 0.99 f1: 0.9894736842105264 auc 0.9895833333333333
Random Forest n_data 1000
acc: 0.98 f1: 0.9787234042553191 auc 0.9791666666666667
Naive Bayes
acc: 0.97 f1: 0.9719626168224299 auc 0.9727272727272727
SVM C_data 0.01
acc: 0.88 f1: 0.8775510204081634 auc 0.8909090909090909
SVM C_data 0.1
acc: 0.95 f1: 0.9532710280373831 auc 0.9525252525252524
SVM C_data 1.0
acc: 0.96 f1: 0.9629629629629629 auc 0.9616161616161615
SVM C_data 10.0
acc: 0.96 f1: 0.9629629629629629 auc 0.9616161616161615
SVM C_data 100.0
acc: 0.91 f1: 0.9142857142857144 auc 0.9141414141414141
Random Forest n_data 10
acc: 0.97 f1: 0.9719626168224299 auc 0.9727272727272727
Random Forest n_data 100
acc: 0.97 f1: 0.9719626168224299 auc 0.9727272727272727
Random Forest n_data 1000
acc: 0.97 f1: 0.9719626168224299 auc 0.9727272727272727
Naive Bayes
acc: 0.97 f1: 0.967741935483871 auc 0.96875
SVM C_data 0.01
acc: 0.89 f1: 0.8910891089108911 auc 0.891826923076923
SVM C_data 0.1
acc: 0.95 f1: 0.945054945054945 auc 0.9479166666666667
SVM C_data 1.0
acc: 0.95 f1: 0.946236559139785 auc 0.9487179487179486
SVM C_data 10.0
acc: 0.95 f1: 0.946236559139785 auc 0.9487179487179486
SVM C_data 100.0
acc: 0.93 f1: 0.924731182795699 auc 0.9286858974358976
Random Forest n_data 10
acc: 0.97 f1: 0.967741935483871 auc 0.96875
Random Forest n_data 100
acc: 0.97 f1: 0.967741935483871 auc 0.96875
Random Forest n_data 1000
acc: 0.97 f1: 0.967741935483871 auc 0.96875
Naive Bayes
acc: 0.99 f1: 0.9887640449438202 auc 0.9888888888888889
SVM C_data 0.01
acc: 0.88 f1: 0.8800000000000001 auc 0.888888888888889
SVM C_data 0.1
acc: 0.95 f1: 0.9411764705882353 auc 0.9444444444444444
SVM C_data 1.0
acc: 0.95 f1: 0.9425287356321839 auc 0.9464646464646465
SVM C_data 10.0
acc: 0.97 f1: 0.967032967032967 auc 0.9707070707070707
SVM C_data 100.0
acc: 0.87 f1: 0.853932584269663 auc 0.8676767676767676
Random Forest n_data 10
acc: 0.99 f1: 0.9887640449438202 auc 0.9888888888888889
Random Forest n_data 100
acc: 0.97 f1: 0.9655172413793104 auc 0.9666666666666667
Random Forest n_data 1000
acc: 0.99 f1: 0.9887640449438202 auc 0.9888888888888889
Naive Bayes
acc: 0.97 f1: 0.968421052631579 auc 0.9695512820512822
SVM C_data 0.01
acc: 0.92 f1: 0.9183673469387755 auc 0.920673076923077
SVM C_data 0.1
acc: 0.92 f1: 0.9111111111111111 auc 0.9174679487179487
SVM C_data 1.0
acc: 0.96 f1: 0.9574468085106383 auc 0.9591346153846154
SVM C_data 10.0
acc: 0.95 f1: 0.9484536082474228 auc 0.9503205128205129
SVM C_data 100.0
acc: 0.86 f1: 0.851063829787234 auc 0.858974358974359
Random Forest n_data 10
acc: 0.97 f1: 0.968421052631579 auc 0.9695512820512822
Random Forest n_data 100
acc: 0.97 f1: 0.968421052631579 auc 0.9695512820512822
Random Forest n_data 1000
acc: 0.97 f1: 0.968421052631579 auc 0.9695512820512822
Naive Bayes
acc: 0.97 f1: 0.970873786407767 auc 0.9703525641025642
SVM C_data 0.01
acc: 0.88 f1: 0.8695652173913044 auc 0.8846153846153846
SVM C_data 0.1
acc: 0.96 f1: 0.9607843137254902 auc 0.9607371794871795
SVM C_data 1.0
acc: 0.97 f1: 0.970873786407767 auc 0.9703525641025642
SVM C_data 10.0
acc: 0.95 f1: 0.9523809523809524 auc 0.9495192307692308
SVM C_data 100.0
acc: 0.9 f1: 0.9056603773584906 auc 0.8990384615384616
Random Forest n_data 10
acc: 0.97 f1: 0.970873786407767 auc 0.9703525641025642
Random Forest n_data 100
acc: 0.97 f1: 0.970873786407767 auc 0.9703525641025642
Random Forest n_data 1000
acc: 0.97 f1: 0.970873786407767 auc 0.9703525641025642
Naive Bayes
acc: 0.96 f1: 0.96 auc 0.96
SVM C_data 0.01
acc: 0.94 f1: 0.9375 auc 0.94
SVM C_data 0.1
acc: 0.94 f1: 0.9375 auc 0.94
SVM C_data 1.0
acc: 0.95 f1: 0.9484536082474226 auc 0.95
SVM C_data 10.0
acc: 0.95 f1: 0.9494949494949495 auc 0.95
SVM C_data 100.0
acc: 0.89 f1: 0.8952380952380952 auc 0.89
Random Forest n_data 10
acc: 0.94 f1: 0.94 auc 0.94
Random Forest n_data 100
acc: 0.96 f1: 0.96 auc 0.96
Random Forest n_data 1000
acc: 0.95 f1: 0.9504950495049505 auc 0.95
Naive Bayes
acc: 0.97 f1: 0.9714285714285713 auc 0.9704937775993576
SVM C_data 0.01
acc: 0.84 f1: 0.8222222222222222 auc 0.8490566037735849
SVM C_data 0.1
acc: 0.94 f1: 0.9400000000000001 auc 0.9433962264150944
SVM C_data 1.0
acc: 0.92 f1: 0.9199999999999999 auc 0.9233239662786029
SVM C_data 10.0
acc: 0.9 f1: 0.9019607843137256 auc 0.9020473705339221
SVM C_data 100.0
acc: 0.91 f1: 0.9090909090909092 auc 0.9138900040144521
Random Forest n_data 10
acc: 0.97 f1: 0.9714285714285713 auc 0.9704937775993576
Random Forest n_data 100
acc: 0.96 f1: 0.9615384615384616 auc 0.9610598153352067
Random Forest n_data 1000
acc: 0.97 f1: 0.9714285714285713 auc 0.9704937775993576

Process finished with exit code 0

分析

  • 随机森林算法非常优秀,准确率一般比SVC和GaussianNB要准确
  • SVC的惩罚系数设置,比较低时容易过拟合降低准确率,比较高时欠拟合也容易出现准确率低的情况
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值