Week 15 Python 课后练习 sklearn assignment

最新推荐文章于 2024-07-27 16:23:07 发布

Toffc

最新推荐文章于 2024-07-27 16:23:07 发布

阅读量375

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/toffc_/article/details/80729122

本文通过创建一个包含1000个样本和10个特征的数据集，并使用10折交叉验证，比较了高斯朴素贝叶斯、支持向量机及随机森林三种分类器的准确率、F1分数和AUCROC等指标。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

#Create a classification dataset (n samples >= 1000, n features >= 10) 
from sklearn import datasets
dataset = datasets.make_classification(n_samples=1000, n_features=10,  
            n_informative=2, n_redundant=2, n_repeated=0, n_classes=2)  
  
#Split the dataset using 10-fold cross validation
from sklearn import cross_validation  
kf = cross_validation.KFold(1000, n_folds=10, shuffle=True)  
for train_index, test_index in kf:  
    X_train, y_train = dataset[0][train_index], dataset[1][train_index]  
    X_test, y_test = dataset[0][test_index], dataset[1][test_index]  

#Train the algorithms & Evaluate the cross-validated performance
from sklearn.naive_bayes import GaussianNB  
from sklearn.svm import SVC  
from sklearn.ensemble import RandomForestClassifier  
from sklearn import metrics  

#GaussianNB
clf = GaussianNB()  
clf.fit(X_train, y_train)  
pred = clf.predict(X_test)  
print("GaussianNB:")  
print("pred: \n", pred)  
print("y_test: \n", y_test)  
#Evaluate the cross-validated performance  
print("Evaluate the cross-validated performance:")  
acc = metrics.accuracy_score(y_test, pred)  
print("Accuracy: ", acc)  
f1 = metrics.f1_score(y_test, pred)  
print("F1-score: ",f1)  
auc = metrics.roc_auc_score(y_test, pred)  
print("AUC ROC: ", auc)  
print()  
  
#SVC  
clf = SVC(C=1e-01, kernel='rbf', gamma=0.1)  
clf.fit(X_train, y_train)  
pred = clf.predict(X_test)  
print("SVC: ")  
print("pred: \n", pred)  
print("y_test: \n", y_test)  
#Evaluate the cross-validated performance  
print("Evaluate the cross-validated performance:")  
acc = metrics.accuracy_score(y_test, pred)  
print("Accuracy: ", acc)  
f1 = metrics.f1_score(y_test, pred)  
print("F1-score: ",f1)  
auc = metrics.roc_auc_score(y_test, pred)  
print("AUC ROC: ", auc)  
print()  
  
#Random Forest  
clf = RandomForestClassifier(n_estimators=6)  
clf.fit(X_train, y_train)  
pred = clf.predict(X_test)  
print("RandomForestClassifier: ")  
print("pred: \n", pred)  
print("y_test: \n", y_test)  
#Evaluate the cross-validated performance  
print("Evaluate the cross-validated performance:")  
acc = metrics.accuracy_score(y_test, pred)  
print("Accuracy: ", acc)  
f1 = metrics.f1_score(y_test, pred)  
print("F1-score: ",f1)  
auc = metrics.roc_auc_score(y_test, pred)  
print("AUC ROC: ", auc)  
print()

结果：

D:\Software\Python\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
GaussianNB:
pred: 
 [0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 1 1
 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 0 0 0
 0 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0]
y_test: 
 [0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 1 1
 0 1 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 0 0 0
 0 1 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0]
Evaluate the cross-validated performance:
Accuracy:  0.97
F1-score:  0.9690721649484536
AUC ROC:  0.9697879151660664

SVC: 
pred: 
 [0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 0 1 1 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 1 1
 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 0 0 0
 0 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 0]
y_test: 
 [0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 1 1
 0 1 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 0 0 0
 0 1 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0]
Evaluate the cross-validated performance:
Accuracy:  0.95
F1-score:  0.9473684210526316
AUC ROC:  0.9493797519007602

RandomForestClassifier: 
pred: 
 [0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 1 1
 0 1 0 0 1 0 0 0 1 0 1 1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 0 0 0
 0 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 0]
y_test: 
 [0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 1 1
 0 1 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 0 0 0
 0 1 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0]
Evaluate the cross-validated performance:
Accuracy:  0.95
F1-score:  0.9484536082474228
AUC ROC:  0.9497799119647861

结果：

本次实验中，Random Forest Classifier的性能在Accuracy、F1-score、 AUC ROC方面的性能都是最佳的。