sklearn常见分类器(二分类)

本文对比了随机森林、逻辑回归、决策树、GBDT、AdaBoost、朴素贝叶斯、LDA、QDA、SVM和多分类Naive Bayes等模型在Pima Indians糖尿病数据集上的分类效果,展示了每种算法的准确性和详细报告。
import pandas as pd
import matplotlib
matplotlib.rcParams['font.sans-serif']=[u'simHei']
matplotlib.rcParams['axes.unicode_minus']=False
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer

data_set = pd.read_csv('pima-indians-diabetes.csv')
data = data_set.values[:,:]

y = data[:,8]
X = data[:,:8]
X_train,X_test,y_train,y_test = train_test_split(X,y)

### 随机森林
print("==========================================")   
RF = RandomForestClassifier(n_estimators=10,random_state=11)
RF.fit(X_train,y_train)
predictions = RF.predict(X_test)
print("RF")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))


### Logistic Regression Classifier 
print("==========================================")      
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(penalty='l2')
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("LR")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))
 
 
### Decision Tree Classifier    
print("==========================================")   
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("DT")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))

 
### GBDT(Gradient Boosting Decision Tree) Classifier    
print("==========================================")   
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(n_estimators=200)
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("GBDT")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))

 
###AdaBoost Classifier
print("==========================================")   
from sklearn.ensemble import  AdaBoostClassifier
clf = AdaBoostClassifier()
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("AdaBoost")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))

 
### GaussianNB
print("==========================================")   
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("GaussianNB")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))

 
### Linear Discriminant Analysis
print("==========================================")   
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
clf = LinearDiscriminantAnalysis()
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("Linear Discriminant Analysis")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))

 
### Quadratic Discriminant Analysis
print("==========================================")   
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
clf = QuadraticDiscriminantAnalysis()
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("Quadratic Discriminant Analysis")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))


### SVM Classifier 
print("==========================================")   
from sklearn.svm import SVC
clf = SVC(kernel='rbf', probability=True)
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("SVM")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))


### Multinomial Naive Bayes Classifier
print("==========================================")       
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB(alpha=0.01)
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("Multinomial Naive Bayes")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))


### xgboost
import xgboost
print("==========================================")       
from sklearn.naive_bayes import MultinomialNB
clf = xgboost.XGBClassifier()
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("xgboost")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))


### voting_classify
from sklearn.ensemble import GradientBoostingClassifier, VotingClassifier, RandomForestClassifier
import xgboost
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
clf1 = GradientBoostingClassifier(n_estimators=200)
clf2 = RandomForestClassifier(random_state=0, n_estimators=500)
# clf3 = LogisticRegression(random_state=1)
# clf4 = GaussianNB()
clf5 = xgboost.XGBClassifier()
clf = VotingClassifier(estimators=[
    # ('gbdt',clf1),
    ('rf',clf2),
    # ('lr',clf3),
    # ('nb',clf4),
    # ('xgboost',clf5),
    ],
    voting='soft')
clf.fit(X_train,y_train)
predictions = clf.predict(X_test)
print("voting_classify")
print(classification_report(y_test,predictions))
print("AC",accuracy_score(y_test,predictions))
scikit-learn(sklearn)是一个流行的Python机器学习库,其中包含了许多分类器模型。以下是一些常见分类器模型及其使用方法: 1. 决策树(Decision Tree): 决策树是一种基于树结构的分类器模型,可以对数据进行分类和回归。在scikit-learn中,可以使用`DecisionTreeClassifier`类来创建决策树分类器。你可以使用`fit`方法拟合模型,然后使用`predict`方法进行预测。 2. 支持向量机(Support Vector Machine,SVM): SVM是一种强大的分类器模型,可以通过将数据点映射到高维空间来实现非线性分类。在scikit-learn中,可以使用`SVC`类(用于分类问题)或`SVR`类(用于回归问题)来创建SVM模型。 3. 逻辑回归(Logistic Regression): 逻辑回归是一种用于二分类问题的线性模型,可以输出数据点属于某个类别的概率。在scikit-learn中,可以使用`LogisticRegression`类创建逻辑回归模型。 4. 随机森林(Random Forest): 随机森林是一种集成学习方法,通过构建多个决策树来进行分类或回归,并利用投票或平均预测结果来进行最终预测。在scikit-learn中,可以使用`RandomForestClassifier`类创建随机森林分类器。 5. K近邻(K-Nearest Neighbors,KNN): KNN是一种基于邻居的分类器模型,通过计算待分类数据点与训练数据集中最接近的k个数据点的类别来进行分类。在scikit-learn中,可以使用`KNeighborsClassifier`类创建KNN分类器。 这只是一小部分常见分类器模型,在scikit-learn中还有其他许多分类器可供选择。你可以根据具体问题的需求选择适合的分类器,并使用sklearn提供的API进行模型的训练和预测。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值