Stacking 是一种集合学习技术,通过元分类器组合多个分类模型。基于完整训练集训练各个分类模型; 然后,基于整体中的各个分类模型的输出 - 元特征来拟合元分类器。元分类器可以根据预测类标签或来自集合的概率进行训练。
参考:https://blog.youkuaiyun.com/github_35965351/article/details/60763606
参考:http://rasbt.github.io/mlxtend/
或者,第一级分类器的类概率可用于通过设置来训练元分类器(第二级分类器)use_probas=True。如果average_probas=True,平均1级分类器的概率,如果average_probas=False,概率被学习法(推荐)。例如,在具有2个1级分类器的3类设置中,这些分类器可以对1个训练样本进行以下“概率”预测:
分类器1:[0.2,0.5,0.3]
分类器2:[0.3,0.4,0.4]
如果average_probas=True,元特征将是:
[0.25,0.45,0.35]
相反,使用average_probas=False,k个特征中的结果,其中,k = [n_classes * n_classifiers],通过学习法这些1级概率:
[0.2,0.5,0.3,0.3,0.4,0.4]
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import StackingClassifier
import numpy as np
clf1 = KNeighborsClassifier(n_neighbors=3)
clf2 = RandomForestClassifier(random_state=100,n_estimators=100)
clf3 = GaussianNB()
lr = LogisticRegression(C=1.0,solver='liblinear')
sclf = StackingClassifier(classifiers=[clf1, clf2, clf3],
meta_classifier=lr)
print('10-fold cross validation:\n')
for clf, label in zip([clf1, clf2, clf3, sclf],
['KNN',
'Random Forest',
'Naive Bayes',
'StackingClassifier']):
scores = model_selection.cross_val_score(clf, X, y,
cv=10, scoring='accuracy')
print("Accuracy: %0.2f (+/- %0.2f) [%s]"
% (scores.mean(), scores.std(), label))
clf1 = KNeighborsClassifier(n_neighbors=3)
clf2 = RandomForestClassifier(random_state=100,n_estimators=100)
clf3 = GaussianNB()
lr = LogisticRegression(C=1.0,solver='liblinear')
sclf = StackingClassifier(classifiers=[clf1, clf2, clf3],
use_probas=True,
average_probas=False,
meta_classifier=lr)
print('10-fold cross validation:\n')
for clf, label in zip([clf1, clf2, clf3, sclf],
['KNN',
'Random Forest',
'Naive Bayes',
'StackingClassifier']):
scores = model_selection.cross_val_score(clf, X, y,
cv=10, scoring='accuracy')
print("Accuracy: %0.2f (+/- %0.2f) [%s]"
% (scores.mean(), scores.std(), label))