一.ensemble
1.简介:
该模块用于"集成学习"(ensemble learning)
2.分类:
"自适应提升分类器"(AdaBoost classifier):class sklearn.ensemble.AdaBoostClassifier([base_estimator=None,n_estimators=50,learning_rate=1.0,algorithm='SAMME.R',random_state=None])
#参数说明:
base_estimator:指定基本估计器;为object
n_estimators:指定提升中使用的最大估计器数;为int
learning_rate:指定学习率;为float
#在n_estimators和learning_rate间存在1个权衡
algorithm:指定使用的提升算法;为"SAMME"/"SAMME.R"
random_state:指定使用的随机数;为int/RandomState instance/None
######################################################################################################################
"装袋分类器"(Bagging classifier)/"自举汇聚分类器"(Bootstrap aggregating classifier):class sklearn.ensemble.BaggingClassifier([base_estimator=None,n_estimators=10,max_samples=1.0,max_features=1.0,bootstrap=True,bootstrap_features=False,oob_score=False,warm_start=False,n_jobs=None,random_state=None,verbose=0])
#参数说明:其他参数同class sklearn.ensemble.AdaBoostClassifier()
max_samples,max_features:分别指定用于训练每个基本估计器的样本/特征数;均为int/float
bootstrap,bootstrap_features:分别指定是否通过替换来提取样本/特征;均为bool
oob_score:指定是否使用"包外样本"(out-of-bag samples)来估计泛化误差;为bool
warm_start:指定是否启用热启动;为bool
n_jobs:指定用于并行计算的任务数;为int
verbose:指定输出信息的冗余度;为int/bool
######################################################################################################################
"极端随机树分类器"(Extremely Randomized Trees classifier/ExtRa-trees classifier):class sklearn.ensemble.ExtraTreesClassifier([n_estimators=100,criterion='gini',max_depth=None,min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_features='auto',max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,bootstrap=False,oob_score=False,n_jobs=None,random_state=None,verbose=0,warm_start=False,class_weight=None,ccp_alpha=0.0,max_samples=None])
#参数说明:其他参数同class sklearn.ensemble.BaggingClassifier()
criterion:指定用于衡量拆分质量的标准;为"gini"/"entropy"
max_depth:指定树的最大深度;为int
min_samples_split:指定拆分内部节点所需的最小样本数;为int
#若属于某内部节点的样本少于该值,则停止拆分
min_samples_leaf:指定叶节点中的最小样本数;为int
#若属于某叶节点的样本少于该值,则不进行该拆分
min_weight_fraction_leaf:The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node;为float
max_features:指定使用的最大特征数;为"auto"/"sqrt"/"log2"/int/float
max_leaf_nodes:指定最大叶节点数;为int
min_impurity_decrease:指定继续拆分所需的最小改进;为float
#若改进小于该值,则停止拆分
min_impurity_split:指定继续拆分所需的最小损失;为float
#若当前节点的损失小于该值,则停止拆分
class_weight:指定各个特征的权重;为dict/dict list/"balanced"/"balanced_subsample"
ccp_alpha:指定用于"最小成本-复杂性剪枝"(Minimal Cost-Complexity Pruning)的"复杂性参数"(Complexity parameter);为float>=0
######################################################################################################################
"用于分类的梯度提升"(Gradient Boosting for classification):class sklearn.ensemble.GradientBoostingClassifier([loss='deviance',learning_rate=0.1,n_estimators=100,subsample=1.0,criterion='friedman_mse',min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_depth=3,min_impurity_decrease=0.0,min_impurity_split=None,init=None,random_state=None,max_features=None,verbose=0,max_leaf_nodes=None,warm_start=False,validation_fraction=0.1,n_iter_no_change=None,tol=0.0001,ccp_alpha=0.0])
#参数说明:其他参数同class sklearn.ensemble.ExtraTreesClassifier()
loss:指定损失函数;为"deviance"/"exponential"
learning_rate:指定学习率;为float
subsample:指定用于拟合每个基本估计器的样本比例;为float
criterion:指定用于衡量拆分质量的标准;为"friedman_mse"/"mse"/"mae"
init:指定初始预测;为estimator/"zero"
validation_fraction:指定作为提前终止时的验证集的比例;为0≤float≤1
n_iter_no_change:指定改进小于tol时的最大迭代次数;为int
tol:指定最小改进;为float
#若2次迭代间的损失改进小于该值,则停止
######################################################################################################################
"随机森林分类器"(random forest classifier):class sklearn.ensemble.RandomForestClassifier([n_estimators=100,criterion='gini',max_depth=None,min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_features='auto',max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,bootstrap=True,oob_score=False,n_jobs=None,random_state=None,verbose=0,warm_start=False,class_weight=None,ccp_alpha=0.0,max_samples=None])
#参数说明:同class sklearn.ensemble.ExtraTreesClassifier()
######################################################################################################################
"带有最终分类器的估计器堆栈"(Stack of estimators with a final classifier):class sklearn.ensemble.StackingClassifier(<estimators>[,final_estimator=None,cv=None,stack_method='auto',n_jobs=None,passthrough=False,verbose=0])
#参数说明:其他参数同class sklearn.ensemble.BaggingClassifier()
estimators:指定基本估计器;为list,格式为(str,estimator)
final_estimator:指定用于结合基本估计器的分类器;为estimator
cv:指定交叉验证的拆分策略;为int/cross-validation generator/iterable/None
stack_method:指定为每个基本估计器调用的方法;为"auto"/"predict_proba"/"decision_function"/"predict"
passthrough:为False时,只使用估计器得到的预测值作为final_estimator的训练数据
为True时,也使用原始训练数据作为final_estimator的训练数据
######################################################################################################################
用于"unfitted estimators"的"软投票"(Soft Voting)/"多数规则分类器"(Majority Rule classifier):class sklearn.ensemble.VotingClassifier(<estimators>[,voting='hard',weights=None,n_jobs=None,flatten_transform=True,verbose=False])
#参数说明:其他参数同class sklearn.ensemble.StackingClassifier()
voting:指定投票策略;为"hard"/"soft"
#If "hard",uses predicted class labels for majority rule voting
#If "soft",predicts the class label based on the argmax of the sums of the predicted probabilities,which is recommended for an ensemble of well-calibrated classifiers
weights:指定各个类别的权重;为1×n_classifiers array-like
flatten_transform:指定是否压扁输出;为True(n_samples×n_classifiers*n_classes)/False(n_classifiers×n_samples×n_classes)
#仅当voting="soft"时生效
######################################################################################################################
"基于直方图的梯度提升分类树"(Histogram-based Gradient Boosting Classification Tree):class sklearn.ensemble.HistGradientBoostingClassifier([loss='auto',learning_rate=0.1,max_iter=100,max_leaf_nodes=31,max_depth=None,min_samples_leaf=20,l2_regularization=0.0,max_bins=255,categorical_features=None,monotonic_cst=None,warm_start=False,early_stopping='auto',scoring='loss',validation_fraction=0.1,n_iter_no_change=10,tol=1e-07,verbose=0,random_state=None])
#参数说明:其他参数同class sklearn.ensemble.GradientBoostingClassifier()
loss:指定损失函数;为"auto"/"binary_crossentropy"/"categorical_crossentropy"
max_iter:指定提升过程的最大迭代次数;为int
l2_regularization:指定L2正则化惩罚项的系数;为float
max_bins:指定用于非缺失值的最大bin数;为int
categorical_features:指定所有类别型的特征;为int/bool 1×n_features/1×n_categorical_features array-like
monotonic_cst:指定对各个特征的"单调约束"(monotonic constraint);为1×n_features array-like,仅包含0(无约束)/1(正约束)/-1(负约束)
early_stopping:指定是否启用"早停法"(early stopping);为"auto"/bool
scoring:指定用于早停法的评分策略;为str/callable/None
3.回归:
"自适应提升回归器"(AdaBoost regressor):class sklearn.ensemble.AdaBoostRegressor([base_estimator=None,n_estimators=50,learning_rate=1.0,loss='linear',random_state=None])
#参数说明:其他参数同class sklearn.ensemble.AdaBoostClassifier()
loss:指定损失函数;为"linear"/"square"/"exponential"
######################################################################################################################
"装袋回归器"(Bagging regressor)/"自举汇聚回归器"(Bootstrap aggregating regressor):class sklearn.ensemble.BaggingRegressor([base_estimator=None,n_estimators=10,max_samples=1.0,max_features=1.0,bootstrap=True,bootstrap_features=False,oob_score=False,warm_start=False,n_jobs=None,random_state=None,verbose=0])
#参数说明:同class sklearn.ensemble.BaggingClassifier()
######################################################################################################################
"极端随机树回归器"(Extremely Randomized Trees regressor/ExtRa-trees regressor):class sklearn.ensemble.ExtraTreesRegressor([n_estimators=100,criterion='mse',max_depth=None,min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_features='auto',max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,bootstrap=False,oob_score=False,n_jobs=None,random_state=None,verbose=0,warm_start=False,ccp_alpha=0.0,max_samples=None])
#参数说明:其他参数同class sklearn.ensemble.ExtraTreesClassifier()
criterion:指定用于衡量拆分质量的标准;为"mse"/"mae"
######################################################################################################################
"用于回归的梯度提升"(Gradient Boosting for regression):class sklearn.ensemble.GradientBoostingRegressor([loss='ls',learning_rate=0.1,n_estimators=100,subsample=1.0,criterion='friedman_mse',min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_depth=3,min_impurity_decrease=0.0,min_impurity_split=None,init=None,random_state=None,max_features=None,alpha=0.9,verbose=0,max_leaf_nodes=None,warm_start=False,validation_fraction=0.1,n_iter_no_change=None,tol=0.0001,ccp_alpha=0.0])
#参数说明:其他参数同class sklearn.ensemble.GradientBoostingClassifier()
loss:指定损失函数;为"ls"/"lad"/"huber"/"quantile"
alpha:指定huber/quantile loss function的分位数;为float
#仅当loss="huber"/"quantile"时有效
######################################################################################################################
"随机森林回归器"(random forest regressor):class sklearn.ensemble.RandomForestRegressor([n_estimators=100,criterion='mse',max_depth=None,min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_features='auto',max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,bootstrap=True,oob_score=False,n_jobs=None,random_state=None,verbose=0,warm_start=False,ccp_alpha=0.0,max_samples=None])
#参数说明:同class sklearn.ensemble.ExtraTreesRegressor()
######################################################################################################################
"带有最终回归器的估计器堆栈"(Stack of estimators with a final regressor):class sklearn.ensemble.StackingRegressor(<estimators>[,final_estimator=None,cv=None,n_jobs=None,passthrough=False,verbose=0])
#参数说明:其他参数同class sklearn.ensemble.StackingClassifier()
final_estimator:指定用于结合基本估计器的回归器;为estimator
######################################################################################################################
用于"unfitted estimators"的"预测投票回归器"(Prediction voting regressor):class sklearn.ensemble.VotingRegressor(<estimators>[,weights=None,n_jobs=None,verbose=False])
#参数说明:其他参数同class sklearn.ensemble.VotingClassifier()
weights:指定各预测值的权重;为1×n_regressors array-like
######################################################################################################################
"基于直方图的梯度提升回归树"(Histogram-based Gradient Boosting Regression Tree):class sklearn.ensemble.HistGradientBoostingRegressor([loss='least_squares',learning_rate=0.1,max_iter=100,max_leaf_nodes=31,max_depth=None,min_samples_leaf=20,l2_regularization=0.0,max_bins=255,categorical_features=None,monotonic_cst=None,warm_start=False,early_stopping='auto',scoring='loss',validation_fraction=0.1,n_iter_no_change=10,tol=1e-07,verbose=0,random_state=None])
#参数说明:其他参数同class sklearn.ensemble.HistGradientBoostingClassifier()
loss:指定损失函数;为"least_squares"/"least_absolute_deviation"/"poisson"
4.其他
(1)异常检测:
"孤立森林算法"(Isolation Forest Algorithm):class sklearn.ensemble.IsolationForest([n_estimators=100,max_samples='auto',contamination='auto',max_features=1.0,bootstrap=False,n_jobs=None,random_state=None,verbose=0,warm_start=False])
#参数说明:其他参数同class sklearn.ensemble.BaggingClassifier()
contamination:指定数据中异常值的比例;为"auto"/0<=float<=0.5
(2)数据转换:
"完全随机的树的1个集成"(An ensemble of totally random trees):class sklearn.ensemble.RandomTreesEmbedding([n_estimators=100,max_depth=5,min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,sparse_output=True,n_jobs=None,random_state=None,verbose=0,warm_start=False])
#An unsupervised transformation of a dataset to a high-dimensional sparse representation
#参数说明:其他参数同class sklearn.ensemble.ExtraTreesClassifier()
sparse_output:指定是否返回(稀疏)CSR矩阵(还是返回密集数组);为bool