一.calibration
1.简介:
该模块用于进行"概率校准"(Probability Calibration)
2.使用
(1)类:
基于"保序回归"(isotonic regression)或"逻辑回归"(logistic regression)的概率校准:class sklearn.calibration.CalibratedClassifierCV([base_estimator=None,method='sigmoid',cv=None,n_jobs=None,ensemble=True])
#参数说明:
base_estimator:指定基本估计器;为estimator instance
method:指定用于校准的方法;为"sigmoid"/"isotonic"
cv:指定交叉验证的拆分策略;为int/cross-validation generator/iterable/"prefit"
n_jobs:指定并行计算的任务数;为int
ensemble:指定cv不为"prefit"时如何进行校准;为bool
#If Truethe base_estimator is fitted using training data and calibrated using testing data,for each cv fold.The final estimator is an ensemble of n_cv fitted classifer and calibrator pairs,where n_cv is the number of cross-validation folds.The output is the average predicted probabilities of all pairs
#If False,cv is used to compute unbiased predictions,via cross_val_predict,which are then used for calibration.At prediction time, the classifier used is the base_estimator trained on all the data.Note that this method is also internally implemented in sklearn.svm estimators with the probabilities=True parameter
(2)方法:
求"校准曲线"(calibration curve)的"预测概率"(predicted probabilities)与"实际概率"(true probabilities):[<prob_true>,<prob_pred>=]sklearn.calibration.calibration_curve(<y_true>,<y_prob>[,normalize=False,n_bins=5,strategy='uniform'])
#参数说明:
y_true:指定实际的标签;为1×n_samples array-like
y_prob:指定为正类的概率;为1×n_samples array-like
normalize:指定是否对<y_prob>进行归一化;为bool
n_bins:指定将[0,1]拆分成的bin的数量;为int
strategy:指定如何确定bin的数量;为"uniform"(具有相同宽度)/"quantile"(包含相同数量的样本)
prob_true:返回每个bin中为正类的样本的比例;为1×n_bins ndarray or smaller
prob_pred:返回每个bin中的平均概率预测值;为1×n_bins ndarray or smaller
二.discriminant_analysis
1.简介:
该模块用于进行"线性判别分析"(Linear Discriminant Analysis)和"二次判别分析"(Quadratic Discriminant Analysis)
2.使用:
"线性判别分析"(Linear Discriminant Analysis):class sklearn.discriminant_analysis.LinearDiscriminantAnalysis([solver='svd',shrinkage=None,priors=None,n_components=None,store_covariance=False,tol=0.0001,covariance_estimator=None])
#参数说明:
solver:指定求解器;为"svd"/"lsqr"/"eigen"
shrinkage:指定是否进行正则化;为"auto"/float/None
#solver="svd"时无效
priors:指定各个类别的先验概率;为1×n_classes array-like
n_components:指定用于降维的组件数量;为int
store_covariance:指定是否存储协方差;为bool
tol:指定最小误差(若误差小于该值,则停止);为float
covariance_estimator:指定协方差估计器;为covariance estimator
######################################################################################################################
"二次判别分析"(Quadratic Discriminant Analysis):class sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis([priors=None,reg_param=0.0,store_covariance=False,tol=0.0001])
#参数说明:其他参数同class sklearn.discriminant_analysis.LinearDiscriminantAnalysis()
三.kernel_ridge
1.简介:
该模块用于进行"核岭回归"(kernel ridge regression)
2.使用:
核岭回归:class sklearn.kernel_ridge.KernelRidge([alpha=1,kernel='linear',gamma=None,degree=3,coef0=1,kernel_params=None])
#参数说明:
alpha:指定正则化强度;为float/1×n_targets array-like
kernel:指定核映射;为str/callable(接受kernel_params及2个位置参数,返回float)
gamma:指定核的gamma参数;为flaot
#仅用于RBF/laplacian/polynomial/exponential chi2/sigmoid kernel
degree:指定核的degree参数;为float
#仅用于polynomial kernel
coef0:指定核的coef0参数;为float
#仅用于polynomial/sigmoid kernel
kernel_params:指定要传递给核映射的其他参数;为mapping of string to anything
四.svm
1.简介:
该模块实现了"支持向量机"(Support Vector Machine;SVM)
2.分类:
"线性支持向量分类"(Linear Support Vector Classification):class sklearn.svm.LinearSVC([penalty='l2',loss='squared_hinge',dual=True,tol=0.0001,C=1.0,multi_class='ovr',fit_intercept=True,intercept_scaling=1,class_weight=None,verbose=0,random_state=None,max_iter=1000])
#参数说明:tol同class sklearn.discriminant_analysis.LinearDiscriminantAnalysis()
penalty:指定使用的范数惩罚正则项;为"L1"/"L2"
loss:指定损失函数;为"hinge"/"squared_hinge"
dual:指定是否进行对偶化;为bool
C:指定"正则化强度"(regularization strength)的倒数;为float>0
#即范数惩罚正则化项前系数的倒数
multi_class:指定如何处理多类别分类问题;为"crammer_singer"/"ovr"
fit_intercept:指定是否估计截距;为bool
intercept_scaling:为float
When self.fit_intercept is True,instance vector x becomes [x,self.intercept_scaling],i.e. a "synthetic" feature with constant
value equals to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature
weight Note!the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of
regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased
class_weight:指定各个类别的权重;为dict/"balanced"
verbose:指定输出信息的冗余度;为int/bool
random_state:指定使用的随机数;为int/RandomState instance/None
max_iter:指定最大迭代次数;为int
######################################################################################################################
"Nu-支持向量分类"(Nu-Support Vector Classification):class sklearn.svm.NuSVC([nu=0.5,kernel='rbf',degree=3,gamma='scale',coef0=0.0,shrinking=True,probability=False,tol=0.001,cache_size=200,class_weight=None,verbose=False,max_iter=-1,decision_function_shape='ovr',break_ties=False,random_state=None])
#参数说明:其他参数同class sklearn.svm.LinearSVC()
nu:指定"边际误差"(margin errors)的上限和"支持向量"(support vectors)的下限;为0<float<=1
#用于控制正则化强度
kernel:指定核映射;为"linear"/"poly"/"rbf"/"sigmoid"/"precomputed"
degree:指定核的degree参数;为float
#仅用于polynomial kernel
gamma:指定核的gamma参数;为flaot/"scale"/"auto"
#仅用于RBF/polynomial/sigmoid kernel
coef0:指定核的coef0参数;为float
#仅用于polynomial/sigmoid kernel
shrinking:指定是否使用"shrinking heuristic";为bool
probability:指定是否进行概率估计;为bool
tol:指定最小误差(若误差小于该值,则停止);为float
cache_size:指定"核缓存"(kernel cache)的大小;为float(单位为MB)
class_weight:指定各个类别的权重;为dict/"balanced"
verbose:指定输出信息的冗余度;为int/bool
max_iter:指定最大迭代次数;为int
######################################################################################################################
"C-支持向量分类"(C-Support Vector Classification):class sklearn.svm.SVC([C=1.0,kernel='rbf',degree=3,gamma='scale',coef0=0.0,shrinking=True,probability=False,tol=0.001,cache_size=200,class_weight=None,verbose=False,max_iter=-1,decision_function_shape='ovr',break_ties=False,random_state=None])
#参数说明:C/random_state同class sklearn.svm.LinearSVC()
# 其他参数同class sklearn.svm.NuSVC()
decision_function_shape:指定;为"ovo"/"ovr"
break_ties:指定;为bool
3.回归:
"线性支持向量回归"(Linear Support Vector Regression):class sklearn.svm.LinearSVR([epsilon=0.0,tol=0.0001,C=1.0,loss='epsilon_insensitive',fit_intercept=True,intercept_scaling=1.0,dual=True,verbose=0,random_state=None,max_iter=1000])
#参数说明:其他参数同class sklearn.svm.LinearSVC()
epsilon:指定epsilon-insensitive loss function中的epsilon参数;为float
######################################################################################################################
"Nu-支持向量回归"(Nu-Support Vector Regression):class sklearn.svm.NuSVR([nu=0.5,C=1.0,kernel='rbf',degree=3,gamma='scale',coef0=0.0,shrinking=True,tol=0.001,cache_size=200,verbose=False,max_iter=-1])
#参数说明:C同class sklearn.svm.LinearSVC()
# 其他参数同class sklearn.svm.NuSVC()
######################################################################################################################
"ε-支持向量回归"(Epsilon-Support Vector Regression):class sklearn.svm.SVR([kernel='rbf',degree=3,gamma='scale',coef0=0.0,tol=0.001,C=1.0,epsilon=0.1,shrinking=True,cache_size=200,verbose=False,max_iter=-1])
#参数说明:epsilon同class sklearn.svm.LinearSVR()
# 其他参数同class sklearn.svm.NuSVR()
4.其他
(1)类:
"无监督异常值检测"(Unsupervised Outlier Detection):class sklearn.svm.OneClassSVM([kernel='rbf',degree=3,gamma='scale',coef0=0.0,tol=0.001,nu=0.5,shrinking=True,cache_size=200,verbose=False,max_iter=-1])
#参数说明:同class sklearn.svm.NuSVR()
(2)函数:
求C的最低界限,要求对C∈(l1_min_C,infinity)来说模型不为空:[<l1_min_c>]sklearn.svm.l1_min_c(<X>,<y>[,loss='squared_hinge',fit_intercept=True,intercept_scaling=1.0])
#参数说明:其他参数同class sklearn.svm.LinearSVC()
X:指定训练数据;为n_samples×n_features array-like/n_samples×n_features sparse matrix
y:指定训练数据的真实标签;为1×n_samples array-like
loss:指定损失函数;为"squared_hinge"/"log"
l1_min_c:返回C的最低界限;为