sk-learn常用函数

本文介绍了sklearn库中的一些重要函数,包括数据预处理的StandardScaler和PolynomialFeatures,训练集划分的train_test_split,数据生成的make_blobs,模型选择的GridSearchCV,距离计算的euclidean_distances,线性回归的LinearRegression和Lasso,分类模型的RidgeClassifier和LogisticRegression,以及聚类算法的KMeans和AffinityPropagation。详细讲解了各个函数的关键参数及其作用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

sklearn.preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True)

 https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html


sklearn.preprocessing.PolynomialFeatures(degree=2, interaction_only=False, include_bias=True)

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

degree:阶数

interaction_only:自己和自己相乘

include_bias:截距项


sklearn.model_selection.train_test_split(*arrays, test_size=0.25, train_size=None, random_state=None, shuffle=True, stratify=None)

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html


sklearn.datasets.make_blobs(n_samples=100, n_features=2, centers=None, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None)

https://scikit-learn.org/dev/modules/generated/sklearn.datasets.make_blobs.html

cluster_std:簇的标准差

center_box:每个簇的边界


sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring=None, fit_params=None, n_jobs=None, iid=’warn’, refit=True, cv=’warn’, verbose=0, pre_dispatch=‘2*n_jobs’, error_score=’raise-deprecating’, return_train_score=’warn’)[source]

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html


sklearn.metrics.pairwise.euclidean_distances(X, Y=None, Y_norm_squared=None, squared=False, X_norm_squared=None)

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html


sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=None)

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

fit_intercept:是否计算截距

normalize :是否标准化

coef_ :参数属性

intercept_:常数项


sklearn.linear_model.Lasso(alpha=1.0, fit_intercept=True, normalize=False, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, random_state=None, selection=’cyclic’)

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html

precompute:是否使用预先计算的Gram矩阵来加速计算

max_iter:最大迭代次数

tol:判断是否收敛的阈值


sklearn.linear_model.RidgeClassifier(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, class_weight=None, solver=’auto’, random_state=None)[source]

 https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html


sklearn.linear_model.LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=’warn’, max_iter=100, multi_class=’warn’, verbose=0, warm_start=False, n_jobs=None)

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html


sklearn.cluster.KMeans(n_clusters=8, init=’k-means++’, n_init=10, max_iter=300, tol=0.0001, precompute_distances=’auto’, verbose=0, random_state=None, copy_x=True, n_jobs=None, algorithm=’auto’)

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

tol:判断是否收敛的阈值verbose:详细模式

fit_predict(X, y=None, sample_weight=None)

sklearn.cluster.AffinityPropagation(damping=0.5, max_iter=200, convergence_iter=15, copy=True, preference=None, affinity=’euclidean’, verbose=False)

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html

damping:阻尼系数,是相对于输入值保持当前值的程度。这是为了在更新这些值时避免数值振荡

affinity:使用哪种相似值

verbose:是否输出详细信息

 

为了演示如何使用Python和Scikit-Learn库对逻辑回归模型进行训练和评估,我们先假设我们要使用经典的鸢尾花(Iris)数据集。这个数据集包含了三种不同类型的鸢尾花的测量值,非常适合用于分类任务。 首先,确保已经安装了所需的库,包括`scikit-learn`。如果没有安装,可以使用pip命令安装: ```bash pip install scikit-learn pandas ``` 下面是一个简单的例子,展示如何加载数据、预处理、训练逻辑回归模型,并进行评估: ```python # 导入所需库 from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrix # 加载鸢尾花数据集 iris = datasets.load_iris() X = iris.data # 特征 y = iris.target # 目标变量 # 数据预处理,通常这里我们会将分类标签转换为0, 1的形式 le = preprocessing.LabelEncoder() # LabelEncoder对象 y_encoded = le.fit_transform(y) # 划分训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42) # 创建逻辑回归模型实例 log_reg = LogisticRegression() # 训练模型 log_reg.fit(X_train, y_train) # 预测 y_pred = log_reg.predict(X_test) # 评估模型性能 accuracy = accuracy_score(y_test, y_pred) conf_mat = confusion_matrix(y_test, y_pred) print(f"Accuracy: {accuracy * 100:.2f}%") print("Confusion Matrix:") print(conf_mat) ``` 在这个例子中,我们首先加载了数据集并将其分为特征(花瓣长度、宽度等)和目标变量(鸢尾花种类)。然后,我们将类别标签编码以便于逻辑回归模型处理。接着,划分数据集为训练集和测试集,训练模型并在测试集上进行预测。最后,我们计算模型的准确率以及生成混淆矩阵来进一步了解模型的表现。 如果你有特定的数据集,只需替换`datasets.load_iris()`为从该数据集中读取数据的部分即可。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值