scikit-learn_LinearModel_01_经典最小二乘

本文深入探讨了线性回归技术,特别是在最小二乘法框架下如何找到最佳拟合直线,以最小化实际结果与预测结果之间的残差平方和。通过糖尿病数据集实例,展示了如何使用Python的Scikit-Learn库进行线性回归分析,同时讨论了多重共线性问题及其对模型的影响。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1 Ordinary Least Squares

1 最小二乘法

LinearRegression fits a linear model with coefficients w=(w1,…,wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Mathematically it solves a problem of the form:

线性回归训练带有参数w的线性模型,在给定的数据集中,使其平方差最小。并且这些想要的结果是能够被线性近似预测的。在学术上它以下方式解决问题:
在这里插入图片描述

LinearRegression will take in its fit method arrays X, y and will store the coefficients w of the linear model in its coef_ member:
线性回归使用函数阵列x,y来训练,并且将会储存线性模型系数w

>>> from sklearn import linear_model 
>>> reg = linear_model.LinearRegression() 
>>> reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) 
LinearRegression() 
>>> reg.coef_ 
array([0.5, 0.5])

The coefficient estimates for Ordinary Least Squares rely on the independence of the features. When features are correlated and the columns of the design matrix X have an approximate linear dependence, the design matrix becomes close to singular and as a result, the least-squares estimate becomes highly sensitive to random errors in the observed target, producing a large variance. This situation of multicollinearity can arise, for example, when data are collected without an experimental design

最小二乘法的估计系数依赖于数据的独立性,当数据是相关联的。当设计的矩阵x的列之间有一个近似线性依赖的关系。这个设计矩阵是奇异的,并且,因此最小二乘法在观测的目标中对随机误差非常敏感,会产生一个非常大的方差,多重共线性的情景更甚,举个例子,当我们收集没有经过实验设计的数据。

Linear Regression Example

线性回归举例

This example uses the only the first feature of the diabetes dataset, in order to illustrate a two-dimensional plot of this regression technique. The straight line can be seen in the plot, showing how linear regression attempts to draw a straight line that will best minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation.
这个糖尿病的数据集的例子只是使用它的第一个特点,只是为了说明二维的这类回归的技术,在图中可以看到笔直的线,是使用最小的残差平方和的线性回归的点所画的,这些结果都是可以被线性近似预测的.
The coefficients, the residual sum of squares and the coefficient of determination are also calculated.
这些系数,残差平方和,决定系数是能够被计算出来的。

在这里插入图片描述
在这里插入图片描述

print(__doc__)
# Code source: Jaques Grobler
# License: BSD 3 clause
# Load the diabetes dataset
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
# Use only one feature
diabetes_X = diabetes_X[:, np.newaxis, 2]
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)
# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)
# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(diabetes_y_test, diabetes_y_pred))
# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()

Ordinary Least Squares Complexity
最小二乘法的复杂性
The least squares solution is computed using the singular value decomposition of X. If X is a matrix of shape (n_samples, n_features) this method has a cost of在这里插入图片描述
, assuming that nsamples≥nfeatures.
最小二乘法是计算x矩阵的奇异值的解决方案。如果x矩阵有n行n列并且 nsamples≥nfeatures,这种方法将付出时间复杂度在这里插入图片描述
的代价。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值