机器学习学习笔记--线性模型之一元线性回归

本文详细介绍了线性模型中的一元线性回归,包括基本形式、目标是最小化均方误差,以及如何通过求导找到最优解w和b。通过解析二阶导数判断目标函数的凸凹性,并给出求解w和b的具体步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

线性模型

一元线性回归

  • 基本形式

f(x)=w1x1+w2x2+…+wdxd+b f(\boldsymbol{x})=w_{1} x_{1}+w_{2} x_{2}+\ldots+w_{d} x_{d}+b f(x)=w1x1+w2x2++wdxd+b

向量形式
f(x)=wTx+b f(\boldsymbol{x})=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}+b f(x)=wTx+b
目标:均方误差最小化
(w∗,b∗)=arg⁡min⁡(w,b)∑i=1m(f(xi)−yi)2=arg⁡min⁡(w,b)∑i=1m(yi−wxi−b)2 \begin{aligned}\left(w^{*}, b^{*}\right) &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(f\left(x_{i}\right)-y_{i}\right)^{2} \\ &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} \end{aligned} (w,b)=(w,b)argmini=1m(f(xi)yi)2=(w,b)argmini=1m(yiwxib)2
方法:线性回归模型的最小二乘“参数估计”。将Ew,bE_{w,b}Ew,b分别对wwwbbb求导得到:
∂E(w,b)∂w=2(w∑i=1mxi2−∑i=1m(yi−b)xi) \frac{\partial E_{(w, b)}}{\partial w}=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right) wE(w,b)=2(wi=1mxi2i=1m(yib)xi)

=∑i=1m∂∂w(yi−wxi−b)2 =\sum_{i=1}^{m} \frac{\partial}{\partial w}\left(y_{i}-w x_{i}-b\right)^{2} =i=1mw(yiwxib)2

=∑i=1m2⋅(yi−wxi−b)⋅(−xi) =\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot\left(-x_{i}\right) =i=1m2(yiwxib)(xi)

∂E(w,b)∂b=2(mb−∑i=1m(yi−wxi)) \frac{\partial E_{(w, b)}}{\partial b}=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) bE(w,b)=2(mbi=1m(yiwxi))

这里Ew,bE_{w,b}Ew,b是关于w和b的凸函数,当它关于w和b的导数均为零时,得到w和b的最优解.

判断凹凸性:

设f(x,y)在区域D上具有二阶连续偏导数,记$A = f_{xx}’’(x,y),B = f_{xy}’’(x,y),C = f_{yy}’’(x,y) $则:

(1)D上恒有A>0,且AC-B2>=0B^2>=0B2>=0时,f(x,y)在区域D上是凸函数;

(2)D上恒有A&lt;0A&lt;0A<0AC−B2≥0A C-B^{2} \geq 0ACB20时,f(x,y)在区域D上是凹函数

∂2E(w,b)∂w2=∂∂w(∂E(w,b)∂w)=∂∂w[2(w∑i=1mxi2−∑i=1m(yi−b)xi)]=∂∂w[2w∑i=1mxi2] \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial w^{2}} &amp;=\frac{\partial}{\partial w}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &amp;=\frac{\partial}{\partial w}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &amp;=\frac{\partial}{\partial w}\left[2 w \sum_{i=1}^{m} x_{i}^{2}\right] \end{aligned} w22E(w,b)=w(wE(w,b))=w[2(wi=1mxi2i=1m(yib)xi)]=w[2wi=1mxi2]

=2∑i=1mxi2 =2 \sum_{i=1}^{m} x_{i}^{2} =2i=1mxi2

∂2E(w,b)∂w∂b=∂∂b(∂E(w,b)∂w)=∂∂b[2(w∑i=1mxi2−∑i=1m(yi−b)xi)]=∂∂b[−2∑i=1myixi+2∑i=1mbxi)=∂∂b(2∑i=1mbxi) \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial w \partial b} &amp;=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &amp;=\frac{\partial}{\partial b}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &amp;=\frac{\partial}{\partial b}\left[-2 \sum_{i=1}^{m} y_{i} x_{i}+2 \sum_{i=1}^{m} b x_{i}\right) \\ &amp;=\frac{\partial}{\partial b}\left(2 \sum_{i=1}^{m} b x_{i}\right) \end{aligned} wb2E(w,b)=b(wE(w,b))=b[2(wi=1mxi2i=1m(yib)xi)]=b[2i=1myixi+2i=1mbxi)=b(2i=1mbxi)

∂E(w,b)∂b=∂∂b[∑i=1m(yi−wxi−b)2]=∑i=1m∂∂b(yi−wxi−b)2=∑i=1m2⋅(yi−wxi−b)⋅(−1)=2(mb−∑i=1m(yi−wxi)) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &amp;=\frac{\partial}{\partial b}\left[\sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2}\right] \\ &amp;=\sum_{i=1}^{m} \frac{\partial}{\partial b}\left(y_{i}-w x_{i}-b\right)^{2} \\ &amp;=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot(-1) \\ &amp;=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) \end{aligned} bE(w,b)=b[i=1m(yiwxib)2]=i=1mb(yiwxib)2=i=1m2(yiwxib)(1)=2(mbi=1m(yiwxi))

∂2E(w,b)∂b2=∂∂b(∂E(w,b)∂b)=∂∂b[2(mb−∑i=1m(yi−wxi))]=∂∂b(2mb)=2m \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial b^{2}} &amp;=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial b}\right) \\ &amp;=\frac{\partial}{\partial b}\left[2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right)\right] \\ &amp;=\frac{\partial}{\partial b}(2 m b) \\&amp;=2m\end{aligned} b22E(w,b)=b(bE(w,b))=b[2(mbi=1m(yiwxi))]=b(2mb)=2m

即:
A=2∑i=1mxi2B=2∑i=1mxiC=2m A=2 \sum_{i=1}^{m} x_{i}^{2} \qquad B=2 \sum_{i=1}^{m} x_{i} \qquad C=2 m A=2i=1mxi2B=2i=1mxiC=2m

AC−B2=2m⋅2∑i=1mxi2−(2∑i=1mxi)2=4m∑i=1m(xi2−xix‾) \begin{aligned} A C-B^{2} &amp;=2 m \cdot 2 \sum_{i=1}^{m} x_{i}^{2}-\left(2 \sum_{i=1}^{m} x_{i}\right)^{2}\\&amp;=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}\right) \end{aligned} ACB2=2m2i=1mxi2(2i=1mxi)2=4mi=1m(xi2xix)

又:
∑i=1mxix‾=x‾∑i=1mxi=x‾⋅m⋅1n⋅∑i=1mxi=mx‾2=∑i=1mx‾2 \sum_{i=1}^{m} x_{i} \overline{x}=\overline{x} \sum_{i=1}^{m} x_{i}=\overline{x} \cdot m \cdot \frac{1}{n} \cdot \sum_{i=1}^{m} x_{i}=m \overline{x}^{2}=\sum_{i=1}^{m} \overline{x}^{2} i=1mxix=xi=1mxi=xmn1i=1mxi=mx2=i=1mx2
所以上式
=4m∑i=1m(xi2−xix‾−xix‾+xix‾)=4m∑i=1m(xi2−xix‾−xix‾+x‾2)=4m∑i=1m(xi−x‾)2 =4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}-x_{i} \overline{x}+x_{i} \overline{x}\right)=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}-x_{i} \overline{x}+\overline{x}^{2}\right)=4 m \sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)^{2} =4mi=1m(xi2xixxix+xix)=4mi=1m(xi2xixxix+x2)=4mi=1m(xix)2

∂E(w,b)∂b=2(mb−∑i=1m(yi−wxi))=0mb−∑i=1m(yi−wxi)=0b=1m∑i=1m(yi−wxi)=y‾−wx‾ \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &amp;=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right)=0 \\ &amp; m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)=0 \\ b &amp;=\frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\\&amp;=\overline{y}-w \overline{x} \end{aligned} bE(w,b)b=2(mbi=1m(yiwxi))=0mbi=1m(yiwxi)=0=m1i=1m(yiwxi)=ywx

b=y‾−wx‾b=\overline{y}-w \overline{x}b=ywx代入w∑i=1mxi2=∑i=1myixi−∑i=1mbxiw \sum_{i=1}^{m} x_{i}^{2}=\sum_{i=1}^{m} y_{i} x_{i}-\sum_{i=1}^{m} b x_{i}wi=1mxi2=i=1myixii=1mbxi可得
w∑i=1mxi2−wx‾∑i=1mxi=∑i=1myixi−y‾∑i=1mxi w \sum_{i=1}^{m} x_{i}^{2}-w \overline{x} \sum_{i=1}^{m} x_{i}=\sum_{i=1}^{m} y_{i} x_{i}-\overline{y} \sum_{i=1}^{m} x_{i} wi=1mxi2wxi=1mxi=i=1myixiyi=1mxi

w=∑i=1myixi−y‾∑i=1mxi∑i=1mxi2−x‾∑i=1mxi w=\frac{\sum_{i=1}^{m} y_{i} x_{i}-\overline{y} \sum_{i=1}^{m} x_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\overline{x} \sum_{i=1}^{m} x_{i}} w=i=1mxi2xi=1mxii=1myixiyi=1mxi

又有
y‾∑i=1mxi=1m∑i=1myi∑i=1mxi=x‾∑i=1myi \overline{y} \sum_{i=1}^{m} x_{i}=\frac{1}{m} \sum_{i=1}^{m} y_{i} \sum_{i=1}^{m} x_{i}=\overline{x} \sum_{i=1}^{m} y_{i} yi=1mxi=m1i=1myii=1mxi=xi=1myi

x‾∑i=1mxi=1m∑i=1mxi∑i=1mxi=1m(∑i=1mxi)2 \overline{x} \sum_{i=1}^{m} x_{i}=\frac{1}{m} \sum_{i=1}^{m} x_{i} \sum_{i=1}^{m} x_{i}=\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2} xi=1mxi=m1i=1mxii=1mxi=m1(i=1mxi)2


w=∑i=1myixi−x‾∑i=1myi∑i=1mxi2−1m(∑i=1mxi)2 w=\frac{\sum_{i=1}^{m} y_{i} x_{i}-\overline{x} \sum_{i=1}^{m} y_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}} w=i=1mxi2m1(i=1mxi)2i=1myixixi=1myi

二元凹凸函数求最值:

设f(x,y)是在开区域D内具有连续偏导数的凸(或者凹)函数,(x0,y0)∈D\left(x_{0}, y_{0}\right) \in D(x0,y0)Dfx′(x0,y0)=0,fy′(x0,y0)=0f_{x}^{\prime}\left(x_{0}, y_{0}\right)=0, f_{y}^{\prime}\left(x_{0}, y_{0}\right)=0fx(x0,y0)=0,fy(x0,y0)=0f(x0,y0)f\left(x_{0}, y_{0}\right)f(x0,y0)必为f(x,y)在D内的最小值(或最大值)。

求解b:
b=1m∑i=1m(yi−wxi) b=\frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-w x_{i}\right) b=m1i=1m(yiwxi)
求解www:
w=∑i=1myi(xi−x‾)∑i=1mxi2−1m(∑i=1mxi)2 w=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\overline{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}} w=i=1mxi2m1(i=1mxi)2i=1myi(xix)
www的向量化:


1m(∑i=1mxi)2=x‾∑i=1mxi=∑i=1mxix‾ \frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}=\overline{x} \sum_{i=1}^{m} x_{i}=\sum_{i=1}^{m} x_{i} \overline{x} m1(i=1mxi)2=xi=1mxi=i=1mxix
代入分母得
w=∑i=1myi(xi−x‾)∑i=1mxi2−∑i=1mxix‾=∑i=1m(yixi−yix‾)∑i=1m(xi2−xix‾) \begin{aligned} w &amp;=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\overline{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m} x_{i} \overline{x}} \\ &amp;=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \overline{x}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}\right)} \end{aligned} w=i=1mxi2i=1mxixi=1myi(xix)=i=1m(xi2xix)i=1m(yixiyix)
同理对分子分母做类似转换:
w=∑i=1m(yixi−yix‾)∑i=1m(xi2−xix‾)=∑i=1m(yixi−yix‾−yix‾+yix‾)∑i=1m(xi2−xix‾−xix‾+xix‾) w=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \overline{x}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}\right)}=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \overline{x}-y_{i} \overline{x}+y_{i} \overline{x}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}-x_{i} \overline{x}+x_{i} \overline{x}\right)} w=i=1m(xi2xix)i=1m(yixiyix)=i=1m(xi2xixxix+xix)i=1m(yixiyixyix+yix)

=∑i=1m(y^ixi−yix‾−xiy‾+x‾y‾)∑i=1m(xi2−xix‾−xix‾+x‾2)=∑i=1m(xi−x‾)(yi−y‾)∑i=1m(xi−x‾)2 =\frac{\sum_{i=1}^{m}\left(\hat{y}_{i} x_{i}-y_{i} \overline{x}-x_{i} \overline{y}+\overline{x} \overline{y}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}-x_{i} \overline{x}+\overline{x}^{2}\right)}=\frac{\sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)\left(y_{i}-\overline{y}\right)}{\sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)^{2}} =i=1m(xi2xixxix+x2)i=1m(y^ixiyixxiy+xy)=i=1m(xix)2i=1m(xix)(yiy)


x=(x1,x2,…,xm)Ty=(y1,y2,…,ym)Txd=(x1−x‾,x2−x‾,…,xm−x‾)Tyd=(y1−y‾,y2−y‾,…,ym−y‾)T \begin{array}{c}{\boldsymbol{x}=\left(x_{1}, x_{2}, \ldots, x_{m}\right)^{T} \quad \boldsymbol{y}=\left(y_{1}, y_{2}, \ldots, y_{m}\right)^{T}} \\ {\boldsymbol{x}_{d}=\left(x_{1}-\overline{x}, x_{2}-\overline{x}, \ldots, x_{m}-\overline{x}\right)^{T} \quad \boldsymbol{y}_{d}=\left(y_{1}-\overline{y}, y_{2}-\overline{y}, \ldots, y_{m}-\overline{y}\right)^{T}}\end{array} x=(x1,x2,,xm)Ty=(y1,y2,,ym)Txd=(x1x,x2x,,xmx)Tyd=(y1y,y2y,,ymy)T

w=∑i=1m(xi−x‾)(yi−y‾)∑i=1m(xi−x‾)2=xdTydxdTxd \begin{aligned} w &amp;=\frac{\sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)\left(y_{i}-\overline{y}\right)}{\sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)^{2}} \\ &amp;=\frac{\boldsymbol{x}_{d}^{T} \boldsymbol{y}_{d}}{\boldsymbol{x}_{d}^{T} \boldsymbol{x}_{d}} \end{aligned} w=i=1m(xix)2i=1m(xix)(yiy)=xdTxdxdTyd

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值