线性模型
一元线性回归
- 基本形式
f(x)=w1x1+w2x2+…+wdxd+b f(\boldsymbol{x})=w_{1} x_{1}+w_{2} x_{2}+\ldots+w_{d} x_{d}+b f(x)=w1x1+w2x2+…+wdxd+b
向量形式
f(x)=wTx+b
f(\boldsymbol{x})=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}+b
f(x)=wTx+b
目标:均方误差最小化
(w∗,b∗)=argmin(w,b)∑i=1m(f(xi)−yi)2=argmin(w,b)∑i=1m(yi−wxi−b)2
\begin{aligned}\left(w^{*}, b^{*}\right) &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(f\left(x_{i}\right)-y_{i}\right)^{2} \\ &=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} \end{aligned}
(w∗,b∗)=(w,b)argmini=1∑m(f(xi)−yi)2=(w,b)argmini=1∑m(yi−wxi−b)2
方法:线性回归模型的最小二乘“参数估计”。将Ew,bE_{w,b}Ew,b分别对www和bbb求导得到:
∂E(w,b)∂w=2(w∑i=1mxi2−∑i=1m(yi−b)xi)
\frac{\partial E_{(w, b)}}{\partial w}=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)
∂w∂E(w,b)=2(wi=1∑mxi2−i=1∑m(yi−b)xi)
=∑i=1m∂∂w(yi−wxi−b)2 =\sum_{i=1}^{m} \frac{\partial}{\partial w}\left(y_{i}-w x_{i}-b\right)^{2} =i=1∑m∂w∂(yi−wxi−b)2
=∑i=1m2⋅(yi−wxi−b)⋅(−xi) =\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot\left(-x_{i}\right) =i=1∑m2⋅(yi−wxi−b)⋅(−xi)
∂E(w,b)∂b=2(mb−∑i=1m(yi−wxi)) \frac{\partial E_{(w, b)}}{\partial b}=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) ∂b∂E(w,b)=2(mb−i=1∑m(yi−wxi))
这里Ew,bE_{w,b}Ew,b是关于w和b的凸函数,当它关于w和b的导数均为零时,得到w和b的最优解.
判断凹凸性:
设f(x,y)在区域D上具有二阶连续偏导数,记$A = f_{xx}’’(x,y),B = f_{xy}’’(x,y),C = f_{yy}’’(x,y) $则:
(1)D上恒有A>0,且AC-B2>=0B^2>=0B2>=0时,f(x,y)在区域D上是凸函数;
(2)D上恒有A<0A<0A<0且AC−B2≥0A C-B^{2} \geq 0AC−B2≥0时,f(x,y)在区域D上是凹函数
∂2E(w,b)∂w2=∂∂w(∂E(w,b)∂w)=∂∂w[2(w∑i=1mxi2−∑i=1m(yi−b)xi)]=∂∂w[2w∑i=1mxi2] \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial w^{2}} &=\frac{\partial}{\partial w}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &=\frac{\partial}{\partial w}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &=\frac{\partial}{\partial w}\left[2 w \sum_{i=1}^{m} x_{i}^{2}\right] \end{aligned} ∂w2∂2E(w,b)=∂w∂(∂w∂E(w,b))=∂w∂[2(wi=1∑mxi2−i=1∑m(yi−b)xi)]=∂w∂[2wi=1∑mxi2]
=2∑i=1mxi2 =2 \sum_{i=1}^{m} x_{i}^{2} =2i=1∑mxi2
∂2E(w,b)∂w∂b=∂∂b(∂E(w,b)∂w)=∂∂b[2(w∑i=1mxi2−∑i=1m(yi−b)xi)]=∂∂b[−2∑i=1myixi+2∑i=1mbxi)=∂∂b(2∑i=1mbxi) \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial w \partial b} &=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &=\frac{\partial}{\partial b}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &=\frac{\partial}{\partial b}\left[-2 \sum_{i=1}^{m} y_{i} x_{i}+2 \sum_{i=1}^{m} b x_{i}\right) \\ &=\frac{\partial}{\partial b}\left(2 \sum_{i=1}^{m} b x_{i}\right) \end{aligned} ∂w∂b∂2E(w,b)=∂b∂(∂w∂E(w,b))=∂b∂[2(wi=1∑mxi2−i=1∑m(yi−b)xi)]=∂b∂[−2i=1∑myixi+2i=1∑mbxi)=∂b∂(2i=1∑mbxi)
∂E(w,b)∂b=∂∂b[∑i=1m(yi−wxi−b)2]=∑i=1m∂∂b(yi−wxi−b)2=∑i=1m2⋅(yi−wxi−b)⋅(−1)=2(mb−∑i=1m(yi−wxi)) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &=\frac{\partial}{\partial b}\left[\sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial b}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot(-1) \\ &=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) \end{aligned} ∂b∂E(w,b)=∂b∂[i=1∑m(yi−wxi−b)2]=i=1∑m∂b∂(yi−wxi−b)2=i=1∑m2⋅(yi−wxi−b)⋅(−1)=2(mb−i=1∑m(yi−wxi))
∂2E(w,b)∂b2=∂∂b(∂E(w,b)∂b)=∂∂b[2(mb−∑i=1m(yi−wxi))]=∂∂b(2mb)=2m \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial b^{2}} &=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial b}\right) \\ &=\frac{\partial}{\partial b}\left[2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right)\right] \\ &=\frac{\partial}{\partial b}(2 m b) \\&=2m\end{aligned} ∂b2∂2E(w,b)=∂b∂(∂b∂E(w,b))=∂b∂[2(mb−i=1∑m(yi−wxi))]=∂b∂(2mb)=2m
即:
A=2∑i=1mxi2B=2∑i=1mxiC=2m
A=2 \sum_{i=1}^{m} x_{i}^{2} \qquad B=2 \sum_{i=1}^{m} x_{i} \qquad C=2 m
A=2i=1∑mxi2B=2i=1∑mxiC=2m
AC−B2=2m⋅2∑i=1mxi2−(2∑i=1mxi)2=4m∑i=1m(xi2−xix‾) \begin{aligned} A C-B^{2} &=2 m \cdot 2 \sum_{i=1}^{m} x_{i}^{2}-\left(2 \sum_{i=1}^{m} x_{i}\right)^{2}\\&=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}\right) \end{aligned} AC−B2=2m⋅2i=1∑mxi2−(2i=1∑mxi)2=4mi=1∑m(xi2−xix)
又:
∑i=1mxix‾=x‾∑i=1mxi=x‾⋅m⋅1n⋅∑i=1mxi=mx‾2=∑i=1mx‾2
\sum_{i=1}^{m} x_{i} \overline{x}=\overline{x} \sum_{i=1}^{m} x_{i}=\overline{x} \cdot m \cdot \frac{1}{n} \cdot \sum_{i=1}^{m} x_{i}=m \overline{x}^{2}=\sum_{i=1}^{m} \overline{x}^{2}
i=1∑mxix=xi=1∑mxi=x⋅m⋅n1⋅i=1∑mxi=mx2=i=1∑mx2
所以上式
=4m∑i=1m(xi2−xix‾−xix‾+xix‾)=4m∑i=1m(xi2−xix‾−xix‾+x‾2)=4m∑i=1m(xi−x‾)2
=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}-x_{i} \overline{x}+x_{i} \overline{x}\right)=4 m \sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}-x_{i} \overline{x}+\overline{x}^{2}\right)=4 m \sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)^{2}
=4mi=1∑m(xi2−xix−xix+xix)=4mi=1∑m(xi2−xix−xix+x2)=4mi=1∑m(xi−x)2
∂E(w,b)∂b=2(mb−∑i=1m(yi−wxi))=0mb−∑i=1m(yi−wxi)=0b=1m∑i=1m(yi−wxi)=y‾−wx‾ \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right)=0 \\ & m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)=0 \\ b &=\frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\\&=\overline{y}-w \overline{x} \end{aligned} ∂b∂E(w,b)b=2(mb−i=1∑m(yi−wxi))=0mb−i=1∑m(yi−wxi)=0=m1i=1∑m(yi−wxi)=y−wx
将b=y‾−wx‾b=\overline{y}-w \overline{x}b=y−wx代入w∑i=1mxi2=∑i=1myixi−∑i=1mbxiw \sum_{i=1}^{m} x_{i}^{2}=\sum_{i=1}^{m} y_{i} x_{i}-\sum_{i=1}^{m} b x_{i}w∑i=1mxi2=∑i=1myixi−∑i=1mbxi可得
w∑i=1mxi2−wx‾∑i=1mxi=∑i=1myixi−y‾∑i=1mxi
w \sum_{i=1}^{m} x_{i}^{2}-w \overline{x} \sum_{i=1}^{m} x_{i}=\sum_{i=1}^{m} y_{i} x_{i}-\overline{y} \sum_{i=1}^{m} x_{i}
wi=1∑mxi2−wxi=1∑mxi=i=1∑myixi−yi=1∑mxi
w=∑i=1myixi−y‾∑i=1mxi∑i=1mxi2−x‾∑i=1mxi w=\frac{\sum_{i=1}^{m} y_{i} x_{i}-\overline{y} \sum_{i=1}^{m} x_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\overline{x} \sum_{i=1}^{m} x_{i}} w=∑i=1mxi2−x∑i=1mxi∑i=1myixi−y∑i=1mxi
又有
y‾∑i=1mxi=1m∑i=1myi∑i=1mxi=x‾∑i=1myi
\overline{y} \sum_{i=1}^{m} x_{i}=\frac{1}{m} \sum_{i=1}^{m} y_{i} \sum_{i=1}^{m} x_{i}=\overline{x} \sum_{i=1}^{m} y_{i}
yi=1∑mxi=m1i=1∑myii=1∑mxi=xi=1∑myi
x‾∑i=1mxi=1m∑i=1mxi∑i=1mxi=1m(∑i=1mxi)2 \overline{x} \sum_{i=1}^{m} x_{i}=\frac{1}{m} \sum_{i=1}^{m} x_{i} \sum_{i=1}^{m} x_{i}=\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2} xi=1∑mxi=m1i=1∑mxii=1∑mxi=m1(i=1∑mxi)2
则
w=∑i=1myixi−x‾∑i=1myi∑i=1mxi2−1m(∑i=1mxi)2
w=\frac{\sum_{i=1}^{m} y_{i} x_{i}-\overline{x} \sum_{i=1}^{m} y_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}}
w=∑i=1mxi2−m1(∑i=1mxi)2∑i=1myixi−x∑i=1myi
二元凹凸函数求最值:
设f(x,y)是在开区域D内具有连续偏导数的凸(或者凹)函数,(x0,y0)∈D\left(x_{0}, y_{0}\right) \in D(x0,y0)∈D且fx′(x0,y0)=0,fy′(x0,y0)=0f_{x}^{\prime}\left(x_{0}, y_{0}\right)=0, f_{y}^{\prime}\left(x_{0}, y_{0}\right)=0fx′(x0,y0)=0,fy′(x0,y0)=0则f(x0,y0)f\left(x_{0}, y_{0}\right)f(x0,y0)必为f(x,y)在D内的最小值(或最大值)。
求解b:
b=1m∑i=1m(yi−wxi)
b=\frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)
b=m1i=1∑m(yi−wxi)
求解www:
w=∑i=1myi(xi−x‾)∑i=1mxi2−1m(∑i=1mxi)2
w=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\overline{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}}
w=∑i=1mxi2−m1(∑i=1mxi)2∑i=1myi(xi−x)
www的向量化:
将
1m(∑i=1mxi)2=x‾∑i=1mxi=∑i=1mxix‾
\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}=\overline{x} \sum_{i=1}^{m} x_{i}=\sum_{i=1}^{m} x_{i} \overline{x}
m1(i=1∑mxi)2=xi=1∑mxi=i=1∑mxix
代入分母得
w=∑i=1myi(xi−x‾)∑i=1mxi2−∑i=1mxix‾=∑i=1m(yixi−yix‾)∑i=1m(xi2−xix‾)
\begin{aligned} w &=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\overline{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m} x_{i} \overline{x}} \\ &=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \overline{x}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}\right)} \end{aligned}
w=∑i=1mxi2−∑i=1mxix∑i=1myi(xi−x)=∑i=1m(xi2−xix)∑i=1m(yixi−yix)
同理对分子分母做类似转换:
w=∑i=1m(yixi−yix‾)∑i=1m(xi2−xix‾)=∑i=1m(yixi−yix‾−yix‾+yix‾)∑i=1m(xi2−xix‾−xix‾+xix‾)
w=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \overline{x}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}\right)}=\frac{\sum_{i=1}^{m}\left(y_{i} x_{i}-y_{i} \overline{x}-y_{i} \overline{x}+y_{i} \overline{x}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}-x_{i} \overline{x}+x_{i} \overline{x}\right)}
w=∑i=1m(xi2−xix)∑i=1m(yixi−yix)=∑i=1m(xi2−xix−xix+xix)∑i=1m(yixi−yix−yix+yix)
=∑i=1m(y^ixi−yix‾−xiy‾+x‾y‾)∑i=1m(xi2−xix‾−xix‾+x‾2)=∑i=1m(xi−x‾)(yi−y‾)∑i=1m(xi−x‾)2 =\frac{\sum_{i=1}^{m}\left(\hat{y}_{i} x_{i}-y_{i} \overline{x}-x_{i} \overline{y}+\overline{x} \overline{y}\right)}{\sum_{i=1}^{m}\left(x_{i}^{2}-x_{i} \overline{x}-x_{i} \overline{x}+\overline{x}^{2}\right)}=\frac{\sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)\left(y_{i}-\overline{y}\right)}{\sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)^{2}} =∑i=1m(xi2−xix−xix+x2)∑i=1m(y^ixi−yix−xiy+xy)=∑i=1m(xi−x)2∑i=1m(xi−x)(yi−y)
令
x=(x1,x2,…,xm)Ty=(y1,y2,…,ym)Txd=(x1−x‾,x2−x‾,…,xm−x‾)Tyd=(y1−y‾,y2−y‾,…,ym−y‾)T
\begin{array}{c}{\boldsymbol{x}=\left(x_{1}, x_{2}, \ldots, x_{m}\right)^{T} \quad \boldsymbol{y}=\left(y_{1}, y_{2}, \ldots, y_{m}\right)^{T}} \\ {\boldsymbol{x}_{d}=\left(x_{1}-\overline{x}, x_{2}-\overline{x}, \ldots, x_{m}-\overline{x}\right)^{T} \quad \boldsymbol{y}_{d}=\left(y_{1}-\overline{y}, y_{2}-\overline{y}, \ldots, y_{m}-\overline{y}\right)^{T}}\end{array}
x=(x1,x2,…,xm)Ty=(y1,y2,…,ym)Txd=(x1−x,x2−x,…,xm−x)Tyd=(y1−y,y2−y,…,ym−y)T
w=∑i=1m(xi−x‾)(yi−y‾)∑i=1m(xi−x‾)2=xdTydxdTxd \begin{aligned} w &=\frac{\sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)\left(y_{i}-\overline{y}\right)}{\sum_{i=1}^{m}\left(x_{i}-\overline{x}\right)^{2}} \\ &=\frac{\boldsymbol{x}_{d}^{T} \boldsymbol{y}_{d}}{\boldsymbol{x}_{d}^{T} \boldsymbol{x}_{d}} \end{aligned} w=∑i=1m(xi−x)2∑i=1m(xi−x)(yi−y)=xdTxdxdTyd