线性模型——线性回归

线性回归的基本形式

给定由d个属性描述的示例 x = ( x 1 , x 2 , . . . . . x d ) x=(x_1,x_2,.....x_d) x=(x1,x2,.....xd),xi是x在第i个属性上的取值,线性模型试图学得一个通过属性的线性组合来进行预测的函数,即:
f ( x ) = w 1 x 1 + w 2 x 2 + . . . + w d x d + b f(x)=w_1x_1+w_2x_2+...+w_dx_d+b f(x)=w1x1+w2x2+...+wdxd+b
其实线性回归就是概率论中的线性回归,做物理试验时也经常用。

一元线性回归就是输入的属性只有一个。
对于离散的属性,如果存在顺序关系,即可以分别大小等,可以转化成连续值。例如:人好看是1,不好看是0;如果不存在有序关系,可转化为k维向量,每个属性对应位置上是1。

一元线性回归
最小二乘估计

线性回归试图学得 f ( x i ) = w x i + b f(x_i)=wx_i+b f(xi)=wxi+b。求w和b。
基于均方误差最小化来进行模型求解的方法称为“最小二乘法”
E ( w , b ) = ∑ i = 1 m ( y i − f ( x i ) ) 2 = ∑ i = 1 m ( y i − ( w x i + b ) ) 2 = ∑ i = 1 m ( y i − w x i − b ) 2 \begin{aligned} E_{(w, b)} &=\sum_{i=1}^{m}\left(y_{i}-f\left(x_{i}\right)\right)^{2} \\ &=\sum_{i=1}^{m}\left(y_{i}-\left(w x_{i}+b\right)\right)^{2} \\ &=\sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} \end{aligned} E(w,b)=i=1m(yif(xi))2=i=1m(yi(wxi+b))2=i=1m(yiwxib)2
最终求得的是使后边的式子值最小的w和b的值。

最小二乘法试图找到一条直线,使所有样本到直线上的欧式距离最小。

极大似然估计

极大似然估计用来估计概率分布的参数值。
方法是:方法:对于离散型(连续型)随机变量 X X X,假设其概率质量函数为 P ( x ; θ ) P(x ; \theta) P(x;θ)(概率密度函数为 p ( x ; θ ) ) p(x ; \theta)) p(x;θ))),其中为待估计的参数值 θ \theta θ(可以有多个)。现有 x 1 , x 2 , x 3 , … , x n x_{1}, x_{2}, x_{3}, \ldots, x_{n} x1,x2,x3,,xn
来自 X X X的个独立同分布的样本,它们的联合概率为
L ( θ ) = ∏ i = 1 n P ( x i ; θ ) L(\theta)=\prod_{i=1}^{n} P\left(x_{i} ; \theta\right) L(θ)=i=1nP(xi;θ)
使得 L ( θ ) = ∏ i = 1 n P ( x i ; θ ) L(\theta)=\prod_{i=1}^{n} P\left(x_{i} ; \theta\right) L(θ)=i=1nP(xi;θ)取得最大值得 θ \theta θ即为所求的参数值。
由于是连乘运算,可以取对数变成加法运算,然后求导得到取值。
推导代价函数
对于线性回归模型,引入一个误差参数有: y = w x + b + ϵ y=w x+b+\epsilon y=wx+b+ϵ
其中 ϵ \epsilon ϵ 为不受控制的随机误差,通常假设其服从均值为0的正态分布 ϵ ∼ N ( 0 , σ 2 ) \epsilon \sim N\left(0, \sigma^{2}\right) ϵN(0,σ2) (高斯 提出的,也可以用中心极限定理解释),所以 ϵ \epsilon ϵ 的概率密度函数为
p ( ϵ ) = 1 2 π σ exp ⁡ ( − ϵ 2 2 σ 2 ) p(\epsilon)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\epsilon^{2}}{2 \sigma^{2}}\right) p(ϵ)=2π σ1exp(2σ2ϵ2)
若将 ϵ \epsilon ϵ y − ( w x + b ) y-(w x+b) y(wx+b) 等价替换可得
p ( y ) = 1 2 π σ exp ⁡ ( − ( y − ( w x + b ) ) 2 2 σ 2 ) p(y)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{(y-(w x+b))^{2}}{2 \sigma^{2}}\right) p(y)=2π σ1exp(2σ2(y(wx+b))2)
上式显然可以看作 y ∼ N ( w x + b , σ 2 ) y \sim N\left(w x+b, \sigma^{2}\right) yN(wx+b,σ2), 下面便可以用极大似然估计来估计 w w w b b b 的 值, 似然函数为
L ( w , b ) = ∏ i = 1 m p ( y i ) = ∏ i = 1 m 1 2 π σ exp ⁡ ( − ( y i − ( w x i + b ) ) 2 2 σ 2 ) ln ⁡ L ( w , b ) = ∑ i = 1 m ln ⁡ 1 2 π σ exp ⁡ ( − ( y i − w x i − b ) 2 2 σ 2 ) = ∑ i = 1 m ln ⁡ 1 2 π σ + ∑ i = 1 m ln ⁡ exp ⁡ ( − ( y i − w x i − b ) 2 2 σ 2 ) \begin{aligned} L(w, b)=& \prod_{i=1}^{m} p\left(y_{i}\right)=\prod_{i=1}^{m} \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y_{i}-\left(w x_{i}+b\right)\right)^{2}}{2 \sigma^{2}}\right) \\ \ln L(w, b) &=\sum_{i=1}^{m} \ln \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y_{i}-w x_{i}-b\right)^{2}}{2 \sigma^{2}}\right) \\ &=\sum_{i=1}^{m} \ln \frac{1}{\sqrt{2 \pi} \sigma}+\sum_{i=1}^{m} \ln \exp \left(-\frac{\left(y_{i}-w x_{i}-b\right)^{2}}{2 \sigma^{2}}\right) \end{aligned} L(w,b)=lnL(w,b)i=1mp(yi)=i=1m2π σ1exp(2σ2(yi(wxi+b))2)=i=1mln2π σ1exp(2σ2(yiwxib)2)=i=1mln2π σ1+i=1mlnexp(2σ2(yiwxib)2)
最终有
( w ∗ , b ∗ ) = arg ⁡ max ⁡ ( w , b ) ln ⁡ L ( w , b ) = arg ⁡ min ⁡ ( w , b ) ∑ i = 1 m ( y i − w x i − b ) 2 \left(w^{*}, b^{*}\right)=\underset{(w, b)}{\arg \max } \ln L(\boldsymbol{w}, b)=\underset{(w, b)}{\arg \min } \sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2} (w,b)=(w,b)argmaxlnL(w,b)=(w,b)argmini=1m(yiwxib)2
和最小二乘法殊途同归。

求解w和b
  • 凸集和凸函数

凸集:设集合 D ⊂ R n D \subset \mathbb{R}^{n} DRn, 如果对任意的 x , y ∈ D \boldsymbol{x}, \boldsymbol{y} \in D x,yD 与任意的 α ∈ [ 0 , 1 ] \alpha \in[0,1] α[0,1], 有
α x + ( 1 − α ) y ∈ D \alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y} \in D αx+(1α)yD
则称集合 D D D 是凸集。凸集的几何意义是:若两个点属于此集合,则这两点连线上的任意 一点均属于此集合(此处应该有图)。常见的凸集有空集\varnothing, n \mathrm{n} n 维欧式空间 R n \mathbb{R}^{n} Rn 凸函数:设 D D D 是非空凸集, f f f 是定义在 D D D 上的函数,如果对任意的 x 1 , x 2 ∈ D , α ∈ \boldsymbol{x}^{1}, \boldsymbol{x}^{2} \in D, \alpha \in x1,x2D,α ( 0 , 1 ) (0,1) (0,1), 均有
f ( α x 1 + ( 1 − α ) x 2 ) ⩽ α f ( x 1 ) + ( 1 − α ) f ( x 2 ) f\left(\alpha \boldsymbol{x}^{1}+(1-\alpha) \boldsymbol{x}^{2}\right) \leqslant \alpha f\left(\boldsymbol{x}^{1}\right)+(1-\alpha) f\left(\boldsymbol{x}^{2}\right) f(αx1+(1α)x2)αf(x1)+(1α)f(x2)

  • 梯度和海塞矩阵
    函数对变量的偏导写成向量形式就是梯度。
    函数对变量求二阶偏导写成矩阵形式。
    ∇ 2 f ( x ) = [ ∂ 2 f ( x ) ∂ x 1 2 ∂ 2 f ( x ) ∂ x 1 ∂ x 2 ⋯ ∂ 2 f ( x ) ∂ x 1 ∂ x n ∂ 2 f ( x ) ∂ x 2 ∂ x 1 ∂ 2 f ( x ) ∂ x 2 2 ⋯ ∂ 2 f ( x ) ∂ x 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ( x ) ∂ x n ∂ x 1 ∂ 2 f ( x ) ∂ x n ∂ x 2 ⋯ ∂ 2 f ( x ) ∂ x n 2 ] \nabla^{2} f(\boldsymbol{x})=\left[\begin{array}{cccc}\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{1}^{2}} & \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{1} \partial x_{2}} & \cdots & \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{1} \partial x_{n}} \\ \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{2} \partial x_{1}} & \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{2}^{2}} & \cdots & \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{2} \partial x_{n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{n} \partial x_{1}} & \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{n} \partial x_{2}} & \cdots & \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{n}^{2}}\end{array}\right] 2f(x)=x122f(x)x2x12f(x)xnx12f(x)x1x22f(x)x222f(x)xnx22f(x)x1xn2f(x)x2xn2f(x)xn22f(x)
    定理:设 D ⊂ R n D \subset \mathbb{R}^{n} DRn 是非空开凸集, f : D ⊂ R n → R f: D \subset \mathbb{R}^{n} \rightarrow \mathbb{R} f:DRnR, 且 f ( x ) f(\boldsymbol{x}) f(x) D D D 上二阶连续可微,
    如果 f ( x ) f(\boldsymbol{x}) f(x) 的Hessian(海塞)矩阵在 D D D 上是半正定的,则 f ( x ) f(\boldsymbol{x}) f(x) D D D 上的凸函数。(类比 一元函数判断凹凸性)
    因此只需证明海塞矩阵是半正定的,那么 E ( w , b ) E(w,b) E(w,b)就是关于w和b的凸函数。
    ∂ E ( w , b ) ∂ w = ∂ ∂ w [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ w ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − x i ) = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) \begin{aligned} \frac{\partial E_{(w, b)}}{\partial w} &=\frac{\partial}{\partial w}\left[\sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial w}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot\left(-x_{i}\right) \\ &=2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right) \end{aligned} wE(w,b)=w[i=1m(yiwxib)2]=i=1mw(yiwxib)2=i=1m2(yiwxib)(xi)=2(wi=1mxi2i=1m(yib)xi)

    ∂ E ( w , b ) ∂ w 2 = ∂ ∂ w ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ w [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = ∂ ∂ w ( 2 w ∑ i = 1 m x i 2 ) = 2 ∑ i = 1 m x i 2 \begin{aligned} \frac{\partial E_{(w, b)}}{\partial w^{2}} &=\frac{\partial}{\partial w}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &=\frac{\partial}{\partial w}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &=\frac{\partial}{\partial w}\left(2 w \sum_{i=1}^{m} x_{i}^{2}\right) \\ &=2 \sum_{i=1}^{m} x_{i}^{2} \end{aligned} w2E(w,b)=w(wE(w,b))=w[2(wi=1mxi2i=1m(yib)xi)]=w(2wi=1mxi2)=2i=1mxi2
    ∂ E ( w , b ) ∂ w ∂ b = ∂ ∂ b ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ b [ 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ] = ∂ ∂ b [ − 2 ∑ i = 1 m ( y i − b ) x i ] = ∂ ∂ b ( − 2 ∑ i = 1 m y i x i + 2 ∑ i = 1 m b x i ) = 2 ∑ i = 1 m x i \begin{aligned} \frac{\partial E_{(w, b)}}{\partial w \partial b} &=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial w}\right) \\ &=\frac{\partial}{\partial b}\left[2\left(w \sum_{i=1}^{m} x_{i}^{2}-\sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right)\right] \\ &=\frac{\partial}{\partial b}\left[-2 \sum_{i=1}^{m}\left(y_{i}-b\right) x_{i}\right] \\ &=\frac{\partial}{\partial b}\left(-2 \sum_{i=1}^{m} y_{i} x_{i}+2 \sum_{i=1}^{m} b x_{i}\right) \\ &=2 \sum_{i=1}^{m} x_{i} \end{aligned} wbE(w,b)=b(wE(w,b))=b[2(wi=1mxi2i=1m(yib)xi)]=b[2i=1m(yib)xi]=b(2i=1myixi+2i=1mbxi)=2i=1mxi
    ∂ E ( w , b ) ∂ b = ∂ ∂ b [ ∑ i = 1 m ( y i − w x i − b ) 2 ] = ∑ i = 1 m ∂ ∂ b ( y i − w x i − b ) 2 = ∑ i = 1 m 2 ⋅ ( y i − w x i − b ) ⋅ ( − 1 ) = 2 ( m b − ∑ i = 1 m ( y i − w x i ) )   \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b} &=\frac{\partial}{\partial b}\left[\sum_{i=1}^{m}\left(y_{i}-w x_{i}-b\right)^{2}\right] \\ &=\sum_{i=1}^{m} \frac{\partial}{\partial b}\left(y_{i}-w x_{i}-b\right)^{2} \\ &=\sum_{i=1}^{m} 2 \cdot\left(y_{i}-w x_{i}-b\right) \cdot(-1) \\ &=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right) \text { } \end{aligned} bE(w,b)=b[i=1m(yiwxib)2]=i=1mb(yiwxib)2=i=1m2(yiwxib)(1)=2(mbi=1m(yiwxi)) 
    ∂ E ( w , b ) ∂ b ∂ w = ∂ ∂ w ( ∂ E ( w , b ) ∂ b ) = ∂ ∂ w [ 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) ] = ∂ ∂ w ( 2 ∑ i = 1 m w x i ) = 2 ∑ i = 1 m x i \begin{aligned} \frac{\partial E_{(w, b)}}{\partial b \partial w} &=\frac{\partial}{\partial w}\left(\frac{\partial E_{(w, b)}}{\partial b}\right) \\ &=\frac{\partial}{\partial w}\left[2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right)\right] \\ &=\frac{\partial}{\partial w}\left(2 \sum_{i=1}^{m} w x_{i}\right) \\ &=2 \sum_{i=1}^{m} x_{i} \end{aligned} bwE(w,b)=w(bE(w,b))=w[2(mbi=1m(yiwxi))]=w(2i=1mwxi)=2i=1mxi
    ∂ 2 E ( w , b ) ∂ b 2 = ∂ ∂ b ( ∂ E ( w , b ) ∂ b ) = ∂ ∂ b [ 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) ] = 2 m \begin{aligned} \frac{\partial^{2} E_{(w, b)}}{\partial b^{2}} &=\frac{\partial}{\partial b}\left(\frac{\partial E_{(w, b)}}{\partial b}\right) \\ &=\frac{\partial}{\partial b}\left[2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right)\right] \\ &=2 m \end{aligned} b22E(w,b)=b(bE(w,b))=b[2(mbi=1m(yiwxi))]=2m
    ∇ 2 E ( w , b ) = [ ∂ 2 E ( w , b ) ∂ w 2 ∂ 2 E ( w , b ) ∂ w ∂ b ∂ 2 E ( w , b ) ∂ b ∂ w ∂ 2 E ( u , b ) ∂ b 2 ] = [ 2 ∑ i = 1 m x i 2 2 ∑ i = 1 m x i 2 ∑ i = 1 m x i 2 m ] \nabla^{2} E_{(w, b)}=\left[\begin{array}{cc}\frac{\partial^{2} E_{(w, b)}}{\partial w^{2}} & \frac{\partial^{2} E_{(w, b)}}{\partial w \partial b} \\ \frac{\partial^{2} E_{(w, b)}}{\partial b \partial w} & \frac{\partial^{2} E_{(u, b)}}{\partial b^{2}}\end{array}\right]=\left[\begin{array}{cc}2 \sum_{i=1}^{m} x_{i}^{2} & 2 \sum_{i=1}^{m} x_{i} \\ 2 \sum_{i=1}^{m} x_{i} & 2 m\end{array}\right] 2E(w,b)=[w22E(w,b)bw2E(w,b)wb2E(w,b)b22E(u,b)]=[2i=1mxi22i=1mxi2i=1mxi2m]
    经过证明,该矩阵是半正定的。(每个顺序子矩阵都≥0)

根据凸充分性定理,偏导等于0时的w和b即为所求。
所以:
∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) = 0 \frac{\partial E_{(w, b)}}{\partial b}=2\left(m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)\right)=0 bE(w,b)=2(mbi=1m(yiwxi))=0
m b − ∑ i = 1 m ( y i − w x i ) = 0 m b-\sum_{i=1}^{m}\left(y_{i}-w x_{i}\right)=0 mbi=1m(yiwxi)=0
b = 1 m ∑ i = 1 m ( y i − w x i ) b=\frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-w x_{i}\right) b=m1i=1m(yiwxi)

b = 1 m ∑ i = 1 m y i − w ⋅ 1 m ∑ i = 1 m x i = y ˉ − w x ˉ b=\frac{1}{m} \sum_{i=1}^{m} y_{i}-w \cdot \frac{1}{m} \sum_{i=1}^{m} x_{i}=\bar{y}-w \bar{x} b=m1i=1myiwm1i=1mxi=yˉwxˉ

w = ∑ i = 1 m y i x i − x ˉ ∑ i = 1 m y i ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 = ∑ i = 1 m y i ( x i − x ˉ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 w=\frac{\sum_{i=1}^{m} y_{i} x_{i}-\bar{x} \sum_{i=1}^{m} y_{i}}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}}=\frac{\sum_{i=1}^{m} y_{i}\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{m} x_{i}^{2}-\frac{1}{m}\left(\sum_{i=1}^{m} x_{i}\right)^{2}} w=i=1mxi2m1(i=1mxi)2i=1myixixˉi=1myi=i=1mxi2m1(i=1mxi)2i=1myi(xixˉ)

机器学习三要素
  • 模型:根据具体问题,确定假设空间。
  • 策略:根据评价标准,确定选取最优模型的策略(通常会产出一个“损失函数”)。
  • 算法:求解损失函数,确定最优模型。
多元线性回归

样本由多个属性描述时,试图学得:
f ( x i ) = w T x i + b f\left(\boldsymbol{x}_{i}\right)=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b f(xi)=wTxi+b
即:
f ( x i ) = ( w 1 w 2 ⋯ w d ) ( x i 1 x i 2 ⋮ x i d ) + b f ( x i ) = w 1 x i 1 + w 2 x i 2 + … + w d x i d + b \begin{gathered} f\left(\boldsymbol{x}_{i}\right)=\left(\begin{array}{cccc} w_{1} & w_{2} & \cdots & w_{d} \end{array}\right)\left(\begin{array}{c} x_{i 1} \\ x_{i 2} \\ \vdots \\ x_{i d} \end{array}\right)+b \\ f\left(\boldsymbol{x}_{i}\right)=w_{1} x_{i 1}+w_{2} x_{i 2}+\ldots+w_{d} x_{i d}+b \end{gathered} f(xi)=(w1w2wd)xi1xi2xid+bf(xi)=w1xi1+w2xi2++wdxid+b
b = w d + 1 ∗ 1 b =w_{d+1*1} b=wd+11
有:
f ( x i ) = w 1 x i 1 + w 2 x i 2 + … + w d x i d + w d + 1 ⋅ 1 f\left(\boldsymbol{x}_{i}\right)=w_{1} x_{i 1}+w_{2} x_{i 2}+\ldots+w_{d} x_{i d}+w_{d+1} \cdot 1 f(xi)=w1xi1+w2xi2++wdxid+wd+11
f ( x i ) = ( w 1 w 2 ⋯ w d w d + 1 ) ( x i 1 x i 2 ⋮ x i d 1 ) f ( x ^ i ) = w ^ T x ^ i \begin{gathered} f\left(\boldsymbol{x}_{i}\right)=\left(\begin{array}{lllll} w_{1} & w_{2} & \cdots & w_{d} & w_{d+1} \end{array}\right)\left(\begin{array}{c} x_{i 1} \\ x_{i 2} \\ \vdots \\ x_{i d} \\ 1 \end{array}\right) \\ f\left(\hat{\boldsymbol{x}}_{i}\right)=\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{i} \end{gathered} f(xi)=(w1w2wdwd+1)xi1xi2xid1f(x^i)=w^Tx^i
由最小二乘法:
E w ^ = ∑ i = 1 m ( y i − f ( x ^ i ) ) 2 = ∑ i = 1 m ( y i − w ^ T x ^ i ) 2 E_{\hat{\boldsymbol{w}}}=\sum_{i=1}^{m}\left(y_{i}-f\left(\hat{\boldsymbol{x}}_{i}\right)\right)^{2}=\sum_{i=1}^{m}\left(y_{i}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{i}\right)^{2} Ew^=i=1m(yif(x^i))2=i=1m(yiw^Tx^i)2
向量化:
E w ^ = ∑ i = 1 m ( y i − w ^ T x ^ i ) 2 = ( y 1 − w ^ T x ^ 1 ) 2 + ( y 2 − w ^ T x ^ 2 ) 2 + … + ( y m − w ^ T x ^ m ) 2 E w ^ = ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋯ y m − w ^ T x ^ m ) ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y m − w ^ T x ^ m ) \begin{aligned} &E_{\hat{\boldsymbol{w}}}=\sum_{i=1}^{m}\left(y_{i}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{i}\right)^{2}=\left(y_{1}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{1}\right)^{2}+\left(y_{2}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{2}\right)^{2}+\ldots+\left(y_{m}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{m}\right)^{2} \\ &E_{\hat{\boldsymbol{w}}}=\left(\begin{array}{cccc} y_{1}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{1} & y_{2}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{2} & \cdots & y_{m}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{m} \end{array}\right)\left(\begin{array}{c} y_{1}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{1} \\ y_{2}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{2} \\ \vdots \\ y_{m}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{m} \end{array}\right) \end{aligned} Ew^=i=1m(yiw^Tx^i)2=(y1w^Tx^1)2+(y2w^Tx^2)2++(ymw^Tx^m)2Ew^=(y1w^Tx^1y2w^Tx^2ymw^Tx^m)y1w^Tx^1y2w^Tx^2ymw^Tx^m
其中

( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y m − w ^ T x ^ m ) = ( y 1 y 2 ⋮ y m ) − ( w ^ T x ^ 1 w ^ T x ^ 2 ⋮ w ^ T x ^ m ) = ( y 1 y 2 ⋮ y m ) − ( x ^ 1 T w ^ x ^ 2 T w ^ ⋮ x ^ m T w ^ ) y = ( y 1 y 2 ⋮ y m ) , ( x ^ 1 T w ^ x ^ 2 T w ^ ⋮ x ^ m T w ^ ) = ( x ^ 1 T x ^ 2 T ⋮ x ^ m T ) ⋅ w ^ = ( x 1 T 1 x 2 T 1 ⋮ ⋮ x m T 1 ) ⋅ w ^ = X ⋅ w ^ \begin{gathered} \left(\begin{array}{c} y_{1}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{1} \\ y_{2}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{2} \\ \vdots \\ y_{m}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{m} \end{array}\right)=\left(\begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{array}\right)-\left(\begin{array}{c} \hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{1} \\ \hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{2} \\ \vdots \\ \hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{m} \end{array}\right)=\left(\begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{array}\right)-\left(\begin{array}{c} \hat{\boldsymbol{x}}_{1}^{\mathrm{T}} \hat{\boldsymbol{w}} \\ \hat{\boldsymbol{x}}_{2}^{\mathrm{T}} \hat{\boldsymbol{w}} \\ \vdots \\ \hat{\boldsymbol{x}}_{m}^{\mathrm{T}} \hat{\boldsymbol{w}} \end{array}\right) \\ \boldsymbol{y}=\left(\begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{array}\right), & \left(\begin{array}{c} \hat{\boldsymbol{x}}_{1}^{\mathrm{T}} \hat{\boldsymbol{w}} \\ \hat{\boldsymbol{x}}_{2}^{\mathrm{T}} \hat{\boldsymbol{w}} \\ \vdots \\ \hat{\boldsymbol{x}}_{m}^{\mathrm{T}} \hat{\boldsymbol{w}} \end{array}\right)=\left(\begin{array}{c} \hat{\boldsymbol{x}}_{1}^{\mathrm{T}} \\ \hat{\boldsymbol{x}}_{2}^{\mathrm{T}} \\ \vdots \\ \hat{\boldsymbol{x}}_{m}^{\mathrm{T}} \end{array}\right) \cdot \hat{\boldsymbol{w}}=\left(\begin{array}{cc} \boldsymbol{x}_{1}^{\mathrm{T}} & 1 \\ \boldsymbol{x}_{2}^{\mathrm{T}} & 1 \\ \vdots & \vdots \\ \boldsymbol{x}_{m}^{\mathrm{T}} & 1 \end{array}\right) \cdot \hat{\boldsymbol{w}}=\mathbf{X} \cdot \hat{\boldsymbol{w}} \end{gathered} y1w^Tx^1y2w^Tx^2ymw^Tx^m=y1y2ymw^Tx^1w^Tx^2w^Tx^m=y1y2ymx^1Tw^x^2Tw^x^mTw^y=y1y2ym,x^1Tw^x^2Tw^x^mTw^=x^1Tx^2Tx^mTw^=x1Tx2TxmT111w^=Xw^
所以
E w ^ = ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋯ y m − w ^ T x ^ m ) ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y m − w ^ T x ^ m ) E_{\hat{\boldsymbol{w}}}=\left(\begin{array}{cccc} y_{1}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{1} & y_{2}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{2} & \cdots & y_{m}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{m} \end{array}\right)\left(\begin{array}{c} y_{1}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{1} \\ y_{2}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{2} \\ \vdots \\ y_{m}-\hat{\boldsymbol{w}}^{\mathrm{T}} \hat{\boldsymbol{x}}_{m} \end{array}\right) Ew^=(y1w^Tx^1y2w^Tx^2ymw^Tx^m)y1w^Tx^1y2w^Tx^2ymw^Tx^m
则: E w ^ = ( y − X w ^ ) T ( y − X w ^ ) E_{\hat{\boldsymbol{w}}}=(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})^{\mathrm{T}}(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}}) Ew^=(yXw^)T(yXw^)
然后对w一尖求偏导,对向量求偏导可以查询数学手册。
∂ E w ^ ∂ w ^ = ∂ ∂ w ^ [ ( y − X w ^ ) T ( y − X w ^ ) ] = ∂ ∂ w ^ [ ( y T − w ^ T X T ) ( y − X w ^ ) ] = ∂ ∂ w ^ [ y T y − y T X w ^ − w ^ T X T y + w ^ T X T X w ^ ] = ∂ ∂ w ^ [ − y T X w ^ − w ^ T X T y + w ^ T X T X w ^ ] = − ∂ y T X w ^ ∂ w ^ − ∂ w ^ T X T y ∂ w ^ + ∂ w ^ T X T X w ^ ∂ w ^ \begin{aligned} \frac{\partial E_{\hat{\boldsymbol{w}}}}{\partial \hat{\boldsymbol{w}}} &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})^{\mathrm{T}}(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})\right] \\ &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[\left(\boldsymbol{y}^{\mathrm{T}}-\hat{\boldsymbol{w}}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}}\right)(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})\right] \\ &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[\boldsymbol{y}^{\mathrm{T}} \boldsymbol{y}-\boldsymbol{y}^{\mathrm{T}} \mathbf{X} \hat{\boldsymbol{w}}-\hat{\boldsymbol{w}}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \boldsymbol{y}+\hat{\boldsymbol{w}}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X} \hat{\boldsymbol{w}}\right] \\ &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[-\boldsymbol{y}^{\mathrm{T}} \mathbf{X} \hat{\boldsymbol{w}}-\hat{\boldsymbol{w}}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \boldsymbol{y}+\hat{\boldsymbol{w}}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X} \hat{\boldsymbol{w}}\right] \\ &=-\frac{\partial \boldsymbol{y}^{\mathrm{T}} \mathbf{X} \hat{\boldsymbol{w}}}{\partial \hat{\boldsymbol{w}}}-\frac{\partial \hat{\boldsymbol{w}}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \boldsymbol{y}}{\partial \hat{\boldsymbol{w}}}+\frac{\partial \hat{\boldsymbol{w}}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X} \hat{\boldsymbol{w}}}{\partial \hat{\boldsymbol{w}}} \end{aligned} w^Ew^=w^[(yXw^)T(yXw^)]=w^[(yTw^TXT)(yXw^)]=w^[yTyyTXw^w^TXTy+w^TXTXw^]=w^[yTXw^w^TXTy+w^TXTXw^]=w^yTXw^w^w^TXTy+w^w^TXTXw^
 根据矩阵微分公式  ∂ x T a ∂ x = ∂ a T x ∂ x = a , ∂ x T A x ∂ x = ( A + A T ) x  可得:  ∂ E w ^ ∂ w ^ = − X T y − X T y + ( X T X + X T X ) w ^ = 2 X T ( X w ^ − y )   \begin{aligned} &\text { 根据矩阵微分公式 } \frac{\partial \boldsymbol{x}^{\mathrm{T}} \boldsymbol{a}}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^{\mathrm{T}} \boldsymbol{x}}{\partial \boldsymbol{x}}=\boldsymbol{a}, \frac{\partial \boldsymbol{x}^{\mathrm{T}} \mathbf{A} \boldsymbol{x}}{\partial \boldsymbol{x}}=\left(\mathbf{A}+\mathbf{A}^{\mathrm{T}}\right) \boldsymbol{x} \text { 可得: }\\ &\begin{aligned} \frac{\partial E_{\hat{\boldsymbol{w}}}}{\partial \hat{\boldsymbol{w}}} &=-\mathbf{X}^{T} \boldsymbol{y}-\mathbf{X}^{T} \boldsymbol{y}+\left(\mathbf{X}^{T} \mathbf{X}+\mathbf{X}^{T} \mathbf{X}\right) \hat{\boldsymbol{w}} \\ &=2 \mathbf{X}^{\mathrm{T}}(\mathbf{X} \hat{\boldsymbol{w}}-\boldsymbol{y}) \text { } \end{aligned} \end{aligned}  根据矩阵微分公式 xxTa=xaTx=a,xxTAx=(A+AT)x 可得: w^Ew^=XTyXTy+(XTX+XTX)w^=2XT(Xw^y) 
令其为0:
∂ E w ^ ∂ w ^ = 2 X T ( X w ^ − y ) = 0 2 X T X w ^ − 2 X T y = 0 2 X T X w ^ = 2 X T y w ^ = ( X T X ) − 1 X T y \begin{aligned} &\frac{\partial E_{\hat{\boldsymbol{w}}}}{\partial \hat{\boldsymbol{w}}}=2 \mathbf{X}^{\mathrm{T}}(\mathbf{X} \hat{\boldsymbol{w}}-\boldsymbol{y})=0 \\ &2 \mathbf{X}^{\mathrm{T}} \mathbf{X} \hat{\boldsymbol{w}}-2 \mathbf{X}^{\mathrm{T}} \boldsymbol{y}=0 \\ &2 \mathbf{X}^{\mathrm{T}} \mathbf{X} \hat{\boldsymbol{w}}=2 \mathbf{X}^{\mathrm{T}} \boldsymbol{y} \\ &\hat{\boldsymbol{w}}=\left(\mathbf{X}^{\mathrm{T}} \mathbf{X}\right)^{-1} \mathbf{X}^{\mathrm{T}} \boldsymbol{y} \end{aligned} w^Ew^=2XT(Xw^y)=02XTXw^2XTy=02XTXw^=2XTyw^=(XTX)1XTy
所以,最终得模型为:
f ( x ^ i ) = x ^ i T ( X T X ) − 1 X T y f\left(\hat{\boldsymbol{x}}_{i}\right)=\hat{\boldsymbol{x}}_{i}^{\mathrm{T}}\left(\mathbf{X}^{\mathrm{T}} \mathbf{X}\right)^{-1} \mathbf{X}^{\mathrm{T}} \boldsymbol{y} f(x^i)=x^iT(XTX)1XTy

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

up-to-star

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值