算法描述:
Linear Regression Algorithm |
---|
1.输入特征集X和标签集y: X=[−−x1T−−−−x2T−−⋯−−xNT−−]N×(d+1)X = {\left[ {\begin{array}{} { - - x_1^T - - }\\ { - - x_2^T - - }\\ \cdots \\ { - - x_N^T - - } \end{array}} \right]_{N \times (d + 1)}}X=⎣⎢⎢⎡−−x1T−−−−x2T−−⋯−−xNT−−⎦⎥⎥⎤N×(d+1) y=[y1y2⋯yN]N×1y = {\left[ {\begin{array}{} {{y_1}}\\ {{y_2}}\\ \cdots \\ {{y_N}} \end{array}} \right]_{N \times 1}}y=⎣⎢⎢⎡y1y2⋯yN⎦⎥⎥⎤N×1 2.计算伪逆(pseudo-inverse): x†=(xTx)−1xT((d+1)×N){x^\dag } = {({x^T}x)^{ - 1}}{x^T}((d + 1) \times N)x†=(xTx)−1xT((d+1)×N) 3.返回 ωLIN=x†y((d+1)×1){\omega _{LIN}} = {x^\dag }y((d + 1) \times 1)ωLIN=x†y((d+1)×1) |
计算过程:
Ein(ω)=1N∑n=1N(ωTxn−yn)2=1N∑n=1N(xnTω−yn)2(内积可交换)=1N∥x1Tω−y1x2Tω−y2⋯xNTω−yN∥2=∥[−−x1T−−−−x2T−−⋯−−xNT−−]ω−[y1y2⋯yN]∥2=1N∥X⎵N×(d+1)ω⎵(d+1)×1−y⎵N×1∥2    \begin{array}{l} {E_{in}}(\omega ) = \frac{1}{N}{\sum\limits_{n = 1}^N {({\omega ^T}{x_n} - {y_n})} ^2} = \frac{1}{N}{\sum\limits_{n = 1}^N {(x_n^T\omega - {y_n})} ^2}(内积可交换) \\ \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \frac{1}{N}{\left\| {\begin{array}{} {x_1^T\omega - {y_1}}\\ {x_2^T\omega - {y_2}}\\ \cdots \\ {x_N^T\omega - {y_N}} \end{array}} \right\|^2} = {\left\| {\left[ {\begin{array}{} { - - x_1^T - - }\\ { - - x_2^T - - }\\ \cdots \\ { - - x_N^T - - } \end{array}} \right]\omega - \left[ {\begin{array}{} {{y_1}}\\ {{y_2}}\\ \cdots \\ {{y_N}} \end{array}} \right]} \right\|^2}\\ \\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} = \frac{1}{N}{\left\| {\underbrace X_{N \times (d + 1)}\underbrace \omega _{(d + 1) \times 1} - \underbrace y_{N \times 1}} \right\|^2} \\ \;\; \end{array}{}Ein(ω)=N1n=1∑N(ωTxn−yn)2=N1n=1∑N(xnTω−yn)2(内积可交换)=N1∥∥∥∥∥∥∥∥x1Tω−y1x2Tω−y2⋯xNTω−yN∥∥∥∥∥∥∥∥2=∥∥∥∥∥∥∥∥⎣⎢⎢⎡−−x1T−−−−x2T−−⋯−−xNT−−⎦⎥⎥⎤ω−⎣⎢⎢⎡y1y2⋯yN⎦⎥⎥⎤∥∥∥∥∥∥∥∥2=N1∥∥∥∥∥∥N×(d+1)X(d+1)×1ω−N×1y∥∥∥∥∥∥2
这里的误差用 平方误差(Squared Error) 来衡量,即,
err(y^,y)=(y^−y)2err(\hat y,y) = {(\hat y - y)^2}err(y^,y)=(y^−y)2
在线性回归里就是:
minωEin(ω)=1N∥xω−y∥2\mathop {\min }\limits_\omega {E_{in}}(\omega ) = \frac{1}{N}{\left\| {x\omega - y} \right\|^2}ωminEin(ω)=N1∥xω−y∥2
当函数满足continuous,differentiable,convex时,其最小值点在其梯度为零的地方,

Ein(ω)=1N∥xω−y∥2=1N(ωTxTx⎵Aω−2ωTxTy⎵b+yTy⎵c)∇Ein(ω)=1N(2Aω−2b)∇Ein(ω)=2N(xTxω−xTy)=0ωLIN=(xTx)−1xT⎵x†yωLIN=x†y\begin{array}{l} {E_{in}}(\omega ) = \frac{1}{N}{\left\| {x\omega - y} \right\|^2} = \frac{1}{N}({\omega ^T}\underbrace {{x^T}x}_A\omega - 2{\omega ^T}\underbrace {{x^T}y}_b + \underbrace {{y^T}y}_c)\\ \nabla {E_{in}}(\omega ) = \frac{1}{N}(2A\omega - 2b)\\ \\ \nabla {E_{in}}(\omega ) = \frac{2}{N}({x^T}x\omega - {x^T}y) = 0\\ \\ {\omega _{LIN}} = \underbrace {{{({x^T}x)}^{ - 1}}{x^T}}_{x\dag }y\\ {\omega _{LIN}} = {x^\dag }y \end{array}Ein(ω)=N1∥xω−y∥2=N1(ωTAxTxω−2ωTbxTy+cyTy)∇Ein(ω)=N1(2Aω−2b)∇Ein(ω)=N2(xTxω−xTy)=0ωLIN=x†(xTx)−1xTyωLIN=x†y
用伪逆表示。这里的xTx{x^T}xxTx常常是可逆的,由于这里的N>>d+1N > > d + 1N>>d+1,样本数通常是远大于特征数的 ;但也有少数情况不可逆,所以用伪逆表示。