假设某个体xx有个特征,即x=(x1,x2,...,xd)x=(x1,x2,...,xd),xixi是第i个特征,线性模型(linear model)试图通过特征的线性组合得到预测值,即
//是否能有个好的例子
只要能求出ww和,便能得到线性模型,该如何求得ww和呢?
假设训练数据集有n个个体,即D={(x1,y1),(x2,y2),...,(xn,yn)}D={(x1,y1),(x2,y2),...,(xn,yn)},xixi代表第ii个个体,代表第ii个个体所对应的真实值。
一.
直观来说,我们可以有两种方案:
1)|f(xi)−yi|1)|f(xi)−yi|
2)(f(xi)−yi)22)(f(xi)−yi)2
方案2便是高斯的最小二乘法(least square method)。我们把所有个体的预测值和真实值之间的差异加总:
解出:
二、f(x)=wTx+bf(x)=wTx+b
接着我们由浅入深,对于多元回归模型f(x)=wTx+bf(x)=wTx+b,我们仍是希望让预测值和真实值的差异最小。同样,我们仍然选取最小二乘法来衡量预测值和真实值之间的差异:
g(w,b)=(wTx1+b−y1)2+(wTx2+b−y2)2+...+(wTxn+b−yn)2g(w,b)=(wTx1+b−y1)2+(wTx2+b−y2)2+...+(wTxn+b−yn)2
g(w,b)=[(wTx1+b−y1),(wTx2+b−y2),...,(wTxn+b−yn)]⎡⎣⎢⎢⎢⎢(wTx1+b−y1)(wTx2+b−y2)...(wTxn+b−yn)⎤⎦⎥⎥⎥⎥g(w,b)=[(wTx1+b−y1),(wTx2+b−y2),...,(wTxn+b−yn)][(wTx1+b−y1)(wTx2+b−y2)...(wTxn+b−yn)]
推导:
⎡⎣⎢⎢⎢⎢(wTx1+b−y1)(wTx2+b−y2)...(wTxn+b−yn)⎤⎦⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢(wTx1+b)(wTx2+b)...(wTxn+b)⎤⎦⎥⎥⎥⎥−⎡⎣⎢⎢⎢y1y2...yn⎤⎦⎥⎥⎥=⎡⎣⎢⎢⎢⎢(xT1,1)(w,b)T(xT2,1)(w,b)T...(xTn,1)(w,b)T⎤⎦⎥⎥⎥⎥−⎡⎣⎢⎢⎢y1y2...yn⎤⎦⎥⎥⎥=⎡⎣⎢⎢⎢⎢xT1,1xT2,1...xTn,1⎤⎦⎥⎥⎥⎥(wT,b)−⎡⎣⎢⎢⎢y1y2...yn⎤⎦⎥⎥⎥=X(wT,b)−⎡⎣⎢⎢⎢y1y2...yn⎤⎦⎥⎥⎥=X(wT,b)−Y=Xw~−Y[(wTx1+b−y1)(wTx2+b−y2)...(wTxn+b−yn)]=[(wTx1+b)(wTx2+b)...(wTxn+b)]−[y1y2...yn]=[(x1T,1)(w,b)T(x2T,1)(w,b)T...(xnT,1)(w,b)T]−[y1y2...yn]=[x1T,1x2T,1...xnT,1](wT,b)−[y1y2...yn]=X(wT,b)−[y1y2...yn]=X(wT,b)−Y=Xw~−Y
注:wTxi+b=(xTi,1)[wb]=(xTi,1)(w,b)T;令X=⎡⎣⎢⎢⎢⎢xT1,1xT2,1...xTn,1⎤⎦⎥⎥⎥⎥,Y=⎡⎣⎢⎢⎢y1y2...yn⎤⎦⎥⎥⎥,w~=(wT,b)注:wTxi+b=(xiT,1)[wb]=(xiT,1)(w,b)T;令X=[x1T,1x2T,1...xnT,1],Y=[y1y2...yn],w~=(wT,b)
所以g(w~)=(Xw~−Y)T(Xw~−Y)g(w~)=(Xw~−Y)T(Xw~−Y),同样我们希望g(w~)g(w~)求得最小值,因此我们继续采用偏导数求解:
推导:
g(w~)=(Xw~−Y)T(Xw~−Y)=((Xw~)T−YT)(Xw~−Y)=(Xw~)TXw~−YTXw~−(Xw~)TY+YTY=w~TXTXw~−YTXw~−w~TXTY+YTYg(w~)=(Xw~−Y)T(Xw~−Y)=((Xw~)T−YT)(Xw~−Y)=(Xw~)TXw~−YTXw~−(Xw~)TY+YTY=w~TXTXw~−YTXw~−w~TXTY+YTY
因为dw~w~Tdw~=2w~,dw~Tdw~=Idw~w~Tdw~=2w~,dw~Tdw~=I
所以∂g(w~)∂w~=2XTXw~−2XTY=0,XTXw~=XTY∂g(w~)∂w~=2XTXw~−2XTY=0,XTXw~=XTY
最后的结果是w~=(XTX)−1XTYw~=(XTX)−1XTY