首先是代价函数J(θ)J(\theta)J(θ)的定义为
J(θ)=12m∑i=1m(θ0x0(i)+θ1x1(i)+⋯+θnxn(i)−y(i))2
J(\theta)=\frac{1}{2m}\sum_{i=1}^m(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+\cdots+\theta_nx_n^{(i)}-y^{(i)})^2
J(θ)=2m1i=1∑m(θ0x0(i)+θ1x1(i)+⋯+θnxn(i)−y(i))2
对任意参数θj\theta_jθj求偏导,得
∂J(θ)∂θj=1m∑i=1mxj(i)(θ0x0(i)+θ1x1(i)+⋯+θnxn(i)−y(i))
\frac{\partial J(\theta)}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^m x_j^{(i)}(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+\cdots+\theta_nx_n^{(i)}-y^{(i)})
∂θj∂J(θ)=m1i=1∑mxj(i)(θ0x0(i)+θ1x1(i)+⋯+θnxn(i)−y(i))
要求极值,则偏导等于0,有
1m∑i=1mxj(i)(θ0x0(i)+θ1x1(i)+⋯+θnxn(i)−y(i))=0
\frac{1}{m}\sum_{i=1}^m x_j^{(i)}(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+\cdots+\theta_nx_n^{(i)}-y^{(i)})=0
m1i=1∑mxj(i)(θ0x0(i)+θ1x1(i)+⋯+θnxn(i)−y(i))=0
尝试把求和号打开,化成矩阵形式
[xj(1)xj(2)⋯xj(m)][θ0x0(1)+θ1x1(1)+⋯+θnxn(1)−y(1)θ0x0(2)+θ1x1(2)+⋯+θnxn(2)−y(2)⋮θ0x0(m)+θ1x1(m)+⋯+θnxn(m)−y(m)]=0
\left[\begin{matrix}
x_j^{(1)}&x_j^{(2)}&\cdots&x_j^{(m)}
\end{matrix}\right]
\left[\begin{matrix}
\theta_0x_0^{(1)}+\theta_1x_1^{(1)}+\cdots+\theta_nx_n^{(1)}-y^{(1)}\\
\theta_0x_0^{(2)}+\theta_1x_1^{(2)}+\cdots+\theta_nx_n^{(2)}-y^{(2)}\\
\vdots \\
\theta_0x_0^{(m)}+\theta_1x_1^{(m)}+\cdots+\theta_nx_n^{(m)}-y^{(m)}\\
\end{matrix}\right]
=0
[xj(1)xj(2)⋯xj(m)]⎣⎢⎢⎢⎢⎡θ0x0(1)+θ1x1(1)+⋯+θnxn(1)−y(1)θ0x0(2)+θ1x1(2)+⋯+θnxn(2)−y(2)⋮θ0x0(m)+θ1x1(m)+⋯+θnxn(m)−y(m)⎦⎥⎥⎥⎥⎤=0
上面的第二个矩阵还可以再次展开为
[θ0x0(1)+θ1x1(1)+⋯+θnxn(1)−y(1)θ0x0(2)+θ1x1(2)+⋯+θnxn(2)−y(2)⋮θ0x0(m)+θ1x1(m)+⋯+θnxn(m)−y(m)]=[x0(1)x1(1)⋯xn(1)x0(2)x1(2)⋯xn(2)⋮⋮⋱⋮x0(m)x1(m)⋯xn(m)][θ0θ1⋮θn]−[y1y2⋮ym]
\left[\begin{matrix}
\theta_0x_0^{(1)}+\theta_1x_1^{(1)}+\cdots+\theta_nx_n^{(1)}-y^{(1)}\\
\theta_0x_0^{(2)}+\theta_1x_1^{(2)}+\cdots+\theta_nx_n^{(2)}-y^{(2)}\\
\vdots \\
\theta_0x_0^{(m)}+\theta_1x_1^{(m)}+\cdots+\theta_nx_n^{(m)}-y^{(m)}\\
\end{matrix}\right]=
\left[\begin{matrix}
x_0^{(1)}&x_1^{(1)}&\cdots&x_n^{(1)}\\
x_0^{(2)}&x_1^{(2)}&\cdots&x_n^{(2)}\\
\vdots & \vdots & \ddots &\vdots\\
x_0^{(m)}&x_1^{(m)}&\cdots&x_n^{(m)}\\
\end{matrix}\right]
\left[\begin{matrix}
\theta_0\\
\theta_1\\
\vdots \\
\theta_n\\
\end{matrix}\right]-
\left[\begin{matrix}
y_1\\
y_2\\
\vdots \\
y_m\\
\end{matrix}\right]
⎣⎢⎢⎢⎢⎡θ0x0(1)+θ1x1(1)+⋯+θnxn(1)−y(1)θ0x0(2)+θ1x1(2)+⋯+θnxn(2)−y(2)⋮θ0x0(m)+θ1x1(m)+⋯+θnxn(m)−y(m)⎦⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎡x0(1)x0(2)⋮x0(m)x1(1)x1(2)⋮x1(m)⋯⋯⋱⋯xn(1)xn(2)⋮xn(m)⎦⎥⎥⎥⎥⎤⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤−⎣⎢⎢⎢⎡y1y2⋮ym⎦⎥⎥⎥⎤
所以,对于任一参数θj\theta_jθj,都有
[xj(1)xj(2)⋯xj(m)]([x0(1)x1(1)⋯xn(1)x0(2)x1(2)⋯xn(2)⋮⋮⋱⋮x0(m)x1(m)⋯xn(m)][θ0θ1⋮θn]−[y1y2⋮ym])=0
\left[\begin{matrix}
x_j^{(1)}&x_j^{(2)}&\cdots&x_j^{(m)}
\end{matrix}\right]
\left(
\left[\begin{matrix}
x_0^{(1)}&x_1^{(1)}&\cdots&x_n^{(1)}\\
x_0^{(2)}&x_1^{(2)}&\cdots&x_n^{(2)}\\
\vdots & \vdots & \ddots &\vdots\\
x_0^{(m)}&x_1^{(m)}&\cdots&x_n^{(m)}\\
\end{matrix}\right]
\left[\begin{matrix}
\theta_0\\
\theta_1\\
\vdots \\
\theta_n\\
\end{matrix}\right]-
\left[\begin{matrix}
y_1\\
y_2\\
\vdots \\
y_m\\
\end{matrix}\right]
\right)=0
[xj(1)xj(2)⋯xj(m)]⎝⎜⎜⎜⎜⎛⎣⎢⎢⎢⎢⎡x0(1)x0(2)⋮x0(m)x1(1)x1(2)⋮x1(m)⋯⋯⋱⋯xn(1)xn(2)⋮xn(m)⎦⎥⎥⎥⎥⎤⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤−⎣⎢⎢⎢⎡y1y2⋮ym⎦⎥⎥⎥⎤⎠⎟⎟⎟⎟⎞=0
将所有参数θ\thetaθ得到的方程全部联立,就可以得到
[x0(1)x0(2)⋯x0(m)x1(1)x1(2)⋯x1(m)⋮⋮⋱⋮xn(1)xn(2)⋯xn(m)]([x0(1)x1(1)⋯xn(1)x0(2)x1(2)⋯xn(2)⋮⋮⋱⋮x0(m)x1(m)⋯xn(m)][θ0θ1⋮θn]−[y1y2⋮ym])=0
\left[\begin{matrix}
x_0^{(1)}&x_0^{(2)}&\cdots&x_0^{(m)}\\
x_1^{(1)}&x_1^{(2)}&\cdots&x_1^{(m)}\\
\vdots & \vdots & \ddots &\vdots\\
x_n^{(1)}&x_n^{(2)}&\cdots&x_n^{(m)}\\
\end{matrix}\right]
\left(
\left[\begin{matrix}
x_0^{(1)}&x_1^{(1)}&\cdots&x_n^{(1)}\\
x_0^{(2)}&x_1^{(2)}&\cdots&x_n^{(2)}\\
\vdots & \vdots & \ddots &\vdots\\
x_0^{(m)}&x_1^{(m)}&\cdots&x_n^{(m)}\\
\end{matrix}\right]
\left[\begin{matrix}
\theta_0\\
\theta_1\\
\vdots \\
\theta_n\\
\end{matrix}\right]-
\left[\begin{matrix}
y_1\\
y_2\\
\vdots \\
y_m\\
\end{matrix}\right]
\right)=0
⎣⎢⎢⎢⎢⎡x0(1)x1(1)⋮xn(1)x0(2)x1(2)⋮xn(2)⋯⋯⋱⋯x0(m)x1(m)⋮xn(m)⎦⎥⎥⎥⎥⎤⎝⎜⎜⎜⎜⎛⎣⎢⎢⎢⎢⎡x0(1)x0(2)⋮x0(m)x1(1)x1(2)⋮x1(m)⋯⋯⋱⋯xn(1)xn(2)⋮xn(m)⎦⎥⎥⎥⎥⎤⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤−⎣⎢⎢⎢⎡y1y2⋮ym⎦⎥⎥⎥⎤⎠⎟⎟⎟⎟⎞=0
令
X=[x0(1)x1(1)⋯xn(1)x0(2)x1(2)⋯xn(2)⋮⋮⋱⋮x0(m)x1(m)⋯xn(m)],Θ=[θ0θ1⋮θn],Y=[y1y2⋮ym]
X=
\left[\begin{matrix}
x_0^{(1)}&x_1^{(1)}&\cdots&x_n^{(1)}\\
x_0^{(2)}&x_1^{(2)}&\cdots&x_n^{(2)}\\
\vdots & \vdots & \ddots &\vdots\\
x_0^{(m)}&x_1^{(m)}&\cdots&x_n^{(m)}\\
\end{matrix}\right]
,\Theta=\left[\begin{matrix}
\theta_0\\
\theta_1\\
\vdots \\
\theta_n\\
\end{matrix}\right],
Y=\left[\begin{matrix}
y_1\\
y_2\\
\vdots \\
y_m\\
\end{matrix}\right]
X=⎣⎢⎢⎢⎢⎡x0(1)x0(2)⋮x0(m)x1(1)x1(2)⋮x1(m)⋯⋯⋱⋯xn(1)xn(2)⋮xn(m)⎦⎥⎥⎥⎥⎤,Θ=⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤,Y=⎣⎢⎢⎢⎡y1y2⋮ym⎦⎥⎥⎥⎤
则上述方程可写为
XT(XΘ−Y)=0
X^T(X\Theta-Y)=0
XT(XΘ−Y)=0
展开,再化简
XTXΘ−XTY=0XTXΘ=XTYΘ=(XTX)−1XTY
X^TX\Theta-X^TY=0\\
X^TX\Theta=X^TY\\
\Theta=(X^TX)^{-1}X^TY
XTXΘ−XTY=0XTXΘ=XTYΘ=(XTX)−1XTY
就可以得到Normal Equation
Θ=(XTX)−1XTY
\Theta=(X^TX)^{-1}X^TY
Θ=(XTX)−1XTY
直接使用这个式子求参数的话,可以避免多次迭代,而且求解出来的结果更加严格。缺点是对矩阵XTXX^TXXTX求逆的时间复杂度是O(n3)O(n^3)O(n3)的,当参数较多(n≥105n\ge 10^5n≥105)时基本上就宣告破产了,应换用梯度下降。