利用正规方程求解出使得代价函数最小的参数
θ
=
(
X
T
X
)
−
1
X
T
y
\theta=(X^TX)^{-1}X^Ty
θ=(XTX)−1XTy
有两种推导方法
1.矩阵求导
已知代价函数为:
J
(
θ
)
=
1
2
(
X
θ
−
y
)
2
=
1
2
(
X
θ
−
y
)
T
(
X
θ
−
y
)
=
1
2
(
θ
T
X
T
−
y
T
)
(
X
θ
−
y
)
=
1
2
(
θ
T
X
T
X
θ
−
θ
T
X
T
y
−
X
θ
y
T
+
y
T
y
)
\begin{aligned} J(\theta)&=\frac{1}{2}(X\theta-y)^2\\ &=\frac{1}{2}(X\theta-y)^T(X\theta-y)\\ &=\frac{1}{2}(\theta^TX^T-y^T)(X\theta-y)\\ &=\frac{1}{2}(\theta^TX^TX\theta-\theta^TX^Ty-X\theta y^T+y^Ty) \end{aligned}
J(θ)=21(Xθ−y)2=21(Xθ−y)T(Xθ−y)=21(θTXT−yT)(Xθ−y)=21(θTXTXθ−θTXTy−XθyT+yTy)
主要用到的矩阵求导公式:
∂
(
A
B
)
∂
B
=
A
T
∂
(
A
B
T
)
∂
B
=
A
∂
(
X
T
A
B
)
∂
B
=
2
A
X
\frac{\partial (AB)}{\partial B}=A^T\\ \frac{\partial (AB^T)}{\partial B}=A\\ \frac{\partial (X^TAB)}{\partial B}=2AX
∂B∂(AB)=AT∂B∂(ABT)=A∂B∂(XTAB)=2AX
使
J
(
θ
)
J(\theta)
J(θ)对
θ
\theta
θ求导等于0
则有:
∂
J
(
θ
)
∂
θ
=
1
2
(
∂
∂
θ
(
θ
T
X
T
X
θ
)
−
∂
∂
θ
(
θ
T
X
T
y
)
−
∂
∂
θ
(
y
T
X
θ
)
+
∂
∂
θ
(
y
T
y
)
)
=
1
2
(
2
X
T
X
θ
−
X
T
y
−
X
T
y
)
=
X
T
X
θ
−
X
T
y
=
0
\begin{aligned} \frac{\partial J(\theta)}{\partial\theta}&=\frac{1}{2}\left(\frac{\partial}{\partial\theta}(\theta^TX^TX\theta)-\frac{\partial}{\partial\theta}(\theta^TX^Ty)-\frac{\partial}{\partial\theta}(y^TX\theta)+\frac{\partial}{\partial\theta}(y^Ty)\right)\\ &=\frac{1}{2}(2X^TX\theta-X^Ty-X^Ty)\\ &=X^TX\theta-X^Ty\\ &=0 \end{aligned}
∂θ∂J(θ)=21(∂θ∂(θTXTXθ)−∂θ∂(θTXTy)−∂θ∂(yTXθ)+∂θ∂(yTy))=21(2XTXθ−XTy−XTy)=XTXθ−XTy=0
X
T
X
θ
−
X
T
y
=
0
X
T
X
θ
=
X
T
y
X^TX\theta-X^Ty=0\\ X^TX\theta=X^Ty
XTXθ−XTy=0XTXθ=XTy
两侧乘以
(
X
T
X
)
−
1
(X^TX)^{-1}
(XTX)−1便得出
θ
=
(
X
T
X
)
−
1
X
T
y
\theta=(X^TX)^{-1}X^Ty
θ=(XTX)−1XTy
更多矩阵公式参见Matrix Cookbook,非常全面的一本参考手册:
http://www2.imm.dtu.dk/pubdb/edoc/imm3274.pdf
2.变形
见另一篇文章Normal equation公式推导