1. 线性回归
成本函数:最小二乘
J(θ)=12∑i=1m(θTx(i)−y(i)))2
J(θ)=\frac{1}{2}\sum^m_{i=1}(θ^Tx^{(i)}-y^{(i)}))^2
J(θ)=21i=1∑m(θTx(i)−y(i)))2
利用梯度下降法:
θj=θj−α∂J(θ)∂θj
θ_j=θ_j-α\frac{\partial J(θ)}{\partial θ_j}
θj=θj−α∂θj∂J(θ)
则我们需要求导:
- 直接对元素求导
∂J(θ)∂θj=∑i=1m∂∂θj12(θTx(i)−y(i)))2=∑i=1m{(θTx(i)−y(i))∂∂θj(θTx(i)−y(i))}=∑i=1m{(θTx(i)−y(i))xj(i)} \begin{aligned} \frac{\partial J(θ)}{\partial θ_j} &=\sum^m_{i=1}\frac{\partial }{\partial θ_j}\frac{1}{2}(θ^Tx^{(i)}-y^{(i)}))^2\\ &=\sum^m_{i=1}\{(θ^Tx^{(i)}-y^{(i)})\frac{\partial }{\partial θ_j}(θ^Tx^{(i)}-y^{(i)})\}\\ &=\sum^m_{i=1}\{(θ^Tx^{(i)}-y^{(i)})x^{(i)}_j\} \end{aligned} ∂θj∂J(θ)=i=1∑m∂θj∂21(θTx(i)−y(i)))2=i=1∑m{(θTx(i)−y(i))∂θj∂(θTx(i)−y(i))}=i=1∑m{(θTx(i)−y(i))xj(i)}
-
转化为矩阵求导:
令:
X=[—(x(1))T——(x(2))T—⋮—(x(m))T—],θ=[θ0θ1⋮θn],y=[y(1)y(2)⋮y(m)] X=\left[ \begin{matrix} —(x^{(1)})^T—\\ —(x^{(2)})^T—\\ \vdots\\ —(x^{(m)})^T— \end{matrix} \right] ,θ=\left[ \begin{matrix} θ_0\\ θ_1\\ \vdots\\ θ_n \end{matrix} \right], y=\left[ \begin{matrix} y^{(1)}\\ y^{(2)}\\ \vdots\\ y^{(m)} \end{matrix} \right] X=⎣⎢⎢⎢⎡—(x(1))T——(x(2))T—⋮—(x(m))T—⎦⎥⎥⎥⎤,θ=⎣⎢⎢⎢⎡θ0θ1⋮θn⎦⎥⎥⎥⎤,y=⎣⎢⎢⎢⎡y(1)y(2)⋮y(m)⎦⎥⎥⎥⎤
则我们可以知道:
hθ(x(i))=(x(i))Tθ h_{θ}(x^{(i)}{})=(x^{(i)})^Tθ hθ(x(i))=(x(i))Tθ
所以:
Xθ−y=[(x(1))Tθ−y(1)(x(2))Tθ−y(2)⋮(x(m))Tθ−y(m)] Xθ-y=\left[ \begin{matrix} (x^{(1)})^Tθ-y^{(1)}\\ (x^{(2)})^Tθ-y^{(2)}\\ \vdots\\ (x^{(m)})^Tθ-y^{(m)} \end{matrix} \right] Xθ−y=⎣⎢⎢⎢⎡(x(1))Tθ−y(1)(x(2))Tθ−y(2)⋮(x(m))Tθ−y(m)⎦⎥⎥⎥⎤
因此我们可以推出:
12(Xθ−y)T(Xθ−y)=12∑i=1m(θTx(i)−y(i)))2=J(θ) \frac{1}{2}(Xθ-y)^T(Xθ-y)=\frac{1}{2}\sum^m_{i=1}(θ^Tx^{(i)}-y^{(i)}))^2=J(θ) 21(Xθ−y)T(Xθ−y)=21i=1∑m(θTx(i)−y(i)))2=J(θ)
令w=(Xθ−y)w=(Xθ-y)w=(Xθ−y),则原式可以写成:J(θ)=12wTwJ(θ)=\frac{1}{2}w^TwJ(θ)=21wTw
d(J(θ))=12d(wT)w+12wTd(w) d(J(θ))=\frac{1}{2}d(w^T)w+\frac{1}{2}w^Td(w) d(J(θ))=21d(wT)w+21wTd(w)
由于J(θ)J(θ)J(θ)是标量,所以:
tr(J(θ))=tr(12d(wT)w+12wTd(w))=J(θ)=tr((∂J(θ)∂w)Td(w)) tr(J(θ))=tr(\frac{1}{2}d(w^T)w+\frac{1}{2}w^Td(w))=J(θ)=tr((\frac{\partial J(θ)}{\partial w})^Td(w)) tr(J(θ))=tr(21d(wT)w+21wTd(w))=J(θ)=tr((∂w∂J(θ))Td(w))
又因为:
tr(12d(wT)w+12wTd(w))=tr(12(d(w))Tw)+tr(12wTd(w))=12tr((wTd(w))T)+12tr(wTd(w))=12tr((wTd(w))+12tr(wTd(w))=tr(wTd(w))=tr((∂J(θ)∂w)Td(w)) \begin{aligned} tr\left(\frac{1}{2}d(w^T)w+\frac{1}{2}w^Td(w)\right)&=tr\left(\frac{1}{2}(d(w))^Tw\right)+tr\left(\frac{1}{2}w^Td(w)\right)\\ &=\frac{1}{2}tr\left((w^Td(w))^T\right)+\frac{1}{2}tr\left(w^Td(w)\right)\\ &=\frac{1}{2}tr\left((w^Td(w)\right)+\frac{1}{2}tr\left(w^Td(w)\right)\\ &=tr(w^Td(w))=tr\left(\left(\frac{\partial J(θ)}{\partial w}\right)^Td(w)\right) \\ \end{aligned} tr(21d(wT)w+21wTd(w))=tr(21(d(w))Tw)+tr(21wTd(w))=21tr((wTd(w))T)+21tr(wTd(w))=21tr((wTd(w))+21tr(wTd(w))=tr(wTd(w))=tr((∂w∂J(θ))Td(w))
于是我们可以得出:
wTd(w)=(∂J(θ)∂w)Td(w) w^Td(w)=\left(\frac{\partial J(θ)}{\partial w}\right)^Td(w)\\ wTd(w)=(∂w∂J(θ))Td(w)
所以:
∂J(θ)∂w=w \frac{\partial J(θ)}{\partial w}=w ∂w∂J(θ)=w
由因为:
d(w)=Xd(θ) d(w)=Xd(θ) d(w)=Xd(θ)
所以
d(J(θ))=tr((∂J(θ)∂w)Td(w))=tr((∂J(θ)∂w)TXd(θ))=tr(wTXd(θ))d(J(θ))=tr((∂J(θ)∂θ)Td(θ)) \begin{aligned} d(J(θ))&=tr\left(\left(\frac{\partial J(θ)}{\partial w}\right)^Td(w)\right)=tr\left(\left(\frac{\partial J(θ)}{\partial w}\right)^TXd(θ)\right)=tr\left(w^TXd(θ)\right)\\ d(J(θ))&=tr\left(\left(\frac{\partial J(θ)}{\partial θ}\right)^Td(θ)\right) \end{aligned} d(J(θ))d(J(θ))=tr((∂w∂J(θ))Td(w))=tr((∂w∂J(θ))TXd(θ))=tr(wTXd(θ))=tr((∂θ∂J(θ))Td(θ))
于是显然有:
wTXd(θ)=(∂J(θ)∂θ)Td(θ) w^TXd(θ)=\left(\frac{\partial J(θ)}{\partial θ}\right)^Td(θ) wTXd(θ)=(∂θ∂J(θ))Td(θ)
所以:
∂J(θ)∂θ=XT(Xθ−y) \frac{\partial J(θ)}{\partial θ}=X^T(Xθ-y) ∂θ∂J(θ)=XT(Xθ−y)