做ex_2的时候,碰到一个求梯度公式,在此手推一波. 如下:
δJ(θ)δθj=1m∑i=1m(hθ(x)(i)−y(i))xj(i)\frac{\delta J(\theta)}{\delta\theta_j}=\frac {1}{m}\sum_{i=1}^m(h_\theta(x)^{(i)}-y^{(i)})x^{(i)}_jδθjδJ(θ)=m1i=1∑m(hθ(x)(i)−y(i))xj(i)
- 前提:cost functionJ(θ)=1m∑i=1m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]J(\theta)=\frac 1m\sum_{i=1}^m[-y^{(i)}log(h_\theta(x^{(i)}))-(1-y^{(i)})log(1-h_\theta(x^{(i)}))]J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))−(1−y(i))log(1−hθ(x(i)))]hypothesis
hθ(x(i))=g(x(i)θ)h_\theta(x^{(i)})=g(x{(i)}\theta)hθ(x(i))=g(x(i)θ)Logistic function
g(x)=11+exp(−x),exp(x)=exg(x)=\frac 1{1+exp(-x)}, exp(x)=e^xg(x)=1+exp(−x)1,exp(x)=ex - 推导过程:
δJ(θ)δθj=−1m(y(i)hθ′(x(i))hθ(x(i))+(1−y(i))−hθ′(x(i))1−hθ(x(i)))\frac{\delta J(\theta)}{\delta\theta_j}=-\frac1m(y^{(i)}\frac{h^{'}_\theta(x^{(i)})}{h_\theta(x^{(i)})}+(1-y^{(i)})\frac{-h^{'}_\theta(x^{(i)})}{1-h_\theta(x^{(i)})})δθjδJ(θ)=−m1(y(i)hθ(x(i))hθ′(x(i))+(1−y(i))1−hθ(x(i))−hθ′(x(i)))
=−1my(i)hθ′(x(i)(1−hθ(x(i)))−(1−y(i))hθ′(x(i))hθ(x(i))hθ(x(i))(1−hθ(x(i)))=-\frac1m\frac{y^{(i)}h^{'}_\theta(x^{(i)}(1-h_\theta(x^{(i)}))-(1-y^{(i)}){h^{'}_\theta(x^{(i)})}{h_\theta(x^{(i)})}}{h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))}=−m1hθ(x(i))(1−hθ(x(i)))y(i)hθ′(x(i)(1−hθ(x(i)))−(1−y(i))hθ′(x(i))hθ(x(i))
=−1m(y(i)−hθ(x(i)))hθ′(x(i))hθ(x(i))(1−hθ(x(i)))...............(1) =-\frac1m\frac{(y^{(i)}-h_\theta(x^{(i)}))h^{'}_\theta(x^{(i)})}{h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))}...............(1)=−m1hθ(x(i))(1−hθ(x(i)))(y(i)−hθ(x(i)))hθ′(x(i))...............(1)
设H(x)=x(i)θ,则hθ′(x(i))=−eH(x)(1+e−H(x))2H′(x).......(2) 设H(x)=x{(i)}\theta, 则h^{'}_\theta(x^{(i)})=-\frac{e^H(x)}{(1+e^-H(x))^2}H^{'}(x).......(2)设H(x)=x(i)θ,则hθ′(x(i))=−(1+e−H(x))2eH(x)H′(x).......(2)
hθ(x(i))(1−hθ(x(i)))=eH(x)(1+e−H(x))2..........(3)h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))= \frac{e^H(x)}{(1+e^-H(x))^2}..........(3)hθ(x(i))(1−hθ(x(i)))=(1+e−H(x))2eH(x)..........(3)
H′(x)=θj(i)................(4) H^{'}(x)=\theta_j^{(i)}................(4)H′(x)=θj(i)................(4)
将(2)(3)(4)代入(1)得δJ(θ)δθj=1mxj(i)(hθ(x)(i)−y(i))xj(i) 将(2)(3)(4)代入(1)得\frac{\delta J(\theta)}{\delta\theta_j}=\frac {1}{m}x^{(i)}_j(h_\theta(x)^{(i)}-y^{(i)})x^{(i)}_j将(2)(3)(4)代入(1)得δθjδJ(θ)=m1xj(i)(hθ(x)(i)−y(i))xj(i)