计算图(Computation Graph)
举例:
J(a,b,c)=3(a+bc)  ⟹  {u=bcv=a+uJ=3vJ(a,b,c)=3(a+bc)\implies\begin{cases}
u=bc \\
v=a+u \\
J=3v
\end{cases}J(a,b,c)=3(a+bc)⟹⎩⎪⎨⎪⎧u=bcv=a+uJ=3v
那么这个函数的计算图为:
逻辑回归梯度下降算法(Gradient descent algorithm)
单个训练样本(One training sample):
z=wT+bz=w^T+bz=wT+b
y^=a=σ(z)\hat{y}=a=\sigma(z)y^=a=σ(z)
L(a,y)=−(ylog(a)+(1−y)log(1−a))L(a,y)=-(ylog(a)+(1-y)log(1-a))L(a,y)=−(ylog(a)+(1−y)log(1−a))
计算图(Computaion Graph):
计算导数(Derivative):
dl(a,y)da=−ya+1−y1−a\frac{dl(a,y)}{da}=-\frac{y}{a}+\frac{1-y}{1-a}dadl(a,y)=−ay+1−a1−y
dl(a,y)dz=dlda⋅dadz\frac{dl(a,y)}{dz}=\frac{dl}{da}\cdot\frac{da}{dz}dzdl(a,y)=dadl⋅dzda
=(−ya+1−y1−a)a(1−a)=(-\frac{y}{a}+\frac{1-y}{1-a})a(1-a)=(−ay+1−a1−y)a(1−a)
=a−y=a-y=a−y
dl(a,y)dw1=x1(a−y)\frac{dl(a,y)}{dw_1}=x_1(a-y)dw1dl(a,y)=x1(a−y)
dl(a,y)dw2=x2(a−y)\frac{dl(a,y)}{dw_2}=x_2(a-y)dw2dl(a,y)=x2(a−y)
dl(a,y)db=a−y\frac{dl(a,y)}{db}=a-ydbdl(a,y)=a−y
这实际上是把逻辑回归看作单层的神经网络,用反向传播算法(Back Propagation Algorithm)计算出各个参数的导数,以便下一步用梯度下降算法计算出代价最小的参数。
多个训练样本(m training samples):
J(w,b)=1m∑i=1ml(a(i),y(i))J(w,b)=\frac{1}{m}\sum_{i=1}^{m}l(a^{(i)},y^{(i)})J(w,b)=m1∑i=1ml(a(i),y(i))
a(i)=y^(i)=σ(z(i))=σ(wTx(i)+b)a^{(i)}=\hat{y}^{(i)}=\sigma(z^{(i)})=\sigma(w^Tx^{(i)}+b)a(i)=y^(i)=σ(z(i))=σ(wTx(i)+b)
∂J(w,b)∂w1=1m∑i=1m∂l(a(i),y(i))∂w1\frac{∂J(w,b)}{∂w_1}=\frac{1}{m}\sum_{i=1}^{m}\frac{∂l(a^{(i)},y^{(i)})}{∂w_1}∂w1∂J(w,b)=m1∑i=1m∂w1∂l(a(i),y(i))
∂J(w,b)∂b=1m∑i=1m∂l(a(i),y(i))∂b\frac{∂J(w,b)}{∂b}=\frac{1}{m}\sum_{i=1}^{m}\frac{∂l(a^{(i)},y^{(i)})}{∂b}∂b∂J(w,b)=m1∑i=1m∂b∂l(a(i),y(i))
逻辑回归算法(Logistic regression algorithm)
Repeat{
J=0;dw1=0;dw2=0;db=0J=0;dw_1=0;dw_2=0;db=0J=0;dw1=0;dw2=0;db=0
For i in range(m):
z(i)=wTx(i)+bz^{(i)}=w^Tx^{(i)}+bz(i)=wTx(i)+b
a(i)=σ(z(i))a^{(i)}=\sigma(z^{(i)})a(i)=σ(z(i))
J+=y(i)loga(i)+(1−y(i))log(1−a(i))J+=y^{(i)}loga^{(i)}+(1-y^{(i)})log(1-a^{(i)})J+=y(i)loga(i)+(1−y(i))log(1−a(i))
dz(i)=a(i)−y(i)dz^{(i)}=a^{(i)}-y^{(i)}dz(i)=a(i)−y(i)
dw1(i)+=x1(i)dz(i)dw_1^{(i)}+=x_1^{(i)}dz^{(i)}dw1(i)+=x1(i)dz(i)
dw2(i)+=x2(i)dz(i)dw_2^{(i)}+=x_2^{(i)}dz^{(i)}dw2(i)+=x2(i)dz(i)
db+=dz(i)db +=dz^{(i)}db+=dz(i)
J/=mJ/=mJ/=m
dw1/=mdw_1/=mdw1/=m
dw2/=mdw_2/=mdw2/=m
db/=mdb/=mdb/=m
w1=w1−αdw1w_1=w_1-\alpha dw_1w1=w1−αdw1
w2=w2−αdw2w_2=w_2-\alpha dw_2w2=w2−αdw2
b=w1−αdbb=w_1-\alpha dbb=w1−αdb
}
未完待续…