深度神经网络(Deep Neural Networks)

Forward
输入: 总层数L,所有隐藏层和输出层对应的矩阵𝑊(从2开始),偏倚向量𝑏,输入值向量𝑥
输出:输出层的输出aLa^LaL
- 初始化a1=xa^1=xa1=x
- for l=2 to Lfor\; l=2\;to\; Lforl=2toL:al=σ(zl)=σ(Wlal−1+bl)a^l = \sigma(z^l) = \sigma(W^la^{l-1} + b^l)al=σ(zl)=σ(Wlal−1+bl)
- 最后的结果即为输出aLa^LaL
Back Propagation
J(W,b,x,y)=12∣∣aL−y∣∣22J(W,b,x,y) = \frac{1}{2}||a^L-y||_2^2J(W,b,x,y)=21∣∣aL−y∣∣22δL=∂J(W,b,x,y)∂zL=(aL−y)⊙σ′(zL)\delta^L = \frac{\partial J(W,b,x,y)}{\partial z^L} = (a^L-y)\odot \sigma^{'}(z^L)δL=∂zL∂J(W,b,x,y)=(aL−y)⊙σ′(zL)δl=∂J(W,b,x,y)∂zl=(∂zl+1∂zl)T∂J(W,b,x,y)∂zl+1=(∂zl+1∂zl)Tδl+1\delta^{l} = \frac{\partial J(W,b,x,y)}{\partial z^l} = (\frac{\partial z^{l+1}}{\partial z^{l}})^T\frac{\partial J(W,b,x,y)}{\partial z^{l+1}} =(\frac{\partial z^{l+1}}{\partial z^{l}})^T \delta^{l+1}δl=∂zl∂J(W,b,x,y)=(∂zl∂zl+1)T∂zl+1∂J(W,b,x,y)=(∂zl∂zl+1)Tδl+1zl+1=Wl+1al+bl+1=Wl+1σ(zl)+bl+1z^{l+1}= W^{l+1}a^{l} + b^{l+1} = W^{l+1}\sigma(z^l) + b^{l+1}zl+1=Wl+1al+bl+1=Wl+1σ(zl)+bl+1δl=(∂zl+1∂zl)T∂J(W,b,x,y)∂zl+1=(Wl+1)Tδl+1⊙σ′(zl)\delta^{l} = (\frac{\partial z^{l+1}}{\partial z^{l}})^T\frac{\partial J(W,b,x,y)}{\partial z^{l+1}} =(W^{l+1})^T\delta^{l+1}\odot \sigma^{'}(z^l)δl=(∂zl∂zl+1)T∂zl+1∂J(W,b,x,y)=(Wl+1)Tδl+1⊙σ′(zl)∂J(W,b,x,y)∂Wl=δl(al−1)T\frac{\partial J(W,b,x,y)}{\partial W^l} = \delta^{l}(a^{l-1})^T∂Wl∂J(W,b,x,y)=δl(al−1)T∂J(W,b,x,y)∂bl=δl\frac{\partial J(W,b,x,y)}{\partial b^l} = \delta^{l}∂bl∂J(W,b,x,y)=δl符号⊙代表Hadamard积,矩阵点乘
输入: 总层数L,以及各隐藏层与输出层的神经元个数,激活函数σ,损失函数,迭代步长𝛼,最大迭代次数MAX与停止迭代阈值𝜖,m个训练样本{(x1,y1),(x2,y2),...,(xm,ym)}\{(x_1,y_1), (x_2,y_2), ..., (x_m,y_m)\}{(x1,y1),(x2,y2),...,(xm,ym)}
输出:各隐藏层与输出层的线性关系系数矩阵𝑊和偏倚向量𝑏
- 初始化各隐藏层与输出层的线性关系系数矩阵𝑊和偏倚向量𝑏的值为一个随机值。
- for iter to 1 to maxfor\; iter\; to\; 1\; to\; maxforiterto1tomax: 3-5
- for i=1 to mfor\; i =1\; to\; mfori=1tom:
- DNN输入a1=x1a^1=x^1a1=x1
- for l=2 to Lfor\; l=2\;to\; Lforl=2toL,计算ai,l=σ(zi,l)=σ(Wlai,l−1+bl)a^{i,l} = \sigma(z^{i,l}) = \sigma(W^la^{i,l-1} + b^l)ai,l=σ(zi,l)=σ(Wlai,l−1+bl)
- 通过损失函数计算输出层的δi,L\delta^{i,L}δi,L
- for l=L−1 to 2for\; l=L-1\;to\; 2forl=L−1to2, 进行反向传播算法计算δi,l=(Wl+1)Tδi,l+1⊙σ′(zi,l)\delta^{i,l} = (W^{l+1})^T\delta^{i,l+1}\odot \sigma^{'}(z^{i,l})δi,l=(Wl+1)Tδi,l+1⊙σ′(zi,l)
- for l=2 to Lfor\; l =2\; to\; Lforl=2toL,更新第𝑙层的Wl,blW^l,b^lWl,bl:Wl=Wl−α∑i=1mδi,l(ai,l−1)TW^l = W^l -\alpha \sum\limits_{i=1}^m \delta^{i,l}(a^{i, l-1})^TWl=Wl−αi=1∑mδi,l(ai,l−1)Tbl=bl−α∑i=1mδi,lb^l = b^l -\alpha \sum\limits_{i=1}^m \delta^{i,l}bl=bl−αi=1∑mδi,l
- 如果所有𝑊, 𝑏的变化值都小于停止迭代阈值𝜖,则跳出迭代循环。
- 输出各隐藏层与输出层的线性关系系数矩阵𝑊和偏倚向量𝑏。
1117

被折叠的 条评论
为什么被折叠?



