BP
每个epoch:
\qquad 每个batch:
\qquad\qquad 每个level (n = N, … to 1,即从后往前):
\qquad\qquad\qquad 分别计算出该层误差(对该层参数、该层输入数据)的导数:
\qquad\qquad\qquad\qquad ∂L∂ωn=∂L∂xn+1∂xn+1∂ωn\frac{\partial L}{\partial \omega^{n}} = \frac{\partial L}{\partial x^{n+1}} \frac{\partial x^{n+1}}{\partial \omega^{n}}∂ωn∂L=∂xn+1∂L∂ωn∂xn+1 (更新本level的ωn\omega^{n}ωn时即用)
\qquad\qquad\qquad\qquad ∂L∂xn=∂L∂xn+1∂xn+1∂xn\frac{\partial L}{\partial x^{n}} = \frac{\partial L}{\partial x^{n+1}} \frac{\partial x^{n+1}}{\partial x^{n}}∂xn∂L=∂xn+1∂L∂xn∂xn+1 (留给底一层的level用)
\qquad\qquad\qquad 更新参数:
\qquad\qquad\qquad\qquad ωn←ωn−η∂L∂ωn\omega^{n} \leftarrow \omega^{n} - \eta \frac{\partial L}{\partial \omega^{n}}ωn←ωn−η∂ωn∂L
\qquad\qquad\qquad\qquad bn←bn−η∂L∂bnb^{n} \leftarrow b^{n} - \eta \frac{\partial L}{\partial b^{n}}bn←bn−η∂bn∂L
Arg:
- ω\omegaω:omega(欧米茄)
- η\etaη:eta(艾塔)
Note:
- BP中的 ∂L∂ωn\frac{\partial L}{\partial \omega^{n}}∂ωn∂L 和 ∂L∂xn\frac{\partial L}{\partial x^{n}}∂xn∂L 的计算结果 来源于 对 前馈计算时 的 L=f(wnxn)L = f(w^{n}x^{n})L=f(wnxn) 的求导 。
链式法则
∂L∂ωn=∂L∂xn+1∂xn+1∂ωn=∂L∂xn+2∂xn+2∂xn+1∂xn+1∂ωn\frac{\partial L}{\partial \omega^{n}} = \frac{\partial L}{\partial x^{n+1}} \frac{\partial x^{n+1}}{\partial \omega^{n}} = \frac{\partial L}{\partial x^{n+2}} \frac{\partial x^{n+2}}{\partial x^{n+1}} \frac{\partial x^{n+1}}{\partial \omega^{n}}∂ωn∂L=∂xn+1∂L∂ωn∂xn+1=∂xn+2∂L∂xn+1∂xn+2∂ωn∂xn+1
bp机制导致每隔一层,∂L∂ωi\frac{\partial L}{\partial \omega^{i}}∂ωi∂L 指数级下降。