-
首先需要了解一下链式法则:
y = g ( x ) z = h ( y ) Δ x → Δ y → Δ z \begin{array}{c}{y=g(x) \quad z=h(y)} \\ {\quad \Delta x \rightarrow \Delta y \rightarrow \Delta z}\end{array} y=g(x)z=h(y)Δx→Δy→Δz如果我们要求Z对X的导数,很明显是: d z d x = d z d y d y d x \frac{d z}{d x}=\frac{d z}{d y} \frac{d y}{d x} dxdz=dydzdxdy
类似的 x = g ( s ) y = h ( s ) z = k ( x , y ) x=g(s) \qquad y=h(s) \qquad z=k(x, y) x=g(s)y=h(s)z=k(x,y)
如果我们要求Z对S的导数,根据数据流的方向可以得到:
d z d s = ∂ z ∂ x d x d s + ∂ z ∂ y d y d s \frac{d z}{d s}=\frac{\partial z}{\partial x} \frac{d x}{d s}+\frac{\partial z}{\partial y} \frac{d y}{d s} dsdz=∂x∂zdsdx+∂y∂zdsdy
假设损失函数为l,当我们用梯度下降法更新参数w和b值时,需要分别对l求w和b的导数, ∂ l ∂ w = ∂ z ∂ w ∂ l ∂ z , \frac{\partial l}{\partial w}=\frac{\partial z}{\partial w} \frac{\partial l}{\partial z}, ∂w∂l=∂w∂z∂z∂l, ∂ z ∂ w \frac{\partial z}{\partial w} ∂w∂z可以根据z的表达式很容易得到,关键是 ∂ l ∂ z \frac{\partial l}{\partial z} ∂z∂l,因为像上图一样,z之后可能连着很多项,所以 ∂ l ∂ z \frac{\partial l}{\partial z} ∂z∂l计算过程可能包含很多步骤。所以整个反向传播可以分成两部分,①向前传播,计算 ∂ z ∂ w ; \frac{\partial z}{\partial w} ; ∂w∂z;②向后传播,计算 ∂ l ∂ z ; \frac{\partial l}{\partial z}; ∂z∂l; -
向前传播:
∂ z / ∂ w 1 = x 1 ∂ z / ∂ w 2 = x 2 , \begin{array}{l}{\partial z / \partial w_{1}= x_{1}} \\ {\partial z / \partial w_{2}= x_{2}}\end{array}, ∂z/∂w1=x1∂z/∂w2=x2,根据向前传播,可以得到下图:
-
向后传播
根据上图,很容易可以得到: ∂ l ∂ z = ∂ a ∂ z ∂ l ∂ a ∂ l ∂ a = ∂ z ′ ∂ a ∂ l ∂ z ′ + ∂ z ′ ′ ∂ a ∂ l ∂ z ′ ′ , a = σ ( z ) , \frac{\partial l}{\partial z}=\frac{\partial a}{\partial z} \frac{\partial l}{\partial a} \quad \frac{\partial l}{\partial a}=\frac{\partial z^{\prime}}{\partial a} \frac{\partial l}{\partial z^{\prime}}+\frac{\partial z^{\prime \prime}}{\partial a} \frac{\partial l}{\partial z^{\prime \prime}},a=\sigma(z), ∂z∂l=∂z∂a∂a∂l∂a∂l=∂a∂z′∂z′∂l+∂a∂z′′∂z′′∂l,a=σ(z),
我们将 ∂ a ∂ z = σ ′ ( z ) , \frac{\partial a}{\partial z}=\sigma^\prime(z), ∂z∂a=σ′(z),又因为 z ′ = a w 3 + ⋯ , z ′ ′ = a w 4 + ⋯   , ∂ z ′ ∂ a = w 3 , ∂ z ′ ′ ∂ a = w 4 , z^{\prime}=a w_{3}+\cdots,z^{\prime \prime}=a w_{4}+\cdots,\frac{\partial z^{\prime}}{\partial a}=w_3,\frac{\partial z^{\prime\prime}}{\partial a}=w_4, z′=aw3+⋯,z′′=aw4+⋯,∂a∂z′=w3,∂a∂z′′=w4,所以 ∂ l ∂ z = σ ′ ( z ) [ w 3 ∂ l ∂ z ′ + w 4 ∂ l ∂ z ′ ′ ] \frac{\partial l}{\partial z}=\sigma^{\prime}(z)\left[w_{3} \frac{\partial l}{\partial z^{\prime}}+w_{4} \frac{\partial l}{\partial z^{\prime \prime}}\right] ∂z∂l=σ′(z)[w3∂z′∂l+w4∂z′′∂l],可以将 ∂ l ∂ z \frac{\partial l}{\partial z} ∂z∂l看成下图,把它当成另一个 neuron,从右往左依次是输入,权重和\激活函数,此时可以把 σ ′ ( z ) \sigma^{\prime}(z) σ′(z)当成激活函数。
假设
y
1
y_1
y1和
y
2
y_2
y2是最后一层的输出,
∂
l
∂
z
′
=
∂
y
1
∂
z
′
∂
l
∂
y
1
∂
l
∂
z
′
′
=
∂
y
2
∂
z
′
′
∂
l
∂
y
2
\frac{\partial l}{\partial z^{\prime}}=\frac{\partial y_{1}}{\partial z^{\prime}} \frac{\partial l}{\partial y_{1}} \quad \frac{\partial l}{\partial z^{\prime \prime}}=\frac{\partial y_{2}}{\partial z^{\prime \prime}} \frac{\partial l}{\partial y_{2}}
∂z′∂l=∂z′∂y1∂y1∂l∂z′′∂l=∂z′′∂y2∂y2∂l,如下图:
如果
y
1
y_1
y1和
y
2
y_2
y2不是最后一层的输出,那就一直正向计算,直到它是最后一层,如下图:
这样从后面可以一直往前传播,计算出所有的
∂
l
∂
z
i
\frac{\partial l}{\partial z_{i}}
∂zi∂l
最后整个反向传播的过程可以整理成下图:
参考自李宏毅公开课(机器学习)