softmax 的公式大家都应该知道
S ( y i ) = e y i ∑ j e y j S\left(y_{i}\right)=\frac{e^{y i}}{\sum_{j} e^{y j}} S(yi)=∑jeyjeyi
正向传播很简单,示意图借用知乎用户@香菜的
其中loss是交叉熵损失函数
C = − ∑ i = 1 m y i log p i C=-\sum_{i=1}^{m} y_{i} \log p_{i} C=−i=1∑myilogpi
方向传播
反向传播则是本文关注的重点:
假设3分类分为[0,1,2], 此时输出的label 为 0,即y0=1,y1=0,y2=0
J = − ( y 0 ∗ l o g p 0 + y 1 ∗ l o g p 1 + y 2 ∗ l o g p 2 ) J = -(y_0*logp_0+y_1*logp_1+y_2*logp_2) J=−(y0∗logp0+y1∗logp1+y2∗logp2)
第一步 对loss求偏导
由于y0=1,y1=0,y2=0
∂ J ∂ p 0 = − 1 p 0 , ∂ J ∂ p 1 = 0 , ∂ J ∂ p 2 = 0 \frac{\partial J}{\partial p_{0}} =- \frac{1}{p0},\frac{\partial J}{\partial p_{1}} =0,\frac{\partial J}{\partial p_{2}} =0 ∂p0∂J=−p01,∂p1∂J=0,∂p2∂J=0
用矩阵来表示则为:
∂ J ∂ p = [ − 1 P 0 0 0 ] \frac{\partial J}{\partial p}=\left[\begin{array}{c}-\frac{1}{P_{0}} \\ 0 \\ 0\end{array}\right] ∂p∂J=⎣⎡−P0100⎦⎤
第二步 对p求偏导(重点)
接下来,计算
∂ p k ∂ s c o r e i = [ ∂ P 0 ∂ S 0 ∂ P 1 ∂ S 0 ∂ P 2 ∂ S 0 ∂ P 0 ∂ S 1 ∂ P 1 ∂ S 1 ∂ P 2 ∂ S 1 ∂ P 0 ∂ S 2 ∂ P 1 ∂ S 2 ∂ P 2 ∂ S 2 ] \frac{\partial p_{k}}{\partial{ score }_{i}} =\left[\begin{array}{lll}\frac{\partial P_{0}}{\partial S_{0}} & \frac{\partial P_{1}}{\partial S_{0}} & \frac{\partial P_{2}}{\partial S_{0}} \\ \frac{\partial P_{0}}{\partial S_{1}} & \frac{\partial P_{1}}{\partial S_{1}} & \frac{\partial P_{2}}{\partial S_{1}} \\ \frac{\partial P_{0}}{\partial S_{2}} & \frac{\partial P_{1}}{\partial S_{2}} & \frac{\partial P_{2}}{\partial S_{2}}\end{array}\right] ∂scorei∂pk=⎣⎢⎡∂S0∂P0∂S1∂P0∂S2∂P0∂S0∂P1∂S1∂P1