文章目录
前言
另一个博主有更详细的推导https://blog.youkuaiyun.com/chaipp0607/article/details/101946040
一.交叉熵函数的导数
- softmax:令一条数据最后的输出为[z1,z2,z3,z4,…,z10],这里令输出层的神经元数量为10
p i = e z i ∑ j = 1 10 e z j pi=\frac{e^{z_i} }{\sum_{j=1}^{10} e^{z_j}} pi=∑j=110ezjezi - cross_entropy:
L = − ∑ i = 1 10 y i × l o g ( p i ) L=-\sum_{i=1}^{10} y_i \times log(pi) L=−i=1∑10yi×log(pi) - 链式法则:
∂ L ∂ z i = ∑ j = 1 10 ∂ L ∂ p j ∂ p j ∂ z i \frac{\partial L}{\partial z_i}=\sum_{j=1}^{10}\frac{\partial L}{\partial p_j}\frac{\partial p_j}{\partial z_i} ∂zi∂L=j=1∑10∂pj∂L∂zi∂pj - 逐个击破
- ∂ L ∂ p j = − y j × 1 p j \frac{\partial L}{\partial p_j}=-y_j\times\frac{1}{p_j} ∂pj∂L=−yj×pj1
- ∂ p j ∂ z i \frac{\partial p_j}{\partial z_i} ∂zi