作业一:推导交叉熵 loss 下的 Softmax 梯度
Created: March 18, 2022 1:19 PM
L=−logesk∑jesjL = - \log \frac{e^{s_k}}{\sum_j e^{s_j}}L=−log∑jesjesk,求 ∂L∂esi\frac{\partial L}{\partial e^{s_i}}∂esi∂L 。
定义 pi=esi∑jesjp_i = \frac{e^{s_i}}{\sum_j e^{s_j}}pi=∑jesjesi
-
i=ki = ki=k:
∂L∂esi=−1pk∂pk∂sk=−1pkesk⋅∑jesj−esk⋅esk(∑jesj)2=−1pkesk∑jesj∑j≠kesj∑jesj=−1pkesk∑jesj(1−esk∑jesj)=−1pkpk(1−pk)=pk−1 \frac{\partial L}{\partial e^{s_i}} =- \frac{1}{p_k} \frac{\partial p_k}{\partial s_k} \\ =- \frac{1}{p_k} \frac{e^{s_k} \cdot \sum_j e^{s_j} - e^{s_k} \cdot e^{s_k}}{(\sum_j e^{s_j})^2} \\ =- \frac{1}{p_k} \frac{e^{s_k}}{\sum_j e^{s_j}} \frac{\sum_{j \neq k}e^{s_j}}{\sum_j e^{s_j}} \\ =- \frac{1}{p_k} \frac{e^{s_k}}{\sum_j e^{s_j}}(1-\frac{e^{s_k}}{\sum_j e^{s_j}}) \\ =- \frac{1}{p_k} p_k (1-p_k) \\ = p_k - 1 ∂esi∂L=−pk1∂sk∂pk=−pk1(∑jesj)2esk⋅∑jesj−esk⋅esk=−pk1∑jesjesk∑jesj∑j=kesj=−pk1∑jesjesk(1−∑jesjesk)=−pk1pk(1−pk)=pk−1
-
i≠ki \neq ki=k
∂L∂esi=−1pi∂pi∂si=−1pk−eskesi(∑jesj)2=1pkesk∑jesjesi∑jesj=1pk⋅pk⋅pi=pi \frac{\partial L}{\partial e^{s_i}} =- \frac{1}{p_i} \frac{\partial p_i}{\partial s_i} \\=- \frac{1}{p_k} \frac{-e^{s_k} e^{s_i}}{(\sum_j e^{s_j})^2} \\=\frac{1}{p_k} \frac{e^{s_k}}{\sum_j e^{s_j}} \frac{e^{s_i}}{\sum_j e^{s_j}} \\=\frac{1}{p_k} \cdot p_k \cdot p_i \\= p_i ∂esi∂L=−pi1∂si∂pi=−pk1(∑jesj)2−eskesi=pk1∑jesjesk∑jesjesi=pk1⋅pk⋅pi=pi -
总结
∂L∂esi={pk−1,i=kpi,i≠k \frac{\partial L}{\partial e^{s_i}} =\begin{cases}p_k - 1 , \quad i = k \\p_i , \quad i \neq k\end{cases} ∂esi∂L={pk−1,i=kpi,i=k
本文详细推导了在交叉熵损失函数下Softmax函数的梯度计算过程。通过数学公式展示了当输入索引等于输出类别时以及不等于输出类别时的梯度变化,并给出了最终的梯度表达式。
862





