paddle复现pytorch踩坑(七）：softmax_with_cross_entropy的用法

最新推荐文章于 2023-04-17 14:00:00 发布

原创最新推荐文章于 2023-04-17 14:00:00 发布 · 1.9k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #pytorch #paddlepaddle

模型转换论文复现专栏收录该内容

13 篇文章

订阅专栏

博客围绕飞桨和PyTorch的交叉熵损失函数展开。介绍了fluid.layers.softmax_with_cross_entropy的作用，复现问题时发现loss为nan，原因是API对应错误。分析了不同API的差异，还进行了问题拓展，对比了不同方法计算loss的结果，最终建议paddle计算使用softmax_with_cross_entropy。

fluid.layers.softmax_with_cross_entropy

作用：OP实现了softmax交叉熵损失函数。该函数会将softmax操作、交叉熵损失函数的计算过程进行合并，从而提供了数值上更稳定的梯度值。
链接：pp飞桨API说明

复现问题

报错：loss为nan
解决：API对应错误
pytorch采用：F.cross_entropy
原拟采用：fluid.layers.cross_entropy
更改：fluid.layers.softmax_with_cross_entropy

问题分析

F.cross_entropy时，相当于执行以下代码

soft_out = F.softmax(out)
log_soft_out = torch.log(soft_out)
loss = F.nll_loss(log_soft_out, y)

paddle API里cross_entropy不包含softmax，故采用softmax_with_cross_entropy
实际复现的代码

# pytorch code
loss_cls = F.cross_entropy(cls[active, :], labels[active], reduction='none', ignore_index=IGN_FLAG)
    
# paddlepaddle code
index_active = fluid.layers.nonzero(active)
loss_cls = fluid.layers.softmax_with_cross_entropy(
                    fluid.layers.gather(cls, index_active), fluid.layers.reshape(fluid.layers.gather(labels, index_active), [-1, 1]), ignore_index=IGN_FLAG)

问题拓展

理论上讲 fluid.layers.softmax_with_cross_entropy 效果应等同于 fluid.layers.softmax + fluid.layers.cross_entropy。实际测试的时候二者loss的计算后取均值结果不一样，且对比pytorch的方法计算结果不同，下面进行对比：

F.cross_entropy
Min = 0.7678133
Mean = 10.989796
Max = 320.526
fluid.layers.softmax + fluid.layers.cross_entropy
Min = 0.7678134
Mean = 9.627525e+17
Max = 1e+20
fluid.layers.softmax_with_cross_entropy
Min = 0.7678134
Mean = 9.82002
Max = 64.6236
总结：排查了一下，不是其他API计算错误，是三者的差异，实际paddle计算还是应使用softmax_with_cross_entropy。