【深度学习】cross-entropy loss是什么？

最新推荐文章于 2025-03-21 21:55:33 发布

lele_ne

最新推荐文章于 2025-03-21 21:55:33 发布

阅读量740

点赞数 26

分类专栏：深度学习文章标签：深度学习人工智能

本文链接：https://blog.youkuaiyun.com/lele_ne/article/details/145974328

版权

深度学习专栏收录该内容

32 篇文章

订阅专栏

CrossEntropyLoss

CrossEntropyLoss由两部分构成，一部分是Softmax，另一部分是log likelihood

Softmax

现有一个神经网络模型可以预测数字3和7, 网络的最后一层输出是两个activation，分别代表数字3和7.
假设我们有6张图片，每张图片会有2个相应的activations。

from fastai.vision.all import *
torch.random.manual_seed(42);
acts = torch.randn((6,2))*2
acts

在这里插入图片描述
sigmoid方法的作用是，将数值变成0-1之间。

acts.sigmoid()

在这里插入图片描述
现在每一行的两个数值分别代表着，一张图片是数字3和数字7的置信度。

我们求数字3相对于数字7有多大的把握

(acts[:,0]-acts[:,1]).sigmoid()

在这里插入图片描述这是acts第一列的值，再求其第二列的值，我们只需要用1减去第一列的值就可以了。

这其实是softmax的简化版本，sigmoid可以实现针对2个类别的softmax；但如果超过两个类别，就需要使用softmax了。
以下为softmax方法的源代码。
在这里插入图片描述我们可以让acts调用softmax方法，看看结果是否同刚才的结果一致。

sm_acts = torch.softmax(acts, dim=1)
sm_acts

在这里插入图片描述
正面我们来结合另一个例子来更好地理解softmax方法的源代码。
如下表格的output列是网络最后一层的3个activations, exp列是以e为底，output为指数的结果，softmax列是每个exp除以exp列总和的结果。
例如， $exp: 1.02=e^{0.02}, 0.22=1.02/4.60$

在这里插入图片描述

Log Likelihood

定义这6张图片的真实标签值。

from fastai.vision.all import *
targ = tensor([0,1,0,1,1,0])

让我们回顾一下softmax activations

sm_acts

在这里插入图片描述
我们想要得到标签对应的activations

idx = range(6)
sm_acts[idx, targ]

在这里插入图片描述为了便于观察，我们可以将所有的数据制作成dataframe表格显示出来，如下图所示：

from IPython.display import HTML
df = pd.DataFrame(sm_acts, columns=["3","7"])
df['targ'] = targ
df['idx'] = idx
df['result'] = sm_acts[range(6), targ]
t = df.style.hide()
#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))

在这里插入图片描述
其实就是根据idx, targ在acts中定位结果并展示出来。这也就是sm_acts[idx, targ]所做的事。
PyTorch深度学习库中也提供了一个模块，实现了和sm_acts[range(6), targ]一样的功能，、叫作F.nll_loss。
但这个函数不取对数，只是添加负号。

F.nll_loss(sm_acts, targ, reduction='none')

在这里插入图片描述

Taking the log

接下来我们再在表格右边添加loss列，先对result列取对数，然后添加负号。
为什么添加负号，因为result列的值都是在0-1之间，意味着ln(result)的值将是负数，所以要添加负号。

from IPython.display import HTML
df['loss'] = -torch.log(tensor(df['result']))
t = df.style.hide()
#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))

在这里插入图片描述我们可以注意到第3行和第4行的loss数值较大，这是为什么呢？
我们可以观察到第3行和第4行的共同点是：模型在错误的类别上有着很高的预测值。
所以我们的损失函数-torch.log给它们了一个惩罚，这对于模型的训练很有帮助。

Negative Log Likelihood

计算上方表格中的loss列的平均值，可以得到Negative Log Likelihood Loss，它也叫作Cross-Entropy Loss，
对activations先通过softmax函数，然后再计算negative log likelihood，最后计算所有loss的均值，这三步也就是cross-entropy loss的计算过程。
在PyTorch中，已经有了提前实现好的类和函数，分别叫nn.CrossEntropyLoss()和F.cross_entropy()，但我们通常用前者。

loss_func = nn.CrossEntropyLoss()
loss_func(acts, targ)

在这里插入图片描述
如果在初始化CrossEntropyLoss类的时候，指明参数reduction=‘none’，就意味着不计算所有loss的平均值。

nn.CrossEntropyLoss(reduction='none')(acts, targ)

在这里插入图片描述