bert系列第二篇：几个损失函数

最新推荐文章于 2025-09-26 10:25:22 发布

原创最新推荐文章于 2025-09-26 10:25:22 发布 · 8.2k 阅读

20 ·

CC 4.0 BY-SA版权

文章标签：

#损失函数 #交叉熵 #loss #mse

机器学习同时被 2 个专栏收录

21 篇文章

订阅专栏

深度学习

13 篇文章

订阅专栏

本文详细介绍了几种常见的损失函数，包括L1-Loss（MAE）、MSE（L2 Loss）、nllloss（negative-log-loss）和crossentropy。通过公式解析与实例计算，阐述了它们在不同场景下的应用，指出交叉熵损失函数在多分类问题中的重要性，并解释了NLLLoss实际上与交叉熵的关联。

L1-Loss（MAE)

$Loss(x,y)=\frac{1}{N} \sum_{i}( |x_i-y_i|)$

简单而言，就是两个向量的绝对值的误差。默认求平均，可以设置为sum。
也是mean absolute error（MAE)

举例而言：

 loss = nn.L1Loss()
    '''
    L1 Loss
    '''
    predict_ar = [0,2,3]
    target_ar = [2,2,3]
    input = torch.FloatTensor(predict_ar)
    input.requires_grad = True
    target = torch.FloatTensor(target_ar)
    output = loss(input, target)
    print(output)

结果则是0.667

MSE(L2 Loss)

这个也容易理解，则是平方误差。 mean squared error。

 loss = nn.MSELoss()
    '''
    L1 Loss
    '''
    predict_ar = [0, 2, 3]
    target_ar = [2, 2, 3]

    input = torch.FloatTensor(predict_ar)
    input.requires_grad = True
    target = torch.FloatTensor(target_ar)

    output = loss(input, target)
    print(output)

这个结果应该是4/3= 1.333

nllloss=negative-log-loss

https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss

The negative log likelihood loss. It is useful to train a classification problem with C classes.

这个loss有点不是非常容易理解，它主要的目标是衡量分类问题里，标签和真值的差异，只是用了log和取负。

那么如何计算呢？

loss(input, class) = -input[class]

你没有看错，就是这么简单。
input=[-1.233, 2.657, 0.534]，真实标签为2（class=2），则loss为-0.534

但实际上，大部分情况，input需要做log_softmax喂给NLLLoss。

能否举个例子

def nll_loss():
    #input = torch.rand(3,3)
    input = torch.FloatTensor([[0.8065, 0.5920, 0.1936],
        [0.0985, 0.5246, 0.6277],
        [0.7537, 0.4928, 0.3823]
                               ])

    #每一行是一个预测结果， 分为三类;
    # 如果我们的真值是[0,2,1]类别，那么nllloss如何计算？

    #预处理logsoftmax
    ls = nn.LogSoftmax(dim=1) #行操作；
    input = ls(input)
    target = torch.tensor([0,2,1])

    # 手算
    loss_manual =(0.8539+0.9127+1.1611)/3

    print("input:",input)
    print("target:", target)

    print("loss_manual:",loss_manual)
    # 机算
    loss_f = nn.NLLLoss()
    loss_value = loss_f(input,target)
    print("loss_value:",loss_value.numpy())

最好的例子就是看代码。
输出结果：

input: tensor([[-0.8539, -1.0684, -1.4668],
        [-1.4419, -1.0158, -0.9127],
        [-0.9002, -1.1611, -1.2716]])
target: tensor([0, 2, 1])
loss_manual: 0.9758999999999999
loss_value: 0.97590446

crossentropy

计算公式： $-\sum_{\forall x} p(x) \log(q(x))$

大部分计算的公式
$\frac{1}{N}\left(\sum_{i=1}^{N} \mathbf{y_i} \cdot \log(\mathbf{\hat{y}_i})\right)$

为何交叉熵可以评估多分类？

A就是数据的真实分布： p(x)
B就是模型从训练数据上学到的分布：q(x)
交叉熵可以很好的表示分布的差异，所以利用最小化交叉熵来达到拟合分布。
具体细节可以参考：https://www.zhihu.com/question/65288314

仔细思考上面的nllloss

将输入经过softmax激活函数之后，再计算其与target的交叉熵损失。即该方法将nn.LogSoftmax()和 nn.NLLLoss()进行了结合。严格意义上的交叉熵损失函数应该是nn.NLLLoss()。

仔细观察下面的官网的公式（是不是有softmax的影子，是不是有log，是不是还有nll操作（x[class]).
$\operatorname{loss}(x, \text {class})=-\log \left(\frac{\exp (x[\operatorname{class}])}{\sum_{j} \exp (x[j])}\right)$

我们利用上面的例子验证一下结果：

def nll_loss():
    #input = torch.rand(3,3)
    raw_input = torch.FloatTensor([[0.8065, 0.5920, 0.1936],
        [0.0985, 0.5246, 0.6277],
        [0.7537, 0.4928, 0.3823]
                               ])

    #每一行是一个预测结果， 分为三类;
    # 如果我们的真值是[0,2,1]类别，那么nllloss如何计算？

    #预处理logsoftmax
    ls = nn.LogSoftmax(dim=1) #行操作；
    input = ls(raw_input)
    target = torch.tensor([0,2,1])

    # 手算
    loss_manual =(0.8539+0.9127+1.1611)/3

    print("input:",input)
    print("target:", target)

    print("loss_manual:",loss_manual)
    # 机算
    loss_f = nn.NLLLoss()
    loss_value = loss_f(input,target)
    print("loss_value:",loss_value.numpy())

    loss_f_crossentropy = nn.CrossEntropyLoss()
    loss_ce = loss_f_crossentropy(raw_input,target)
    print("loss_ce:", loss_ce.numpy())

输出结果：

input: tensor([[-0.8539, -1.0684, -1.4668],
        [-1.4419, -1.0158, -0.9127],
        [-0.9002, -1.1611, -1.2716]])
target: tensor([0, 2, 1])
loss_manual: 0.9758999999999999
loss_value: 0.97590446
loss_ce: 0.97590446

总结

loss是评估预测和真值的距离，目标是最小化，这样拟合的分布则更加精准。过拟合不再本篇讨论范畴。
不同的loss可以更好的衡量两个分布的举例，从而拟合更好。
L1，L2非常容易理解
NLLloss是最小化负对数似然函数，和交叉熵本质上是一致的，交叉熵函数里多了一个logsoftmax的过程，其实是分布概率化。
如此看来，基本上交叉熵就够了。NLLLoss基本不会太用，毕竟交叉熵里有了logsoftmax也有了类别差异的计算。
$\operatorname{loss}(x, \text {class})=-\log \left(\frac{\exp (x[\operatorname{class}])}{\sum_{j} \exp (x[j])}\right)$ 这个公式是核心。