深度学习 - softmax交叉熵损失计算

想胖的壮壮

已于 2024-06-15 19:55:07 修改

阅读量1.4k

点赞数 35

文章标签：深度学习人工智能

于 2024-06-08 11:26:27 首次发布

本文链接：https://blog.youkuaiyun.com/weixin_47552266/article/details/139544020

版权

示例代码

import torch
from torch import nn

# 多分类交叉熵损失，使用nn.CrossEntropyLoss()实现。nn.CrossEntropyLoss()=softmax + 损失计算
def test1():
    # 设置真实值: 可以是热编码后的结果也可以不进行热编码
    # y_true = torch.tensor([[0, 1, 0], [0, 0, 1]], dtype=torch.float32)
    # 注意的类型必须是64位整型数据
    y_true = torch.tensor([1, 2], dtype=torch.int64)
    y_pred = torch.tensor([[0.2, 0.6, 0.2], [0.1, 0.8, 0.1]], dtype=torch.float32)
    # 实例化交叉熵损失
    loss = nn.CrossEntropyLoss()
    # 计算损失结果
    my_loss = loss(y_pred, y_true).numpy()
    print('loss:', my_loss)

输入数据

y_true = torch.tensor([1, 2], dtype=torch.int64)
y_pred = torch.tensor([[0.2, 0.6, 0.2], [0.1, 0.8, 0.1]], dtype=torch.float32)

y_true：真实标签，包含两个样本，分别属于类别 1 和类别 2。
y_pred：预测的概率分布，包含两个样本，每个样本有三个类别的预测值。

Step 1: Softmax 变换

Softmax 函数将原始的预测值转换为概率分布。Softmax 的公式如下：

$\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}$

对于第一个样本 y_pred = [0.2, 0.6, 0.2]：

计算指数：

$e^{0.2} \approx 1.221, \quad e^{0.6} \approx 1.822, \quad e^{0.2} \approx 1.221$

计算 Softmax 分母：

$\sum_{j} e^{x_j} = 1.221 + 1.822 + 1.221 = 4.264$

计算 Softmax 分子并得到结果：

$\text{Softmax}(0.2) = \frac{1.221}{4.264} \approx 0.286$

$\text{Softmax}(0.6) = \frac{1.822}{4.264} \approx 0.427$

$\text{Softmax}(0.2) = \frac{1.221}{4.264} \approx 0.286$

Softmax 结果为 [[0.286, 0.427, 0.286]]。

对于第二个样本 y_pred = [0.1, 0.8, 0.1]：

计算指数：

$e^{0.1} \approx 1.105, \quad e^{0.8} \approx 2.225, \quad e^{0.1} \approx 1.105$

计算 Softmax 分母：

$\sum_{j} e^{x_j} = 1.105 + 2.225 + 1.105 = 4.435$

计算 Softmax 分子并得到结果：

$\text{Softmax}(0.1) = \frac{1.105}{4.435} \approx 0.249$

$\text{Softmax}(0.8) = \frac{2.225}{4.435} \approx 0.502$

$\text{Softmax}(0.1) = \frac{1.105}{4.435} \approx 0.249$

Softmax 结果为 [[0.249, 0.502, 0.249]]。

Step 2: 计算交叉熵损失

交叉熵损失的公式为：

$\text{CrossEntropyLoss}(p, y) = -\sum_{i=1}^{N} y_i \log(p_i)$

对于第一个样本，真实标签为 1（y_true = 1），Softmax 后的预测概率分布为 [0.286, 0.427, 0.286]：

$\text{CrossEntropyLoss} = - [0 \cdot \log(0.286) + 1 \cdot \log(0.427) + 0 \cdot \log(0.286)]$

由于 (0 \cdot \log(0.286) = 0)，忽略后我们得到：

$\log(0.427) \approx 0.851$

对于第二个样本，真实标签为 2（y_true = 2），Softmax 后的预测概率分布为 [0.249, 0.502, 0.249]：

$\text{CrossEntropyLoss} = - [0 \cdot \log(0.249) + 0 \cdot \log(0.502) + 1 \cdot \log(0.249)]$

由于 (0 \cdot \log(0.249) = 0) 和 (0 \cdot \log(0.502) = 0)，忽略后我们得到：

$\log(0.249) \approx 1.390$

Step 3: 平均损失

计算平均损失：

$\text{平均损失} = \frac{0.851 + 1.390}{2} \approx \frac{2.241}{2} \approx 1.1205$

因此，最终的交叉熵损失 my_loss 约为 1.1205。