PyTorch学习笔记（四）损失函数

最新推荐文章于 2024-07-19 09:14:27 发布

longrootchen

最新推荐文章于 2024-07-19 09:14:27 发布

阅读量817

点赞数

CC 4.0 BY-SA版权

分类专栏： PyTorch学习笔记文章标签：深度学习机器学习 pytorch 神经网络

本文链接：https://blog.youkuaiyun.com/longrootchen/article/details/105673979

PyTorch学习笔记专栏收录该内容

15 篇文章

订阅专栏

本文介绍了PyTorch中18种常用的损失函数，包括L1Loss、SmoothL1Loss、MSELoss、BCELoss等，详细解析了每种损失函数的计算公式及其应用场景，适合深度学习和机器学习开发者参考。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Environment

OS: macOS Mojave
Python version: 3.7
PyTorch version: 1.4.0
IDE: PyCharm

文章目录

0. 写在前面
1. L1Loss
2. SmoothL1Loss
3. MSELoss
4. BCELoss
5. BCEWithLogitsLoss
6. CrossEntropyLoss
7. NLLLoss
8. PoisonNLLLoss
9. KLDivLoss
10. MarginRankingLoss
11. HingeEmbeddingLoss
12. MultiLabelMarginLoss
13. SoftMarginLoss
14. MultiLabelSoftMarginLoss
15. CosineEmbeddingLoss
16. MultiMarginLoss
17. TripletMarginLoss
18. CTCLoss

0. 写在前面

损失函数用于描述模型预测与真实值之间的差异。严格意义上来说，损失函数（loss function）是对于单个样本实例而言的，而代价函数（cost function）的对于训练数据集而言的

损失函数 $\text{Loss} = f(\hat{y}, y)$
代价函数 $\text{Cost} = \sum_i^N f(\hat{y}, y)$ 或 $\text{Cost} = \frac{1}{N} \sum_i^N f(\hat{y}, y)$

但实际表述中并不严格区分二者。此外，最终要优化的函数为目标函数为代价函数加上正则项

目标函数 $\text{Obj} = \text{Cost} + \text{Regularization}$

PyTorch 在 torch.nn 模块中提供了 18 种常用的损失函数的类，它们被定义为 torch.nn.Module 的子类，通过重写 forward 方法，在其中调用 torch.nn.functional 中的函数实现。

from torch.nn import Module, CrossEntropyLoss

issubclass(CrossEntropyLoss, Module)  # True

实例化这些类时，都需要传入一个参数 reduction

默认值为 mean，计算 $\text{Cost} = \frac{1}{N} \sum_i^N f(\hat{y}, y)$
若传入 sum，则计算 $\text{Cost} = \sum_i^N f(\hat{y}, y)$
若传入 none，则计算 $\text{Loss} = f(\hat{y}, y)$

from torch.nn import L1Loss

l1_loss = L1Loss(reduction='mean')

这里小小地学习一下这些损失函数的类，有些尚未在应用到，日后方便查询 ✅

1. L1Loss

L1Loss 类，计算 inputs 和 target 之差的绝对值， $|\hat{y} - y|$

import torch
from torch.nn import L1Loss

# create data
inputs = torch.tensor([1, 5, 3, 9, 7], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)

l1_loss = L1Loss(reduction='none')
print(l1_loss(inputs, target))
# tensor([3., 3., 3., 9., 1.])

2. SmoothL1Loss

SmoothL1Loss 类，平滑 L1 损失函数，计算公式为
$\begin{cases} \frac{1}{2} (\hat{y} - y)^2, \text{ if} |\hat{y} - y| < 1 \\ |\hat{y} - y| - \frac{1}{2}, \text{otherwise} \end{cases}$

import torch
from torch.nn import SmoothL1Loss

# create data
inputs = torch.tensor([1, 5, 3, 9, 7.6], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)

smooth_l1_loss = SmoothL1Loss(reduction='none')
print(smooth_l1_loss(inputs, target))
# tensor([2.5000, 2.5000, 2.5000, 8.5000, 0.0800])

3. MSELoss

MSELoss 类，计算 inputs 和 target 之差的平方， $(\hat{y} - y)^2$

import torch
from torch.nn import MSELoss

# create data
inputs = torch.tensor([1, 5, 3, 9, 7], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)

l2_loss = MSELoss(reduction='none')

print(l2_loss(inputs, target))

4. BCELoss

BCELoss 类，计算二分类交叉熵，要求输入 inputs 的值范围在 $[0, 1]$

import torch
from torch.nn import BCELoss

# create data
inputs = torch.tensor([
    [1, 3],
    [4, 2]
], dtype=torch.float)
target = torch.tensor([
    [1, 0],
    [1, 0]
], dtype=torch.float)

binary_crossentropy_loss = BCELoss(
    weight=None,
    reduction='none'
)

# 用 sigmoid 将输入压缩至 0 到 1
print(binary_crossentropy_loss(torch.sigmoid(inputs), target))
# tensor([[0.3133, 3.0486],
#         [0.0181, 2.1269]])

5. BCEWithLogitsLoss

在 BCELoss 类中，要求输入 inputs 的值范围在 $[0, 1]$ ，因此需要额外调用 torch.sigmoid 计算二分类交叉熵。

使用 BCEWithLogitsLoss 类则不用额外调用 torch.sigmoid。计算公式为
$\log{\sigma(\hat{y})} + (1 - y) \log{(1 - \sigma(\hat{y}))})$

import torch
from torch.nn import BCEWithLogitsLoss

# create data
inputs = torch.tensor([
    [1, 3],
    [4, 2]
], dtype=torch.float)
target = torch.tensor([
    [1, 0],
    [1, 0]
], dtype=torch.float)

bce_with_logits_loss = BCEWithLogitsLoss(
    weight=None,
    reduction='none',
    pos_weight=None  # 正样本的权重
)
print(bce_with_logits_loss(inputs, target))
# tensor([[0.3133, 3.0486],
#         [0.0181, 2.1269]])

6. CrossEntropyLoss

CrossEntropyLoss 类，将 LogSoftmax 和 NLLLoss 结合，计算交叉熵损失。可以参考这篇博文 Pytorch详解NLLLoss和CrossEntropyLoss。

import torch
from torch.nn import CrossEntropyLoss

# create data
inputs = torch.tensor([
    [1, 2],
    [1, 3],
    [1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

cross_entropy_loss = CrossEntropyLoss(
    weight=None,  # 为各类别的损失设置权重
    ignore_index=-1,  # 忽略某个类别
    reduction='none'
)
print(cross_entropy_loss(inputs, target))
# tensor([1.3133, 0.1269, 0.1269])

传入 weight 参数

import torch
from torch.nn import CrossEntropyLoss

# create data
inputs = torch.tensor([
    [1, 2],
    [1, 3],
    [1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# 设置 weight，如下表示：表示标签为 0 的样本权重为 1，标签为 1 的权重为 2
weight = torch.tensor([1, 2], dtype=torch.float)
cross_entropy_loss = CrossEntropyLoss(weight, reduction='none')
print(cross_entropy_loss(inputs, target))
# tensor([1.3133, 0.2539, 0.2539])

7. NLLLoss

NLLLoss 类，取出真实标签对应的预测分数，并取相反数

import torch
from torch.nn import NLLLoss

# create data
inputs = torch.tensor([
    [1, 2],
    [1, 3],
    [1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

nll = NLLLoss(
    weight=None,
    ignore_index=-1,
    reduction='none'
)
print(nll(inputs, target))
# tensor([-1., -3., -3.])

8. PoisonNLLLoss

PoisonNLLLoss 类，对于泊松分布目标的负对数似然损失，计算公式

若参数 log_input 为 True，则 $e^{\hat{y}} - y \times \hat{y}$

若参数 log_input 为 False，则 $\hat{y} - y \times \log{(\hat{y} + eps)}$

import torch
from torch.nn import PoissonNLLLoss

# create data
inputs = torch.tensor([
    [0.3, 0.7],
    [0.6, 0.4]
], dtype=torch.float)
target = torch.tensor([
    [1, 0],
    [1, 0]
], dtype=torch.long)

poison_nll_loss = PoissonNLLLoss(
    log_input=True,  # 指示是否输入的预测值已取了对数
    full=False,  # 计算所有 loss，默认为 False
    eps=1e-8,  # 修正项，避免 log_input=False 时对 0 取对数
    reduction='none'
)

print(poison_nll_loss(inputs, target))
# tensor([[1.0499, 2.0138],
#         [1.2221, 1.4918]])

9. KLDivLoss

KLDivLoss 类，计算 KL 散度（KL Divergence），即相对熵

相对熵的理论公式为
$D_{KL}(P|Q) = E_{x \sim p} [\frac{P(x)}{Q(x)}] = E_{x \sim p} [\log{P(x)} - \log{Q(x)}]$

但 PyTorch 中的计算为
$(\log{y} - \hat{y})$

意味着，在输入 $\hat{y}$ 之前要先计算其 log-probability，可以使用 LogSoftmax 实现

import torch
from torch.nn import KLDivLoss, LogSoftmax

# create data
inputs = torch.tensor([
    [0.5, 0.3, 0.2],
    [0.2, 0.3, 0.5]
])
target = torch.tensor([
    [0.9, 0.05, 0.05],
    [0.1, 0.7, 0.2]
])

# log-probability
inputs = LogSoftmax(dim=1)(inputs)

# reduction 还可以传入 'batchmean'，之后版本的 reduction='mean' 的效果将变为与 reduction='batchmean' 相同
kl_div_loss = KLDivLoss(reduction='none')

print(kl_div_loss(inputs, target))
# tensor([[ 0.7510, -0.0928, -0.0878],
#         [-0.1063,  0.5482, -0.1339]])

10. MarginRankingLoss

MarginRankingLoss 类，描述两个向量之间的相似度，常用于排序任务

计算公式： $max\{0, -y \times (\hat{y_1} - \hat{y_2}) + margin)\}$

import torch
from torch.nn import MarginRankingLoss

# create data
y1 = torch.tensor([
    [1],
    [2],
    [3]
], dtype=torch.float)
y2 = torch.tensor([
    [2],
    [2],
    [2]
], dtype=torch.float)
y_true = torch.tensor([1, 1, -1], dtype=torch.float)

margin_ranking_loss = MarginRankingLoss(
    margin=0.0,  # 边界值，\hat{y_1} 和 \hat{y_2} 之间的差异值
    reduction='none'
)

# 返回一个 n x n 的 loss 矩阵，
# 第一行表示 y1 中的第一个元素和 y2 中的每一个元素计算的结果，
# 第二行表示 y1 中的第二个元素和 y2 中的每一个元素计算的结果，以此类推。
print(margin_ranking_loss(y1, y2, y_true))
# tensor([[1., 1., 0.],
#         [0., 0., 0.],
#         [0., 0., 1.]])

11. HingeEmbeddingLoss

HingeEmbeddingLoss 类，计算预测与真实之间的相似性，常用于非线性 embedding 和半监督学习。

计算公式：
$\begin{cases} \hat{y} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , if \ \ y = 1 \\ max\{ 0, \Delta - \hat{y} \} \ \ , if \ \ y = -1 \end{cases}$

import torch
from torch.nn import HingeEmbeddingLoss

# create data
# 输入 inputs 应为两个预测之差的绝对值
inputs = torch.tensor([[1., .8, .5]])
target = torch.tensor([[1, 1, -1]])

hinge_embedding_loss = HingeEmbeddingLoss(margin=1.0, reduction='none')
print(hinge_embedding_loss(inputs, target))
# tensor([[1.0000, 0.8000, 0.5000]])

12. MultiLabelMarginLoss

MultiLabelMarginLoss 类，多标签边界损失。计算公式为
$\sum_{ij} \frac{max\{ 0, 1 - (x[y[j]] - x[i]) \}}{\text{x.size}(0)}$
where $i = 0$ to $\text{x.size}(0)-1$ , $j = 0$ to $\text{y.size}(0)-1$ , $\leq y[j] \geq \text{x.size}(0)-1$ , and $\not= y[j]$ for all $i$ and $j$ .

import torch
from torch.nn import MultiLabelMarginLoss

# create data
inputs = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
target = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

multi_label_margin_loss = MultiLabelMarginLoss(reduction='none')
print(multi_label_margin_loss(inputs, target))
# tensor([0.8500])

手动计算

import torch

# create data
inputs = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
target = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

input_ = inputs[0]
item1 = (1 - (input_[0] - input_[1])) + (1 - (input_[0] - input_[2]))
item2 = (1 - (input_[3] - input_[1])) + (1 - (input_[3] - input_[2]))

print((item1 + item2) / input_.size(0))
# tensor([0.8500])

13. SoftMarginLoss

SoftMarginLoss 类，二分类 logistic 损失函数，计算公式
$\log{(1 + e^{(-y \times \hat{y})})}$

import torch
from torch.nn import SoftMarginLoss

# create data
inputs = torch.tensor([
    [0.3, 0.7],
    [0.5, 0.5]
])
target = torch.tensor([
    [-1, 1],
    [1, -1]
], dtype=torch.float)

soft_margin_loss = SoftMarginLoss(reduction='none')
print(soft_margin_loss(inputs, target))
# tensor([[0.8544, 0.4032],
#         [0.4741, 0.9741]])

14. MultiLabelSoftMarginLoss

MultiLabelSoftMarginLoss 类，SoftMarginLoss 的多标签版本，计算公式
$-\frac{1}{C} \sum_i \left[ y_i \log{(\frac{1}{1 + e^{-\hat{y_i}}})} + (1 - y_i) \log{(\frac{e^{-\hat{y_i}}}{1 + e^{-\hat{y_i}}})} \right]$
其中， $C$ 为标签数， $y_i$ 表示某一个标签的真实值， $\hat{y_i}$ 表示某一个标签的预测值。

import torch
from torch.nn import MultiLabelSoftMarginLoss

# create data
inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)

multi_label_soft_margin_loss = MultiLabelSoftMarginLoss(weight=None, reduction='none')
print(multi_label_soft_margin_loss(inputs, target))
# tensor([0.5429])

手动计算

import torch

# create data
inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)

C = 3
i_0 = torch.log(torch.exp(-inputs[0][0]) / (1 + torch.exp(-inputs[0][0])))
i_1 = torch.log(1 / (1 + torch.exp(-inputs[0][1])))
i_2 = torch.log(1 / (1 + torch.exp(-inputs[0][2])))
res = -(1 / C) * (i_0 + i_1 + i_2)
print(res)
# tensor([0.5429])

15. CosineEmbeddingLoss

CosineEmbeddingLoss 类，采用余弦相似度计算两个输入的相似性，常用于非线性 embedding 和半监督学习。计算公式
$\begin{cases} 1 - \cos{(x_1, x_2)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , if \ \ y = 1 \\ max\{ 0, \cos{(x_1, x_2)} - margin \} \ \ , if \ \ y = -1 \end{cases}$

import torch
from torch.nn import CosineEmbeddingLoss

# create data
inputs1 = torch.tensor([
    [.3, .5, .7],
    [.3, .5, .7]
])
inputs2 = torch.tensor([
    [.1, .3, .5],
    [.1, .3, .5]
])
target = torch.tensor([1, -1], dtype=torch.float)

cosine_embedding_loss = CosineEmbeddingLoss(
    margin=0.0,  # 边界值，可取值范围为 [-1, 1]，推荐 [0, 0.5]
    reduction='none'
)
print(cosine_embedding_loss(inputs1, inputs2, target))
# tensor([0.0167, 0.9833])

手动计算

import torch
from torch.nn import CosineEmbeddingLoss

# create data
inputs1 = torch.tensor([
    [.3, .5, .7],
    [.3, .5, .7]
])
inputs2 = torch.tensor([
    [.1, .3, .5],
    [.1, .3, .5]
])
target = torch.tensor([1, -1], dtype=torch.float)


def cosine(a, b):
    numerator = a @ b
    denominator = torch.norm(a, 2) * torch.norm(b, 2)
    return numerator / denominator


res_y_pos = 1 - cosine(inputs1[0], inputs2[0])  # y = 1
res_y_neg = max(0, cosine(inputs1[1], inputs2[1]))  # y = -1
print(res_y_pos, res_y_neg)
# tensor(0.0167) tensor(0.9833)

16. MultiMarginLoss

MultiMarginLoss 类，计算多分类任务的合页损失，计算公式
$\frac{1}{C} \sum_i (max\{ 0, margin - x[y] + x[i] \})^p$
其中， $C$ 为多分类的类别数

import torch
from torch.nn import MultiMarginLoss

# create data
inputs = torch.tensor([
    [0.1, 0.2, 0.7],
    [0.2, 0.5, 0.3]
])
target = torch.tensor([1, 2], dtype=torch.long)

multi_margin_loss = MultiMarginLoss(
    p=1,  # 指数部分的值，可传入 1 或 2
    margin=1.0,  # 边界值
    weight=None,  # 各类别损失的权重
    reduction='none'
)
print(multi_margin_loss(inputs, target))
# tensor([0.8000, 0.7000])

手动计算

import torch

# create data
inputs = torch.tensor([
    [0.1, 0.2, 0.7],
    [0.2, 0.5, 0.3]
])
target = torch.tensor([1, 2], dtype=torch.long)

# 对于第一个样本实例
inputs_ = inputs[0]
margin = 1

i_0 = margin - (inputs_[1] - inputs_[0])  # > 0
i_2 = margin - (inputs_[1] - inputs_[2])  # > 0
res = (i_0 + i_2) / inputs_.size(0)
print(res)
# tensor(0.8000)

17. TripletMarginLoss

TripletMarginLoss 类，计算三元组损失，常用于人脸识别。计算公式
$L(a, p, n) = max\{ d(a_i, p_i) - d(a_i, n_i) + margin, 0 \}$
其中， $d(x, y) = ||x - y||_p$

在这里插入代码片import torch
from torch.nn import TripletMarginLoss

# create data
anchor = torch.tensor([[1.]])
pos = torch.tensor([[2.]])
neg = torch.tensor([[0.5]])

triplet_margin_loss = TripletMarginLoss(
    margin=1.0,  # 边界值
    p=2.0,  # 范数的阶，默认为 2
    eps=1e-06,
    swap=False,
    reduction='none'
)
print(triplet_margin_loss(anchor, pos, neg))
# tensor([1.5000])

手动计算

(1. - 2.)**2 - np.sqrt((1. - .5)**2) + 1  # 1.5

18. CTCLoss

计算 CTC (Connectionist Temporal Classificatoin) 损失，用于时序类数据的分类问题。

from torch.nn import CTCLoss

CTCLoss(
    blank=0, # blank label
    reduction='mean',
    zero_infinity=False # 无穷大的值或梯度置零
)

时序建模方面有待进一步了解…