Environment
- OS: macOS Mojave
- Python version: 3.7
- PyTorch version: 1.4.0
- IDE: PyCharm
文章目录
- 0. 写在前面
- 1. L1Loss
- 2. SmoothL1Loss
- 3. MSELoss
- 4. BCELoss
- 5. BCEWithLogitsLoss
- 6. CrossEntropyLoss
- 7. NLLLoss
- 8. PoisonNLLLoss
- 9. KLDivLoss
- 10. MarginRankingLoss
- 11. HingeEmbeddingLoss
- 12. MultiLabelMarginLoss
- 13. SoftMarginLoss
- 14. MultiLabelSoftMarginLoss
- 15. CosineEmbeddingLoss
- 16. MultiMarginLoss
- 17. TripletMarginLoss
- 18. CTCLoss
0. 写在前面
损失函数用于描述模型预测与真实值之间的差异。严格意义上来说,损失函数(loss function)是对于单个样本实例而言的,而代价函数(cost function)的对于训练数据集而言的
- 损失函数 Loss = f ( y ^ , y ) \text{Loss} = f(\hat{y}, y) Loss=f(y^,y)
- 代价函数 Cost = ∑ i N f ( y ^ , y ) \text{Cost} = \sum_i^N f(\hat{y}, y) Cost=∑iNf(y^,y) 或 Cost = 1 N ∑ i N f ( y ^ , y ) \text{Cost} = \frac{1}{N} \sum_i^N f(\hat{y}, y) Cost=N1∑iNf(y^,y)
但实际表述中并不严格区分二者。此外,最终要优化的函数为目标函数为代价函数加上正则项
- 目标函数 Obj = Cost + Regularization \text{Obj} = \text{Cost} + \text{Regularization} Obj=Cost+Regularization
PyTorch 在 torch.nn
模块中提供了 18 种常用的损失函数的类,它们被定义为 torch.nn.Module
的子类,通过重写 forward
方法,在其中调用 torch.nn.functional 中的函数实现。
from torch.nn import Module, CrossEntropyLoss
issubclass(CrossEntropyLoss, Module) # True
实例化这些类时,都需要传入一个参数 reduction
- 默认值为
mean
,计算 Cost = 1 N ∑ i N f ( y ^ , y ) \text{Cost} = \frac{1}{N} \sum_i^N f(\hat{y}, y) Cost=N1∑iNf(y^,y) - 若传入
sum
,则计算 Cost = ∑ i N f ( y ^ , y ) \text{Cost} = \sum_i^N f(\hat{y}, y) Cost=∑iNf(y^,y) - 若传入
none
,则计算 Loss = f ( y ^ , y ) \text{Loss} = f(\hat{y}, y) Loss=f(y^,y)
from torch.nn import L1Loss
l1_loss = L1Loss(reduction='mean')
这里小小地学习一下这些损失函数的类,有些尚未在应用到,日后方便查询 ✅
1. L1Loss
L1Loss
类,计算 inputs 和 target 之差的绝对值,
l
o
s
s
=
∣
y
^
−
y
∣
loss = |\hat{y} - y|
loss=∣y^−y∣
import torch
from torch.nn import L1Loss
# create data
inputs = torch.tensor([1, 5, 3, 9, 7], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)
l1_loss = L1Loss(reduction='none')
print(l1_loss(inputs, target))
# tensor([3., 3., 3., 9., 1.])
2. SmoothL1Loss
SmoothL1Loss
类,平滑 L1 损失函数,计算公式为
l
o
s
s
=
{
1
2
(
y
^
−
y
)
2
,
if
∣
y
^
−
y
∣
<
1
∣
y
^
−
y
∣
−
1
2
,
otherwise
loss = \begin{cases} \frac{1}{2} (\hat{y} - y)^2, \text{ if} |\hat{y} - y| < 1 \\ |\hat{y} - y| - \frac{1}{2}, \text{otherwise} \end{cases}
loss={21(y^−y)2, if∣y^−y∣<1∣y^−y∣−21,otherwise
import torch
from torch.nn import SmoothL1Loss
# create data
inputs = torch.tensor([1, 5, 3, 9, 7.6], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)
smooth_l1_loss = SmoothL1Loss(reduction='none')
print(smooth_l1_loss(inputs, target))
# tensor([2.5000, 2.5000, 2.5000, 8.5000, 0.0800])
3. MSELoss
MSELoss
类,计算 inputs 和 target 之差的平方,
l
o
s
s
=
(
y
^
−
y
)
2
loss = (\hat{y} - y)^2
loss=(y^−y)2
import torch
from torch.nn import MSELoss
# create data
inputs = torch.tensor([1, 5, 3, 9, 7], dtype=torch.float)
target = torch.tensor([4, 2, 6, 0, 8], dtype=torch.float)
l2_loss = MSELoss(reduction='none')
print(l2_loss(inputs, target))
4. BCELoss
BCELoss
类,计算二分类交叉熵,要求输入 inputs 的值范围在
[
0
,
1
]
[0, 1]
[0,1]
import torch
from torch.nn import BCELoss
# create data
inputs = torch.tensor([
[1, 3],
[4, 2]
], dtype=torch.float)
target = torch.tensor([
[1, 0],
[1, 0]
], dtype=torch.float)
binary_crossentropy_loss = BCELoss(
weight=None,
reduction='none'
)
# 用 sigmoid 将输入压缩至 0 到 1
print(binary_crossentropy_loss(torch.sigmoid(inputs), target))
# tensor([[0.3133, 3.0486],
# [0.0181, 2.1269]])
5. BCEWithLogitsLoss
在 BCELoss
类中,要求输入 inputs 的值范围在
[
0
,
1
]
[0, 1]
[0,1],因此需要额外调用 torch.sigmoid
计算二分类交叉熵。
使用 BCEWithLogitsLoss
类则不用额外调用 torch.sigmoid
。计算公式为
l
o
s
s
=
−
(
y
log
σ
(
y
^
)
+
(
1
−
y
)
log
(
1
−
σ
(
y
^
)
)
)
loss = -(y \log{\sigma(\hat{y})} + (1 - y) \log{(1 - \sigma(\hat{y}))})
loss=−(ylogσ(y^)+(1−y)log(1−σ(y^)))
import torch
from torch.nn import BCEWithLogitsLoss
# create data
inputs = torch.tensor([
[1, 3],
[4, 2]
], dtype=torch.float)
target = torch.tensor([
[1, 0],
[1, 0]
], dtype=torch.float)
bce_with_logits_loss = BCEWithLogitsLoss(
weight=None,
reduction='none',
pos_weight=None # 正样本的权重
)
print(bce_with_logits_loss(inputs, target))
# tensor([[0.3133, 3.0486],
# [0.0181, 2.1269]])
6. CrossEntropyLoss
CrossEntropyLoss
类,将 LogSoftmax
和 NLLLoss
结合,计算交叉熵损失。可以参考这篇博文 Pytorch详解NLLLoss和CrossEntropyLoss。
import torch
from torch.nn import CrossEntropyLoss
# create data
inputs = torch.tensor([
[1, 2],
[1, 3],
[1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)
cross_entropy_loss = CrossEntropyLoss(
weight=None, # 为各类别的损失设置权重
ignore_index=-1, # 忽略某个类别
reduction='none'
)
print(cross_entropy_loss(inputs, target))
# tensor([1.3133, 0.1269, 0.1269])
传入 weight
参数
import torch
from torch.nn import CrossEntropyLoss
# create data
inputs = torch.tensor([
[1, 2],
[1, 3],
[1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)
# 设置 weight,如下表示:表示标签为 0 的样本权重为 1,标签为 1 的权重为 2
weight = torch.tensor([1, 2], dtype=torch.float)
cross_entropy_loss = CrossEntropyLoss(weight, reduction='none')
print(cross_entropy_loss(inputs, target))
# tensor([1.3133, 0.2539, 0.2539])
7. NLLLoss
NLLLoss
类,取出真实标签对应的预测分数,并取相反数
import torch
from torch.nn import NLLLoss
# create data
inputs = torch.tensor([
[1, 2],
[1, 3],
[1, 3]
], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)
nll = NLLLoss(
weight=None,
ignore_index=-1,
reduction='none'
)
print(nll(inputs, target))
# tensor([-1., -3., -3.])
8. PoisonNLLLoss
PoisonNLLLoss
类,对于泊松分布目标的负对数似然损失,计算公式
若参数 log_input
为 True
,则
l
o
s
s
=
e
y
^
−
y
×
y
^
loss = e^{\hat{y}} - y \times \hat{y}
loss=ey^−y×y^
若参数 log_input
为 False
,则
l
o
s
s
=
y
^
−
y
×
log
(
y
^
+
e
p
s
)
loss = \hat{y} - y \times \log{(\hat{y} + eps)}
loss=y^−y×log(y^+eps)
import torch
from torch.nn import PoissonNLLLoss
# create data
inputs = torch.tensor([
[0.3, 0.7],
[0.6, 0.4]
], dtype=torch.float)
target = torch.tensor([
[1, 0],
[1, 0]
], dtype=torch.long)
poison_nll_loss = PoissonNLLLoss(
log_input=True, # 指示是否输入的预测值已取了对数
full=False, # 计算所有 loss,默认为 False
eps=1e-8, # 修正项,避免 log_input=False 时对 0 取对数
reduction='none'
)
print(poison_nll_loss(inputs, target))
# tensor([[1.0499, 2.0138],
# [1.2221, 1.4918]])
9. KLDivLoss
KLDivLoss
类,计算 KL 散度(KL Divergence),即相对熵
相对熵的理论公式为
D
K
L
(
P
∣
Q
)
=
E
x
∼
p
[
P
(
x
)
Q
(
x
)
]
=
E
x
∼
p
[
log
P
(
x
)
−
log
Q
(
x
)
]
D_{KL}(P|Q) = E_{x \sim p} [\frac{P(x)}{Q(x)}] = E_{x \sim p} [\log{P(x)} - \log{Q(x)}]
DKL(P∣Q)=Ex∼p[Q(x)P(x)]=Ex∼p[logP(x)−logQ(x)]
但 PyTorch 中的计算为
l
o
s
s
=
y
(
log
y
−
y
^
)
loss = y (\log{y} - \hat{y})
loss=y(logy−y^)
意味着,在输入 y ^ \hat{y} y^ 之前要先计算其 log-probability,可以使用 LogSoftmax 实现
import torch
from torch.nn import KLDivLoss, LogSoftmax
# create data
inputs = torch.tensor([
[0.5, 0.3, 0.2],
[0.2, 0.3, 0.5]
])
target = torch.tensor([
[0.9, 0.05, 0.05],
[0.1, 0.7, 0.2]
])
# log-probability
inputs = LogSoftmax(dim=1)(inputs)
# reduction 还可以传入 'batchmean',之后版本的 reduction='mean' 的效果将变为与 reduction='batchmean' 相同
kl_div_loss = KLDivLoss(reduction='none')
print(kl_div_loss(inputs, target))
# tensor([[ 0.7510, -0.0928, -0.0878],
# [-0.1063, 0.5482, -0.1339]])
10. MarginRankingLoss
MarginRankingLoss
类,描述两个向量之间的相似度,常用于排序任务
计算公式: l o s s = m a x { 0 , − y × ( y 1 ^ − y 2 ^ ) + m a r g i n ) } loss = max\{0, -y \times (\hat{y_1} - \hat{y_2}) + margin)\} loss=max{0,−y×(y1^−y2^)+margin)}
import torch
from torch.nn import MarginRankingLoss
# create data
y1 = torch.tensor([
[1],
[2],
[3]
], dtype=torch.float)
y2 = torch.tensor([
[2],
[2],
[2]
], dtype=torch.float)
y_true = torch.tensor([1, 1, -1], dtype=torch.float)
margin_ranking_loss = MarginRankingLoss(
margin=0.0, # 边界值,\hat{y_1} 和 \hat{y_2} 之间的差异值
reduction='none'
)
# 返回一个 n x n 的 loss 矩阵,
# 第一行表示 y1 中的第一个元素和 y2 中的每一个元素计算的结果,
# 第二行表示 y1 中的第二个元素和 y2 中的每一个元素计算的结果,以此类推。
print(margin_ranking_loss(y1, y2, y_true))
# tensor([[1., 1., 0.],
# [0., 0., 0.],
# [0., 0., 1.]])
11. HingeEmbeddingLoss
HingeEmbeddingLoss
类,计算预测与真实之间的相似性,常用于非线性 embedding 和半监督学习。
计算公式:
l
o
s
s
=
{
y
^
,
i
f
y
=
1
m
a
x
{
0
,
Δ
−
y
^
}
,
i
f
y
=
−
1
loss = \begin{cases} \hat{y} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , if \ \ y = 1 \\ max\{ 0, \Delta - \hat{y} \} \ \ , if \ \ y = -1 \end{cases}
loss={y^ ,if y=1max{0,Δ−y^} ,if y=−1
import torch
from torch.nn import HingeEmbeddingLoss
# create data
# 输入 inputs 应为两个预测之差的绝对值
inputs = torch.tensor([[1., .8, .5]])
target = torch.tensor([[1, 1, -1]])
hinge_embedding_loss = HingeEmbeddingLoss(margin=1.0, reduction='none')
print(hinge_embedding_loss(inputs, target))
# tensor([[1.0000, 0.8000, 0.5000]])
12. MultiLabelMarginLoss
MultiLabelMarginLoss
类,多标签边界损失。计算公式为
l
o
s
s
=
∑
i
j
m
a
x
{
0
,
1
−
(
x
[
y
[
j
]
]
−
x
[
i
]
)
}
x.size
(
0
)
loss = \sum_{ij} \frac{max\{ 0, 1 - (x[y[j]] - x[i]) \}}{\text{x.size}(0)}
loss=ij∑x.size(0)max{0,1−(x[y[j]]−x[i])}
where
i
=
0
i = 0
i=0 to
x.size
(
0
)
−
1
\text{x.size}(0)-1
x.size(0)−1,
j
=
0
j = 0
j=0 to
y.size
(
0
)
−
1
\text{y.size}(0)-1
y.size(0)−1,
0
≤
y
[
j
]
≥
x.size
(
0
)
−
1
0 \leq y[j] \geq \text{x.size}(0)-1
0≤y[j]≥x.size(0)−1, and
i
≠
y
[
j
]
i \not= y[j]
i=y[j] for all
i
i
i and
j
j
j.
import torch
from torch.nn import MultiLabelMarginLoss
# create data
inputs = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
target = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)
multi_label_margin_loss = MultiLabelMarginLoss(reduction='none')
print(multi_label_margin_loss(inputs, target))
# tensor([0.8500])
手动计算
import torch
# create data
inputs = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
target = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)
input_ = inputs[0]
item1 = (1 - (input_[0] - input_[1])) + (1 - (input_[0] - input_[2]))
item2 = (1 - (input_[3] - input_[1])) + (1 - (input_[3] - input_[2]))
print((item1 + item2) / input_.size(0))
# tensor([0.8500])
13. SoftMarginLoss
SoftMarginLoss
类,二分类 logistic 损失函数,计算公式
l
o
s
s
=
log
(
1
+
e
(
−
y
×
y
^
)
)
loss = \log{(1 + e^{(-y \times \hat{y})})}
loss=log(1+e(−y×y^))
import torch
from torch.nn import SoftMarginLoss
# create data
inputs = torch.tensor([
[0.3, 0.7],
[0.5, 0.5]
])
target = torch.tensor([
[-1, 1],
[1, -1]
], dtype=torch.float)
soft_margin_loss = SoftMarginLoss(reduction='none')
print(soft_margin_loss(inputs, target))
# tensor([[0.8544, 0.4032],
# [0.4741, 0.9741]])
14. MultiLabelSoftMarginLoss
MultiLabelSoftMarginLoss
类,SoftMarginLoss
的多标签版本,计算公式
l
o
s
s
=
−
1
C
∑
i
[
y
i
log
(
1
1
+
e
−
y
i
^
)
+
(
1
−
y
i
)
log
(
e
−
y
i
^
1
+
e
−
y
i
^
)
]
loss = -\frac{1}{C} \sum_i \left[ y_i \log{(\frac{1}{1 + e^{-\hat{y_i}}})} + (1 - y_i) \log{(\frac{e^{-\hat{y_i}}}{1 + e^{-\hat{y_i}}})} \right]
loss=−C1i∑[yilog(1+e−yi^1)+(1−yi)log(1+e−yi^e−yi^)]
其中,
C
C
C 为标签数,
y
i
y_i
yi 表示某一个标签的真实值,
y
i
^
\hat{y_i}
yi^ 表示某一个标签的预测值。
import torch
from torch.nn import MultiLabelSoftMarginLoss
# create data
inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)
multi_label_soft_margin_loss = MultiLabelSoftMarginLoss(weight=None, reduction='none')
print(multi_label_soft_margin_loss(inputs, target))
# tensor([0.5429])
手动计算
import torch
# create data
inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)
C = 3
i_0 = torch.log(torch.exp(-inputs[0][0]) / (1 + torch.exp(-inputs[0][0])))
i_1 = torch.log(1 / (1 + torch.exp(-inputs[0][1])))
i_2 = torch.log(1 / (1 + torch.exp(-inputs[0][2])))
res = -(1 / C) * (i_0 + i_1 + i_2)
print(res)
# tensor([0.5429])
15. CosineEmbeddingLoss
CosineEmbeddingLoss
类,采用余弦相似度计算两个输入的相似性,常用于非线性 embedding 和半监督学习。计算公式
l
o
s
s
=
{
1
−
cos
(
x
1
,
x
2
)
,
i
f
y
=
1
m
a
x
{
0
,
cos
(
x
1
,
x
2
)
−
m
a
r
g
i
n
}
,
i
f
y
=
−
1
loss = \begin{cases} 1 - \cos{(x_1, x_2)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , if \ \ y = 1 \\ max\{ 0, \cos{(x_1, x_2)} - margin \} \ \ , if \ \ y = -1 \end{cases}
loss={1−cos(x1,x2) ,if y=1max{0,cos(x1,x2)−margin} ,if y=−1
import torch
from torch.nn import CosineEmbeddingLoss
# create data
inputs1 = torch.tensor([
[.3, .5, .7],
[.3, .5, .7]
])
inputs2 = torch.tensor([
[.1, .3, .5],
[.1, .3, .5]
])
target = torch.tensor([1, -1], dtype=torch.float)
cosine_embedding_loss = CosineEmbeddingLoss(
margin=0.0, # 边界值,可取值范围为 [-1, 1],推荐 [0, 0.5]
reduction='none'
)
print(cosine_embedding_loss(inputs1, inputs2, target))
# tensor([0.0167, 0.9833])
手动计算
import torch
from torch.nn import CosineEmbeddingLoss
# create data
inputs1 = torch.tensor([
[.3, .5, .7],
[.3, .5, .7]
])
inputs2 = torch.tensor([
[.1, .3, .5],
[.1, .3, .5]
])
target = torch.tensor([1, -1], dtype=torch.float)
def cosine(a, b):
numerator = a @ b
denominator = torch.norm(a, 2) * torch.norm(b, 2)
return numerator / denominator
res_y_pos = 1 - cosine(inputs1[0], inputs2[0]) # y = 1
res_y_neg = max(0, cosine(inputs1[1], inputs2[1])) # y = -1
print(res_y_pos, res_y_neg)
# tensor(0.0167) tensor(0.9833)
16. MultiMarginLoss
MultiMarginLoss
类,计算多分类任务的合页损失,计算公式
l
o
s
s
=
1
C
∑
i
(
m
a
x
{
0
,
m
a
r
g
i
n
−
x
[
y
]
+
x
[
i
]
}
)
p
loss = \frac{1}{C} \sum_i (max\{ 0, margin - x[y] + x[i] \})^p
loss=C1i∑(max{0,margin−x[y]+x[i]})p
其中,
C
C
C 为多分类的类别数
import torch
from torch.nn import MultiMarginLoss
# create data
inputs = torch.tensor([
[0.1, 0.2, 0.7],
[0.2, 0.5, 0.3]
])
target = torch.tensor([1, 2], dtype=torch.long)
multi_margin_loss = MultiMarginLoss(
p=1, # 指数部分的值,可传入 1 或 2
margin=1.0, # 边界值
weight=None, # 各类别损失的权重
reduction='none'
)
print(multi_margin_loss(inputs, target))
# tensor([0.8000, 0.7000])
手动计算
import torch
# create data
inputs = torch.tensor([
[0.1, 0.2, 0.7],
[0.2, 0.5, 0.3]
])
target = torch.tensor([1, 2], dtype=torch.long)
# 对于第一个样本实例
inputs_ = inputs[0]
margin = 1
i_0 = margin - (inputs_[1] - inputs_[0]) # > 0
i_2 = margin - (inputs_[1] - inputs_[2]) # > 0
res = (i_0 + i_2) / inputs_.size(0)
print(res)
# tensor(0.8000)
17. TripletMarginLoss
TripletMarginLoss
类,计算三元组损失,常用于人脸识别。计算公式
L
(
a
,
p
,
n
)
=
m
a
x
{
d
(
a
i
,
p
i
)
−
d
(
a
i
,
n
i
)
+
m
a
r
g
i
n
,
0
}
L(a, p, n) = max\{ d(a_i, p_i) - d(a_i, n_i) + margin, 0 \}
L(a,p,n)=max{d(ai,pi)−d(ai,ni)+margin,0}
其中,
d
(
x
,
y
)
=
∣
∣
x
−
y
∣
∣
p
d(x, y) = ||x - y||_p
d(x,y)=∣∣x−y∣∣p
在这里插入代码片import torch
from torch.nn import TripletMarginLoss
# create data
anchor = torch.tensor([[1.]])
pos = torch.tensor([[2.]])
neg = torch.tensor([[0.5]])
triplet_margin_loss = TripletMarginLoss(
margin=1.0, # 边界值
p=2.0, # 范数的阶,默认为 2
eps=1e-06,
swap=False,
reduction='none'
)
print(triplet_margin_loss(anchor, pos, neg))
# tensor([1.5000])
手动计算
(1. - 2.)**2 - np.sqrt((1. - .5)**2) + 1 # 1.5
18. CTCLoss
计算 CTC (Connectionist Temporal Classificatoin) 损失,用于时序类数据的分类问题。
from torch.nn import CTCLoss
CTCLoss(
blank=0, # blank label
reduction='mean',
zero_infinity=False # 无穷大的值或梯度置零
)
时序建模方面有待进一步了解…