一、损失函数
1.交叉熵损失 Cross Entorpy Loss
逐像素交叉熵损失是图像分割中最常用的损失函数。该损失函数分别检查每个像素,将类预测(softmax 或 sigmoid)与目标向量(one hot)进行比较。
1.1交叉熵。
关于样本集的两个概率分布p和q,设p为真实的分布,例如[1, 0, 0]表示当前文本属于第一类;q为拟合的分布,例如该文本的分布为[0.7, 0.2, 0.1]。
按照真实分布p来衡量识别一个样本所需的编码长度的期望,即平均编码长度(信息熵):
H
(
p
)
=
−
∑
i
=
1
C
p
(
x
i
)
l
o
g
(
p
(
x
i
)
)
H(p)=-\sum_{i=1}^Cp(x_i)log(p(x_i))
H(p)=−∑i=1Cp(xi)log(p(xi))
如果使用拟合分布q来表示来自真实分布p的编码长度的期望,即平均编码长度(交叉熵):
H
(
p
,
q
)
=
−
∑
i
=
1
C
p
(
x
i
)
l
o
g
(
q
(
x
i
)
)
H(p,q)=-\sum_{i=1}^Cp(x_i)log(q(x_i))
H(p,q)=−∑i=1Cp(xi)log(q(xi))
根据吉斯不等式,
H
(
p
,
q
)
≥
H
(
p
)
H(p,q)\geq H(p)
H(p,q)≥H(p)恒成立,当q为真实分布时取等,我们将由q得到的平均编码长度比由p得到的平均编码长度多出的bit数称为相对熵,也叫KL散度:
D
(
p
∣
∣
q
)
=
H
(
p
)
−
H
(
p
,
q
)
=
∑
i
=
1
C
p
(
x
i
)
l
o
g
(
p
(
x
i
)
q
(
x
i
)
)
D(p||q)=H(p) - H(p,q)=\sum_{i=1}^Cp(x_i)log(\frac{p(x_i)}{q(x_i)})
D(p∣∣q)=H(p)−H(p,q)=∑i=1Cp(xi)log(q(xi)p(xi))
在机器学习的分类问题中,我们希望缩小模型预测和标签之间的差距,即KL散度越小越好,在这里由于KL散度中的 H ( p ) H(p) H(p)项不变(在其他问题中未必),故在优化过程中只需要关注交叉熵就可以了,因此一般使用交叉熵作为损失函数。
1.2 二分类交叉熵
二分类网络模型的输出最后采用Sigmoid激活函数,输出一个通道。二分类交叉熵损失:
L = − ( y l o g ( p ) + ( 1 − y ) l o g ( 1 − p ) ) L=-(ylog(p) + (1-y)log(1-p)) L=−(ylog(p)+(1−y)log(1−p))
其中,y为标签,1表示正样本,0表示负样本。p表示样本预测为正样本的概率。
pytorch用于计算二分类交叉熵的类为torch.nn.BCELoss()。定义与应用如下:
torch.nn.BCELoss(
weight=None,
size_average=None,
reduce=None,
reduction='mean'
)
# 实际应用
import torch
import torch.nn as nn
model = nn.Conv2d(1, 1, 3, padding=1)
criterion = nn.BCELoss(reduction='mean')
x = torch.randn(1, 1, 16, 16)
y = torch.randint(0, 2, size=(1, 1, 16, 16)).type(torch.FloatTensor)
preds = nn.Sigmoid()(model(x))
loss = criterion(preds, y)
1.2 多分类交叉熵
多分类任务中的交叉熵损失函数定义为:
L = − ∑ i = 1 C y i l o g ( p i ) L=-\sum_{i=1}^Cy_ilog(p_i) L=−∑i=1Cyilog(pi)
其中 p = [ p 0 , . . . , p C − 1 ] p=[p_0,...,p_{C-1}] p=[p0,...,pC−1]是一个概率分布,每个元素 p i p_i pi表示样本属于第 i i i类的概率; y = [ y 0 , . . . , y C − 1 ] y=[y_0,...,y_{C-1}] y=[y0,...,yC−1]是样本的one-hot标签,当样本属于第 i i i类时, y i = 1 y_i=1 yi=1,否则 y i = 0 y_i=0 yi=0。
PyTorch提供了两个类来计算交叉熵,分别是torch.nn.CrossEntropyLoss() 和torch.nn.NLLLoss()。
torch.nn.CrossEntropyLoss()接收网络非softmax输出。令
z
=
[
z
0
,
.
.
.
,
z
C
−
1
]
z=[z_0,...,z_{C-1}]
z=[z0,...,zC−1]表示网络的输出(非softmax),则损失函数描述如下:
L
(
z
,
c
)
=
−
y
c
l
o
g
(
e
x
p
(
z
[
c
]
)
∑
j
=
0
C
−
1
e
x
p
(
z
[
j
]
)
)
=
−
y
c
z
[
c
]
+
y
c
l
o
g
(
∑
j
=
0
C
−
1
e
x
p
z
[
j
]
)
L(z,c)=-y_clog(\frac{exp(z[c])}{\sum_{j=0}^{C-1}exp(z[j])})=-y_cz[c]+y_clog(\sum_{j=0}^{C-1}exp{z[j]})
L(z,c)=−yclog(∑j=0C−1exp(z[j])exp(z[c]))=−ycz[c]+yclog(∑j=0C−1expz[j])
如果weight被指定,则
L
(
z
,
c
)
=
w
y
c
(
−
z
[
c
]
+
l
o
g
(
∑
j
=
0
C
−
1
e
x
p
z
[
j
]
)
)
L(z,c)=wy_c(-z[c]+log(\sum_{j=0}^{C-1}exp{z[j]}))
L(z,c)=wyc(−z[c]+log(∑j=0C−1expz[j])),其中,
w
=
w
e
i
g
h
t
[
c
]
⋅
1
{
c
≠
i
g
n
o
r
e
_
i
n
d
e
x
}
w=weight[c] \cdot 1 \{c \neq ignore\_index \}
w=weight[c]⋅1{c=ignore_index}
定义与应用如下:
torch.nn.CrossEntropyLoss(
weight=None,
ignore_index=-100,
reduction="mean",
)
# 实际应用
import torch
import torch.nn as nn
model = nn.Conv2d(1, 3, 3, padding=1)
criterion = nn.CrossEntropyLoss(weight=None,
ignore_index=-100,
reduction='mean')
x = torch.randn(1, 1, 16, 16)
y = torch.randint(0, 3, size=(1, 16, 16)) # pytorch的label是像素所在位置的值,故y.shape = [B, H, W]
logits = model(x)
loss = criterion(logits, y)
torch.nn.NLLLoss()接收网络softmax输出,即每个样本类别的对数似然估计(softmax输出),令
a
=
[
a
0
,
.
.
.
,
a
C
−
1
]
a=[a_0,...,a_{C-1}]
a=[a0,...,aC−1]表示样本的对数似然估计,则该损失函数表述如下:
L
(
a
,
c
)
=
−
w
⋅
y
c
a
[
c
]
=
−
w
⋅
y
c
l
o
g
(
a
c
)
L(a,c)=-w\cdot y_ca[c] = -w \cdot y_clog(a_c)
L(a,c)=−w⋅yca[c]=−w⋅yclog(ac),其中
w
=
w
e
i
g
h
t
[
c
]
⋅
1
{
c
≠
i
g
n
o
r
e
_
i
n
d
e
x
}
w=weight[c] \cdot 1 \{c \neq ignore\_index \}
w=weight[c]⋅1{c=ignore_index}
定义与应用如下:
torch.nn.NLLLoss(
weight=None,
ignore_index=-100,
reduction="mean",
)
#实际应用
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Conv2d(1, 3, 3, padding=1),
nn.LogSoftmax(dim=1)
)
criterion = nn.NLLLoss(weight=None,
ignore_index=-100,
reduction='mean')
x = torch.randn(1, 1, 16, 16)
y = torch.randint(0, 3, size=(1, 16, 16))
preds = model(x) # (16, 3)
loss = criterion(preds, y)
另外,sigmoid函数: S ( x ) = 1 1 + e − x S(x) = \frac{1}{1+e^{-x}} S(x)=1+e−x1;softmax函数: S ( x i ) = e x i ∑ j = 0 C e x j S(x_i) = \frac{e^{x_i}}{\sum_{j=0}^Ce^{x_j}} S(xi)=∑j=0Cexjexi;
log_softmax: L o g S ( x i ) = l o g ( S ( x i ) ) = l o g ( e x i ∑ j = 0 C e x j ) LogS(x_i) = log(S(x_i)) = log(\frac{e^{x_i}}{\sum_{j=0}^Ce^{x_j}}) LogS(xi)=log(S(xi))=log(∑j=0Cexjexi),加速运算,保持数值稳定,防止溢出。
2.加权交叉熵损失
带有权重的交叉熵损失函数,定义为:
L
=
−
∑
i
=
0
C
−
1
w
i
y
i
l
o
g
(
p
i
)
L=-\sum_{i=0}^{C-1}w_iy_ilog(p_i)
L=−∑i=0C−1wiyilog(pi)
在交叉熵损失的基础上给每一个类别添加一个权重参数
w
i
=
N
−
N
i
N
w_i=\frac{N-N_i}{N}
wi=NN−Ni,其中
N
N
N为数据集的像素总书,
N
i
N_i
Ni为数据集第
i
i
i类像素数。即给数量较少的样本添加一个较大的权重,使得网络均衡学习样本的分布。有效地解决样本不均衡时的少量样本的训练问题。
3.Focal Loss
何凯明团队在RetinaNet论文中引入了Focal Loss来解决难易样本数量不平衡。One-Stage的目标检测器通常会产生10k数量级的框,但只有极少数是正样本,正负样本数量非常不平衡。
二分类Focal Loss:
L
=
{
−
α
(
1
−
p
)
γ
l
o
g
(
p
)
i
f
y
=
1
−
(
1
−
α
)
p
γ
l
o
g
(
1
−
p
)
i
f
y
=
0
L=\left\{ \begin{aligned} -\alpha(1-p)^\gamma log(p) && if && y=1\\ -(1-\alpha)p^\gamma log(1-p) && if && y=0 \end{aligned} \right.
L={−α(1−p)γlog(p)−(1−α)pγlog(1−p)ifify=1y=0
多分类Focal Loss:
L
=
∑
c
=
0
C
−
1
−
α
c
(
1
−
p
c
)
γ
l
o
g
(
p
c
)
L=\sum_{c=0}^{C-1}-\alpha_c (1-p_c)^\gamma log(p_c)
L=∑c=0C−1−αc(1−pc)γlog(pc)
pytorch代码:
import torch
import torch.nn as nn
import torch.nn.functional as F
class FocalLoss(nn.Module):
def __init__(self, gamma=0, alpha=None, reduction=True):
super(FocalLoss, self).__init__()
self.gamma = gamma
self.alpha = alpha
if isinstance(alpha,(float,int)): self.alpha = torch.Tensor([alpha,1-alpha])
if isinstance(alpha,list): self.alpha = torch.Tensor(alpha)
self.reduction = reduction
def forward(self, input, target):
if input.dim()>2:
input = input.view(input.size(0),input.size(1),-1) # N,C,H,W => N,C,H*W
input = input.transpose(1,2) # N,C,H*W => N,H*W,C
input = input.contiguous().view(-1,input.size(2)) # N,H*W,C => N*H*W,C
target = target.view(-1,1)
logpt = F.log_softmax(input, dim=1)
logpt = logpt.gather(1,target)
logpt = logpt.view(-1)
pt = torch.Tensor(logpt.data.exp())
if self.alpha is not None:
if self.alpha.type()!=input.data.type():
self.alpha = self.alpha.type_as(input.data)
at = self.alpha.gather(0,target.data.view(-1))
logpt = logpt * torch.Tensor(at)
loss = -1 * (1-pt)**self.gamma * logpt
if self.reduction: return loss.mean()
else: return loss.sum()
if __name__ == '__main__':
model = nn.Sequential(
nn.Conv2d(1, 3, 3, padding=1),
)
criterion = FocalLoss(gamma=2, alpha=[0.2, 0.3, 0.5], reduction=True)
x = torch.randn(1, 1, 16, 16)
y = torch.randint(0, 3, size=(1, 16, 16))
preds = model(x)
loss1 = criterion(preds, y)
4.Dice Loss
Dice系数,是一种集合相似度度量函数,通常用于计算两个样本的相似度值(范围为[0,1]):
s
=
2
∣
X
∩
Y
∣
+
1
∣
X
∣
+
∣
Y
∣
+
1
s=\frac{2|X \cap Y|+1}{|X| + |Y|+1}
s=∣X∣+∣Y∣+12∣X∩Y∣+1
Dice Loss:
L
=
1
−
2
∣
X
∩
Y
∣
∣
X
∣
+
∣
Y
∣
L=1-\frac{2|X \cap Y|}{|X| + |Y|}
L=1−∣X∣+∣Y∣2∣X∩Y∣
import torch
import torch.nn as nn
import torch.nn.functional as F
class DiceLoss(nn.Module):
def __init__(self):
super(DiceLoss, self).__init__()
self.epsilon = 1e-5
def forward(self, predict, target):
assert predict.size() == target.size(), "the size of predict and target must be equal."
num = predict.size(0)
pre = torch.sigmoid(predict).view(num, -1)
tar = target.view(num, -1)
intersection = (pre * tar).sum(-1).sum() #利用预测值与标签相乘当作交集
union = (pre + tar).sum(-1).sum()
score = 1 - 2 * (intersection + self.epsilon) / (union + self.epsilon)
return score
5. IOU Loss
L = 1 − ∣ X ∩ Y ∣ ∣ X ∣ + ∣ Y ∣ − ∣ X ∩ Y ∣ L=1-\frac{|X \cap Y|}{|X|+|Y|-|X\cap Y|} L=1−∣X∣+∣Y∣−∣X∩Y∣∣X∩Y∣
6.Tversky Loss
Tversky系数是Dice系数和 Jaccard 系数的一种推广。当设置α=β=0.5,此时Tversky系数就是Dice系数。而当设置α=β=1时,此时Tversky系数就是Jaccard系数。α和β分别控制假阴性和假阳性。通过调整α和β,可以控制假阳性和假阴性之间的平衡。
T ( X , Y ) = ∣ X ∩ Y ∣ ∣ X ∩ Y ∣ + α ∣ X − Y ∣ + β ∣ Y − X ∣ T(X,Y)=\frac{|X \cap Y|}{|X \cap Y|+ \alpha|X - Y| + \beta |Y-X|} T(X,Y)=∣X∩Y∣+α∣X−Y∣+β∣Y−X∣∣X∩Y∣
L ( X , Y ) = 1 − 1 + p p ^ 1 + p p ^ + β ( 1 − p ) p ^ + ( 1 − β ) p ( 1 − p ^ ) L(X,Y) = 1-\frac{1+p\hat p}{1+p\hat p+ \beta(1-p)\hat p + (1-\beta)p(1- \hat p)} L(X,Y)=1−1+pp^+β(1−p)p^+(1−β)p(1−p^)1+pp^
二、评价指标
1.执行时间(execution time)
2.内存占用(memory footprint)
3.Pixel Accuracy 像素精度
标记(分割)正确的像素占总像素的比例。
P A = ∑ i = 0 k p i i ∑ i = 0 k ∑ j = 0 k p i j PA = \frac{\sum_{i=0}^kp_{ii}}{\sum_{i=0}^k\sum_{j=0}^kp_{ij}} PA=∑i=0k∑j=0kpij∑i=0kpii
4.Mean Pixel Accuracy 平均像素精度
每个类内被正确分类的像素数的比例,求和之后再求平均
m
P
A
=
1
k
+
1
∑
i
=
0
k
p
i
i
∑
j
=
0
k
p
i
j
mPA=\frac{1}{k+1}\sum_{i=0}^k\frac{pii}{\sum_{j=0}^kp_{ij}}
mPA=k+11∑i=0k∑j=0kpijpii
5.Mean IOU
计算真实值和预测值两个集合的交集和并集之比。这个比例可以变形为TP(交集)比上TP、FP、FN之和(并集)。即:mIOU=TP/(FP+FN-TP)。
m I O U = 1 k + 1 ∑ i = 0 k p i i ∑ j = 0 k p i j + ∑ j = 0 k p j i − p i i mIOU=\frac{1}{k+1}\sum_{i=0}^k\frac{p_{ii}}{\sum_{j=0}^kp_{ij}+\sum_{j=0}^kp_{ji}-p_{ii}} mIOU=k+11∑i=0k∑j=0kpij+∑j=0kpji−piipii
6. requency Weighted Intersection over Union(FWIOU 频权交并比)
根据每个类出现的频率为其设置权重。
F
W
I
O
U
=
1
∑
i
=
0
k
∑
j
=
0
k
p
i
j
∑
i
=
0
k
p
i
i
∑
j
=
0
k
p
i
j
∑
j
=
0
k
p
i
j
+
∑
j
=
0
k
p
j
i
−
p
i
i
FWIOU=\frac{1}{\sum_{i=0}^k\sum_{j=0}^kp_{ij}}\sum_{i=0}^k\frac{p_{ii}\sum_{j=0}^kp_{ij}}{\sum_{j=0}^kp_{ij}+\sum_{j=0}^kp_{ji}-p_{ii}}
FWIOU=∑i=0k∑j=0kpij1∑i=0k∑j=0kpij+∑j=0kpji−piipii∑j=0kpij
python 代码