@创建于:2021.10.19
文章目录
1、损失函数
本文列出PyTorch中常用的损失函数(一般通过torch.nn调用),并详细介绍每个损失函数的功能介绍、数学公式和调用代码。当然,PyTorch的损失函数还远不止这些,在解决实际问题的过程中需要进一步探索、借鉴现有工作,或者设计自己的损失函数。
1.1 二分类交叉熵损失函数
torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')
ℓ ( x , y ) = { mean ( L ) , if reduction = ’mean’ sum ( L ) , if reduction = ’sum’ \ell(x, y)=\left\{\begin{array}{ll}\operatorname{mean}(L), & \text { if reduction }=\text { 'mean' } \\\operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array}\right. ℓ(x,y)={mean(L),sum(L), if reduction = ’mean’ if reduction = ’sum’
1.2 交叉熵损失函数
torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
loss ( x , class ) = − log ( exp ( x [ class ] ) ∑ j exp ( x [ j ] ) ) = − x [ class ] + log ( ∑ j exp ( x [ j ] ) ) \operatorname{loss}(x, \text { class })=-\log \left(\frac{\exp (x[\text { class }])}{\sum_{j} \exp (x[j])}\right)=-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right) loss(x, class )=−log(∑jexp(x[j])exp(x[ class ]))=−x[ class ]+log(j∑exp(x[j]))
1.3 L1损失函数
torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')`
L n = ∣ ∣ x n − y n ∣ ∣ 1 L_{n} = || x_{n}-y_{n}||_1 Ln=∣∣xn−yn∣∣1
1.4 平滑L1 (Smooth L1)损失函数
torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean', beta=1.0)
loss ( x , y ) = 1 n ∑ i = 1 n z i \operatorname{loss}(x, y)=\frac{1}{n} \sum_{i=1}^{n} z_{i} loss(x,y)=n1i=1∑nzi
1.5 MSE损失函数
`torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`
l n = ( x n − y n ) 2 l_{n}=\left(x_{n}-y_{n}\right)^{2} ln=(xn−yn)2
1.6 目标泊松分布的负对数似然损失
torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')
当参数log_input=True
:
loss
(
x
n
,
y
n
)
=
e
x
n
−
x
n
⋅
y
n
\operatorname{loss}\left(x_{n}, y_{n}\right)=e^{x_{n}}-x_{n} \cdot y_{n}
loss(xn,yn)=exn−xn⋅yn
当参数log_input=False
:
loss
(
x
n
,
y
n
)
=
x
n
−
y
n
⋅
log
(
x
n
+
eps
)
\operatorname{loss}\left(x_{n}, y_{n}\right)=x_{n}-y_{n} \cdot \log \left(x_{n}+\text { eps }\right)
loss(xn,yn)=xn−yn⋅log(xn+ eps )
1.7 KL散度
torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)
D K L ( P , Q ) = E X ∼ P [ log P ( X ) Q ( X ) ] = E X ∼ P [ log P ( X ) − log Q ( X ) ] = ∑ i = 1 n P ( x i ) ( log P ( x i ) − log Q ( x i ) ) \begin{aligned}D_{\mathrm{KL}}(P, Q)=\mathrm{E}_{X \sim P}\left[\log \frac{P(X)}{Q(X)}\right] &=\mathrm{E}_{X \sim P}[\log P(X)-\log Q(X)] \\&=\sum_{i=1}^{n} P\left(x_{i}\right)\left(\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right)\end{aligned} DKL(P,Q)=EX∼P[logQ(X)P(X)]=EX∼P[logP(X)−logQ(X)]=i=1∑nP(xi)(logP(xi)−logQ(xi))
1.8 MarginRankingLoss
torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')
loss ( x 1 , x 2 , y ) = max ( 0 , − y ∗ ( x 1 − x 2 ) + margin ) \operatorname{loss}(x 1, x 2, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin}) loss(x1,x2,y)=max(0,−y∗(x1−x2)+margin)
1.9 多标签边界损失函数
torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')
loss ( x , y ) = ∑ i j max ( 0 , 1 − x [ y [ j ] ] − x [ i ] ) x ⋅ size ( 0 ) \operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-x[y[j]]-x[i])}{x \cdot \operatorname{size}(0)} loss(x,y)=ij∑x⋅size(0)max(0,1−x[y[j]]−x[i])
1.10 二分类损失函数
torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')torch.nn.(size_average=None, reduce=None, reduction='mean')
loss ( x , y ) = ∑ i log ( 1 + exp ( − y [ i ] ⋅ x [ i ] ) ) x ⋅ nelement ( ) \operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] \cdot x[i]))}{x \cdot \operatorname{nelement}()} loss(x,y)=i∑x⋅nelement()log(1+exp(−y[i]⋅x[i]))
1.11 多分类的折页损失
torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')
loss ( x , y ) = ∑ i max ( 0 , margin − x [ y ] + x [ i ] ) p x ⋅ size ( 0 ) \operatorname{loss}(x, y)=\frac{\sum_{i} \max (0, \operatorname{margin}-x[y]+x[i])^{p}}{x \cdot \operatorname{size}(0)} loss(x,y)=x⋅size(0)∑imax(0,margin−x[y]+x[i])p
1.12 三元组损失
torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')
L
(
a
,
p
,
n
)
=
max
{
d
(
a
i
,
p
i
)
−
d
(
a
i
,
n
i
)
+
margin
,
0
}
L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\operatorname{margin}, 0\right\}
L(a,p,n)=max{d(ai,pi)−d(ai,ni)+margin,0}
其中,
d
(
x
i
,
y
i
)
=
∥
x
i
−
y
i
∥
・
\text { 其中, } d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{\text {・ }}
其中, d(xi,yi)=∥xi−yi∥・
1.13 HingEmbeddingLoss
torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')
l n = { x n , if y n = 1 max { 0 , Δ − x n } , if y n = − 1 l_{n}=\left\{\begin{array}{ll}x_{n}, & \text { if } y_{n}=1 \\\max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1\end{array}\right. ln={xn,max{0,Δ−xn}, if yn=1 if yn=−1
1.14 余弦相似度
torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')
loss
(
x
,
y
)
=
{
1
−
cos
(
x
1
,
x
2
)
,
if
y
=
1
max
{
0
,
cos
(
x
1
,
x
2
)
−
margin
}
,
if
y
=
−
1
\operatorname{loss}(x, y)=\left\{\begin{array}{ll}1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\\max \left\{0, \cos \left(x_{1}, x_{2}\right)-\text { margin }\right\}, & \text { if } y=-1\end{array}\right.
loss(x,y)={1−cos(x1,x2),max{0,cos(x1,x2)− margin }, if y=1 if y=−1
其中
cos
(
θ
)
=
A
⋅
B
∥
A
∥
∥
B
∥
=
∑
i
=
1
n
A
i
×
B
i
∑
i
=
1
n
(
A
i
)
2
×
∑
i
=
1
n
(
B
i
)
2
\cos (\theta)=\frac{A \cdot B}{\|A\|\|B\|}=\frac{\sum_{i=1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i=1}^{n}\left(A_{i}\right)^{2}} \times \sqrt{\sum_{i=1}^{n}\left(B_{i}\right)^{2}}}
cos(θ)=∥A∥∥B∥A⋅B=∑i=1n(Ai)2×∑i=1n(Bi)2∑i=1nAi×Bi
1.15 CTC损失函数
torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)
功能: 用于解决时序类数据的分类
计算连续时间序列和目标序列之间的损失。CTCLoss对输入和目标的可能排列的概率进行求和,产生一个损失值,这个损失值对每个输入节点来说是可分的。输入与目标的对齐方式被假定为 “多对一”,这就限制了目标序列的长度,使其必须是≤输入长度。
2、优化器
Pytorch很人性化的提供了一个优化器的库torch.optim,在这里面共有十种优化器。具体如下。
- torch.optim.ASGD
- torch.optim.Adadelta
- torch.optim.Adagrad
- torch.optim.Adam
- torch.optim.AdamW
- torch.optim.Adamax
- torch.optim.LBFGS
- torch.optim.RMSprop
- torch.optim.Rprop
- torch.optim.SGD
- torch.optim.SparseAdam
以上这些优化算法均继承于Optimizer
,所有优化器的基类Optimizer
。定义如下:
class Optimizer(object):
def __init__(self, params, defaults):
self.defaults = defaults
self.state = defaultdict(dict)
self.param_groups = []
Optimizer
有三个属性:
defaults
:存储的是优化器的超参数;
state
:参数的缓存;
param_groups
:管理的参数组,是一个list,其中每个元素是一个字典,顺序是params,lr,momentum,dampening,weight_decay,nesterov。