PyTorch里的损失函数(loss) 和 优化器(optimizer)

@创建于:2021.10.19

1、损失函数

本文列出PyTorch中常用的损失函数(一般通过torch.nn调用),并详细介绍每个损失函数的功能介绍、数学公式和调用代码。当然,PyTorch的损失函数还远不止这些,在解决实际问题的过程中需要进一步探索、借鉴现有工作,或者设计自己的损失函数。

1.1 二分类交叉熵损失函数

torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')

ℓ ( x , y ) = { mean ⁡ ( L ) ,  if reduction  =  ’mean’  sum ⁡ ( L ) ,  if reduction  =  ’sum’  \ell(x, y)=\left\{\begin{array}{ll}\operatorname{mean}(L), & \text { if reduction }=\text { 'mean' } \\\operatorname{sum}(L), & \text { if reduction }=\text { 'sum' }\end{array}\right. (x,y)={mean(L),sum(L), if reduction = ’mean’  if reduction = ’sum’ 

1.2 交叉熵损失函数

torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

loss ⁡ ( x ,  class  ) = − log ⁡ ( exp ⁡ ( x [  class  ] ) ∑ j exp ⁡ ( x [ j ] ) ) = − x [  class  ] + log ⁡ ( ∑ j exp ⁡ ( x [ j ] ) ) \operatorname{loss}(x, \text { class })=-\log \left(\frac{\exp (x[\text { class }])}{\sum_{j} \exp (x[j])}\right)=-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right) loss(x, class )=log(jexp(x[j])exp(x[ class ]))=x[ class ]+log(jexp(x[j]))

1.3 L1损失函数

torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')`

L n = ∣ ∣ x n − y n ∣ ∣ 1 L_{n} = || x_{n}-y_{n}||_1 Ln=xnyn1

1.4 平滑L1 (Smooth L1)损失函数

torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean', beta=1.0)

loss ⁡ ( x , y ) = 1 n ∑ i = 1 n z i \operatorname{loss}(x, y)=\frac{1}{n} \sum_{i=1}^{n} z_{i} loss(x,y)=n1i=1nzi

在这里插入图片描述

1.5 MSE损失函数

`torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`

l n = ( x n − y n ) 2 l_{n}=\left(x_{n}-y_{n}\right)^{2} ln=(xnyn)2

1.6 目标泊松分布的负对数似然损失

torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')

当参数log_input=True
loss ⁡ ( x n , y n ) = e x n − x n ⋅ y n \operatorname{loss}\left(x_{n}, y_{n}\right)=e^{x_{n}}-x_{n} \cdot y_{n} loss(xn,yn)=exnxnyn
当参数log_input=False
loss ⁡ ( x n , y n ) = x n − y n ⋅ log ⁡ ( x n +  eps  ) \operatorname{loss}\left(x_{n}, y_{n}\right)=x_{n}-y_{n} \cdot \log \left(x_{n}+\text { eps }\right) loss(xn,yn)=xnynlog(xn+ eps )

1.7 KL散度

torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)

D K L ( P , Q ) = E X ∼ P [ log ⁡ P ( X ) Q ( X ) ] = E X ∼ P [ log ⁡ P ( X ) − log ⁡ Q ( X ) ] = ∑ i = 1 n P ( x i ) ( log ⁡ P ( x i ) − log ⁡ Q ( x i ) ) \begin{aligned}D_{\mathrm{KL}}(P, Q)=\mathrm{E}_{X \sim P}\left[\log \frac{P(X)}{Q(X)}\right] &=\mathrm{E}_{X \sim P}[\log P(X)-\log Q(X)] \\&=\sum_{i=1}^{n} P\left(x_{i}\right)\left(\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right)\end{aligned} DKL(P,Q)=EXP[logQ(X)P(X)]=EXP[logP(X)logQ(X)]=i=1nP(xi)(logP(xi)logQ(xi))

1.8 MarginRankingLoss

torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

loss ⁡ ( x 1 , x 2 , y ) = max ⁡ ( 0 , − y ∗ ( x 1 − x 2 ) + margin ⁡ ) \operatorname{loss}(x 1, x 2, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin}) loss(x1,x2,y)=max(0,y(x1x2)+margin)

1.9 多标签边界损失函数

torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')

loss ⁡ ( x , y ) = ∑ i j max ⁡ ( 0 , 1 − x [ y [ j ] ] − x [ i ] ) x ⋅ size ⁡ ( 0 ) \operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-x[y[j]]-x[i])}{x \cdot \operatorname{size}(0)} loss(x,y)=ijxsize(0)max(0,1x[y[j]]x[i])

1.10 二分类损失函数

torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')torch.nn.(size_average=None, reduce=None, reduction='mean')

loss ⁡ ( x , y ) = ∑ i log ⁡ ( 1 + exp ⁡ ( − y [ i ] ⋅ x [ i ] ) ) x ⋅ nelement ⁡ ( ) \operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] \cdot x[i]))}{x \cdot \operatorname{nelement}()} loss(x,y)=ixnelement()log(1+exp(y[i]x[i]))

1.11 多分类的折页损失

torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')

loss ⁡ ( x , y ) = ∑ i max ⁡ ( 0 , margin ⁡ − x [ y ] + x [ i ] ) p x ⋅ size ⁡ ( 0 ) \operatorname{loss}(x, y)=\frac{\sum_{i} \max (0, \operatorname{margin}-x[y]+x[i])^{p}}{x \cdot \operatorname{size}(0)} loss(x,y)=xsize(0)imax(0,marginx[y]+x[i])p

1.12 三元组损失

torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')

L ( a , p , n ) = max ⁡ { d ( a i , p i ) − d ( a i , n i ) + margin ⁡ , 0 } L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\operatorname{margin}, 0\right\} L(a,p,n)=max{d(ai,pi)d(ai,ni)+margin,0}
 其中,  d ( x i , y i ) = ∥ x i − y i ∥ ・  \text { 其中, } d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{\text {・ }}  其中d(xi,yi)=xiyi 

1.13 HingEmbeddingLoss

torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')

l n = { x n ,  if  y n = 1 max ⁡ { 0 , Δ − x n } ,  if  y n = − 1 l_{n}=\left\{\begin{array}{ll}x_{n}, & \text { if } y_{n}=1 \\\max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1\end{array}\right. ln={xn,max{0,Δxn}, if yn=1 if yn=1

1.14 余弦相似度

torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

loss ⁡ ( x , y ) = { 1 − cos ⁡ ( x 1 , x 2 ) ,  if  y = 1 max ⁡ { 0 , cos ⁡ ( x 1 , x 2 ) −  margin  } ,  if  y = − 1 \operatorname{loss}(x, y)=\left\{\begin{array}{ll}1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\\max \left\{0, \cos \left(x_{1}, x_{2}\right)-\text { margin }\right\}, & \text { if } y=-1\end{array}\right. loss(x,y)={1cos(x1,x2),max{0,cos(x1,x2) margin }, if y=1 if y=1
其中
cos ⁡ ( θ ) = A ⋅ B ∥ A ∥ ∥ B ∥ = ∑ i = 1 n A i × B i ∑ i = 1 n ( A i ) 2 × ∑ i = 1 n ( B i ) 2 \cos (\theta)=\frac{A \cdot B}{\|A\|\|B\|}=\frac{\sum_{i=1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i=1}^{n}\left(A_{i}\right)^{2}} \times \sqrt{\sum_{i=1}^{n}\left(B_{i}\right)^{2}}} cos(θ)=ABAB=i=1n(Ai)2 ×i=1n(Bi)2 i=1nAi×Bi

1.15 CTC损失函数

torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)

功能: 用于解决时序类数据的分类

计算连续时间序列和目标序列之间的损失。CTCLoss对输入和目标的可能排列的概率进行求和,产生一个损失值,这个损失值对每个输入节点来说是可分的。输入与目标的对齐方式被假定为 “多对一”,这就限制了目标序列的长度,使其必须是≤输入长度。

2、优化器

Pytorch很人性化的提供了一个优化器的库torch.optim,在这里面共有十种优化器。具体如下。

  • torch.optim.ASGD
  • torch.optim.Adadelta
  • torch.optim.Adagrad
  • torch.optim.Adam
  • torch.optim.AdamW
  • torch.optim.Adamax
  • torch.optim.LBFGS
  • torch.optim.RMSprop
  • torch.optim.Rprop
  • torch.optim.SGD
  • torch.optim.SparseAdam

以上这些优化算法均继承于Optimizer,所有优化器的基类Optimizer。定义如下:

class Optimizer(object):
    def __init__(self, params, defaults):        
        self.defaults = defaults
        self.state = defaultdict(dict)
        self.param_groups = []

Optimizer有三个属性:
defaults:存储的是优化器的超参数;
state:参数的缓存;
param_groups:管理的参数组,是一个list,其中每个元素是一个字典,顺序是params,lr,momentum,dampening,weight_decay,nesterov。

3、参考链接

thorough-pytorch/第三章 PyTorch的主要组成模块

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值