Deep Residual Learning for Image Recognition--ResNet论文阅读笔记

ResNet解决了深度网络训练中遇到的退化问题,提出残差映射和快捷连接的概念。通过让网络学习残差值F(x),而非原始映射H(x),并利用快捷连接实现这一目标。实验表明,ResNet在深度增加时仍能保持性能,且投影快捷连接在某些情况下表现更优。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Deep Residual Learning for Image Recognition--ResNet

原文  点击打开链接

“ease the training of networks that are substantially deeper than those used previously”

核心思想:提高更深层网络的训练效率

 

网络加深后遇到的问题:

1.    vanishing/exploding gradients

normalized initialization and intermediate normalization layers

梯度消失或爆炸,导致最终无法逼近局部最小值。用归一化处理可以基本解决这一问题。

2.    degradation

退化问题:更深层的网络训练效果却更差,残差网络主要解决这一问题

 

I.  Basic Ideas

1.    residual mapping

denoting the desired underlying mapping as H(x), we let the stacked nonlinear layers t another mapping of F(x) := H(x)x. The original mapping is recast into F(x)+x

x是输入层,现在通过网络不直接获得输出层的值H(x),而是获得残差值F(x),输出层的值=F(x)+x(也可以把x换成其它x的函数,但是用identity的好处是没有增加新的学习参数)

2.    shortcut connections

The formulation of F(x)+x can be realized by feed forward neural networks with “shortcut connections” (Fig. 2). Shortcut connections are those skipping one or more layers.

残差网络实现方法:在输入与输出之间加一个向前的shortcut,这样原网络输出就成了残差

 

II.  Notice

1.    The dimensions of x and F mustbe equal. If this is not the case (e.g., when changing the input/output channels), we can perform a linear projection Ws by the shortcut connections tomatch the dimensions

如果输入输出层维度不同(比如channel数不同),shortcut就要加一个转换维度的映射。

2.    a function F that has two or three layers. if F has only a single layer we have not observed advantages.

用来学习残差的网络层数应当大于1(否则没有明显效果),更多的层也是可行的。


III.  Network Architectures

1.    plain network 基于VGG设计,residual network 在plain network的基础上加了shortcut connections.

2.    When the dimensions increase (dotted line shortcutsin Fig. 3), we consider two options:

(A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter;

(B) The projection shortcut is used to match dimensions (done by 1×1 convolutions).

虚线是维度不匹配时的shortcut,有两种方案:直接把增加的维度设为0或乘投影矩阵

 

IV.  Implementation

multi-scale, standard color augmentation, BN, SGD with a mini-batch size of 256

learning rate starts from0.1 and is divided by 10 when the error plateaus and the models are trained forup to 60×104 iterations.

weight decay : 0.0001  momentum : 0.9. no dropout

 

V.  Experiments

plain network从18-layer到34-layer出现了退化现象,而ResNet-34效果要优于ResNet-18

(A)zero-padding shortcuts areused for increasing dimensions, and all shortcuts are parameter free

(B) projection shortcuts areused for increasing dimensions, and other shortcuts are identity;

(C) all shortcuts areprojections

ResNet-34 ABC都优于plain-34,B优于A,C略优于B,说明projection shortcuts在效果上更好

 

VI.  Deeper Bottleneck Architectures

The three layers are 1×1, 3×3, and 1×1 convolutions, where the 1×1 layers are responsible for reducing and then increasing (restoring) dimensions, leaving the 3×3 layer a bottleneck with smaller input/output dimensions.

两头的conv1先降维再升维

identity shortcuts lead to more efficient models for the bottleneck designs

对于bottleneck,identity shortcut要优于projection shortcut

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值