Deep Residual Learning for Image Recognition

本文提出了一种残差学习框架,解决了深度神经网络训练中的退化问题。当网络深度增加时,普通网络的训练误差会饱和并迅速恶化,而残差网络能够从大幅增加的深度中获益,优化更容易,训练误差更低。通过引入短路连接,网络能更好地适应身份映射,从而实现更深层次的优化。实验表明,深残差网络在保持高效的同时,能从极深的层数中获得精度提升。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 

Abstract

A residual learning framework is presented. Comprehensive empirical evidence show that these residual networks are easier to optimize, and can gain accuracy from considerable increased depth.


Introduction

1. Deep networks intergate low/mid/high-level features.The “levels” of features can be enriched by the number of stacked layers(depth).

2. Is learning better network as easy as stacking more layers? No, The vanishing/exploding gradients hampers convergence from the begining.But this problem has been largely addressed by normalization initialization and intermediate normailzation layers, which enable networks with tens of layers to start converging for stochastic gradient descent(SGD) with back-propagation.

3. When deeper networks start converging, a degradation problem has been exposed:with the network depth incresing, accuracy gets saturated and then degrades rapidly. Such degradation is not caused by overfitting.  And deeper model leads to higher traning error.

 

4. Consider a shallower architecture and its deeper countrepart that adds more layers onto it. There is a solution by construction to the deeper model: the added layers are identity mapping, and the other layers are copied from the learned shallower model. Theoretically, a deeper model should produce no higher training error than its shallower counterpart, but the fact is just the opposite.There is no constructed solution about this phenomenon.

5. In this paper, authors address the degradation problem by introducing a deep residual learning framework. They explicitly let layers fit a residual mapping of .H(x) is the original, unreferenced mapping.

The formulation of F(x) + x can be realized by feedforward nerual networds with “ shortcunt connections”.The shortcut connections simpy perform identity mapping.

 

Through evaluating their method, They show: 1) Extremely deep residual nets are easy to optimize,but the counterpart “plain” nets exhibit higher training error when the depth increase; 2) It can easily enjoy accuracy gains from greatly increased depth.


Related Work

1. Residual Representations.

2. Shortcut Connections:  An early practice of training multi-layer perceptrons(MLPs) is to add a linear layer connected from the network input to the output.

3. Deep Residual Learning

If the added layers can be constructed as identity mappings, a deeper model should have training error no greater than its shallower counterpart.

Identity Mapping by Shortscut:

 

 

 The form of the residual function F is flexible. Experiments in this paper involve a function F that has two or three layers, while more layers are possible.

Network Architectures

Three types

When the dimensions increase,We consider two options:(A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter; (B) The projection shortcut inis used to match dimensions.

 

 

Implementation

1. The image is resized with its shorter side randomly sample in [256,480] for scale augmentation. A 224x224 crop is randomly sampled from an image or its horizontal flip,with the per-pixel mean subtracted.

2. The standard color augmentation is used.

3. We adopt batch normalization(BN) right after each convolution and before activation.

4. We initialize the weights and as in[13] and train all plain/residual nets from scratch.

5. We use SGD with a mini-batch size of 256.

6. The learning rate starts from 0.1 and is divided by 10 when the error plateaus, and the models are trained for up to 60×10^4 iterations.

7. We use a weight decay of 0.0001 and a momentum of 0.9.

8. We do not use dropout.

Findings

1. The ResNet eases the optimization by providing faster convergence at the early stage, comparing to the plain nets.

2. Deeper Bottleneck Architectures

 

The three layers are 1×1 ,3×3,and1×1 convolutions, where the 1×1 layers are responsible for reducing and then increasing(restoring) dimensions, leaving the 3×3 layer a bottleneck with smaller input/ouput dimensions.

If the identity shortcut is replaced with projection, one can show that the time complexity and model size are doubled, as the shortcut is connected to the two high-dimensional ends. So identity shortcuts lead to more efficient models for the bottleneck designs.

Deep non-bottleneck ResNets also gain accuracy from increased depth, but are not as economical as the bottleneck ResNet. So the usage of bottleneck designs is mainly due to practical considerations.

3.

 

4.The deep plain nets suffer from increased depth, and exhibit higher training error when going deeper. This phenomenon is similar to that on ImageNet and MNIST, suggesting that such an optimization difficulty is a fundamental problem.

5.The residual functions might be generally closer to zero than the non-residual functions

6.The testing result of this 1202-layer network is worse than that of our 110-layer network,although both have similar training error. We argue that this is because of overfitting. The 1202-layer network may be unnecessarily large for this small dataset.

deep residual learning for image recognition是一种用于图像识别的深度残差学习方法。该方法通过引入残差块(residual block)来构建深度神经网络,以解决深度网络训练过程中的梯度消失和梯度爆炸等问题。 在传统的深度学习网络中,网络层数增加时,随之带来的问题是梯度消失和梯度爆炸。这意味着在网络中进行反向传播时,梯度会变得非常小或非常大,导致网络训练变得困难。deep residual learning则使用了残差连接(residual connection)来解决这一问题。 在残差块中,输入特征图被直接连接到输出特征图上,从而允许网络直接学习输入与输出之间的残差。这样一来,即使网络层数增加,也可以保持梯度相对稳定,加速网络训练的过程。另外,通过残差连接,网络也可以更好地捕获图像中的细节和不同尺度的特征。 使用deep residual learning方法进行图像识别时,我们可以通过在网络中堆叠多个残差块来增加网络的深度。这样,网络可以更好地提取图像中的特征,并在训练过程中学习到更复杂的表示。通过大规模图像数据训练,deep residual learning可以在很多图像识别任务中达到甚至超过人类表现的准确性。 总之,deep residual learning for image recognition是一种利用残差连接解决梯度消失和梯度爆炸问题的深度学习方法,通过增加网络深度并利用残差学习,在图像识别任务中获得了突破性的表现。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值