论文笔记 | Deep Residual Learning for Image Recognition

最新推荐文章于 2025-07-01 11:33:55 发布

bea_tree

最新推荐文章于 2025-07-01 11:33:55 发布

阅读量3.8k

点赞数 1

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/bea_tree/article/details/51735788

ConvNets 专栏收录该内容

39 篇文章

订阅专栏

本文探讨了深度残差学习的概念，介绍了残差网络如何通过增加深度来提高优化效率及准确率，同时保持较低的复杂度。文章分析了残差映射相较于原始映射更容易优化的原因，并讨论了不同类型的快捷连接及其在解决退化问题中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Authors

Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun

Abstract

Residual Networks are easier to optimize and gain accuracy from considerably increased depth, but it have lower complexity than VGGnets.

1 Introduction

这里写图片描述
We denote the desired underlying mapping as H(x)=F(x)+x, We hypothesize that it is easier to optimize the residual mapping than to optimize original, unreferenced mapping. To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.

2 relate work

2.1 Residual representation

F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for
image categorization. In CVPR, 2007.
H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and
C. Schmid. Aggregating local image descriptors into compact codes.
TPAMI, 2012.
W. L. Briggs, S. F. McCormick, et al. A Multigrid Tutorial. Siam,
2000.

2.2 shortcut connections

#highway
R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks.
arXiv:1505.00387, 2015
R. K. Srivastava, K. Greff, and J. Schmidhuber. Training very deep
networks. 1507.06228, 2015.
#LSTM
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural
computation, 9(8):1735–1780, 1997

3 Deep residual learning

3.1 residual learning

The degradation problem suggests that the solver might have difficults in approximating identity mappings by multiple layers.

3.2 Identity Mapping by Shortcuts

这里写图片描述
we can perform a linear projection by the shortcut connetions to match the dimensions.

In this exprements, the F has 2-3 layers , but cannot only 1 layer, that will similar to a linear year.

Plain Network The convolutional layer mostly have 3x3 filters and follow two simple design rules:1) for the same output feature map size, the layers have the same number of filters;2) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer.
Residual Network: when the dimensions increase(dotted line shortcuts), there are two options:1) zeros pads for increaseing dimensions 2)1x1 convolutions(slightly better)

4 Experments

BN ensures forward propagated signals to have non-zero variances.

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep
network training by reducing internal covariate shift. In ICML, 2015

ResNet eases the optimization by providing faster convergence at the early stage.
shortcut : identity or projection?
这里写图片描述
deep bottleneck architectures

Analysis of layer responses

Object detection Improvement

box refinement:

S. Gidaris and N. Komodakis. Object detection via a multi-region &
semantic segmentation-aware cnn model. In ICCV, 2015.

global context:
RoI–add a global feature SPP

Conclusions

Authors
Abstract
Introduction
relate work
- 1 Residual representation
- 2 shortcut connections
Deep residual learning
- 1 residual learning
- 2 Identity Mapping by Shortcuts
Experments
- Object detection Improvement
Conclusions