CNN初探（四）------Deep Residual Learning for Image Recognition

最新推荐文章于 2025-07-01 11:33:55 发布

原创最新推荐文章于 2025-07-01 11:33:55 发布 · 805 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#residual #deep #learning

机器学习同时被 2 个专栏收录

30 篇文章

订阅专栏

深度学习

14 篇文章

订阅专栏

Degradation problem

是否通过简单地叠加网络的层数来提高准确率？为了得到答案，首先要解决vanishing/exploding gradients问题，这会极大地妨碍收敛。通过normalized initialization([1,2,3,4])和intermediate normalization layers[5]解决。
当深度网络能够收敛时，精度开始饱和，然后开始快速下降。但是需要注意的是精度恶化问题并不是因为过拟合。

这里写图片描述

Residual learning

微软为了解决这个问题提出了deep residual learning结构。假设你期望得到的映射为 $\mathcal{H(x)}$ ，现在我们改为拟合另外一个映射 $\mathcal{F(x):=H(x)-x}$ ， $\mathcal{F(x)}$ 即是残差，因此我们现在只需拟合 $\mathcal{F(x)+x}$

Shortcut Connections

$\mathcal{F(x)+x}$ 就可以被理解为“前馈神经网络+短连接（Shortcut Connections）”，短连接是用来跳过一层乃至更多层，短连接有许多的形式，在这里使用的是恒等映射，它不会增加参数，也不会增加计算复杂度，仍然可以使用SGD with backpropagation进行训练。

Residual learning

[1]Y. LeCun, L. Bottou, G. B. Orr, and K.-R.M¨ uller. Efficient backprop. In Neural Networks: Tricks of the Trade, pages 9–50. Springer, 1998.
[2]X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, 2010.
[3]A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120, 2013.
[4]K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, 2015
[5]S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.