Key Point list
1 Vanishing/exploding gradients are largely addressed by normalized initialization and intermediate normalization layers(BN)
2 The plain network with depth increasing gets saturated and then degrades rapidly, which is not caused by overfitting. (degradation problem)
3 ResNet assump that it is easier to optimize the residual mapping than optize the origin, unreferenced mapping
Denote the desired underlying mapping as H(x), residual mapping as F(x)
F(x):=H(x)−x
Reasoning: The degradation problem suggests that the solvers might have difficulties in approximating identity mappings by multiple nonlinear layers. With the residual learning reformulation, if identity mappings are optimal, the solvers may simply drive the weights of the multiple nonlinear layers toward zero to approach identity mappings.
If the optimal function is closer to an identity mapping than to a zero mapping, it should be easier for the solver to find the perturbations with reference to an identity mapping, than to learn the function as a new one
4 The Design of ResNet:
Bottleneck
Reduce the parameters and calculations to develop more deeper network by reducing the channels.
The parameter-free identity shortcuts are particularly important for the bottleneck architecturesIdentity Mapping by Shortcuts(See Figure 2 )
when dimension increases,
method:
A) identity mapping with zeros padding
B) projection shortcut by 1 x 1 convolutionsNetwork Framework
5 Experiment on ImageNet show:
- ResNet are easier to optimize, the plain net exhibit higher training error with depth increasing
- ResNet can easily enjoy accuracy from increased depth producing results substantially better than previous networks
6 Experiment Result:
ResNet通过引入残差块解决了深度神经网络中的退化问题,使得更深的网络更易于优化和训练。它假设优化残差映射比优化原始非引用映射更简单。网络设计包括瓶颈结构,减少参数并保持身份映射。实验表明,ResNet在增加深度时仍能保持较低的训练误差,并显著提升准确性。
6348

被折叠的 条评论
为什么被折叠?



