ResNet

Paper: Deep Residual Learning for Image Recognition

---

Q:Is learning better networks as easy as stacking more layers?

A:vanishing/exploding gradients, which is largely addressed by normalization initialization and batch normalization, which ensures forward propagated signals to have non-zero variances .


degradation problem

随着网络深度的增加,准确度达到饱和,然后迅速衰退。并不是由过拟合导致的,增加网络层数反而导致更高的错误率。

--->并不是所有的网络系统都近似地易于优化。

--->not caused by vanishing gradients

--->reason is to be studied

--->conjecture that the deep plain nets may have exponentially low convergence rates.


introduce a deep residual learning framework to address degradation problem

--->假设相较于优化原始无参照映射H(x),优化残差映射F(x)=H(x)-x更容易。

--->原始映射F(x)+x可以使用skip-connection(shortcut connection)来实现。



Shortcut connection

--->those skipping one or more layers.


Deep Residual Learning 

--->The degradation problem suggests that solvers might have difficulty in approximating identity mappings by multiple nonlinear layers.

--->In real cases, our reformulation may help to precondition the problem.(预置条件)

--->perform a linear projection Ws by the shortcut connection to match the dimensions


--->the dotted shortcuts increase dimensions.

--->follow 2 design rules:

1)for the same feature map size, the layer has same filter numbers.

2)if feature map size is halved, the filter number is doubled to preserve the time complexity per layer.

--->adopt Batch Normalization right after each convolution and before activation.


Deeper Bottleneck Architecture

--->projection shortcuts are not essential for addressing the degradation problem.

--->identity shortcuts are important for not increasing the complexity of the bottleneck architecture.

--->designed for economical considerations.

--->1*1 convolution layer is responsible for reducing and increasing dimensions.


Exploring over 1000 layers

--->overfitting for the small database

### ResNet 深度学习架构概述 ResNet(Residual Network,残差网络)是一种用于解决深度神经网络训练过程中梯度消失和模型退化问题的创新性架构。其核心思想是通过引入 **shortcut connection** 来构建残差块,从而允许信息在较深层次间直接流动[^1]。 #### 架构特点 ResNet 的设计突破了传统 CNN 网络随着层数增加而性能下降的问题。具体来说: - 它通过 shortcut 连接将输入直接加到后续层的输出上,形成所谓的残差学习形式 \( F(x) = H(x) - x \),其中 \( H(x) \) 是期望学习的目标函数[^2]。 - Shortcut 连接可以有效缓解梯度消失现象并促进更深层网络的学习能力[^3]。 #### 主要版本及其差异 ResNet 提供了几种不同的变体,主要包括 ResNet-18、ResNet-34、ResNet-50、ResNet-101 和 ResNet-152。它们的主要区别体现在以下几个方面: - 对于浅层网络 (如 ResNet-18 和 ResNet-34),基本单元由两个 3×3 卷积组成; - 而对于更深的网络 (如 ResNet-50 及以上),则采用了瓶颈结构(bottleneck architecture),即每个残差块包含三个卷积层:第一个为 1×1 卷积负责降维压缩通道数,第二个为 3×3 卷积提取特征,第三个再次使用 1×1 卷积恢复原始维度[^3]。 以下是 ResNet-50 的典型实现代码: ```python import torch.nn as nn class BasicBlock(nn.Module): expansion = 1 def __init__(self, in_channels, out_channels, stride=1, downsample=None): super(BasicBlock, self).__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) self.relu = nn.ReLU(inplace=True) self.conv2 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=3, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels * self.expansion) self.downsample = downsample def forward(self, x): identity = x if self.downsample is not None: identity = self.downsample(x) out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) out += identity out = self.relu(out) return out class Bottleneck(nn.Module): expansion = 4 def __init__(self, in_channels, out_channels, stride=1, downsample=None): super(Bottleneck, self).__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, bias=False) self.bn3 = nn.BatchNorm2d(out_channels * self.expansion) self.relu = nn.ReLU(inplace=True) self.downsample = downsample def forward(self, x): identity = x if self.downsample is not None: identity = self.downsample(x) out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) out = self.relu(out) out = self.conv3(out) out = self.bn3(out) out += identity out = self.relu(out) return out class ResNet(nn.Module): def __init__(self, block, layers, num_classes=1000): super(ResNet, self).__init__() self.inplanes = 64 self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) self.layer1 = self._make_layer(block, 64, layers[0]) self.layer2 = self._make_layer(block, 128, layers[1], stride=2) self.layer3 = self._make_layer(block, 256, layers[2], stride=2) self.layer4 = self._make_layer(block, 512, layers[3], stride=2) self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(512 * block.expansion, num_classes) def _make_layer(self, block, planes, blocks, stride=1): downsample = None if stride != 1 or self.inplanes != planes * block.expansion: downsample = nn.Sequential( nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes * block.expansion)) layers = [] layers.append(block(self.inplanes, planes, stride, downsample)) self.inplanes = planes * block.expansion for i in range(1, blocks): layers.append(block(self.inplanes, planes)) return nn.Sequential(*layers) def resnet50(): model = ResNet(Bottleneck, [3, 4, 6, 3]) # For ResNet-50 return model ``` 上述代码定义了一个标准的 ResNet-50 结构,适用于多种计算机视觉任务,例如图像分类、目标检测等[^4]。 --- ### 应用场景分析 ResNet 已经成为许多实际应用的核心组件之一,在多个领域表现出卓越的效果: - 图像分类:ResNet 在 ImageNet 数据集上的表现证明了它的高效性和鲁棒性[^4]。 - 目标检测与实例分割:作为 Faster R-CNN 或 Mask R-CNN 中的基础骨干网络,ResNet 显著提升了检测精度。 - 视频处理:扩展至三维空间后形成的 3D-ResNet 成为了视频动作识别的重要工具。 --- 相关问题
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值