CondenseNet算法笔记

CondenseNet是一种优化DenseNet的深度学习模型,通过引入learned group convolutions和训练时剪枝,减少计算量和内存占用。在保持与DenseNet相似准确性的同时,只需1/10的训练时间。文章对比了ShuffleNet和MobileNet等模型,探讨了卷积group操作、训练剪枝和跨block连接的作用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

论文:CondenseNet: An Efficient DenseNet using Learned Group Convolutions
链接:https://arxiv.org/abs/1711.09224
代码地址:https://github.com/ShichenLiu/CondenseNet(PyTorch)

一作是康奈尔大学的黄高,主要在于优化了DenseNet网络,使其计算效率更高且参数存储更少。在DenseNet博客中介绍过DenseNet最大的一个缺点就是显存占用较大,主要原因在于生成了额外的较多特征层。在DenseNet后面紧接着有一篇技术文章通过开辟统一存储空间用于存储额外生成的特征,可以在一定程度上减少模型训练时候的显存占用。这篇文章则主要通过卷积的group操作以及在训练时候的剪枝来达到降低显存提高速度的目的。作者的实验证明,CondenseNet可以在只需要DenseNet的1/10训练时间的前提下,达到和DenseNet差不多的准确率。

总结下这篇文章的几个特点:1、引入卷积group操作,而且在1*1卷积中引入group操作时做了改进。2、训练一开始就对权重做剪枝,而不是对训练好的模型做剪枝。3、在DenseNet基础上引入跨block的dense连接

文中有和ShuffleNet、MobileNet等模型压缩加速算法的对比,这两个算法主要采用depth-wise separable convolution达到提速和压缩的效果,可以参考博客:ShuffleNetMobileNet

Figure1中左边图是DenseNet的结构。第三层的1*1 Conv主要起到channel缩减的作用(channel数量从lk减到4k),第五层的3*3 Conv生成k个channel的输出。Figure1中间图是CondenseNet在训练时候的结构,Permute层的作用是为了降低引入1*1 L-Cconv对结果的不利影响,实现的是channel之间的调换过程。需要注意的是原来1*1 Conv替换成了1*1 L-Conv(learned group convolution),原来的3*3 Conv替换成了3*3 G-Conv(group convolution)

以下是CondenseNet的简单实现代码: ```python import torch import torch.nn as nn import torch.nn.functional as F def conv3x3(in_channels, out_channels, stride=1): """3x3 convolution with padding""" return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) class Bottleneck(nn.Module): def __init__(self, in_channels, growth_rate): super(Bottleneck, self).__init__() self.bn1 = nn.BatchNorm2d(in_channels) self.conv1 = nn.Conv2d(in_channels, 4 * growth_rate, kernel_size=1, bias=False) self.bn2 = nn.BatchNorm2d(4 * growth_rate) self.conv2 = nn.Conv2d(4 * growth_rate, growth_rate, kernel_size=3, padding=1, bias=False) def forward(self, x): out = self.conv1(F.relu(self.bn1(x))) out = self.conv2(F.relu(self.bn2(out))) out = torch.cat([x, out], 1) return out class Transition(nn.Module): def __init__(self, in_channels, out_channels): super(Transition, self).__init__() self.bn = nn.BatchNorm2d(in_channels) self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False) def forward(self, x): out = self.conv(F.relu(self.bn(x))) out = F.avg_pool2d(out, 2) return out class CondenseBlock(nn.Module): def __init__(self, in_channels, growth_rate, num_layers, drop_rate=0.0): super(CondenseBlock, self).__init__() self.num_layers = num_layers self.drop_rate = drop_rate self.layers = nn.ModuleList([Bottleneck(in_channels + i * growth_rate, growth_rate) for i in range(num_layers)]) def forward(self, x): for i in range(self.num_layers): if i == 0: out = self.layers[i](x) else: out = self.layers[i](out) if self.drop_rate > 0: out = F.dropout(out, p=self.drop_rate, training=self.training) out = torch.cat([x, out], 1) return out class CondenseNet(nn.Module): def __init__(self, growth_rate=12, block_config=(18, 18, 18), num_classes=10, drop_rate=0): super(CondenseNet, self).__init__() self.growth_rate = growth_rate # Initial convolution self.conv1 = nn.Conv2d(3, 2 * growth_rate, kernel_size=3, padding=1, bias=False) # First block self.block1 = CondenseBlock(2 * growth_rate, growth_rate, block_config[0], drop_rate=drop_rate) self.trans1 = Transition(2 * growth_rate + block_config[0] * growth_rate, int(1.5 * growth_rate)) # Second block self.block2 = CondenseBlock(int(1.5 * growth_rate), growth_rate, block_config[1], drop_rate=drop_rate) self.trans2 = Transition(int(1.5 * growth_rate) + block_config[1] * growth_rate, int(2.25 * growth_rate)) # Third block self.block3 = CondenseBlock(int(2.25 * growth_rate), growth_rate, block_config[2], drop_rate=drop_rate) # Final batch norm self.bn = nn.BatchNorm2d(int(2.25 * growth_rate)) # Linear layer self.linear = nn.Linear(int(2.25 * growth_rate), num_classes) def forward(self, x): out = self.conv1(x) out = F.max_pool2d(out, 3, stride=2, padding=1) out = self.block1(out) out = self.trans1(out) out = self.block2(out) out = self.trans2(out) out = self.block3(out) out = F.relu(self.bn(out)) out = F.avg_pool2d(out, 8) out = out.view(out.size(0), -1) out = self.linear(out) return out ``` 这里实现了CondenseNet网络的主要结构,包括Bottleneck、TransitionCondenseBlock三个模块。其中,Bottleneck表示CondenseBlock中的基本单元,Transition用于进行通道数调整,CondenseBlock则是CondenseNet的主要模块,由多个Bottleneck组成。在模型的前部分,使用了一个3x3卷积一个最大池化层,而在最后一层则是一个全局平均池化层一个全连接层。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值