【DenseNet】《Densely Connected Convolutional Networks》

最新推荐文章于 2022-01-28 16:50:20 发布

原创最新推荐文章于 2022-01-28 16:50:20 发布 · 776 阅读

0 ·

CC 4.0 BY-SA版权

CNN / Transformer 专栏收录该内容

254 篇文章

订阅专栏

DenseNet通过密集连接解决深层神经网络中的梯度消失问题，增强特征传播并促进特征复用，显著减少参数数量。该文详细介绍了DenseNet的设计动机、架构优势、实验结果及作者见解。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在这里插入图片描述

CVPR-2017 best paper award

在 CIFAR-10 上的小实验可以参考博客【Keras-DenseNet】CIFAR-10

文章目录

1 Motivation
2 Advantages
3 Architecture
4 Experiment
- 4.1 Classification Results on CIFAR and SVHN
- 4.2 Classification Results on ImageNet
5 Feature Reuse
6 作者解读

1 Motivation

As CNNs become increasingly deep, a new research problem emerges: as information about the input or gradient passes through many layers, it can vanish and “wash out” by the time it reaches the end (or beginning) of the network.

目前的解决方法如ResNets、Highway Networks、Stochastic depth等，they all share a key characteristic: they create short paths from early layers to later layers.

作者 distills this insight into a simple connectivity pattern:
each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.

在传统的卷积神经网络 $L$ 层，有 $L$ 个连接，DenseNet中则会有 $L (L + 1) / 2$ 个连接
这里写图片描述

2 Advantages

DenseNets have several compelling advantages:

alleviate the vanishing-gradient problem
strengthen feature propagation
encourage feature reuse
substantially reduce the number of parameters

Each layer has direct access to the gradients from the loss function and the original input signal, leading to an implicit deep supervision.

Further, we also observe that dense connections have a regularizing effect, which reduces overfitting on tasks with smaller training set sizes.

3 Architecture

这里写图片描述

3.1 Dense connectivity

ResNet:
这里写图片描述

$x_{l} = H_{l}(x_{l-1})+x_{l-1}$

DenseNet:（Dense blocks）
首先明确，dense connectivity 仅仅是在一个dense block里的，不同dense block 之间是没有dense connectivity 的！
这里写图片描述

$x_{l} = H_{l}([x_{0},x_{01},...,x_{l-1}])$

$H_{l}(·)$ 可以是 BN , ReLU, Pooling, or Convolution
$x_{l}$ 表示为 the output of the $l^{th}$ layer
$x_{0},x_{01},...,x_{l-1}]$ 表示 concatenation

Note：

ResNet 的skip-connection 是 addition，eg：11+22 = 33

DenseNet 的skip-connetion是concatenate，eg：11+22 = 1122

3.2 Composite function

we define $H_{l}(·)$ as a composite function of three consecutive operations：

BN→ReLU→3×3Conv

原版的DenseNet，区别于后面的Bottleneck layers，也即是DenseNet-B
也就是figure 1 中的H1 H2 H3 H4，feature maps大小不会改变

3.3 Pooling layers

Dense blocks 不会改变feature maps 的尺寸，这样网络一直以一个分辨率（w×h）传下去肯定不行，pooling（down-sample）是很有必要的，所以作者把网络分成很多个Dense blocks，Dense blocks 之间的transition layers来做 pooling.
这里写图片描述

transition layers：在Dense blocks之间，do convolution and pooling：

BN→1×1Conv→2×2 average pooling

3.4 Growth rate

If each function $H_{l}$ produces $k$ feature maps, it follows that the $l^{th}$ layer has $k_{0}+k×(l-1)$ input feature-maps, where $k_{0}$ is the number of channels in the input layer.

Growth rate is k
eg： figure 1 中 x1、x2、x3、x4 k为4

3.5 Bottleneck layers（Dense blocks内）

为了减少feature maps的channels，因为哪怕k很小，层数多了，concatenate起来，feature maps的channels也会很大

BN→ReLU→1×1Conv→BN→ReLU→3×3Conv

这种version of $H_{l}(·)$ 叫 DenseNet-B

3.6 Compression（Dense blocks之间）

If a dense block contains $m$ feature-maps, we let the following transition layer generate $[\theta m]$ output feature maps, $\theta ≤1$ .
当 $\theta =1$ 时，就和原来的transition layer 一样
当 $\theta <1$ 时，叫 DenseNet-C

When both the bottleneck and transition layers with $\theta <1$ are used, we refer to our model as DenseNet-BC.

4 Experiment

主要实验都是在三大数据库上进行的，CIFAR、SVHN、ImageNet

4.1 Classification Results on CIFAR and SVHN

加粗的是优于其它方法，蓝色的是最好的结果
这里写图片描述

说明以下特点

Accuracy
Capacity：L（深度）和k（growth rate）增加，效果变好的general trend
Parameter Efficiency：参数少，效果好
Overfitting： DenseNet-BC缓解

4.2 Classification Results on ImageNet

Accuracy：compare with ResNet
这里写图片描述
Parameter Efficiency：compare with ResNet

5 Feature Reuse

作者让网络中的每一层都直接与其前面层相连，实现特征的重复利用；同时把网络的每一层设计得特别「窄」，即只学习非常少的特征图（最极端情况就是每一层只学习一个特征图），达到降低冗余性的目的。
这里写图片描述

上图可以简单的理解为，分子 $l$ 到 $s$ 的权重，分母为所有的权重（权重都是用L1范式计算的）

Dense Block2 和 Dense Block3 第一行的值是tansition layer 到下一个Dense block各层的权重，值比较低，说明特征冗余了，所以权重占比比较小！
This is in keeping with the strong results of DenseNet-BC where exactly these outputs are compressed