[paper note] Densely Connected Convolutional Networks

最新推荐文章于 2025-02-18 11:02:55 发布

chn13

最新推荐文章于 2025-02-18 11:02:55 发布

阅读量1k

点赞数

分类专栏： paper-note 文章标签：图片分类 cnn

本文链接：https://blog.youkuaiyun.com/chn13/article/details/53608145

版权

20 篇文章

订阅专栏

Current trend of CNN architecture: create short paths from early layers to later layers.
- ResNet
- Highway network: The first network with more than 100 layers, bypassing paths
- Stochastic depth: Improves the training of deep residual networks by dropping layers randomly during training, which manages to train a 1202-layer ResNet
- FractalNets
Wide filter is helpful.
Connect all layers with each other.
Combine features by concatenating them (ResNet combines by summation).
DenseNet layers are very narrow (12 feature-maps per layer), resulting in less parameters

dense block

Dense connectivity: concatenate all the preceding layers:
$x_l = H_l([x_0, x_1, \dots, x_{l-1}])$
Composite function:
H_l is defined as BN + ReLU + 3x3 Conv
Pooling and dense block
- See figure above
- Transition layer between dense blocks, consist of BN + 1x1 Conv + 2x2 AveragePooing
Growth rate k:
- The number of output feature-maps.
- The l-th layer will have k x (l-1) + k_0 input feature-maps (k_0:input image channels)
Bottleneck layers
- Introduce 1x1 Conv before 3x3 Conv to reduce number of feature-maps will improve computation efficiency.
- H_l is changed to BN + ReLU + 1x1 Conv + BN + ReLU + 3x3 Conv
- 1x1 Conv reduce the input to 4k feature-maps in the experiment.
Compression
- Reduce feature-maps number in transition layer by factor $\theta$

Datasets
- CIFAR-10/100, 32x32
  - Zero-padded with 4 pixels on each side
  - Randomly cropped to again produce 32×32 images
  - Half of the images are then horizontally mirrored
- SVHN (Street View House Numbers), 32x32
- ImageNet, 224x224, 1.2m for training, 50000 for validation, 1000 classes
Settings: weight decay 10e-4, Nesterov momentum of 0.9 w\o dampening, learning rate 0.1 with decay scheme, dropout when no data augmentation
Accuracy result:
- 3.46% on CIFAR-10, L=190, k=40
- 17.18% on CIFAR-100, L=190, k=40
- 1.59% on SVHN, L=100, k=24
Capacity: the performance continues improving when L, k increase, showing the DenseNet is less prone to overfitting (???)
Parameter efficiency.