TITLE: Pelee: A Real-Time Object Detection System on Mobile Devicesn
AUTHOR: Robert J. Wang, Xiang Li, Shuang Ao, Charles X. Ling
ASSOCIATION: University ofWestern Ontario
FROM: arXiv:1804.06882
CONTRIBUTION
- A variant of DenseNet architecture called PeleeNet for mobile devices is proposed.
- The network architecture of Single Shot MultiBox Detector (SSD) is optimized for speed acceleration and then combine it with PeleeNet.
METHOD
BUILDING BLOCKS
Two-Way Dense Layer. A 2-way dense layer is used to get different scales of receptive fields. One branch uses a small kernel size (3x3) to capture small-size objects. The other branch stacks two 3x3 convolution layers for larger objects. The structure is shown in the following figure.

Stem Block. This block is placed before the first dense layer for the sake of cost efficiency. This stem block can effectively improve the feature expression ability without adding computational cost too much. The structure is shown as follows.

Dynamic Number of Channels in Bottleneck Layer. The number of channels in the bottleneck layer varies according to the input shape to make sure the number of output channels does not exceed the number of its input channels.
Transition Layer without Compression. experiments show that the compression factor proposed by DenseNet hurts the feature expression so that the number of output channels is kept the same as the number of input channels in transition layers.
Composite Function. The post-activation (Convolution - Batch Normalization - Relu) is used for speed acceleration. For post-activation, all batch normalization layers can be merged with convolution layer at the inference stage. To compensate for the negative impact on accuracy caused by this change, a shallow and wide network structure is designed. In addition, a 1x1 convolution layer is added to the last dense block to get a stronger representational ability.
ARCHITECTURE
The framework of the work is illustrated in the following table.

OPTIMIZATION FOR SSD
Feature Map Selection. 5 scale feature maps (19x19, 10x10, 5x5, 3x3, and 1x1) are selected. Larger resolution features are discarded for speed acceleration.
Residual Prediction Block. For each feature map used for detection, a residual block (ResBlock) is constructed before conducting prediction, shown in the following figure.

PERFORMANCE
The classification performance on ILSVRC2012 is shown in the following table.

The detection performance on VOC2007 is shown in the following table.

The detection performance on COCO2015 is shown in the following table.

SOME IDEAS
From my own experince, DW convolution is not pruning friendly so that recently pruning methods, such as ThiNet and Net-Trim, works poorly on DW convolution. This work uses conventional convolutional layers, so maybe those pruning methods can play a role.

提出了一种适用于移动设备的目标检测系统Pelee,该系统基于改进的DenseNet架构PeleeNet,并针对单次多框检测器(SSD)进行了速度优化。通过使用两路密集层、动态通道数量等创新设计,实现了高效且准确的目标检测。
912

被折叠的 条评论
为什么被折叠?



