【神经网络设计系列】一、高效卷积神经网络一览

最新推荐文章于 2025-04-20 00:39:58 发布

zlzlzl8951

最新推荐文章于 2025-04-20 00:39:58 发布

阅读量343

点赞数

分类专栏：目标检测深度学习

原文链接：https://zhuanlan.zhihu.com/p/54082978

版权

深度学习同时被 2 个专栏收录

33 篇文章

订阅专栏

目标检测

9 篇文章

订阅专栏

本文转载自：https://zhuanlan.zhihu.com/p/54082978

这里梳理了一下当前几个高效卷积神经网络，包括mobilenet[1]、mobilenetv2[2]、shufflenet[3]、shufflenetv2[4]、xception[5]、light xception[6]。

梳理的目的有二：

1、神经网络结构的细节在文章中一般会散落在各处，通过梳理，我们将网络结构的细节合为一处，形成一个网络结构说明书，按照这个说明书就能快速而又无误地实现出对应的神经网络；

2、通过比较各个高效卷积神经网络，找到其中的共性与区别。

每个网络结构说明书的重点是“基本单元”和“网络结构全貌”，网络结构中的重要细节则直接从论文中摘抄。

Mobilenet

基本单元

网络全貌

资源占比

Width Multiplier

In order to construct these smaller and less computationally expensive models we introduce a very simple parameter α called width multiplier. The role of the width multiplier α is to thin a network uniformly at each layer. For a given layer and width multiplier α, the number of input channels M becomes αM and the number of output channels N becomes αN.

Mobilenetv2

基本单元

We use batch normalization after every layer.

网络全貌

Width Multiplier

One minor implementation difference, with [1] is that for multipliers less than one, we apply width multiplier to all layers except the very last convolutional layer. This improves performance for smaller models.

Shufflenet

基本单元

we set the number of bottleneck channels to 1/4 of the output channels for each ShuffleNet unit.

网络全貌

Note that for Stage 2, we do not apply group convolution on the first pointwise layer because the number of input channels is relatively small.

Width multiplier

To customize the network to a desired complexity, we can simply apply a scale factor s on the number of channels. For example, we denote the networks in Table 1 as ”ShuffleNet 1*”, then ”ShuffleNet s*” means scaling the number of filters in ShuffleNet 1*by s times thus overall complexity will be roughly s2 times of ShuffleNet 1*.

Shufflenetv2

基本单元

It is clear that when c1 : c2 is approaching 1 : 1, the MAC becomes smaller and the network evaluation speed is faster.

即channel split的时候，左右两个branch的channel平均分效果最好。

At the beginning of each unit, the input of c feature channels are split into two branches with c-c0 and c0 channels, respectively. Following G3, one branch remains as identity. The other branch consists of three convolutions with the same input and output channels to satisfy G1. The two 1 × 1 convolutions are no longer group-wise, unlike [3]. This is partially to follow G2, and partially because the split operation already produces two groups.

网络全貌

Width multiplier

the number of channels in each block is scaled to generate networks of different complexities, marked as , 0.5*, 1* , etc.

Xception

基本单元

Two minor differences between and “extreme” version of an Inception module and a depthwise separable convolution would be:
1. The order of the operations: depthwise separable convolutions as usually implemented (e.g. in TensorFlow) perform first channel-wise spatial convolution and then perform 1x1 convolution, whereas Inception performs the 1x1 convolution first.
2. The presence or absence of a non-linearity after the first operation. In Inception, both operations are followed by a ReLU non-linearity, however depthwise separable convolutions are usually implemented without non-linearities.
We argue that the first difference is unimportant, in particular because these operations are meant to be used in a stacked setting.

the absence of any non-linearity leads to both faster convergence and better final performance.

网络全貌

Note that all Convolution and SeparableConvolution layers are followed by batch normalization (not included in the diagram). All SeparableConvolution layers use a depth multiplier of 1 (no depth expansion).

Light Xception

An efficient Xception like architecture for our fast detector An efficient Xception like architecture for our fast detector.

网络全貌

we do not use the pre-activation design

综合来看

1、所有模型，无论是普通conv、point wise conv（无论是否执行group操作）还是depth wise conv后面都会接上一个bn层；

2、是否pre activation得根据具体模型和具体任务来决定；

3、mobilenet、mobilenetv2的depth wise conv之后接了非线性layer，而shufflenet、shufflenetv2、xception的depth wise conv之后没接非线性layer。到底是接好还是不接好，有点类似是否pre activation，得根据具体任务、具体模型来决定；

4、separable conv模块中point wise conv和depth wise conv的顺序对模型影响不大；

5、第一层和最后一层会有一些特殊处理，需要注意，如mobilenetv2在用width multiplier进行模型缩放时，对最后一层不会缩放；shufflenet第2个stage的第一层point wise conv不会执行group操作。