Xception: Deep Learning with Depthwise Separable Convolutions论文阅读

本文深入解析了Xception网络,一种基于深度可分离卷积的深度学习模型。文章介绍了从Inception模块演进到Xception的过程,并详细阐述了深度可分离卷积的工作原理及其在Xception中的实现方式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Xception: Deep Learning with Depthwise Separable Convolutions论文阅读

标签(空格分隔): 深度学习之网络结构 论文笔记


Xception的出现可以说为后面移动端的一些网络,例如Mobile-Net、Shuffle-Net,提供了一些思路,还是比较有看头的。

Inception到Xception

简单的说,Xception是Inception的一种极限情况,在Inception结构中,一个卷积层被拆分为几个并行的结构,如下图所示。

image_1cdkisnjv1vp197tmql1af95tc9.png-26kB

在论文中,作者将上图所示的结构变形为下图简化结构

image_1cdkj4r3a1kfk6glb69esa190b26.png-24.7kB

那么现在作者想着,最多有多少个这样的并行结构呢?于是就有了下面这样的结构,也就是Xception结构

image_1cdkj80v663olkr1pe28stulb2j.png-30.4kB

作者把这种结构称作为:

An “extreme” version of our Inception module

Xception结构将正常的卷积层分解成为两部分:通道独立卷积层(depthwise separable convolutions)和点独立卷积层(pointwise convolution)

作者将一个正常卷积层描述为通过卷积层后,对特征图进行2维空间的特征提取和通道之间的特征提取。想一想下一层一个特征图上的一个点怎么来的。不就是对上一层中对每一个特征图进行运算后然后不同通道相加嘛!
考虑到这个,于是作者首先进行二维空间层面的独立卷积(不考虑其他通道的影响),然后进行点独立卷积(考虑通道之间的相互作用,但是不考虑,二维空间其他点的作用),也就分别对应于depthwise separable convolutions和pointwise convolution
具体操作:
depthwise separable convolutions:对每一个通道进行独立卷积 3×3 3 × 3
pointwise convolution:对depthwise separable convolutions的结构进行普通的 1×1 1 × 1 卷积

理论上的解释

作者给了Inception理论上的解释,这里,向大家展示论文中的原话:

  1. A convolution layer attempts to learn filters in a 3D space, with 2 spatial dimensions (width and height) and a channel dimension; thus a single convolution kernel is tasked with simultaneously mapping cross-channel correlations and spatial correlations.
  2. This idea behind the Inception module is to make this process easier and more efficient by explicitly factoring it into a series of operations that would independently look at cross-channel correlations and at spatial correlations.
  3. More precisely, the typical Inception module first looks at cross-channel correlations via a set of 1x1 convolutions, mapping the input data into 3 or 4 separate spaces that are smaller than the original input space, and then maps all correlations in these smaller 3D spaces, via regular 3x3 or 5x5 convolutions
  4. In effect, the fundamental hypothesis behind Inception is that cross-channel correlations and spatial correlations are sufficiently decoupled that it is preferable not to map them jointly

Xecption 和 depthwise separable convolution区别

作者在论文中强调,Xecption和depthwise separable convolution两者基本上是等同关系,但是在一些细节方面还是有些区别的。
比如说:
1. 在Xecption结构中,首先采用pointwise convolution,然后是depthwise convolutions,但是在depthwise separable则是相反;
2. 在Xecption结构中,在pointwise convolution和depthwise convolutions后均采用了BN以及ReLU操作,而depthwise separable convolution没有

  1. We remark that this extreme form of an Inception module is almost identical to a depthwise separable convolution, an operation that has been used in neural network.
  2. A depthwise separable convolution, commonly called “separable convolution” in deep learning frameworks such as TensorFlow and Keras, consists in a depthwise convolution, i.e. a spatial convolution performed independently over each channel of an input, followed by a pointwise convolution, i.e. a 1x1 convolution, projecting the channels output by the depthwise convolution onto a new channel space.

关于两者之间的区别:

  1. The order of the operations: depthwise separable convolutions as usually implemented (e.g. in TensorFlow) perform first channel-wise spatial convolution and then perform 1x1 convolution, whereas Inception performs the 1x1 convolution first.
  2. The presence or absence of a non-linearity after the first operation. In Inception, both operations are followed by a ReLU non-linearity, however depthwise separable convolutions are usually implemented without non-linearities.

We argue that the first difference is unimportant, in particular because these operations are meant to be used in a stacked setting. The second difference might matter, and we investigate it in the experimental section.

Xecption network

介绍完了Xecption结构之后,作者接着将Xecption应用到Inception network中,搭建Xecption network

  1. we suggest that it may be possible to improve upon the Inception family of architectures by replacing Inception modules with depthwise separable convolutions, i.e. by building models that would be stacks of depthwise separable convolutions.
  2. We propose a convolutional neural network architecture based entirely on depthwise separable convolution layers. In effect, we make the following hypothesis: that the mapping of cross-channels correlations and spatial correlations in the feature maps of convolutional neural networks can be entirely decoupled
  3. The Xception architecture has 36 convolutional layers forming the feature extraction base of the network.
  4. In short, the Xception architecture is a linear stack of depthwise separable convolution layers with residual connections. This makes the architecture very easy to define and modify;

网络结构:
image_1ce3a02tr19q51icuiim1b3c1cmq9.png-155.8kB

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值