扩张卷积、数据格式、代码实现、概念集中构建帖

原创已于 2022-08-02 17:40:33 修改 · 1.1k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#conv2d #dilation

于 2019-01-24 14:33:26 首次发布

奇异叶专栏收录该内容

15 篇文章

订阅专栏

本文深入解析卷积神经网络中的关键概念，包括离散卷积公式、Padding、Stride及Transpose的定义与应用，通过实例展示tensorflow和pytorch中的卷积函数实现，并介绍扩张卷积的原理及其在小波分析中的历史渊源。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

离散卷积公式与定义

pytorch官网有原理公式：

torch.nn — PyTorch master documentation

where ⋆\star⋆ is the valid 2D cross-correlation operator.

点评：从求和公式的上、下界可以看出，各个通道的卷积值加起来。而且，这里卷积是通过相关函数完成的。相关系数定义：

相关函数和卷积的区别在于一个负号，即卷积函数没有反转。pytorch这么做是因为神经网络其实是利用计算机算力穷举法，卷积核非常多，因此严格反转操作的意义不大。

Padding, stride, transpose概念

超赞的图形化卷积教学：

conv_arithmetic/README.md at master · vdumoulin/conv_arithmetic · GitHub

padding: 补0，四周补上行/列0。类似DFT补0操作。

strides: 滑窗的步长。

还有两个参数就很难理解了，首先是transpose。学名叫上采样/反卷积。我有一个简单的理解方法：

把一个2×2的特征图卷积成4×4的原图像，其实很简单。2×2先向量排列成1×4，然而乘4×16矩阵，得到1×16矩阵，最后从排列成4×4。

上述过程的正向卷积，是乘16×4矩阵：

卷积核是3×3，请慢慢理解。

conv2d 和 max_pool是一对“奔波尔霸”和“霸波尔奔”。看下面两个例子来说明padding ='same' / padding= 'valid'

"SAME": output size is the same as input size. This requires the filter window to slip outside input map, hence the need to pad.
"VALID": Filter window stays at valid position inside input map, so output size shrinks by filter_size - 1. No padding occurs.

tf.nn.max_pool(
    value,
    ksize,
    strides,
    padding,
    data_format='NHWC',
    name=None
)

value: A 4-D Tensor of the format specified by data_format.
ksize: A list or tuple of 4 ints. The size of the window for each dimension of the input tensor.
strides: A list or tuple of 4 ints. The stride of the sliding window for each dimension of the input tensor.

例子

import tensorflow as tf
tf.reset_default_graph()

x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])

x = tf.reshape(x, [1, 2, 3, 1])  # give a shape accepted by tf.nn.max_pool

valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

print(valid_pad.get_shape())
print(same_pad.get_shape())
with tf.Session() as sess:
    print(sess.run(valid_pad))
    print(sess.run(same_pad))

输出：

(1, 1, 1, 1)
(1, 1, 2, 1)
[[[[ 5.]]]]
[[[[ 5.]
[ 6.]]]]

tensor flow函数实现

tf.nn.cov2d数据格式tensorflow官网说明：

tf.nn.conv2d | TensorFlow Core v2.9.1

tf.nn.conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=True,
    data_format='NHWC',
    dilations=[1, 1, 1, 1],
    name=None
)

input: A Tensor. Must be one of the following types: half, bfloat16, float32, float64. A 4-D tensor. The dimension order is interpreted according to the value of data_format, see below for details.
filter: A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]
strides: A list of ints. 1-D tensor of length 4. The stride of the sliding window for each dimension of input. The dimension order is determined by the value of data_format, see below for details.
dilations: An optional list of ints. Defaults to [1, 1, 1, 1]. 1-D tensor of length 4. The dilation factor for each dimension of input. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension. The dimension order is determined by the value of data_format, see above for details. Dilations in the batch and depth dimensions must be 1.
name: A name for the operation (optional).

例子

import numpy as np
import tensorflow as tf
tf.reset_default_graph()

x = np.arange(25).reshape(5,5)
x = tf.cast(x,tf.float32)
x = tf.reshape(x, [1, 5, 5, 1])

f=tf.Variable(tf.random_uniform([3,3,1,16],-1,1))

valid_pad = tf.nn.conv2d(x, f, [1,1,1,1], padding='VALID')
same_pad = tf.nn.conv2d(x, f, [1,1,1,1], padding='SAME')

print(valid_pad.get_shape())
print(same_pad.get_shape())
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(same_pad)

输出：

(1, 3, 3, 16)
(1, 5, 5, 16)

tensorflow 和 pytorch数据格式的区别：

NHWC 为：[batch, height, width, channels]

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

In the simplest case, the output value of the layer with input size (N,C,H,W)and output (N,Cout,Hout,Wout)

扩张卷积原理与应用

扩张卷积这种运算最早出现在小波分析，可参看如下两篇离散小波变换原创时期的学术论文。

Holschneider, M., Kronland-Martinet, R., Morlet, J., and Tchamitchian, Ph. A real-time algorithm for signal

analysis with the help of the wavelet transform. In Wavelets: Time-Frequency Methods and Phase Space.

Proceedings of the International Conference, 1987.

Shensa, Mark J. The discrete wavelet transform: wedding the à trous and Mallat algorithms. IEEE Transactions

on Signal Processing, 40(10), 1992.

古老的扩张卷积定义：

式中函数r∈l2空间的含义为：

现代AI主题的论文的扩张卷积定义如下：

点评：图像函数F的坐标是整数网格如[1,1]，[1,2]等。图像的像素值∈实数R。所以图像函数F：Z2 映射 R。离散滤波器函数k当然也是二维整数映射到一维R。

我们从熟悉的卷积公式来推导以上两个表达式。

下图为1-delated，2-delated，4-delated卷积核，感受野（receptive field）分别是3×3，7×7，15×15。2×(r-1)(k-1) + k

Atrous convolution with rate r introduces r − 1 zeros between consecutive filter values, effectively enlarging the kernel size of a k ×k filter to ke = k + (k − 1)(r − 1)