目录
-
离散卷积公式与定义
-
Padding, stride, transpose概念
-
tensor flow函数实现
-
扩张卷积原理与应用
离散卷积公式与定义
pytorch官网有原理公式:
torch.nn — PyTorch master documentation
where ⋆\star⋆ is the valid 2D cross-correlation operator.
点评:从求和公式的上、下界可以看出,各个通道的卷积值加起来。而且,这里卷积是通过相关函数完成的。相关系数定义:
相关函数和卷积的区别在于一个负号,即卷积函数没有反转。pytorch这么做是因为神经网络其实是利用计算机算力穷举法,卷积核非常多,因此严格反转操作的意义不大。
Padding, stride, transpose概念
超赞的图形化卷积教学:
conv_arithmetic/README.md at master · vdumoulin/conv_arithmetic · GitHub
padding: 补0,四周补上行/列0。类似DFT补0操作。
strides: 滑窗的步长。
还有两个参数就很难理解了,首先是transpose。学名叫上采样/反卷积。我有一个简单的理解方法:
把一个2×2的特征图卷积成4×4的原图像,其实很简单。2×2先向量排列成1×4,然而乘4×16矩阵,得到1×16矩阵,最后从排列成4×4。
上述过程的正向卷积,是乘16×4矩阵:
卷积核是3×3,请慢慢理解。
conv2d 和 max_pool是一对“奔波尔霸”和“霸波尔奔”。看下面两个例子来说明padding ='same' / padding= 'valid'
"SAME"
: output size is the same as input size. This requires the filter window to slip outside input map, hence the need to pad."VALID"
: Filter window stays at valid position inside input map, so output size shrinks byfilter_size - 1
. No padding occurs.
tf.nn.max_pool(
value,
ksize,
strides,
padding,
data_format='NHWC',
name=None
)
value
: A 4-DTensor
of the format specified bydata_format
.ksize
: A list or tuple of 4 ints. The size of the window for each dimension of the input tensor.strides
: A list or tuple of 4 ints. The stride of the sliding window for each dimension of the input tensor.
例子
import tensorflow as tf
tf.reset_default_graph()
x = tf.constant([[1., 2., 3.],
[4., 5., 6.]])
x = tf.reshape(x, [1, 2, 3, 1]) # give a shape accepted by tf.nn.max_pool
valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
print(valid_pad.get_shape())
print(same_pad.get_shape())
with tf.Session() as sess:
print(sess.run(valid_pad))
print(sess.run(same_pad))
输出:
(1, 1, 1, 1)
(1, 1, 2, 1)
[[[[ 5.]]]]
[[[[ 5.]
[ 6.]]]]
tensor flow函数实现
tf.nn.cov2d数据格式tensorflow官网说明:
tf.nn.conv2d | TensorFlow Core v2.9.1
tf.nn.conv2d(
input,
filter,
strides,
padding,
use_cudnn_on_gpu=True,
data_format='NHWC',
dilations=[1, 1, 1, 1],
name=None
)
input
: ATensor
. Must be one of the following types:half
,bfloat16
,float32
,float64
. A 4-D tensor. The dimension order is interpreted according to the value ofdata_format
, see below for details.filter
: ATensor
. Must have the same type asinput
. A 4-D tensor of shape[filter_height, filter_width, in_channels, out_channels]
strides
: A list ofints
. 1-D tensor of length 4. The stride of the sliding window for each dimension ofinput
. The dimension order is determined by the value ofdata_format
, see below for details.dilations
: An optional list ofints
. Defaults to[1, 1, 1, 1]
. 1-D tensor of length 4. The dilation factor for each dimension ofinput
. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension. The dimension order is determined by the value ofdata_format
, see above for details. Dilations in the batch and depth dimensions must be 1.name
: A name for the operation (optional).
例子
import numpy as np
import tensorflow as tf
tf.reset_default_graph()
x = np.arange(25).reshape(5,5)
x = tf.cast(x,tf.float32)
x = tf.reshape(x, [1, 5, 5, 1])
f=tf.Variable(tf.random_uniform([3,3,1,16],-1,1))
valid_pad = tf.nn.conv2d(x, f, [1,1,1,1], padding='VALID')
same_pad = tf.nn.conv2d(x, f, [1,1,1,1], padding='SAME')
print(valid_pad.get_shape())
print(same_pad.get_shape())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(same_pad)
输出:
(1, 3, 3, 16)
(1, 5, 5, 16)
tensorflow 和 pytorch数据格式的区别:
NHWC 为:[batch, height, width, channels]
torch.nn.
Conv2d
(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
In the simplest case, the output value of the layer with input size (N,C,H,W)and output (N,Cout,Hout,Wout)
扩张卷积原理与应用
扩张卷积这种运算最早出现在小波分析,可参看如下两篇离散小波变换原创时期的学术论文。
Holschneider, M., Kronland-Martinet, R., Morlet, J., and Tchamitchian, Ph. A real-time algorithm for signal
analysis with the help of the wavelet transform. In Wavelets: Time-Frequency Methods and Phase Space.
Proceedings of the International Conference, 1987.
Shensa, Mark J. The discrete wavelet transform: wedding the à trous and Mallat algorithms. IEEE Transactions
on Signal Processing, 40(10), 1992.
古老的扩张卷积定义:
式中函数r∈l2空间的含义为:
现代AI主题的论文的扩张卷积定义如下:
点评:图像函数F的坐标是整数网格如[1,1],[1,2]等。图像的像素值∈实数R。所以图像函数F:Z2 映射 R。离散滤波器函数k当然也是二维整数映射到一维R。
我们从熟悉的卷积公式来推导以上两个表达式。
下图为1-delated,2-delated,4-delated卷积核,感受野(receptive field)分别是3×3,7×7,15×15。2×(r-1)(k-1) + k
Atrous convolution with rate r introduces r − 1 zeros between consecutive filter values, effectively enlarging the kernel size of a k ×k filter to ke = k + (k − 1)(r − 1)