扩张卷积、数据格式、代码实现、概念集中构建帖

本文深入解析卷积神经网络中的关键概念,包括离散卷积公式、Padding、Stride及Transpose的定义与应用,通过实例展示tensorflow和pytorch中的卷积函数实现,并介绍扩张卷积的原理及其在小波分析中的历史渊源。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

  • 离散卷积公式与定义

  • Padding, stride, transpose概念

  • tensor flow函数实现

  • 扩张卷积原理与应用


离散卷积公式与定义

pytorch官网有原理公式:

torch.nn — PyTorch master documentation

where ⋆\star⋆ is the valid 2D cross-correlation operator.

点评:从求和公式的上、下界可以看出,各个通道的卷积值加起来。而且,这里卷积是通过相关函数完成的。相关系数定义:

相关函数和卷积的区别在于一个负号,即卷积函数没有反转。pytorch这么做是因为神经网络其实是利用计算机算力穷举法,卷积核非常多,因此严格反转操作的意义不大。


Padding, stride, transpose概念

超赞的图形化卷积教学:

conv_arithmetic/README.md at master · vdumoulin/conv_arithmetic · GitHub

padding: 补0,四周补上行/列0。类似DFT补0操作。

strides: 滑窗的步长。

还有两个参数就很难理解了,首先是transpose。学名叫上采样/反卷积。我有一个简单的理解方法:

把一个2×2的特征图卷积成4×4的原图像,其实很简单。2×2先向量排列成1×4,然而乘4×16矩阵,得到1×16矩阵,最后从排列成4×4。

上述过程的正向卷积,是乘16×4矩阵:

卷积核是3×3,请慢慢理解。

conv2d 和 max_pool是一对“奔波尔霸”和“霸波尔奔”。看下面两个例子来说明padding ='same' / padding= 'valid'

  • "SAME": output size is the same as input size. This requires the filter window to slip outside input map, hence the need to pad.
  • "VALID": Filter window stays at valid position inside input map, so output size shrinks by filter_size - 1. No padding occurs.
tf.nn.max_pool(
    value,
    ksize,
    strides,
    padding,
    data_format='NHWC',
    name=None
)
  • value: A 4-D Tensor of the format specified by data_format.
  • ksize: A list or tuple of 4 ints. The size of the window for each dimension of the input tensor.
  • strides: A list or tuple of 4 ints. The stride of the sliding window for each dimension of the input tensor.

例子

import tensorflow as tf
tf.reset_default_graph()

x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])

x = tf.reshape(x, [1, 2, 3, 1])  # give a shape accepted by tf.nn.max_pool

valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

print(valid_pad.get_shape())
print(same_pad.get_shape())
with tf.Session() as sess:
    print(sess.run(valid_pad))
    print(sess.run(same_pad))

输出:

(1, 1, 1, 1)
(1, 1, 2, 1)
[[[[ 5.]]]]
[[[[ 5.]
   [ 6.]]]]


tensor flow函数实现

tf.nn.cov2d数据格式tensorflow官网说明:

tf.nn.conv2d  |  TensorFlow Core v2.9.1

tf.nn.conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=True,
    data_format='NHWC',
    dilations=[1, 1, 1, 1],
    name=None
)
  • input: A Tensor. Must be one of the following types: halfbfloat16float32float64. A 4-D tensor. The dimension order is interpreted according to the value of data_format, see below for details.
  • filter: A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]
  • strides: A list of ints. 1-D tensor of length 4. The stride of the sliding window for each dimension of input. The dimension order is determined by the value of data_format, see below for details.
  • dilations: An optional list of ints. Defaults to [1, 1, 1, 1]. 1-D tensor of length 4. The dilation factor for each dimension of input. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension. The dimension order is determined by the value of data_format, see above for details. Dilations in the batch and depth dimensions must be 1.
  • name: A name for the operation (optional).

例子

import numpy as np
import tensorflow as tf
tf.reset_default_graph()

x = np.arange(25).reshape(5,5)
x = tf.cast(x,tf.float32)
x = tf.reshape(x, [1, 5, 5, 1])

f=tf.Variable(tf.random_uniform([3,3,1,16],-1,1))

valid_pad = tf.nn.conv2d(x, f, [1,1,1,1], padding='VALID')
same_pad = tf.nn.conv2d(x, f, [1,1,1,1], padding='SAME')

print(valid_pad.get_shape())
print(same_pad.get_shape())
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(same_pad)

输出:

(1, 3, 3, 16)
(1, 5, 5, 16)

tensorflow 和 pytorch数据格式的区别:

NHWC 为:[batch, height, width, channels]

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

In the simplest case, the output value of the layer with input size (N,C,H,W)and output (N,Cout,Hout,Wout)


扩张卷积原理与应用

扩张卷积这种运算最早出现在小波分析,可参看如下两篇离散小波变换原创时期的学术论文。

Holschneider, M., Kronland-Martinet, R., Morlet, J., and Tchamitchian, Ph. A real-time algorithm for signal

analysis with the help of the wavelet transform. In Wavelets: Time-Frequency Methods and Phase Space.

Proceedings of the International Conference, 1987.

Shensa, Mark J. The discrete wavelet transform: wedding the à trous and Mallat algorithms. IEEE Transactions

on Signal Processing, 40(10), 1992.

古老的扩张卷积定义:

式中函数r∈l2空间的含义为:

现代AI主题的论文的扩张卷积定义如下:

点评:图像函数F的坐标是整数网格如[1,1],[1,2]等。图像的像素值∈实数R。所以图像函数F:Z2 映射 R。离散滤波器函数k当然也是二维整数映射到一维R。

我们从熟悉的卷积公式来推导以上两个表达式。

下图为1-delated,2-delated,4-delated卷积核,感受野(receptive field)分别是3×3,7×7,15×15。2×(r-1)(k-1) + k

Atrous convolution with rate r introduces r − 1 zeros between consecutive filter values, effectively enlarging the kernel size of a k ×k filter to ke = k + (k − 1)(r − 1)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

飞行codes

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值