3D卷积和去(反)卷积

本文介绍两种卷积:

  • 3D卷积
  • 去卷积

下篇文章介绍:

  • 空洞卷积
  • depthwise卷积

3D卷积

最近在读论文的时候,有好几个地方提到了3D卷积,一直不懂是怎么操作的,看了一部分资料:

还是有些懵,又利用tensorflow做了进一步测试学习。

tensorflow中的函数tf.nn.conv3d

tf.nn.conv3d(
	input,
	filter,
	strides,
	padding,
	data_format='NDHWC',
	dilations=[1, 1, 1, 1, 1],
	name=None
)

给定一个5-D的输入和滤波器,计算一个3D卷积。

Args:
  • input: A Tensor. Must be one of the following type, half, bfloat16, float32, float64. Shape [batch, in_depth, in_height, in_width, in_channels]
  • filter: A tensor. Must have the same type as input. Shape [filter_depth, filter_height, filter_width, in_channels, out_channels]. in_channels must match between input and filter
  • strides: A list of ints that has length >= 5. 1-D tensor of length 5. The stride of the sliding window for each dimension of input. Must have strides[0] = strides[4] = 1.
  • padding: A string from: “SAME”, “VALID”. The type of padding algorithm to use.
  • data_format: An optional string from: “NDHWC”, “NCDHW”. Defaults to “NDHWC”. The data format of the input and output data. With the default format “NDHWC”, the data is stored in the order of: [batch, in_depth, in_height, in_width, in_channels]. Alternatively, the format could be “NCDHW”, the data storage order is: [batch, in_channels, in_depth, in_height, in_width].
  • dilations: An optional list of ints. Defaults to [1, 1, 1, 1, 1]. 1-D tensor of length 5. The dilation factor for each dimension of input. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension. The dimension order is determined by the value of data_format, see above for details. Dilations in the batch and depth dimensions must be 1.
  • name: A name for the operation (optional).
Returns:

A Tensor. Has the same type as input

与2D卷积tf.nn.conv2d相比,

tf.nn.conv2d(
	input,
	filter,
	strides,
	padding,
	use_cudnn_on_gpu=True,
	data_format='NHWC',
	dilations=[1, 1, 1, 1],
	name=None
)

其实差别不大,主要区别在于input和filter两个参数,2D卷积中这两个参数的要求是:

  • input: A Tensor. Must be one of the following types: half, bfloat16, float32, float64. A 4-D tensor. [batch, height, width, channels.
  • filter: A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]

主要区别是3D卷积中的input参数的in_depth与filter参数里面的filter_depth。

  • input: A Tensor. Must be one of the following type, half, bfloat16, float32, float64. Shape [batch, in_depth, in_height, in_width, in_channels]
  • in_depth表示的就是时间维度(如视频里面的帧)或者空间维度(如生物医学里面的多个连续切片),它表示帧的个数或者切片的个数。这个区别与图像的channels,如输入rgb图像的3channels。这里它的意义,卷积在这个维度作用于多少的size,如在平面图像的宽和高。
  • filter: A tensor. Must have the same type as input. Shape [filter_depth, filter_height, filter_width, in_channels, out_channels]. in_channels must match between input and filter
  • in_fiter就是说这个滤波器在这个尺度的size,例如滤波器在平面图像的size = 3 x 3,或5 x 5

其它参数和3D卷积类似。

下面就以程序为例做下测试。

import tensorflow as tf
import numpy as np

input = tf.constant(1, shape=[1, 7, 224, 224, 3], dtype=tf.float32)
filter_1_2 = tf.constant(2, shape=[1, 3, 3, 3, 64], dtype=tf.float32)
filter_3_4 = tf.constant(2, shape=[1, 5, 3, 3, 64], dtype=tf.float32)

res_1 = tf.nn.conv3d(input=input,
                   filter=filter_1_2,
                   strides=[1, 1, 1, 1, 1],
                   padding='SAME')
res_2 = tf.nn.conv3d(input=input,
                   filter=filter_1_2,
                   strides=[1, 2, 1, 1, 1],
                   padding='SAME')

res_3 = tf.nn.conv3d(input=input,
                   filter=filter_3_4,
                   strides=[1, 1, 1, 1, 1],
                   padding='SAME')
res_4 = tf.nn.conv3d(input=input,
                   filter=filter_3_4,
                   strides=[1, 2, 1, 1, 1],
                   padding='SAME')
sess = tf.Session()

conv_res_1 = sess.run(res_1)
conv_res_2 = sess.run(res_2)

conv_res_3 = sess.run(res_3)
conv_res_4 = sess.run(res_4)


print conv_res_1.shape
print conv_res_2.shape

print conv_res_3.shape
print conv_res_4.shape

#print conv_res

Output:
(1, 7, 224, 224, 64)
(1, 4, 224, 224, 64)
(1, 7, 224, 224, 64)
(1, 4, 224, 224, 64)

可以看到在第2维的卷积和第3、4维一样的原理。

上图为一个3D滤波器(其实为4D: depth, channel, height, width)在输入数据进行卷积的效果。

如上图,滤波器在第三维(帧,切片)上进行滑动,产生了很多((in_depth - 1) / filter_depth + 1)单通道的feature map。不同的滤波器同样可以产生很多同等个数((in_depth - 1) / filter_depth + 1)的单层feature map。然后就一块组合成了[batch, (in_depth - 1) / filter_depth + 1, (in_height - 1) / filter_height + 1, (in_width - 1) / filter_width + 1, out_channels]的输出,如程序示例所示。

去卷积

去卷积是在语义分割里用的比较多,之前看过一篇专门介绍反卷积的论文,但现在记不太清了,故重新学习一下。

这里的动态图对了解反卷积很形象,看看应该就可以明白。

另外分享一篇论文A guide to convolution arithmetic for deep learning

tensorflow中的tf.nn.conv2d_transpose

tf.nn.conv2d_transpose(
	value,
	filter,
	output_shape,
	strides,
	padding='SAME',
	data_format='NHWC',
	name=None
)
Args:
  • value: A 4-D Tensor of type float and shape [batch, height, width, in_channels] for NHWC data format or [batch, in_channels, height, width] for NCHW data format.
  • filter: A 4-D Tensor with the same type as value and shape [height, width, output_channels, in_channels]. filter’s in_channesl dimension must match that of value.
  • output_shape: A 1-D Tensor representing the output shape of the deconvolution op.
  • strides: A list of ints. The stride of the sliding window for each dimension of the input tensor.
  • padding: A string, either ‘VALID’ or ‘SAME’. The padding algorithm. See the “returns” section of tf.nn.convolution for details.
  • data_format: A string. ‘NHWC’ and ‘NCHW’ are supported.
  • name: Optional name for the returned tensor.

这个output_shape就很懵,为什么input, filter, padding, stride确定后,output_shape不确定?

import tensorflow as tf
import numpy as np


value = tf.constant(1, shape=[1, 3, 3, 3], dtype=tf.float32)
filter = tf.constant(2, shape=[3, 3, 64, 3],dtype=tf.float32)
output_shape_1 = tf.constant([1, 6, 6, 64])
output_shape_2 = tf.constant([1, 5, 5, 64])

res_1 = tf.nn.conv2d_transpose(
    value=value,
    filter=filter,
    output_shape=output_shape_1,
    strides=[1, 2, 2, 1],
    padding='SAME'
)

res_2 = tf.nn.conv2d_transpose(
    value=value,
    filter=filter,
    output_shape=output_shape_2,
    strides=[1, 2, 2, 1],
    padding='SAME'
)

sess = tf.Session()

conv_res_1 = sess.run(res_1)
conv_res_2 = sess.run(res_2)

#conv_res_3 = sess.run(res_3)
#conv_res_4 = sess.run(res_4)


print conv_res_1.shape
print conv_res_2.shape

#print conv_res_3.shape
#print conv_res_4.shape

#print conv_res

Output
(1, 6, 6, 64)
(1, 5, 5, 64)

上面的例子可以看到指定这个output_shape是重要的,因为反卷积后的shape是不确定,可以在可能的集合里来指定。

那么具体反卷积是怎么操作的呢?或者说给定input, filter, stride, padding等信息,反卷积后有哪些可能的output_shape呢?

考虑一下卷积的公式:

conv: i -> o

deconv: o -> i

o = floor(i + 2*p -k) / s + 1, (1)

反卷积是把推导出i的shape,i经过卷积后变成的o.

由(1)可得,

floor(i + 2*p - k) = (o - 1)s

  • 如果 s = 1, 那么 i = (o - 1) + k - 2p,i是固定的,这个时候的output_shape是不需要指定的。另外考虑(常用情况) k = 3, p = 1(padding = ‘SAME’),那么i = o; k = 3, p = 0(padding=‘VALID’), 那么 i = o + 2
  • 如果 s = 2,那么 i + 2p - k = 2(o-1)或 i + 2p - k = 2(o-1) + 1,即 i = 2o - 2 + k - 2p 或 i = 2o - 1 + k - 2p。另外考虑(常用情况)k = 3, p = 1(padding=‘SAME’),那么 i = 2o-1 或者 i = 2*o。k = 3, p = 0(padding=‘VALID’),那么i = 2o + 1或 i = 2o+2

总结:只考虑k = 3的情况(比较常用)

  • s = 1时,
    • padding=‘SAME’ -> i = o
    • padding=‘VALID’ -> i = o + 2
  • s = 2时,
    • padding=‘SAME’ -> i = 2o 或 i =2o - 1
    • padding = ‘VALID’ -> i = 2o + 1 或 2o+2

程序测试

import tensorflow as tf
import numpy as np

value = tf.constant(1, shape=[1, 64, 64, 3], dtype=tf.float32)
filter = tf.constant(2, shape=[3, 3, 256, 3],dtype=tf.float32)

def deconv_s_1():
    # s = 1, padding='SAME'
    output_shape_1_same = tf.constant([1, 64, 64, 256])

    res_1 = tf.nn.conv2d_transpose(
        value=value,
        filter=filter,
        output_shape=output_shape_1_same,
        strides=[1, 1, 1, 1],
        padding='SAME'
    )

    # s = 1, padding='VALID'
    output_shape_1_valid = tf.constant([1, 66, 66, 256])
    res_2 = tf.nn.conv2d_transpose(
        value=value,
        filter=filter,
        output_shape=output_shape_1_valid,
        strides=[1, 1, 1, 1],
        padding='VALID'
    )

    sess = tf.Session()

    conv_res_1 = sess.run(res_1)
    conv_res_2 = sess.run(res_2)
    print "s = 1, padding='SAME', expected: i = o = 64 "
    print conv_res_1.shape
    print "s = 1, padding='VALID', expected i = o + 2 = 66"
    print conv_res_2.shape


def deconv_s_2():
    #s = 2, padding='SAME'

    output_shape_2_same_1 = tf.constant([1, 128, 128, 256])

    res_1 = tf.nn.conv2d_transpose(
        value=value,
        filter=filter,
        output_shape=output_shape_2_same_1,
        strides=[1, 2, 2, 1],
        padding='SAME'
    )
    # s = 2, padding='SAME'

    output_shape_2_same_2 = tf.constant([1, 127, 127, 256])

    res_2 = tf.nn.conv2d_transpose(
        value=value,
        filter=filter,
        output_shape=output_shape_2_same_2,
        strides=[1, 2, 2, 1],
        padding='SAME'
    )

    # s = 2, padding='VALID'

    output_shape_2_valid_1 = tf.constant([1, 129, 129, 256])
    res_3 = tf.nn.conv2d_transpose(
        value=value,
        filter=filter,
        output_shape=output_shape_2_valid_1,
        strides=[1, 2, 2, 1],
            padding='VALID'
    )

    # s = 2, padding='VALID'
    print "s = 2, padding='VALID'"

    output_shape_2_valid_2 = tf.constant([1, 130, 130, 256])
    res_4 = tf.nn.conv2d_transpose(
        value=value,
        filter=filter,
        output_shape=output_shape_2_valid_2,
        strides=[1, 2, 2, 1],
        padding='VALID'
    )

    sess = tf.Session()

    conv_res_1 = sess.run(res_1)
    conv_res_2 = sess.run(res_2)
    conv_res_3 = sess.run(res_3)
    conv_res_4 = sess.run(res_4)
    print "s = 2, padding='SAME', expected i = 2o = 128 or i = 2o - 1 = 127 "
    print conv_res_1.shape
    print conv_res_2.shape
    print "s = 2, padding='VALID', expected i = 2o + 1 = 129 or i = 2o + 2 = 130"
    print conv_res_3.shape
    print conv_res_4.shape

    # print conv_res

def deconv_error():

    output_shape_2_same_1 = tf.constant([1, 129, 129, 256])

    res_1 = tf.nn.conv2d_transpose(
        value=value,
        filter=filter,
        output_shape=output_shape_2_same_1,
        strides=[1, 2, 2, 1],
        padding='SAME'
    )

    sess = tf.Session()

    conv_res_1 = sess.run(res_1)

    print "s = 2, padding='SAME', expected i = 2o = 128 or i = 2o - 1 = 127 "
    print conv_res_1.shape


print "input_size = 64:"

print "stride = 1"
deconv_s_1()
print "stride = 2"
deconv_s_2()

deconv_error()

Output

input_size = 64:
2018-12-04 00:04:02.363507: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
stride = 1
s = 1, padding='SAME', expected: i = o = 64 
(1, 64, 64, 256)
s = 1, padding='VALID', expected i = o + 2 = 66
(1, 66, 66, 256)
stride = 2
s = 2, padding='VALID'
s = 2, padding='SAME', expected i = 2o = 128 or i = 2o - 1 = 127 
(1, 128, 128, 256)
(1, 127, 127, 256)
s = 2, padding='VALID', expected i = 2o + 1 = 129 or i = 2o + 2 = 130
(1, 129, 129, 256)
(1, 130, 130, 256)
2018-12-04 00:04:03.186144: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:355 : Invalid argument: Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed: actual = 64, computed = 65 spatial_dim: 1 input: 129 filter: 3 output: 64 stride: 2 dilation: 1
2018-12-04 00:04:03.187537: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:355 : Invalid argument: Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed: actual = 64, computed = 65 spatial_dim: 1 input: 129 filter: 3 output: 64 stride: 2 dilation: 1
2018-12-04 00:04:03.188916: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:355 : Invalid argument: Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed: actual = 64, computed = 65 spatial_dim: 1 input: 129 filter: 3 output: 64 stride: 2 dilation: 1
2018-12-04 00:04:03.189200: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:355 : Invalid argument: Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed: actual = 64, computed = 65 spatial_dim: 1 input: 129 filter: 3 output: 64 stride: 2 dilation: 1

综上,通过上述程序验证了我们的想法。

所以当ksize = 3时,我们选择padding=‘SAME’, stride = 2,可以将输出Osize = 2 * Isize.

以上从卷积<->反卷积的shape的关系进行了解析,并分析了output_shape的可能性。

<think>嗯,用户想了解2D卷积3D卷积在CNN中的区别应用场景。我需要先回忆一下这两种卷积的基本概念。2D卷积应该是在图像处理中常用的,处理二维数据,比如图片的高度宽度,而3D卷积可能处理三维数据,比如视频中的高度、宽度时间维度,或者是体积数据比如医学CT扫描。 首先,要明确两者的输入数据形状。2D卷积的输入通常是(高度,宽度,通道数),比如一张RGB图片就是(H, W, 3)。而3D卷积的输入可能多了深度或时间维度,比如(深度,高度,宽度,通道数),例如视频的多个帧或者3D医学图像。 然后,卷积核的不同。2D卷积核是二维的,只在空间维度滑动,而3D卷积核是三维的,会在空间时间或者空间深度上滑动。这样3D卷积可以捕捉时空特征,比如视频中的动作。 接下来是应用场景。2D卷积广泛应用于图像分类、目标检测等任务,比如经典的VGG、ResNet。而3D卷积可能用于视频分析、动作识别,或者医学图像处理中的3D数据,比如CT或MRI扫描。 还要考虑计算复杂度。3D卷积因为多了一个维度,参数更多,计算量更大,可能影响训练速度资源消耗。所以优化方法可能不同,比如使用分离卷积或者减少层数。 然后,结合用户提供的引用。引用[1]提到了时间延迟神经网络处理时间特征,可能3D卷积在视频中的应用有关。引用[2][3]讨论CNN架构的优化性能比较,可能涉及到不同卷积类型的效率问题。引用[4]解释张量作为多维数组,说明数据维度的不同。引用[5]是模型训练的代码,可能用户对实际应用感兴趣。 需要检查是否有混淆的地方,比如2D卷积是否处理时间序列,或者3D卷积是否还有其他应用。确保区分清楚2D3D的不同维度,以及它们各自适用的数据类型。可能还需要提到一些具体的网络结构,比如C3D用于视频,而2D的CNN如ResNet处理图像。 最后,生成相关问题时要围绕区别、应用、优化计算效率等方面,帮助用户进一步探索。</think>### 2D卷积3D卷积的区别及应用场景 #### 1. **定义与输入数据** - **2D卷积**:处理二维空间数据(如高度宽度),输入形状为$(H, W, C)$(高度、宽度、通道数)。例如,RGB图像是典型的2D输入。 - **3D卷积**:处理三维数据(如深度、高度、宽度),输入形状为$(D, H, W, C)$(深度、高度、宽度、通道数)。常见于视频(时间序列)或医学3D体数据(如CT扫描)[^4]。 #### 2. **卷积核操作** - **2D卷积核**:尺寸为$(k_h, k_w, C_{in}, C_{out})$,仅在空间维度滑动,输出二维特征图。 $$ \text{输出尺寸} = (H - k_h + 1, W - k_w + 1, C_{out}) $$ - **3D卷积核**:尺寸为$(k_d, k_h, k_w, C_{in}, C_{out})$,在三维空间滑动,输出三维特征图。 $$ \text{输出尺寸} = (D - k_d + 1, H - k_h + 1, W - k_w + 1, C_{out}) $$ #### 3. **特征提取能力** - **2D卷积**:捕捉空间局部特征(如边缘、纹理),适用于静态图像分析(分类、目标检测)[^1][^3]。 - **3D卷积**:捕捉时空特征(如视频中的动作)或三维结构特征(如器官形状),适合动态或立体数据[^1]。 #### 4. **典型应用场景** - **2D卷积**: - 图像分类(ResNet、VGGNet) - 目标检测(YOLO、Faster R-CNN) - 语义分割(U-Net) - **3D卷积**: - 视频动作识别(C3D网络) - 医学影像分析(3D器官分割) - 点云数据处理(3D物体检测) #### 5. **计算复杂度** - **2D卷积**:计算量较低,适合实时推理(如移动端部署)。 - **3D卷积**:因多一个维度,参数量计算量显著增加(约为2D卷积的$k_d$倍),需优化内存计算资源[^2]。 #### 6. **优化策略** - **2D替代方案**:对视频任务,可堆叠2D卷积(如使用时间维度分离的3D卷积)。 - **轻量化3D网络**:采用分组卷积、深度可分离卷积(如X3D网络)[^3]。 ---
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值