tf.nn.conv2d实现卷积 tf.nn.conv2d_transpose是怎样实现反卷积的？

最新推荐文章于 2025-06-20 17:16:58 发布

gqixl

最新推荐文章于 2025-06-20 17:16:58 发布

阅读量974

点赞数

分类专栏： tensorflow

tensorflow 专栏收录该内容

43 篇文章

订阅专栏

本文详细介绍了卷积神经网络中的卷积和反卷积操作，包括不同padding方式的影响、conv2d函数的参数设置及其工作原理，并通过具体实例展示了特征图的计算过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

（一）卷积

1、padding的方式：

说明：

1、摘录自http://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-t

2、不同的padding方式,VALID是采用丢弃的方式,比如上述的input_width=13,只允许滑动2次,多余的元素全部丢掉

3、SAME的方式,采用的是补全的方式,对于上述的情况,允许滑动3次,但是需要补3个元素,左奇右偶,在左边补一个0,右边补2个0

4、For the SAME padding, the output height and width are computed as:

out_height = ceil(float(in_height) / float(strides[1]))
out_width = ceil(float(in_width) / float(strides[2]))

For the VALID padding, the output height and width are computed as:
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))

2、conv2d的参数：

1、strides[0] = strides[3] = 1

3、conv2d的参数解释：

`tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)`

除去name参数用以指定该操作的name，与方法有关的一共五个参数 ：

第一个参数input：指需要做卷积的输入图像，它要求是一个Tensor，具有[batch, in_height, in_width, in_channels]这样的shape，具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，注意这是一个4维的Tensor，要求类型为float32和float64其中之一

第二个参数filter：相当于CNN中的卷积核，它要求是一个Tensor，具有[filter_height, filter_width, in_channels, out_channels]这样的shape，具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]，要求类型与参数input相同,filter的通道数要求与input的in_channels一致，有一个地方需要注意，第三维in_channels，就是参数input的第四维

第三个参数strides：卷积时在图像每一维的步长，这是一个一维的向量，长度4，strides[0]=strides[3]=1

第四个参数padding：string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式（后面会介绍）

第五个参数：use_cudnn_on_gpu:bool类型，是否使用cudnn加速，默认为true

结果返回一个Tensor，这个输出，就是我们常说的feature map

4、conv2d的例子：

那么TensorFlow的卷积具体是怎样实现的呢，用一些例子去解释它：

1、

   [python] 
   view plaincopy
import tensorflow as tf  
#case 2  
input = tf.Variable(tf.random_normal([1,3,3,5]))  
filter = tf.Variable(tf.random_normal([1,1,5,1]))  
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')  
  
with tf.Session() as sess:  
    sess.run(tf.initialize_all_variables())  
    res = (sess.run(op))  
    print (res.shape)  

2、

   [python] 
   view plaincopy
import tensorflow as tf  
   
input = tf.Variable(tf.random_normal([1,5,5,5]))  
filter = tf.Variable(tf.random_normal([3,3,5,1]))  
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')  
  
with tf.Session() as sess:  
    sess.run(tf.initialize_all_variables())  
    res = (sess.run(op))  
    print (res.shape)  

说明：

1、使用VALID方式,feature map的尺寸为
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))=(5-3+1)/1 = 3

out_width = ceil(float(in_width - filter_width + 1) / float(strides[2])) = (5-3+1)/1 = 3

所以,feature map的尺寸为3*3

2、filter的参数个数为3*3*5*1,也即对于输入的每个通道数都对应于一个3*3的滤波器,然后共5个通道数,conv2d的过程就是对5个输入进行点击然后求和,得到一张feature map。如果要得到3张feature map,那么应该使用的参数为3*3*5*3个参数.

（二）反卷积

首先无论你如何理解反卷积，请时刻记住一点，反卷积操作是卷积的反向

如果你随时都记住上面强调的重点，那你基本就理解一大半了，接下来通过一些函数的介绍为大家强化这个观念

conv2d_transpose(value, filter, output_shape, strides, padding="SAME", data_format="NHWC", name=None)

除去name参数用以指定该操作的name，与方法有关的一共六个参数： 第一个参数value：指需要做反卷积的输入图像，它要求是一个Tensor 第二个参数filter：卷积核，它要求是一个Tensor，具有[filter_height, filter_width, out_channels, in_channels]这样的shape，具体含义是[卷积核的高度，卷积核的宽度，卷积核个数，图像通道数] 第三个参数output_shape：反卷积操作输出的shape，细心的同学会发现卷积操作是没有这个参数的，那这个参数在这里有什么用呢？下面会解释这个问题 第四个参数strides：反卷积时在图像每一维的步长，这是一个一维的向量，长度4 第五个参数padding：string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式 第六个参数data_format：string类型的量，'NHWC'和'NCHW'其中之一，这是tensorflow新版本中新加的参数，它说明了value参数的数据格式。'NHWC'指tensorflow标准的数据格式[batch, height, width, in_channels]，'NCHW'指Theano的数据格式,[batch, in_channels，height, width]，当然默认值是'NHWC'