Tensorflow逐步讲解实现卷积神经_python tensorflow cnn卷积步长怎么设置-优快云博客

本文链接：https://blog.youkuaiyun.com/yuzhou164/article/details/62043674

用tensorflow实现一个卷积神经网络。实现效果是下边的图。经过两次卷积运动，每次卷积运算完之后，实现一次pooling降维.

过程介绍：

首先源数据经过卷积核卷积,第一个卷积核[5,5,1,32],步长为1，大大小为5*5，输入通道为1，产生32张28*28的featuer maps，然后Relu非线性activation函数转换

Pooling阶段：第一个Pooling[1,2,2,1],大小为2*2，步长为1，方法max，所得到32 张14*14的feature maps

然后第二个卷积运算，[5,5,32,64],步长为1，大大小为5*5，输入通道为32，产生64张featuer maps，然后Relu非线性activation函数转换

Pooling阶段：第二次Pooling[1,2,2,1],大小为2*2，步长为1，方法max，所得到64 张5*5的feature maps

最后一个全连接网络算出结果，用softmax分类

# CNN 在图像处理方面强大的原因是因为他的两个特征:
# (1)局部不变形:Local invarance : pooling: 使得你的图像在平移、旋转、缩放的时候保持不变
# (2)组合性Compositionality: 每个过滤器获取了低层次的图片的一部分，组合起来成了高层次图片

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True)

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 30
batch_size = 100
display_step = 1

# Network Parameters
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])

#宽窄CNN:(narrow vs wide CNN)#宽CNN: 0填充边界,可以是卷积核过滤边界 输出属性: n_{out}=(n_{in} + 2*n_{padding} - n_{filter}) + 1#窄CNN: 没有用0填充边界: 输入的维数-卷积核的维数+1(这里用窄的CNN没有zeropadding)#tf.nn.conv2d 这个函数的功能是：给定4维的input和filter，计算出一个2维的卷积结果。函数的定义为：#def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None,data_format=None, name=None):#前几个参数分别是input, filter, strides, padding, use_cudnn_on_gpu, …下面来一一解释#input：待卷积的数据。格式要求为一个张量，[batch, in_height, in_width, in_channels].具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]#filter： #W 代表卷积核。格式要求为[filter_height, filter_width, in_channels, out_channels].in_channels等于input的in_channels,out_channels是卷积核的的数量,也是输出多少个featuermap(文献2)
#分别表示 卷积核的高度，宽度，输入通道数，输出通道数。#例如考虑一种最简单的情况，现在有一张3×3单通道的图像（对应的shape：[1，3，3，1]），用一个1×1的卷积核（对应的shape：[1，1，1，1]）去做卷积，最后会得到一张3×3的feature map.# 如果卷积核为[2，2，1，5],就是得到5张2*2的featuer map#strides :一个长为4的list. 表示每次卷积以后卷积窗口在input中滑动的距离#strides: 步长:这里设置为1#stride第一个stride[0]=0和第三个stride[2]必须为1,stride[1]控制卷积核左移的步长,stride[3]控制卷积核下移的步长


#padding ：有SAME和VALID两种选项，表示是否要保留图像边上那一圈不完全卷积的部分。如果是SAME，则保留，例如对于一个输入为5*5，核为[3,3,1,1],padding=‘same’,输入的featuer map为5*5 以为zero-padding(宽CNN). 当padding=’valid‘时候，输出feature map为3*3,为narrow CNN.
#use_cudnn_on_gpu ：是否使用cudnn加速。默认是True

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='VALID')

#pooling: 就是从featuer map 提取一些子节点常用max
#作用:(1)提供了固定大小的输出矩阵,例如: 1000 卷积核,可以输出1000维,可以不管你的kenerl大小和输入的大小
    #(2)降维尽量的去保留明显的特征
#pooling method
#tf.nn.avg_pool: 平均数
#tf.nn.max_pool: 最大值
#tf.nn.max_pool_with_argmax: 返回一个二维元组(output,argmax),最大值pooling，返回最大值及其相应的索引
#tf.nn.avg_pool3d #3D平均值pooling
#tf.nn.max_pool3d #3D最大值pooling
#tf.nn.fractional_avg_pool
#tf.nn.fractional_max_pool
#ksize:代表pool的大小,这里代表一个二维数据长为2宽为2
def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='VALID')


# Create model
def multilayer_perceptron(x, weights, biases):
    #now, we want to change this to a CNN network

    #first reshape the data to 4-D

    x_image = tf.reshape(x, [-1,28,28,1])

    #then apply cnn layers
    #非线性activation函数relu
    h_conv1 = tf.nn.relu(conv2d(x_image, weights['conv1']) + biases['conv_b1'])
    h_pool1 = max_pool_2x2(h_conv1)

    h_conv2 = tf.nn.relu(conv2d(h_pool1, weights['conv2']) + biases['conv_b2'])
    h_pool2 = max_pool_2x2(h_conv2)

    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, weights['fc1']) + biases['fc1_b'])


    # Output layer with linear activation
    out_layer = tf.matmul(h_fc1, weights['out']) + biases['out_b']
    return out_layer

# Store layers weight & biases

#第一个卷积核为5*5，输入通道为1，数据32张featuer maps，输出为32张28*28的feature map

#第一个卷积核为5*5，输入通道为32，数据64张featuer maps，输出为64张10*10的feature map

weights = {
    'conv1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    'conv2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    'fc1' : tf.Variable(tf.random_normal([7*7*64,256])),
    'out': tf.Variable(tf.random_normal([256,n_classes]))
}
biases = {
    'conv_b1': tf.Variable(tf.random_normal([32])),
    'conv_b2': tf.Variable(tf.random_normal([64])),
    'fc1_b': tf.Variable(tf.random_normal([256])),
    'out_b': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))

    #http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
    #http://blog.youkuaiyun.com/mao_xiao_feng/article/details/53444333

    #http://blog.youkuaiyun.com/lujiandong1/article/details/53728053