用tensorflow实现一个卷积神经网络。实现效果是下边的图。经过两次卷积运动,每次卷积运算完之后,实现一次pooling降维.
过程介绍:
首先源数据经过卷积核卷积,第一个卷积核[5,5,1,32],步长为1,大大小为5*5,输入通道为1,产生32张28*28的featuer maps,然后Relu非线性activation函数转换
Pooling阶段:第一个Pooling[1,2,2,1],大小为2*2,步长为1,方法max,所得到32 张14*14的feature maps
然后第二个卷积运算,[5,5,32,64],步长为1,大大小为5*5,输入通道为32,产生64张featuer maps,然后Relu非线性activation函数转换
Pooling阶段:第二次Pooling[1,2,2,1],大小为2*2,步长为1,方法max,所得到64 张5*5的feature maps
最后一个全连接网络算出结果,用softmax分类
# CNN 在图像处理方面强大的原因是因为他的两个特征: # (1)局部不变形:Local invarance : pooling: 使得你的图像在平移、旋转、缩放的时候保持不变 # (2)组合性Compositionality: 每个过滤器获取了低层次的图片的一部分,组合起来成了高层次图片 from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets(".", one_hot=True) import tensorflow as tf # Parameters learning_rate = 0.001 training_epochs = 30 batch_size = 100 display_step = 1 # Network Parameters n_input = 784 # MNIST data input (img shape: 28*28) n_classes = 10 # MNIST total classes (0-9 digits) # tf Graph input x = tf.placeholder("float", [None, n_input]) y = tf.placeholder("float", [None, n_classes])#宽窄CNN:(narrow vs wide CNN)#宽CNN: 0填充边界,可以是卷积核过滤边界 输出属性: n_{out}=(n_{in} + 2*n_{padding} - n_{filter}) + 1#窄CNN: 没有用0填充边界: 输入的维数-卷积核的维数+1(这里用窄的CNN没有zeropadding)#tf.nn.conv2d 这个函数的功能是:给定4维的input和filter,计算出一个2维的卷积结果。函数的定义为:#def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None,data_format=None, name=None):#前几个参数分别是input, filter, strides, padding, use_cudnn_on_gpu, …下面来一一解释#input:待卷积的数据。格式要求为一个张量,[batch, in_height, in_width, in_channels].具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]#filter: #W 代表卷积核。格式要求为[filter_height, filter_width, in_channels, out_channels].in_channels等于input的in_channels,out_channels是卷积核的的数量,也是输出多少个featuermap(文献2)
#分别表示 卷积核的高度,宽度,输入通道数,输出通道数。#例如考虑一种最简单的情况,现在有一张3×3单通道的图像(对应的shape:[1,3,3,1]),用一个1×1的卷积核(对应的shape:[1,1,1,1])去做卷积,最后会得到一张3×3的feature map.# 如果卷积核为[2,2,1,5],就是得到5张2*2的featuer map#strides :一个长为4的list. 表示每次卷积以后卷积窗口在input中滑动的距离#strides: 步长:这里设置为1#stride第一个stride[0]=0和第三个stride[2]必须为1,stride[1]控制卷积核左移的步长,stride[3]控制卷积核下移的步长
#padding :有SAME和VALID两种选项,表示是否要保留图像边上那一圈不完全卷积的部分。如果是SAME,则保留,例如对于一个输入为5*5,核为[3,3,1,1],padding=‘same’,输入的featuer map为5*5 以为zero-padding(宽CNN). 当padding=’valid‘时候,输出feature map为3*3,为narrow CNN. #use_cudnn_on_gpu :是否使用cudnn加速。默认是True def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='VALID') #pooling: 就是从featuer map 提取一些子节点常用max #作用:(1)提供了固定大小的输出矩阵,例如: 1000 卷积核,可以输出1000维,可以不管你的kenerl大小和输入的大小 #(2)降维尽量的去保留明显的特征 #pooling method #tf.nn.avg_pool: 平均数 #tf.nn.max_pool: 最大值 #tf.nn.max_pool_with_argmax: 返回一个二维元组(output,argmax),最大值pooling,返回最大值及其相应的索引 #tf.nn.avg_pool3d #3D平均值pooling #tf.nn.max_pool3d #3D最大值pooling #tf.nn.fractional_avg_pool #tf.nn.fractional_max_pool #ksize:代表pool的大小,这里代表一个二维数据长为2宽为2 def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID') # Create model def multilayer_perceptron(x, weights, biases): #now, we want to change this to a CNN network #first reshape the data to 4-D x_image = tf.reshape(x, [-1,28,28,1]) #then apply cnn layers #非线性activation函数relu h_conv1 = tf.nn.relu(conv2d(x_image, weights['conv1']) + biases['conv_b1']) h_pool1 = max_pool_2x2(h_conv1) h_conv2 = tf.nn.relu(conv2d(h_pool1, weights['conv2']) + biases['conv_b2']) h_pool2 = max_pool_2x2(h_conv2) h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, weights['fc1']) + biases['fc1_b']) # Output layer with linear activation out_layer = tf.matmul(h_fc1, weights['out']) + biases['out_b'] return out_layer # Store layers weight & biases
#第一个卷积核为5*5,输入通道为1,数据32张featuer maps,输出为32张28*28的feature map
#第一个卷积核为5*5,输入通道为32,数据64张featuer maps,输出为64张10*10的feature map weights = { 'conv1': tf.Variable(tf.random_normal([5, 5, 1, 32])), 'conv2': tf.Variable(tf.random_normal([5, 5, 32, 64])), 'fc1' : tf.Variable(tf.random_normal([7*7*64,256])), 'out': tf.Variable(tf.random_normal([256,n_classes])) } biases = { 'conv_b1': tf.Variable(tf.random_normal([32])), 'conv_b2': tf.Variable(tf.random_normal([64])), 'fc1_b': tf.Variable(tf.random_normal([256])), 'out_b': tf.Variable(tf.random_normal([n_classes])) } # Construct model pred = multilayer_perceptron(x, weights, biases) # Define loss and optimizer cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) # Initializing the variables init = tf.global_variables_initializer() # Launch the graph with tf.Session() as sess: sess.run(init) # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): batch_x, batch_y = mnist.train.next_batch(batch_size) # Run optimization op (backprop) and cost op (to get loss value) _, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y}) # Compute average loss avg_cost += c / total_batch # Display logs per epoch step if epoch % display_step == 0: print("Epoch:", '%04d' % (epoch+1), "cost=", \ "{:.9f}".format(avg_cost)) print("Optimization Finished!") # Test model correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # Calculate accuracy accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})) #http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ #http://blog.youkuaiyun.com/mao_xiao_feng/article/details/53444333
#http://blog.youkuaiyun.com/lujiandong1/article/details/53728053