在《斯坦福大学深度学习公开课cs231n学习笔记(9)softmax分类和神经网络分类代码实现》中使用python/numpy实现softmax分类和简单的两层神经网络结构。本篇使用TensorFlow替代numpy实现softmax分类器和两层神经网络结构,其原理是一样的,主要学习使用TensorFlow的应用方法。
前记:mnist数据集介绍
MNIST 数据集可在 http://yann.lecun.com/exdb/mnist/ 获取, 它包含了四个部分:
- Training set images: train-images-idx3-ubyte.gz ( 包含 60,000 个训练样本)
- Training set labels: train-labels-idx1-ubyte.gz (包含 60,000 个训练样本标签)
- Test set images: t10k-images-idx3-ubyte.gz ( 包含 10,000 个测试样本)
- Test set labels: t10k-labels-idx1-ubyte.gz (包含 10,000 个测试样本标签)
MNIST 数据集来自美国国家标准与技术研究所, 训练集 (training set) 由来自 250 个不同人手写的数字构成, 其中 50% 是高中生, 50% 是人口普查局的工作人员;测试集(test set) 也是同样比例的手写数字数据。
每个数据单元由两部分组成:一张手写数字的图片和一个对应的标签。把图片设为“xs”,标签设为“ys”。训练数据集和测试数据集都包含xs和ys,例如训练数据集的图片是 mnist.train.images ,训练数据集的标签是 mnist.train.labels。
每一张图片包含28X28=784像素,示例:

softmax分类器:
先附录一下代码,然后大概描述一下代码的含义。
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print("Download Done!")
x = tf.placeholder(tf.float32, [None, 784])
# 参数定义
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b) #softmax模型建立
y_ = tf.placeholder(tf.float32, [None, 10]) #实际的标签
# softmax损失函数,交叉熵
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
#梯度下降法最优化
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
# 所有参数初始化并建立Session
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
# 训练模型,进行1000次迭代
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
#评估模型
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print("Accuarcy on Test-dataset: ", sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
执行结果:
Accuarcy on Test-dataset: 0.9212
注释:从代码中可以看到,TensorFlow进行softmax建模只需要一句:y = tf.nn.softmax(tf.matmul(x,W) + b);
为了计算交叉熵,添加了一个占位符用于输入正确值:y_ = tf.placeholder("float", [None,10]),然后计算交叉熵。
代码中使用梯度下降算法以0.01的学习率最小化交叉熵,TensorFlow也提供了其他的优化算法。
TensorFlow实际做的是,在后台给计算图增加一系列新的计算操作单元用于实现反向传播算法和梯度下降算法。
tf.argmax 给出某个tensor对象在某一维上的数据最大值的索引值。由于标签向量由0,1组成,因此最大值1所在的索引位置就是类别标签。tf.argmax(y,1)返回的是模型对于任一输入x预测到的标签值,而 tf.argmax(y_,1) 代表正确的标签,用 tf.equal 检测预测结果是否和真实标签一致。
简单神经网络:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
MNIST_data_folder='/home/fcq/tfTest/mnist-data'#手动下载数据集,否则不翻墙会很慢
mnist=input_data.read_data_sets(MNIST_data_folder,one_hot=True)
# 设置学习率,,迭代次数,批处理量
learning_rate = 0.01
num_steps = 1000
batch_size = 128
display_step = 50
#神经网络参数设置
n_hidden_1 = 256 #1st隐层神经元数量
n_hidden_2 = 256 #2nd隐层神经元数量
num_input = 784 #MNIST图片尺寸: 28*28)
num_classes = 10 # MNIST分类数 (数字0-9)
X = tf.placeholder("float", [None, num_input])
Y = tf.placeholder("float", [None, num_classes])
# 权重和偏执量保存
weights = {
'h1': tf.Variable(tf.random_normal([num_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, num_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([num_classes]))
}
# 构建神经网络模型
def neural_net(x):
#第一隐层全连接
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
#第二隐层全连接
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
#输出层全连接
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
#激励函数
logits = neural_net(X)
prediction = tf.nn.softmax(logits)
#损失函数和最优化方法:Adam优化
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)
#准确性评价
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
#参数全部初始化
init = tf.global_variables_initializer()
#训练模型
with tf.Session() as sess:
sess.run(init)
for step in range(1, num_steps+1):
batch_x, batch_y = mnist.train.next_batch(batch_size)
sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
if step % display_step == 0 or step == 1:
# 计算批次损失值和准确度值
loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x, Y: batch_y})
print("Step " + str(step) + ", Minibatch Loss= " + \
"{:.4f}".format(loss) + ", Training Accuracy= " + \
"{:.3f}".format(acc))
print("Optimization Finished!")
# 输出测试集的最终准确度
print("Testing Accuracy:", sess.run(accuracy, feed_dict={X: mnist.test.images,Y: mnist.test.labels}))
执行结果:
Step 1, Minibatch Loss= 2135.4846, Training Accuracy= 0.172
Step 50, Minibatch Loss= 201.3117, Training Accuracy= 0.820
Step 100, Minibatch Loss= 287.2367, Training Accuracy= 0.828
Step 150, Minibatch Loss= 219.4535, Training Accuracy= 0.859
Step 200, Minibatch Loss= 148.0623, Training Accuracy= 0.891
Step 250, Minibatch Loss= 88.3242, Training Accuracy= 0.906
Step 300, Minibatch Loss= 190.2478, Training Accuracy= 0.852
Step 350, Minibatch Loss= 45.7799, Training Accuracy= 0.883
Step 400, Minibatch Loss= 104.1621, Training Accuracy= 0.898
Step 450, Minibatch Loss= 122.6171, Training Accuracy= 0.844
Step 500, Minibatch Loss= 80.5699, Training Accuracy= 0.891
Step 550, Minibatch Loss= 144.2236, Training Accuracy= 0.859
Step 600, Minibatch Loss= 58.8911, Training Accuracy= 0.898
Step 650, Minibatch Loss= 60.6473, Training Accuracy= 0.891
Step 700, Minibatch Loss= 70.2185, Training Accuracy= 0.891
Step 750, Minibatch Loss= 87.3257, Training Accuracy= 0.859
Step 800, Minibatch Loss= 50.0909, Training Accuracy= 0.891
Step 850, Minibatch Loss= 108.5662, Training Accuracy= 0.898
Step 900, Minibatch Loss= 55.2491, Training Accuracy= 0.875
Step 950, Minibatch Loss= 52.5570, Training Accuracy= 0.867
Step 1000, Minibatch Loss= 89.9613, Training Accuracy= 0.836
Optimization Finished!
Testing Accuracy: 0.8787
从测试结果来看,简单神经网络得到的准确度低于softmax分类的测试结果。这应该是正常的,CS231n提到当神经网络层数增大到一定程度时,才能体现出神经网络结构的优势。参考:
https://www.tensorflow.org/get_started/mnist/beginners
http://www.jeyzhang.com/tensorflow-learning-notes.html