-
mnist数据集
包含 7 万张黑底白字手写数字图片,其中 55000 张为训练集,5000 张为验证集,10000 张为测试集。每张图片大小为 28*28 像素,图片中纯黑色像素值为 0,纯白色像素值为 1。数据集的标签是长度为 10 的一维数组,数组中每个元素索引号表示对应数字出现的概率。
1.使用input_data模块中的read_data_sets()函数加载mnist数据集:
from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets('./data/',one_hot=True)
2.返回 mnist 数据集中训练集 train、验证集 validation 和测试集 test 样本数
(1)返回训练集 train 样本数
print(“train data size:”,mnist.train.mun_examples)
输出结果:train data size:55000
(2)返回验证集 validation 样本数
print(“validation data size:”,mnist.validation.mun_examples)
输出结果:validation data size:5000
(3)返回测试集 test 样本数
print(“test data size:”,mnist.test.mun_examples)
输出结果:test data size:10000
3.使用 train.images 函数返回 mnist 数据集图片像素值
在 mnist 数据集中,若想要查看训练集中第 0 张图片像素值,则使用如下函数
mnist.train.images[0]
4.使用 mnist.train.next_batch()函数将数据输入神经网络
例如:
BATCH_SIZE = 200
xs,ys = mnist.train.next_batch(BATCH_SIZE)
print(“xs shape:”,xs.shape)
print(“ys shape:”,ys.shape)
输出结果:xs.shape(200,784)
输出结果:ys.shape(200,10)
5.实现“Mnist 数据集手写数字识别”的常用函数:
(1)tf.get_collection(“”)函数表示从 collection 集合中取出全部变量生成一个列表。
(2)tf.add( )函数表示将参数列表中对应元素相加。
(3)tf.cast(x,dtype)函数表示将参数 x 转换为指定数据类型。
(4)tf.equal( )函数表示对比两个矩阵或者向量的元素。若对应元素相等,则返回 True;若对应元素不相等,则返回 False。
(5)tf.reduce_mean(x,axis)函数表示求取矩阵或张量指定维度的平均值。若不指定第二个参数,则在所有元素中取平均值;若指定第二个参数为 0,则在第一维元素上取平均值,即每一列求平均值;若指定第二个参数为 1,则在第二维元素上取平均值,即每一行求平均值。
(6)tf.argmax(x,axis)函数表示返回指定维度 axis 下,参数 x 中最大值索引号。
(7)os.path.join()函数表示把参数字符串按照路径命名规则拼接。
(8)字符串.split( )函数表示按照指定“拆分符”对字符串拆分,返回拆分列表。
(9)tf.Graph( ).as_default( )函数表示将当前图设置成为默认图,并返回一个上下文管理器。该函数一般与 with 关键字搭配使用,应用于将已经定义好的神经网络在计算图中复现。
6.神经网络模型的保存
在反向传播过程中,一般会间隔一定轮数保存一次神经网络模型,并产生三个文件(保存当前图结构的.meta 文件、保存当前参数名的.index 文件、保存当前参数的.data 文件),在 Tensorflow 中如下表示:
saver=tf.train.Saver()
with tf.Session() as sess:
for i in range(STEPS):
if i % 轮数 == 0:
saver.save(sess,os.path.join(MODEL_SAVE_PATH,MODEL_NAME),global_step=global_step)
7.神经网络模型的加载
在测试网络效果时,需要将训练好的神经网络模型加载,在 Tensorflow 中这样表示:
with tf.Session() as sess:
ckpt=tf.train.get_checkpoint_state(存储路径)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess,ckpt_model_checkpoint_path)
8.加载模型中参数的滑动平均值
在保存模型时,若模型中采用滑动平均,则参数的滑动平均值会保存在相应文件中。通过实例化 saver 对象,实现参数滑动平均值的加载,在 Tensorflow 中如下表示:
ema=tf.train.ExponentialMovingAverage(滑动平均基数)
ema_restore=ema.variables_to_restore()
saver=tf.train.Saver(ema_restore)
9.神经网络模型准确率评估方法
在网络评估时,一般通过计算在一组数据上的识别准确率,评估神经网络的效果。在 Tensorflow 中这样表示:
correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
-
神经网络八股
包括前向传播过程、反向传播过程、反向传播过程中用到的正则化、指数衰减学习率、滑动平均方法的设置、以及测试模块。
1.前向传播过程(forward.py)
def forward(x,regularizer):
w=
b=
y=
return y
def get_weight(shape,regularizer):
def get_bias(shape):
2.反向传播过程(backward.py)
def backward(mnist):
x=tf.placeholder(dtype,shape)
y_=tf.placeholder(dtype,shape)
# 定义前向传播函数
y=forward()
global_step=
loss=
train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
# 实例化saver对象
saver=tf.train.Saver()
with tf.Session() as sess:
# 初始化所有模型参数
tf.initialize_all_variables().run()
# 训练模型
for i in range(STEPS):
sess.run(train_step,feed_dict={x: , y_: })
if i % 轮数 == 0:
saver.save()
3.正则化、指数衰减学习率、滑动平均方法的设置
(1)正则化项regularization
首先,需要在前向传播过程即 forward.py 文件中加入
if regularizer != None:
tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))
其次,需要在反向传播过程即 backword.py 文件中加入
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
cem=tf.reduce_mean(ce)
loss=cem+tf.add_n(tf.get_collection('losses'))
(2)指数衰减学习率
在训练模型时,使用指数衰减学习率可以使模型在训练的前期快速收敛接近较优解,又可以保证模型在训练后期不会有太大波动。
运用指数衰减学习率,需要在反向传播过程即 backword.py 文件中加入:
learning_rate=tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
LEARNING_RATE_STEP, LEARNING_RATE_DECAY,
staircase=True)
(3)滑动平均
在模型训练时引入滑动平均可以使模型在测试数据上表现的更加健壮。
需要在反向传播过程即 backword.py 文件中加入:
ema=tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
ema_op=ema.apply(tf.trainable_variables())
with tf.control_dependencies([train_step, ema_op]):
train_op = tf.no_op(name='train')
4.测试过程(test.py)
首先,制定模型测试函数 test()
def test( mnist ):
with tf.Graph( ).as_default( ) as g:
# 给 x y_占位
x = tf.placeholder(dtype,shape)
y_ = tf.placeholder(dtype,shape)
# 前向传播得到预测结果 y
y = mnist_forward.forward(x, None) # 前向传播得到 y
# 实例化可还原滑动平均的 saver
ema = tf.train.ExponentialMovingAverage(滑动衰减率)
ema_restore = ema.variables_to_restore()
saver = tf.train.Saver(ema_restore)
#计算正确率
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
while True:
with tf.Session() as sess:
# 加载训练好的模型
ckpt = tf.train.get_checkpoint_state(存储路径)
# 如果已有 ckpt 模型则恢复
if ckpt and ckpt.model_checkpoint_path:
# 恢复会话
saver.restore(sess, ckpt.model_checkpoint_path)
# 恢复轮数
global_ste = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
# 计算准确率
accuracy_score = sess.run(accuracy, feed_dict={x:测试数据, y_:测试数据标签 })
# 打印提示
print("After %s training step(s), test accuracy=%g" % (global_step, accuracy_score))
# 如果没有模型
else:
print('No checkpoint file found') #模型不存在提示
return
其次,制定 main()函数
def main():
# 加载测试数据集
mnist = input_data.read_data_sets("./data/", one_hot=True)
# 调用定义好的测试函数 test()
test(mnist)
if __name__ == '__main__':
main()
-
实现手写体mnist数据集的识别
共分为三个模块文件,分别是描述网络结构的前向传播过程文件(mnist_forward.py)、描述网络参数优化方法的反向传播过程文件( mnist_backward.py )、 验 证 模 型 准 确 率 的 测 试 过 程 文 件(mnist_test.py)。
1.前向传播过程文件(mnist_forward.py)
import tensorflow as tf
INPUT_NODE = 784
OUTPUT_NODE = 10
LAYER1_NODE = 500
def get_weight(shape, regularizer):
w = tf.Variable(tf.truncated_normal(shape, stddev=0.1))
if regularizer != None:
tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
return w
def get_bias(shape):
b = tf.Variable(tf.zeros(shape))
return b
def forward(x, regularizer):
w1 = get_weight([INPUT_NODE, LAYER1_NODE], regularizer)
b1 = get_bias(LAYER1_NODE)
y1 = tf.nn.relu(tf.matmul(x, w1) + b1)
w2 = get_weight([LAYER1_NODE, OUTPUT_NODE], regularizer)
b2 = get_bias([OUTPUT_NODE])
y = tf.matmul(y1, w2) + b2
return y
2.反向传播过程文件(mnist_backward.py)
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import mnist_forward
import os
BATCH_SIZE = 200
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
REGULARIZER = 0.0001
STEPS = 10000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH="./model/"
MODEL_NAME="mnist_model"
def backward(mnist):
x = tf.placeholder(tf.float32, [None, mnist_forward.INPUT_NODE])
y_= tf.placeholder(tf.float32, [None, mnist_forward.OUTPUT_NODE])
y = mnist_forward.forward(x, REGULARIZER)
global_step = tf.Variable(0, trainable=False)
ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cem = tf.reduce_mean(ce)
loss = cem + tf.add_n(tf.get_collection('losses'))
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
mnist.train.num_examples / BATCH_SIZE,
LEARNING_RATE_DECAY,
staircase=True)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
ema_op = ema.apply(tf.trainable_variables())
with tf.control_dependencies([train_step, ema_op]):
train_op = tf.no_op(name='train')
saver = tf.train.Saver()
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
for i in range(STEPS):
xs, ys = mnist.train.next_batch(BATCH_SIZE)
_, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: xs, y_: ys})
if i % 1000 == 0:
print("After %d training step(s), loss on training batch is %g." % (step, loss_value))
saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step)
def main():
mnist = input_data.read_data_sets("./data/", one_hot=True)
backward(mnist)
if __name__=='__main__':
main()
3.测试过程文件(mnist_test.py)
import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import mnist_forward
import mnist_backward
TEST_INTERVAL_SECS = 5
def test(mnist):
with tf.Graph().as_default() as g:
x = tf.placeholder(tf.float32, [None, mnist_forward.INPUT_NODE])
y_= tf.placeholder(tf.float32, [None, mnist_forward.OUTPUT_NODE])
y = mnist_forward.forward(x, None)
ema = tf.train.ExponentialMovingAverage(mnist_backward.MOVING_AVERAGE_DECAY)
ema_restore = ema.variables_to_restore()
saver = tf.train.Saver(ema_restore)
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
while(True):
with tf.Session() as sess:
ckpt =
tf.train.get_checkpoint_state(mnist_backward.MODEL_SAVE_PATH)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-'[-1])
accuracy_score = sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
print("After %s training step(s), test accuracy = %g" % (global_step, accuracy_score))
else:
print("No checkpoint file found")
return
time.sleep(TEST_INTERVAL_SECS)
def main():
mnist = input_data.read_data_sets("./data/", one_hot=True)
test(mnist)
if __name__=='__main__':
main()
运行结果如下:
Extracting ./data/train-images-idx3-ubyte.gz Extracting ./data/train-labels-idx1-ubyte.gz Extracting ./data/t10k-images-idx3-ubyte.gz Extracting ./data/t10k-labels-idx1-ubyte.gz After 1 training step(s), loss on training batch is 3.07077. After 1001 training step(s), loss on training batch is 0.258253. After 2001 training step(s), loss on training batch is 0.297677. After 3001 training step(s), loss on training batch is 0.323755. After 4001 training step(s), loss on training batch is 0.201881. After 5001 training step(s), loss on training batch is 0.238682. After 6001 training step(s), loss on training batch is 0.174689. After 7001 training step(s), loss on training batch is 0.191517. After 8001 training step(s), loss on training batch is 0.174553. After 9001 training step(s), loss on training batch is 0.158572.
INFO:tensorflow:Restoring parameters from ./model/mnist_model-1 After ['mnist_model', '1'] training step(s), test accuracy = 0.0905 INFO:tensorflow:Restoring parameters from ./model/mnist_model-1001 After ['mnist_model', '1001'] training step(s), test accuracy = 0.9469 INFO:tensorflow:Restoring parameters from ./model/mnist_model-2001 After ['mnist_model', '2001'] training step(s), test accuracy = 0.961 INFO:tensorflow:Restoring parameters from ./model/mnist_model-3001 After ['mnist_model', '3001'] training step(s), test accuracy = 0.9675 INFO:tensorflow:Restoring parameters from ./model/mnist_model-4001 After ['mnist_model', '4001'] training step(s), test accuracy = 0.9711 INFO:tensorflow:Restoring parameters from ./model/mnist_model-5001 After ['mnist_model', '5001'] training step(s), test accuracy = 0.9744 INFO:tensorflow:Restoring parameters from ./model/mnist_model-6001 After ['mnist_model', '6001'] training step(s), test accuracy = 0.9753 INFO:tensorflow:Restoring parameters from ./model/mnist_model-7001 After ['mnist_model', '7001'] training step(s), test accuracy = 0.9767 INFO:tensorflow:Restoring parameters from ./model/mnist_model-8001 After ['mnist_model', '8001'] training step(s), test accuracy = 0.9773 INFO:tensorflow:Restoring parameters from ./model/mnist_model-9001 After ['mnist_model', '9001'] training step(s), test accuracy = 0.9779
从终端显示的运行结果可以看出,随着训练轮数的增加,网络模型的损失函数值在不断降低,并且在测试集上的准确率在不断提升,有较好的泛化能力。