《一个图像复原实例入门深度学习&TensorFlow—第十篇》训练过程可视化

最新推荐文章于 2025-05-27 00:00:00 发布

Aicoling

最新推荐文章于 2025-05-27 00:00:00 发布

阅读量1.1k

点赞数 7

CC 4.0 BY-SA版权

分类专栏：【实战】TensorFlow 1.0 入门文章标签： TensorBoard 监控指标可视化 TensorFlow 深度学习

本文链接：https://blog.youkuaiyun.com/qq_43024357/article/details/82184722

本文介绍了如何使用TensorBoard进行训练过程的可视化，包括TensorBoard的基本概念、命名空间管理和监控指标的可视化。通过示例代码展示了如何生成和解析计算图，以及如何监控损失值、网络参数变化等关键指标。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

训练过程可视化 TensorBoard

漫漫训练路，孤独的等待岂不是太过寂寞？让我们在模型训练过程中也找点事做吧，那就是用好TensorBoard（在我们安装tensorflow的时候tensorboard就被一起安装了，所以不用conda install tensorboard了，直接能在我们创建的环境中使用）

1. TensorBoard简介

这里参考《TensorFlow实战谷歌深度学习框架》第9章的内容

TensorBoard是TensorFlow提供的可视化工具，它可以通过TensorFlow程序运行过程中输出的日志文件可视化TensorFlow程序的运行状态。TensorBoard和TensorFlow跑在不同的进程中，TensorBoard会自动读取最新的TensorFlow日志文件，并呈现当前TensorFlow程序运行的最新状态。

下面的代码展示了一个简单的TensorFlow程序，在这个程序中完成了TensorBoard日志输出的功能。（构建计算图并不需要在Session中运行哦，定义计算流程后就会自动生成计算图）

import tensorflow as tf
tf.reset_default_graph()                         # 重置计算图
TensorBoard_path = 'E:\\MNIST_data\\TensorBoard' # tensorboard日志保存位置

# 定义一个简单的计算图，实现向量的加法操作
input1 = tf.constant([1.0,2.0,3.0],name='input1')
input2 = tf.constant([4.0,5.0,6.0],name='input2')
output = tf.add_n([input1,input2],name='add')

#生成一个写日志的writer,并将当前的TensorFlow计算图写入日志，日志保存在TensorBoard_path中生成的.tfevents文件中
summary_writer = tf.summary.FileWriter(TensorBoard_path,tf.get_default_graph())
summary_writer.close()

在Spyder（TensorFlow）中运行代码，然后打开Anaconda Prompt，输入activate tensorflow进入我们创建的环境，然后输入：tensorboard.exe –logdir=E:\MNIST_data\TensorBoard 得到：
这里写图片描述
在浏览器中打开网址：http://HP-HP:6006 (不同IP会得到不同的网址，你要输入你自己网址)
进入网址，我们得到下图：（红色方框中的图就是我们陌生又熟悉的计算图的庐山真面目）

很容易吧，调用tf.summary.FileWriter函数就ok啦，来让我们看看之前构建的网络的计算图是啥样的。
代码：（train.py增加了生成计算图日志的函数）

import time
time_start=time.time() # time.time()为1970.1.1到当前时间的毫秒数
import tensorflow as tf
import numpy as np

import input_data  # 导入与输入数据相关的操作
import model       # 导入模型

img_W = 28                                                               # 图像宽度
img_H = 28                                                               # 图像高度
batch_size = 10                                                          # 每个mini-batch含有的样本数量
min_after_dequeue = 1000                                                 # 队列中最少文件数量
capacity = min_after_dequeue + 3*batch_size                              # 队列中最多文件数量

train_image_path = 'E:\\MNIST_data\\train_images\\'                      # 输入图像的路径
train_label_path = 'E:\\MNIST_data\\train_labels\\'                      # 输出图像的路径
Train_TFRecord_path = 'E:\\MNIST_data\\tfrecord\\train_data_set.tfrecord'# 输出TFRecord文件的路径

test_image_path = 'E:\\MNIST_data\\test_images\\'                        # 输入图像的路径
test_label_path = 'E:\\MNIST_data\\test_labels\\'                        # 输出图像的路径
Test_TFRecord_path = 'E:\\MNIST_data\\tfrecord\\test_data_set.tfrecord'  # 输出TFRecord文件的路径
model_save_path = 'E:\\MNIST_data\\models\\conv_1.ckpt'                  # 模型保存的路径
TensorBoard_path = 'E:\\MNIST_data\\TensorBoard'                         # tensorboard日志保存位置

print('please wait for generating the TFRecord file of training sets...')       
#input_data.generate_TFRecordfile(train_image_path,train_label_path,Train_TFRecord_path)# 调用函数生成TFRecord文件
print('please wait for generating the TFRecord file of test sets...')
#input_data.generate_TFRecordfile(test_image_path,test_label_path,Test_TFRecord_path)   # 调用函数生成TFRecord文件

Train_Images_Batch,Train_Labels_Batch = input_data.get_batch(Train_TFRecord_path)       # 调用函数多线程读取TFRecord文件生成mini-batch       
Test_Images_Batch,Test_Labels_Batch = input_data.get_batch(Test_TFRecord_path)          # 调用函数多线程读取TFRecord文件生成mini-batch       
# 定义将mini-batch导入网络的占位符
x = tf.placeholder(tf.float32, shape=[None,img_W,img_H,1],name = 'images')
y_label = tf.placeholder(tf.float32, shape=[None,img_W,img_H,1],name = 'labels')

y_conv = model.inference(x)

loss = tf.reduce_mean(tf.square(y_conv - y_label))     # 定义代价函数为均方误差
train_op = tf.train.AdamOptimizer(1e-4).minimize(loss) # 使用梯度下降算法对参数进行寻优 

init_op = (tf.local_variables_initializer(),tf.global_variables_initializer())#初始化操作
saver = tf.train.Saver()
# 初始化写日志的writer 将当前计算图写入日志
summary_writer = tf.summary.FileWriter(TensorBoard_path,tf.get_default_graph())
with tf.Session() as sess:
    sess.run(init_op)
    coord = tf.train.Coordinator() # 用于协调多个线程同时终止
    threads = tf.train.start_queue_runners(sess=sess,coord=coord) # 启动线程     
    try:
        for step in range(100):
            if coord.should_stop(): # 读到结束标记后coord.should_stop()变为True，跳出循环
                break
            train_images_batch,train_labels_batch = sess.run([Train_Images_Batch,Train_Labels_Batch])
            train_images_batch = np.reshape(train_images_batch,[batch_size,img_W,img_H,1]) # 一个样本为行
            train_labels_batch = np.reshape(train_labels_batch,[batch_size,img_W,img_H,1])
            sess.run(train_op,feed_dict={x:train_images_batch,y_label:train_labels_batch}) # 将mini-batch feed给train_op 训练网络               
            if step%100 == 0:
                test_images_batch,test_labels_batch = sess.run([Test_Images_Batch,Test_Labels_Batch])
                test_images_batch = np.reshape(test_images_batch,[batch_size,img_W,img_H,1]) 
                test_labels_batch = np.reshape(test_labels_batch,[batch_size,img_W,img_H,1])
                train_loss = sess.run(loss,feed_dict={x:train_images_batch,y_label:train_labels_batch})
                test_loss = sess.run(loss,feed_dict={x:test_images_batch,y_label:test_labels_batch})
                print('step %d: loss on training set batch:%d  loss on testing set batch:%d' % (step,train_loss,test_loss))
                saver.save(sess, model_save_path)

    except tf.errors.OutOfRangeError: # 捕捉文件名队列中的结束标记
        print('epoch limit reached')
        coord.request_stop() #通知其它线程停止读取数据
    finally:
        coord.request_stop()
        coord.join(threads) #等待所有线程退出
    saver.save(sess, model_save_path) # 保存模型
    summary_writer.close()
time_end=time.time() # time.time()为1970.1.1到当前时间的毫秒数
print('\nTrain Finished\nTotal run time is : %f s \nThe network was saved in %s' %(time_end-time_start,model_save_path))