tf.summary、tfrecord

本文介绍如何使用TensorFlow自定义Summary来记录和展示训练过程中的数据,如验证集上的损失等,提供了一种简便的方法及示例代码。
部署运行你感兴趣的模型镜像

tf.summary



通常情况下,我们在训练网络时添加summary都是通过如下方式:

tf.scalar_summary(tags, values)
# ...
summary_op = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter(logdir, graph=sess.graph)
summary_str = sess.run(summary_op)
summary_writer.add_summary(summary_str, global_step)
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

当我们自己想添加其他数据到TensorBoard的时候(例如验证时的loss等),这种方式显得太过繁琐,其实我们可以通过如下方式添加自定义数据到TensorBoard内显示。

summary_writer = tf.summary.FileWriter(logdir)
summary = tf.Summary(value=[
    tf.Summary.Value(tag="summary_tag", simple_value=0), 
    tf.Summary.Value(tag="summary_tag2", simple_value=1),
])
# x代表横轴坐标
summary_writer.add_summary(summary, x)
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

或者:

summary_writer = tf.summary.FileWriter(logdir)
summary = tf.Summary()
summary.value.add(tag="summary_tag", simple_value=0)
summary.value.add(tag="summary_tag2", simple_value=1)
# x代表横轴坐标
summary_writer.add_summary(summary, x)
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

注意,这里的x只能是整数,如果是小数的话会自动转为整数类型。

下面给出一段完整的示例代码

import tensorflow as tf
summary_writer = tf.summary.FileWriter('/tmp/test')
summary = tf.Summary(value=[
    tf.Summary.Value(tag="summary_tag", simple_value=0), 
    tf.Summary.Value(tag="summary_tag2", simple_value=1),
])
summary_writer.add_summary(summary, 1)

summary = tf.Summary(value=[
    tf.Summary.Value(tag="summary_tag", simple_value=1), 
    tf.Summary.Value(tag="summary_tag2", simple_value=3),
])
summary_writer.add_summary(summary, 2)

summary_writer.close()
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

显示效果如下所示:
TensorBoard显示示例

参考资料:

How to manually create a tf.Summary

修改历史:
2017-2-19 适应1.0api



tensorboard 显示时样例:tensorboard - -logdir E:/programData#不加引号

tfrecord
写:
with tf.python_io.TFRecordWriter(tf_filename) as tfrecord_writer:
for i, image_example in enumerate(dataset):
sys.stdout.write(‘\r>> Converting image %d/%d’ % (i + 1, len(dataset)))
sys.stdout.flush()
example = tf.train.Example(features=tf.train.Features(feature={
‘image/encoded’: _bytes_feature(image_buffer),
‘image/label’: _int64_feature(class_label),
‘image/roi’: _float_feature(roi),
‘image/landmark’: _float_feature(landmark)#注意是冒号
}))
(注:def _bytes_feature(value):
if not isinstance(value, list):
value = [value]
return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))图像一般tostring()转字符串)
tfrecord_writer.write(example.SerializeToString())
读:
filename_queue = tf.train.string_input_producer([tfrecord_file],shuffle=True)
# read tfrecord
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
image_features = tf.parse_single_example(
serialized_example,
features={
‘image/encoded’: tf.FixedLenFeature([], tf.string),#one image one record
‘image/label’: tf.FixedLenFeature([], tf.int64),
‘image/roi’: tf.FixedLenFeature([4], tf.float32),
‘image/landmark’: tf.FixedLenFeature([10],tf.float32)
}
)
image_features就是解析出来的字典。

您可能感兴趣的与本文相关的镜像

TensorFlow-v2.15

TensorFlow-v2.15

TensorFlow

TensorFlow 是由Google Brain 团队开发的开源机器学习框架,广泛应用于深度学习研究和生产环境。 它提供了一个灵活的平台,用于构建和训练各种机器学习模型

def data_load(data_dir, test_data_dir, img_height, img_width, batch_size): # 加载训练集 train_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir, label_mode='categorical', seed=123, image_size=(img_height, img_width), batch_size=batch_size) # 加载测试集 val_ds = tf.keras.preprocessing.image_dataset_from_directory( test_data_dir, label_mode='categorical', seed=123, image_size=(img_height, img_width), batch_size=batch_size) class_names = train_ds.class_names # 返回处理之后的训练集、验证集和类名 return train_ds, val_ds, class_names # 构建mobilenet模型 # 模型加载,指定图片处理的大小和是否进行迁移学习 def model_load(IMG_SHAPE=(224, 224, 3), class_num=12): # 微调的过程中不需要进行归一化的处理 # 加载预训练的mobilenet模型 base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet') # 将模型的主干参数进行冻结 base_model.trainable = False model = tf.keras.models.Sequential([ # 进行归一化的处理 tf.keras.layers.experimental.preprocessing.Rescaling(1. / 127.5, offset=-1, input_shape=IMG_SHAPE), # 设置主干模型 base_model, # 对主干模型的输出进行全局平均池化 tf.keras.layers.GlobalAveragePooling2D(), # 通过全连接层映射到最后的分类数目上 tf.keras.layers.Dense(class_num, activation='softmax') ]) model.summary() # 模型训练的优化器为adam优化器,模型的损失函数为交叉熵损失函数 model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) return model # 展示训练过程的曲线 def show_loss_acc(history): # 从history中提取模型训练集和验证集准确率信息和误差信息 acc = history.history['accuracy'] val_acc = history.history['val_accuracy'] loss = history.history['loss'] val_loss = history.history['val_loss'] # 按照上下结构将图画输出 plt.figure(figsize=(8, 8)) plt.subplot(2, 1, 1) plt.plot(acc, label='Training Accuracy') plt.plot(val_acc, label='Validation Accura
03-21
import os import numpy as np from util.models import AETripletSplitConvModel from util.data import TFRecordLoader from util.model_utils import RocAucCallback import tensorflow as tf tf.config.experimental.set_memory_growth( tf.config.list_physical_devices('GPU')[0], True ) ids_removed = False data_base = "../data_process" save_dir = data_base + "/models" save_dir_plots = data_base + "/model-plots" if ids_removed: data_dir = data_base + "/tfrecord-ids-removed/train" num_files = 20 else: data_dir = data_base num_files = 35 use_magnitude = False magnitude_percentage = 70 data_dir_magnitude = data_base + "/noise/tfrecord-magnitude" percentages = list(range(0, 175, 5)) # 从0到170,步长为5 num_files_magnitude = {p: 1 for p in percentages} # 每个百分比 num_samples = 11000 # = 880*12.5 group_window_size = 4 # Number of samples with the same ID to group together batch_size = 32 num_epochs = 5 seed = 20220615 shuffle_buffer_file = 15 shuffle_buffer_sample = 10000 layers = [ (64, 2), (64, 4), (32, 8), (32, 16), (32, 32), (32, 32), ] num_layers = len(layers) latent_dim = 512 triplet_margin = 1.0 # Margin for triplet loss function triplet_distance_metric = 'angular' # L2, squared-L2, angular normalization = 'L2' # L2, L1, None learning_rate = 1e-5 # 1e-5 if use_magnitude: model_name = f'ae-triplet-magnitude-{magnitude_percentage}' elif ids_removed: model_name = 'ae-triplet-ids-removed' else: model_name = 'ae-triplet' if use_magnitude: num_files = num_files_magnitude[magnitude_percentage] files_in = [os.path.join(data_dir_magnitude, f"data-{magnitude_percentage}-0.tfrecord")] file_val = files_in[0] # 由于每个百分比只有一个文件,直接使用该文件 file_test = file_val files_train = files_in # 所有文件都用于训练 shuffle_buffer_file = min(shuffle_buffer_file, len(files_train)) else: files_in = [os.path.join(data_dir, "data-{}-0.tfrecord".format(i)) for i in range(0, 171, 5)] if ids_removed: file_val = files_in[-1] files_train = files_in[:-1] else: file_val = files_in[4] file_test = files_in[8] files_train = files_in[0:4] + files_in[5:8] + files_in[9:] cycle_length = 4 # Number of files to read in parallel print(f"Training files: {files_train}") ds_train = TFRecordLoader.from_files(files_train, shuffle_buffer_file, cycle_length, shuffle_buffer_sample, seed) ds_train = TFRecordLoader.window_batch(ds_train, group_window_size, batch_size) ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE) print(f"Validation file: {file_val}") ds_val = TFRecordLoader.from_file(file_val, shuffle_buffer_sample, seed) ds_val = TFRecordLoader.window_batch(ds_val, group_window_size, batch_size) ds_val = ds_val.prefetch(tf.data.experimental.AUTOTUNE) print("Initialising model") model = AETripletSplitConvModel( model_name, num_samples, 2, layers, latent_dim, learning_rate, triplet_margin=triplet_margin, triplet_distance_metric=triplet_distance_metric, normalization=normalization, save_dir=save_dir ) print("Loading checkpoint") start_epoch = 0 try: model.load_model(suffix='checkpoint') # 检查是否有记录训练周期数的文件 epoch_file = os.path.join(save_dir, f"{model_name}-checkpoint-epoch.txt") if os.path.exists(epoch_file): with open(epoch_file, 'r') as f: start_epoch = int(f.read().strip()) print(f"Successfully loaded model from checkpoint. Resuming from epoch {start_epoch + 1}.") except ValueError: print("No checkpoint found, starting training from scratch.") model.model.summary() # 训练时从正确的周期开始 for epoch in range(start_epoch, start_epoch + num_epochs): print(f"Starting epoch {epoch + 1}") model.fit( ds_train, validation_data=ds_val, validation_steps=200, batch_size=batch_size, epochs=1, # 每次只训练一个周期 save_epochs=False, # 关闭原有的保存检查点逻辑 ) # 保存当前周期的检查点,包含epoch编号 model.save_model(suffix=f'checkpoint-epoch-{epoch+1}') # 同时保留最新的checkpoint(用于断点续训) model.save_model(suffix='checkpoint') # 记录当前训练的周期数 epoch_file = os.path.join(save_dir, f"{model_name}-checkpoint-epoch.txt") with open(epoch_file, 'w') as f: f.write(str(epoch + 1)) # 最终模型也包含总训练轮数 model.save_model( suffix=f'final-epochs-{start_epoch + num_epochs}', ) 这是卫星指纹识别模型的训练脚本
最新发布
07-04
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值