tensorflow 数据读取

最新推荐文章于 2021-02-12 18:15:32 发布

原创最新推荐文章于 2021-02-12 18:15:32 发布 · 371 阅读

CC 4.0 BY-SA版权

文章标签：

31 篇文章

订阅专栏

本文详细介绍了TensorFlow中三种主要的数据读取方法：供给数据、从文件读取数据和预加载数据。涵盖文件队列、批处理、乱序处理及线程管理等关键步骤，适合初学者和开发者深入理解。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

TensorFlow程序读取数据一共有3种方法:

根据你的文件格式，选择对应的文件阅读器，然后将文件名队列提供给阅读器的read方法。阅读器的read方法会输出一个key来表征输入的文件和其中的纪录(对于调试非常有用)，同时得到一个字符串标量，这个字符串标量可以被一个或多个解析器，或者转换操作将其解码为张量并且构造成为样本。

key, value =阅读器.read(queue, name=None)

文本格式/CSV 文件

读取 tf.TextLineReader()

解码 tf.decode_csv(）

图片格式

读取 tf.WholeFileReader（）

解码不同的图片格式有不同的解码方式

tf.image.decode_png（）

tf.image.decode_jpeg（）

二进制

TFRecords文件

3.批处理

在数据输入管线的末端，我们需要有另一个队列来执行输入样本的训练，评价和推理。因此我们使用f.train.batch、tf.train.shuffle_batch 函数来对队列中的样本进行乱序处理

功能：读取指定大小（个数）的张量

Args:

tensor_list: The list of tensors to enqueue.
batch_size: 从列表中读取的批处理器大小
num_threads: 进入队列的线程数
capacity: An integer. The maximum number of elements in the queue.
enqueue_many: Whether each tensor in tensor_list is a single example.
shapes: (Optional) The shapes for each example. Defaults to the inferred shapes for tensor_list.
name: (Optional) A name for the operations.

Returns:

A list of tensors with the same number and types as tensor_list.

4.手动开启线程

收集图中所有队列线程，同时默认启动线程

Args:

sess: Session used to run the queue ops. Defaults to the default session.
coord: Optional Coordinator for coordinating the started threads.
daemon: Whether the threads should be marked as daemons, meaning they don't block program exit.
start: Set to False to only create the threads, not start them.
collection: A GraphKey specifying the graph collection to get the queue runners from. Defaults to GraphKeys.QUEUE_RUNNERS.