CIFAR10 二进制数据读取

最新推荐文章于 2021-09-12 19:32:57 发布

pythonHou

最新推荐文章于 2021-09-12 19:32:57 发布

阅读量954

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/pythonNew/article/details/82109709

这篇博客详细介绍了如何读取CIFAR10数据集的二进制文件，包括构造文件队列，使用tf.FixedLengthRecordReader进行数据解码，处理图片数据的形状和类型，以及实现批处理。同时，还定义了一个CIFAR类，并提供了读取数据的方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1 分析

构造文件队列
读取二进制数据并进行解码
处理图片数据形状以及数据类型，批处理返回
开启会话线程运行

2 代码

定义CIFAR类，设置图片相关的属性

class CifarRead(object):
    """
    二进制文件的读取，tfrecords存储读取
    """

    def __init__(self):
        # 定义一些图片的属性
        self.height = 32
        self.width = 32
        self.channel = 3

        self.label_bytes = 1
        self.image_bytes = self.height * self.width * self.channel
        self.bytes = self.label_bytes + self.image_bytes

实现读取数据方法bytes_read(self, file_list)

构造文件队列

# 1、构造文件队列
file_queue = tf.train.string_input_producer(file_list)

tf.FixedLengthRecordReader(bytes)读取

# 2、使用tf.FixedLengthRecordReader(bytes)读取
# 默认必须指定读取一个样本
reader = tf.FixedLengthRecordReader(self.all_bytes)

_, value = reader.read(file_queue)

进行解码操作

# 3、解码操作
# (?, )   (3073, ) = label(1, ) + feature(3072, )
label_image = tf.decode_raw(value, tf.uint8)
# 为了训练方便，一般会把特征值和目标值分开处理
print(label_image)

将数据的标签和图片进行分割

# 使用tf.slice进行切片
label = tf.cast(tf.slice(label_image, [0], [self.label_bytes]), tf.int32)

image = tf.slice(label_image, [self.label_bytes], [self.image_bytes])

print(label, image)

处理数据的形状，并且进行批处理

# 处理类型和图片数据的形状
# 图片形状
# reshape (3072, )----[channel, height, width]
# transpose [channel, height, width] --->[height, width, channel]
depth_major = tf.reshape(image, [self.channel, self.height, self.width])
print(depth_major)

image_reshape = tf.transpose(depth_major, [1, 2, 0])

print(image_reshape)

# 4、批处理
image_batch, label_batch = tf.train.batch([image_reshape, label], batch_size=10, num_threads=1, capacity=10)