convert_to_records.py路径修改

最新推荐文章于 2024-09-27 10:16:27 发布

原创最新推荐文章于 2024-09-27 10:16:27 发布 · 746 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#Deep Learning

Deep Learning 同时被 3 个专栏收录

28 篇文章

订阅专栏

机器学习

3 篇文章

订阅专栏

计算机前沿

2 篇文章

订阅专栏

本文介绍了一种将MNIST数据集转换为TensorFlow TFRecords格式的方法，通过Python脚本实现自动化处理，便于后续的数据加载及训练使用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Converts MNIST data to TFRecords file format with Example protos."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import os
import sys

import tensorflow as tf

from tensorflow.contrib.learn.python.learn.datasets import mnist

FLAGS = None


def _int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def _bytes_feature(value):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def convert_to(data_set, name):
  """Converts a dataset to tfrecords."""
  images = data_set.images
  labels = data_set.labels
  num_examples = data_set.num_examples

  if images.shape[0] != num_examples:
    raise ValueError('Images size %d does not match label size %d.' %
                     (images.shape[0], num_examples))
  rows = images.shape[1]
  cols = images.shape[2]
  depth = images.shape[3]

  filename = os.path.join(FLAGS.directory, name + '.tfrecords')
  print('Writing', filename)
  writer = tf.python_io.TFRecordWriter(filename)
  for index in range(num_examples):
    image_raw = images[index].tostring()
    example = tf.train.Example(features=tf.train.Features(feature={
        'height': _int64_feature(rows),
        'width': _int64_feature(cols),
        'depth': _int64_feature(depth),
        'label': _int64_feature(int(labels[index])),
        'image_raw': _bytes_feature(image_raw)}))
    writer.write(example.SerializeToString())
  writer.close()


def main(unused_argv):
  # Get the data.
  data_sets = mnist.read_data_sets(FLAGS.directory,
                                   dtype=tf.uint8,
                                   reshape=False,
                                   validation_size=FLAGS.validation_size)

  # Convert to Examples and write the result to TFRecords.
  convert_to(data_sets.train, 'train')
  convert_to(data_sets.validation, 'validation')
  convert_to(data_sets.test, 'test')


if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument(
      '--directory',
      type=str,
      default='/tmp/data',
      help='Directory to download data files and write the converted result'
  )
  parser.add_argument(
      '--validation_size',
      type=int,
      default=5000,
      help="""\
      Number of examples to separate from the training data for the validation
      set.\
      """
  )
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

这段代码是用来转换MNIST到标准的tensorflow格式——TFRecords的，默认路径写在了/tmp/data下。我这里直接改成了MNIST_data/ ，然后里面放上MNIST的那四个gz文件就可以了。这是修改路径的方法。运行结果是

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Writing MNIST_data/train.tfrecords
Writing MNIST_data/validation.tfrecords
Writing MNIST_data/test.tfrecords
然后在MNIST_data/ 下可以看到生成的三个TFRecords文件。

最终目标是训练自己的图片，初步怀疑gz解压缩以后就是原始图片，明天可以尝试把自己的图片也弄成gz文件然后生成TFRecords。另外，接着尝试从队列中读取数据。对比教程和官方教程。