data_load.py
# encoding: utf-8
"""
@author: monitor1379
@contact: yy4f5da2@hotmail.com
@site: www.monitor1379.com
@version: 1.0
@license: Apache Licence
@file: mnist_decoder.py
@time: 2016/8/16 20:03
对MNIST手写数字数据文件转换为bmp图片文件格式。
数据集下载地址为http://yann.lecun.com/exdb/mnist。
相关格式转换见官网以及代码注释。
========================
关于IDX文件格式的解析规则:
========================
THE IDX FILE FORMAT
the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.
The basic format is
magic number
size in dimension 0
size in dimension 1
size in dimension 2
.....
size in dimension N
data
The magic number is an integer (MSB first). The first 2 bytes are always 0.
The third byte codes the type of the data:
0x08: unsigned byte
0x09: signed byte
0x0B: short (2 bytes)
0x0C: int (4 bytes)
0x0D: float (4 bytes)
0x0E: double (8 bytes)
The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....
The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).
The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.
"""
import numpy as np
import struct
import matplotlib.pyplot as plt
# 训练集文件
train_images_idx3_ubyte_file = 'mnist/train-images-idx3-ubyte'
# 训练集标签文件
train_labels_idx1_ubyte_file = 'mnist/train-labels-idx1-ubyte'
# 测试集文件
test_images_idx3_ubyte_file = 'mnist/t10k-images-idx3-ubyte'
# 测试集标签文件
test_labels_idx1_ubyte_file = 'mnist/t10k-labels-idx1-ubyte'
def

本文档详细介绍了如何使用Python加载经典的MNIST手写数字数据集。通过`data_load.py`脚本,你可以轻松获取并预处理这个广泛用于图像识别和深度学习教程的数据集。
最低0.47元/天 解锁文章
2199

被折叠的 条评论
为什么被折叠?



