Theano mnist数据集格式-优快云博客

本文详细解读了Theano库中Mnist数据集的存储格式，并提供了解析方法，帮助开发者将其他形式的数据无缝集成到Theano程序中。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

首先链接一篇大牛的Theano文档翻译：http://www.cnblogs.com/xueliangliu/archive/2013/04/03/2997437.html

里面有mnist.pkl.gz 手动下载地址（因为代码里也有自动下载方法）

那么我不是做图像处理的，所以对图像的存储格式没有什么概念，我要以其他方式输入进theano程序中怎么办呢?

于是就得分析它的存储格式。代码(logistic_sgd.py,line 195)注释中说的已经很清楚了：

#train_set, valid_set, test_set format: tuple(input, target)
#input is an numpy.ndarray of 2 dimensions (a matrix)
#witch row's correspond to an example. target is a
#numpy.ndarray of 1 dimensions (vector)) that have the same length as
#the number of rows in the input. It should give the target
#target to the example with the same index in the input.

那么就是说train_X是一个rows行2列的矩阵，train_Y是一个rows维的向量，而train_set是train_X和train_Y的一个组合

那么我们只需要读文件构建矩阵和向量，然后share成theano程序里的类型就ok啦

===================割=========================

想不到后来又重拾DL，如今已经是今非昔比了啊

再次补充一下Mnist数据集的格式

import cPickle, gzip, numpy

# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()

事实证明它会返回一个tuple，分别是train vali test集。

每个集有两维，以train set为例，分别是(50000, 784) (50000,1)代表着5W个样本和5W个label，

每个样本有784个维度 = 28*28

转载于:https://www.cnblogs.com/zklidd/p/3886597.html