一、Mnist数据集的介绍与获取
1.Mnist数据集的介绍
简介
60000行的训练数据集(mnist.train)和10000行的测试数据集(mnist.test)
每一个MNIST数据单元有两部分组成:一张包含手写数字的图片和一个对应的标签。我们把这些图片设为“xs”,把这些标签设为“ys”。训练数据集和测试数据集都包含xs和ys,比如训练数据集的图片是 mnist.train.images ,训练数据集的标签是 mnist.train.labels。
官网:http://www.tensorfly.cn/tfdoc/tutorials/mnist_beginners.html
下载和读取数据集
import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
运行
from tensorflow.examples.tutorials.mnist import input_data
# one_hot 独热编码,也叫一位有效编码。在任意时候只有一位为1,其他位都是0
mnist = input_data.read_data_sets("data/", one_hot=True)
train_images = mnist.train.images
train_labels = mnist.train.labels
test_images = mnist.test.images
test_labels = mnist.test.labels
print("train_images_shape:", train_images.shape)
print("train_labels_shape:", train_labels.shape)
print("test_images_shape:", test_images.shape)
print("test_labels_shape:", test_labels.shape)
print("train_images:", train_images[0]) # 获取55000张里第一张
print("train_images_length:",len(train_images[0]))
print("train_labels:", train_labels[0])
运行结果
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
train_images_shape: (55000, 784) # 一共55000张训练数据 784=28*28像素点
train_labels_shape: (55000, 10) # 10列 0到9十个数字
test_images_shape: (10000, 784) # 10000张测试数据 10000行 784列
test_labels_shape: (10000, 10)
train_images: [ 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0.