MNIST手写数字识别是所有学习AI同学的入门必经过程,MNIST识别准确率提升修炼是精通AI模型的必经课程,MNIST识别准确率开刚始大家一般都能达到90%左右,再往上提高还需要费较大的精力去修改模型、调优参数,MNIST识别率究竟能达到多少,对于初学者还是很难搞清楚,刚开始也没有经验去提升得很高,我在第一遍学习时,通过参数和训练次数调整,利用了很多模型,达到了99.3%的精度,再往上提升时当时那台电脑的计算能力不够,也没有找到新的模型,没有再做研究了,就学习其它内容去了。
上个月旧电脑出现故障,不能启动,硬盘使用了Bitlocker,数据也不能恢复,学习代码也丢了,这次更换了一台性能非常好的电脑后重新学习,就把MNIST的Lenet-5算法重新学习了一遍,结合Kaggle网站上的方案,做了个模型和方案,最终把MNIST识别准确度提升到了99.96%的水平,据了解由于MNIST数据本身有缺陷,除非将评估数据集加入训练准确率应该是达不到100%的。我想能达到3个9的准度也应该足够了,特将实现方案总结出来,分享新学习的同学们,希望对大家有所帮助。
一、Lenet-5通用模型方案
下面方案是Lenet-5最通用的方案,初始是用3X3卷积盒,模型使用Sequential构建,训练也是用自己写的循环代码一步一步构建,特别适合于新手引用。此外,该案例还有日志保存、模型可在代码。
"""
LetNet-5 实战1:网上使用最多的模型, 测试用例精确度能达到99%
"""
import datetime
import tensorflow as tf
from tensorflow.keras import Sequential, layers, losses, datasets
def preprocess(x, y):
"""
预处理函数
"""
# [b, 28, 28], [b]
x = tf.cast(x, dtype=tf.float32) / 255.
y = tf.cast(y, dtype=tf.int32)
y = tf.one_hot(y, depth=10)
return x, y
# 设置LOG与模型保存相关参数
current_time = datetime.datetime.now().strftime(('%Y%m%d-%H%M%S'))
log_dir = 'logs/' + current_time
summary_writer = tf.summary.create_file_writer(log_dir)
(x, y), (x_test, y_test) = datasets.mnist.load_data() # 加载手写数据集数据
batchsz = 128 # 此模型下batch size在128下比较好
train_db = tf.data.Dataset.from_tensor_slices((x, y)) # 转化为Dataset对象
train_db = train_db.shuffle(100000) # 随机打散
train_db = train_db.batch(batchsz) # 批训练
train_db = train_db.map(preprocess) # 数据预处理
test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_db = test_db.shuffle(1000).batch(batchsz).map(preprocess)
# 通过Sequnentia容器创建LeNet-5
network = Sequential([
layers.Conv2D(6, kernel_size=3, strides=1), # 第一个卷积核,6个3X3的卷积核,
layers.MaxPooling2D(pool_size=2, strides=2), # 高宽各减半的池化层
layers.ReLU(), # 激活函数
layers.Conv2D(16, kernel_size=3, strides=1), # 第二个卷积核,16个3X3的卷积核,
layers.MaxPooling2D(pool_size=2, strides=2), # 高宽各减半的池化层
layers.ReLU(), # 激活函数
layers.Flatten(), # 打平层,方便全连接层处理
layers.Dense(120, activation='relu'), # 全连接层,120个结点
layers.Dense(84, activation='relu'), # 全连接层,84个结点
layers.Dense(10, activation='relu') # 全连接层,10个结点
])
# build 一次网络模型,给输入X的形状
network.build(input_shape=(batchsz, 28, 28, 1))
# 统计网络信息
network.summary()
# 创建损失函数的类,在实际计算时直接调用类实例
criteon = losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.95) # batch,128,lr=0.01,acc:0.9914
# optimizer = tf.keras.optimizers.Nadam(learning_rate=0.002) # batch,128,0,89
# optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.01)
# 训练30个epoch
epoch = 30
steps = 0
for n in range(epoch):
for step, (x, y) in enumerate(train_db):
with tf.GradientTape() as tape:
# 插入通道维度 =》[b,28,28,1]
x = tf.expand_dims(x, axis=3)
# 前向计算,获取10类别的概率分布 [b,784]=>[b,10]
out = network(x)
# 计算交叉熵损失函数,标量
loss = criteon(y, out)
# 自动计算梯度
grads = tape.gradient(loss, network.trainable_variables)
# 自动更新参数
optimizer.apply_gradients(zip(grads, network.trainable_variables))
if step % 100 == 0:
steps += 100
print('epoch:', n, 'step:', step, 'loss:', float(loss))
with summary_writer.as_default():
tf.summary.scalar('loss', float(loss), step=steps)
correct, total = 0, 0
for x, y in test_db:
# 插入通道维度 =>[b,28,28,1]
x = tf.expand_dims(x, axis=3)
# 前向计算,获得10类别的预测分布 [b,784] => [b,10]
out = network(x)
# 真实的流程时先经过softmax,再argmax
# 但是由于softmax不改变元素的大小相对关系,故省去
pred = tf.argmax(out, axis=-1)
y = tf.cast(y, tf.int64)
y = tf.argmax(y, axis=-1)
# 统计预测正确的数量
correct += float(tf.reduce_sum(tf.cast(tf.equal(pred, y), tf.float32)))
# 统计预测样本的总数量
total += x.shape[0]
with summary_writer.as_default():
tf.summary.scalar('acc', float(correct / total), step=n)
print("epoch:", n, "acc:", float(correct / total))
tf.saved_model.save(network, 'model-lenet')
该方案是最通用的Lenet-5网络结构,我用该方案进行各种参数、批大小调优,最好一次得到了99.14%的精确度,大部分训练在99%左右。代码中也有注释的其它两种优化器,使用效果一般,精度不如SGD优化器。
二、Lenet-5改进模型
上面典型模型精度很难再提高,通过网上文章搜索,有同学提出了Lenet-5优化模型,这个改进模型前面是使用两5X5的卷积盒,层与层之间用dropout减少网络规模,这个网络的batch size要用小的数据批,同时优化器需要修改为Adam。
batchsz = 32
...
# 通过Sequnentia容器创建LeNet-5
network = Sequential([
layers.Conv2D(32, kernel_size=5, padding='Same', activation='relu', strides=1), # 第一组卷积核,32个5X5的卷积核,
layers.Conv2D(32, kernel_size=5, padding='Same', activation='relu', strides=1), # 第一组卷积核,32个5X5的卷积核,
layers.MaxPooling2D(pool_size=2), # 高宽各减半的池化层
layers.Dropout(0.25), # 激活函数
layers.Conv2D(64, kernel_size=3, padding='Same', activation='relu', strides=1), # 第二组卷积核,32个3X3的卷积核,
layers.Conv2D(64, kernel_size=3, padding='Same', activation='relu', strides=1), # 第二组卷积核,32个3X3的卷积核,
layers.MaxPooling2D(pool_size=2, strides=2), # 高宽各减半的池化层
layers.Dropout(0.25), # 激活函数
layers.Flatten(), # 打平层,方便全连接层处理
layers.Dense(256, activation='relu'), # 全连接层,256个结点
layers.Dense(10, activation='softmax') # 全连接层,10个结点
])
...
# 创建损失函数的类,在实际计算时直接调用类实例
criteon = losses.CategoricalCrossentropy()
# optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.95) # batch,128,lr=0.01,acc:0.9914
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) # batch,128,0,89
改进Lenet-5的优化效果还是不错的,通过模型与参数改进,在30个epoch下获得了99.38%的准确率,我查了一下数据,这也是很多同学用普通模型所能得到的最好成绩了。
三、Kaggle最MNIST模型
再往上提升精确度非常困难了,我尝试过增加层数与宽度的模型,效果也不是十分明显。继续从网上查询,在Kaggle找取阳Top 1的MNIST模型,根据文章介绍扩充数据集后能达到99.9%以上。该模型对Lenet-5进行了非常大的扩充与调整该模型达到了10层,同时宽度达到了512个节点,同时该模型采用了图像增广、优化参数动态调整、训练召回等优化训练策略。
先采用不扩充数据集的MNIST进行训练,精确度能达到99.35%,与第二个改进模型差别不是很大。
import numpy as np
import tensorflow as tf
from tensorflow.keras import Sequential, layers, datasets
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data() # 加载手写数据集数据
x_train, x_test = x_train / 255.0, x_test / 255.0
# 增加一个维度
x_train = np.expand_dims(x_train, axis=3)
x_test = np.expand_dims(x_test, axis=3)
print("train shape:", x_train.shape)
print("test shape:", x_test.shape)
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=20,
width_shift_range=0.20,
shear_range=15,
zoom_range=0.10,
validation_split=0.15,
horizontal_flip=False
)
train_generator = datagen.flow(
x_train,
y_train,
batch_size=256,
subset='training',
)
validation_generator = datagen.flow(
x_train,
y_train,
batch_size=64,
subset='validation',
)
def create_model():
model = tf.keras.Sequential([
tf.keras.layers.Reshape((28, 28, 1)),
tf.keras.layers.Conv2D(filters=32, kernel_size=(5, 5), activation="relu", padding="same",
input_shape=(28, 28, 1)),
tf.keras.layers.MaxPool2D((2, 2)),
tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation="relu", padding="same"),
tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation="relu", padding="same"),
tf.keras.layers.MaxPool2D((2, 2)),
tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), activation="relu", padding="same"),
tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), activation="relu", padding="same"),
tf.keras.layers.MaxPool2D((2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation="sigmoid"),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(512, activation="sigmoid"),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(256, activation="sigmoid"),
tf.keras.layers.Dropout(0.1),
tf.keras.layers.Dense(10, activation="sigmoid")
])
model.compile(
optimizer="adam",
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
return model
model = create_model()
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',
factor=0.1,
patience=5,
min_lr=0.000001,
verbose=1)
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath='model.hdf5',
monitor='val_loss',
save_best_only=True,
save_weights_only=True,
verbose=1)
history = model.fit(train_generator,
epochs=10,
validation_data=validation_generator,
callbacks=[reduce_lr, checkpoint],
verbose=1)
model.summary()
# step5 模型测试
loss, acc = model.evaluate(x_test, y_test)
print("train model, accuracy:{:5.2f}%".format(100 * acc))
model.load_weights('model.hdf5')
final_loss, final_acc = model.evaluate(x_test, y_test, verbose=2)
print("Model accuracy: ", final_acc, ", model loss: ", final_loss)
四、扩充数据集模型
从第二、三优化来看,MNIST数据集存在先天不足,标记也存在缺陷,看来靠算法很难提高精确度了。
Facebook提供了QMNIST数据集,能MNIST数据集进行了扩充,通过下载QMNIST数据集与MNIST数据进行合并,数据集总量达到19万条,其中18万条用于训练和验证,1万条用于独立测试,数据合并如下所示。通过第三章的模型,以及本章的数据合并,能够将MNIST的预测精确度提升到99.96%,从kaggle的排行来看,训练精确度超过3个9是非常高的了。
import matplotlib.pyplot as plt
import warnings
import gzip
import lzma
import codecs
warnings.filterwarnings("ignore")
plt.rcParams['figure.figsize'] = [20, 20]
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
def plot_images(images, labels, shape=(3, 3)):
fig, p = plt.subplots(shape[0], shape[1])
i = 0
for x in p:
for ax in x:
ax.imshow(images[i])
ax.set_title(labels[i])
i += 1
def plot_images_no_title(images, shape=(3, 3)):
fig, p = plt.subplots(shape[0], shape[1])
i = 0
for x in p:
for ax in x:
ax.imshow(images[i])
i += 1
def get_int(b):
return int(codecs.encode(b, 'hex'), 16)
def open_maybe_compressed_file(path):
if path.endswith('.gz'):
return gzip.open(path, 'rb')
elif path.endswith('.xz'):
return lzma.open(path, 'rb')
else:
return open(path, 'rb')
def read_idx3_ubyte(path):
with open_maybe_compressed_file(path) as f:
data = f.read()
assert get_int(data[:4]) == 8 * 256 + 3
length = get_int(data[4:8])
num_rows = get_int(data[8:12])
num_cols = get_int(data[12:16])
parsed = np.frombuffer(data, dtype=np.uint8, offset=16)
return tf.convert_to_tensor(parsed)
def read_idx2_int(path):
with open_maybe_compressed_file(path) as f:
data = f.read()
assert get_int(data[:4]) == 12 * 256 + 2
length = get_int(data[4:8])
width = get_int(data[8:12])
parsed = np.frombuffer(data, dtype=np.dtype('>i4'), offset=12)
return tf.convert_to_tensor(parsed.astype('i4'))
# Load MNIST data
(X_train_mnist, y_train_mnist), (X_test_mnist, y_test_mnist) = mnist.load_data()
# Preprocess MNIST to match our preprocessing
X_mnist = X_train_mnist.reshape(-1, 28, 28, 1)
X_mnist = X_mnist.astype(np.float32) / 255
y_mnist = y_train_mnist
# 测试数据预处理
X_test_mnist = X_test_mnist.reshape(-1, 28, 28, 1)
X_test_mnist = X_test_mnist.astype(np.float32) / 255
# final dataset shape
print("MNIST image dataset shape:", X_mnist.shape)
# plot_images(X_mnist[:9], y_mnist[:9], shape=(3, 3))
# lt.show()
# Read qmnist data
qmnist_data = "d:/qmnist-main/qmnist-train-images-idx3-ubyte.gz"
qminst_label = "d:/qmnist-main/qmnist-train-labels-idx2-int.gz"
qmnist = read_idx3_ubyte(qmnist_data)
y_qmnist = read_idx2_int(qminst_label)
# we reshape and normalize the data
X_qmnist = np.array(qmnist, dtype="float32") / 255
X_qmnist = X_qmnist.reshape(-1, 28, 28, 1)
# 先将EagerTensor转换为标准张量,转换为二维,并取第一列转换为one hot张量
y_qmnist = np.array(y_qmnist)
y_qmnist = y_qmnist.reshape(-1, 8)
y_qmnist = y_qmnist[:, 0]
print("QMNIST image dataset shape:", X_qmnist.shape)
# Read qmnist_test data
qmnist_test_data = "d:/qmnist-main/qmnist-test-images-idx3-ubyte.gz"
qminst_test_label = "d:/qmnist-main/qmnist-test-labels-idx2-int.gz"
qmnist_test = read_idx3_ubyte(qmnist_test_data)
y_qmnist_test = read_idx2_int(qminst_test_label)
# we reshape and normalize the data
X_qmnist_test = np.array(qmnist_test, dtype="float32") / 255
X_qmnist_test = X_qmnist_test.reshape(-1, 28, 28, 1)
# 先将EagerTensor转换为标准张量,转换为二维,并取第一列转换为one hot张量
y_qmnist_test = np.array(y_qmnist_test)
y_qmnist_test = y_qmnist_test.reshape(-1, 8)
y_qmnist_test = y_qmnist_test[:, 0]
print("QMNIST test image dataset shape:", X_qmnist_test.shape)
# Combine MNIST and QMNIST
x_train = np.concatenate((X_mnist, X_qmnist, X_qmnist_test))
y_train = np.concatenate((y_mnist, y_qmnist, y_qmnist_test))
print("Train image dataset shape:", x_train.shape)
本文分享了提升MNIST手写数字识别准确率的四个方法,从基础的Lenet-5模型到Kaggle顶级模型,详细介绍了每个模型的改进策略和实现效果,最终通过数据扩充将识别准确率提升到99.96%。
3699

被折叠的 条评论
为什么被折叠?



