MNIST手写数字识别准确度提升最全、最实用的方法

本文分享了提升MNIST手写数字识别准确率的四个方法,从基础的Lenet-5模型到Kaggle顶级模型,详细介绍了每个模型的改进策略和实现效果,最终通过数据扩充将识别准确率提升到99.96%。

MNIST手写数字识别是所有学习AI同学的入门必经过程,MNIST识别准确率提升修炼是精通AI模型的必经课程,MNIST识别准确率开刚始大家一般都能达到90%左右,再往上提高还需要费较大的精力去修改模型、调优参数,MNIST识别率究竟能达到多少,对于初学者还是很难搞清楚,刚开始也没有经验去提升得很高,我在第一遍学习时,通过参数和训练次数调整,利用了很多模型,达到了99.3%的精度,再往上提升时当时那台电脑的计算能力不够,也没有找到新的模型,没有再做研究了,就学习其它内容去了。

上个月旧电脑出现故障,不能启动,硬盘使用了Bitlocker,数据也不能恢复,学习代码也丢了,这次更换了一台性能非常好的电脑后重新学习,就把MNIST的Lenet-5算法重新学习了一遍,结合Kaggle网站上的方案,做了个模型和方案,最终把MNIST识别准确度提升到了99.96%的水平,据了解由于MNIST数据本身有缺陷,除非将评估数据集加入训练准确率应该是达不到100%的。我想能达到3个9的准度也应该足够了,特将实现方案总结出来,分享新学习的同学们,希望对大家有所帮助。

一、Lenet-5通用模型方案

下面方案是Lenet-5最通用的方案,初始是用3X3卷积盒,模型使用Sequential构建,训练也是用自己写的循环代码一步一步构建,特别适合于新手引用。此外,该案例还有日志保存、模型可在代码。

"""
LetNet-5 实战1:网上使用最多的模型, 测试用例精确度能达到99%
"""
import datetime
import tensorflow as tf
from tensorflow.keras import Sequential, layers, losses, datasets


def preprocess(x, y):
    """
    预处理函数
    """
    # [b, 28, 28], [b]
    x = tf.cast(x, dtype=tf.float32) / 255.
    y = tf.cast(y, dtype=tf.int32)
    y = tf.one_hot(y, depth=10)
    return x, y


# 设置LOG与模型保存相关参数
current_time = datetime.datetime.now().strftime(('%Y%m%d-%H%M%S'))
log_dir = 'logs/' + current_time
summary_writer = tf.summary.create_file_writer(log_dir)
(x, y), (x_test, y_test) = datasets.mnist.load_data()  # 加载手写数据集数据
batchsz = 128   # 此模型下batch size在128下比较好
train_db = tf.data.Dataset.from_tensor_slices((x, y))  # 转化为Dataset对象
train_db = train_db.shuffle(100000)  # 随机打散
train_db = train_db.batch(batchsz)  # 批训练
train_db = train_db.map(preprocess)  # 数据预处理

test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_db = test_db.shuffle(1000).batch(batchsz).map(preprocess)
# 通过Sequnentia容器创建LeNet-5
network = Sequential([
    layers.Conv2D(6, kernel_size=3, strides=1),  # 第一个卷积核,6个3X3的卷积核,
    layers.MaxPooling2D(pool_size=2, strides=2),  # 高宽各减半的池化层
    layers.ReLU(),  # 激活函数
    layers.Conv2D(16, kernel_size=3, strides=1),  # 第二个卷积核,16个3X3的卷积核,
    layers.MaxPooling2D(pool_size=2, strides=2),  # 高宽各减半的池化层
    layers.ReLU(),  # 激活函数
    layers.Flatten(),  # 打平层,方便全连接层处理
    layers.Dense(120, activation='relu'),  # 全连接层,120个结点
    layers.Dense(84, activation='relu'),  # 全连接层,84个结点
    layers.Dense(10, activation='relu')  # 全连接层,10个结点
])
# build 一次网络模型,给输入X的形状
network.build(input_shape=(batchsz, 28, 28, 1))
# 统计网络信息
network.summary()

# 创建损失函数的类,在实际计算时直接调用类实例
criteon = losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.95)  # batch,128,lr=0.01,acc:0.9914
# optimizer = tf.keras.optimizers.Nadam(learning_rate=0.002)  # batch,128,0,89
# optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.01)

# 训练30个epoch
epoch = 30
steps = 0
for n in range(epoch):
    for step, (x, y) in enumerate(train_db):
        with tf.GradientTape() as tape:
            # 插入通道维度 =》[b,28,28,1]
            x = tf.expand_dims(x, axis=3)
            # 前向计算,获取10类别的概率分布 [b,784]=>[b,10]
            out = network(x)
            # 计算交叉熵损失函数,标量
            loss = criteon(y, out)

        # 自动计算梯度
        grads = tape.gradient(loss, network.trainable_variables)
        # 自动更新参数
        optimizer.apply_gradients(zip(grads, network.trainable_variables))
        if step % 100 == 0:
            steps += 100
            print('epoch:', n, 'step:', step, 'loss:', float(loss))
            with summary_writer.as_default():
                tf.summary.scalar('loss', float(loss), step=steps)

    correct, total = 0, 0
    for x, y in test_db:
        # 插入通道维度 =>[b,28,28,1]
        x = tf.expand_dims(x, axis=3)
        # 前向计算,获得10类别的预测分布 [b,784] => [b,10]
        out = network(x)
        # 真实的流程时先经过softmax,再argmax
        # 但是由于softmax不改变元素的大小相对关系,故省去
        pred = tf.argmax(out, axis=-1)
        y = tf.cast(y, tf.int64)
        y = tf.argmax(y, axis=-1)
        # 统计预测正确的数量
        correct += float(tf.reduce_sum(tf.cast(tf.equal(pred, y), tf.float32)))
        # 统计预测样本的总数量
        total += x.shape[0]
    with summary_writer.as_default():
        tf.summary.scalar('acc', float(correct / total), step=n)
    print("epoch:", n, "acc:", float(correct / total))

tf.saved_model.save(network, 'model-lenet')

该方案是最通用的Lenet-5网络结构,我用该方案进行各种参数、批大小调优,最好一次得到了99.14%的精确度,大部分训练在99%左右。代码中也有注释的其它两种优化器,使用效果一般,精度不如SGD优化器。

二、Lenet-5改进模型

上面典型模型精度很难再提高,通过网上文章搜索,有同学提出了Lenet-5优化模型,这个改进模型前面是使用两5X5的卷积盒,层与层之间用dropout减少网络规模,这个网络的batch size要用小的数据批,同时优化器需要修改为Adam。

batchsz = 32
...
# 通过Sequnentia容器创建LeNet-5
network = Sequential([
    layers.Conv2D(32, kernel_size=5, padding='Same', activation='relu', strides=1),  # 第一组卷积核,32个5X5的卷积核,
    layers.Conv2D(32, kernel_size=5, padding='Same', activation='relu', strides=1),  # 第一组卷积核,32个5X5的卷积核,
    layers.MaxPooling2D(pool_size=2),  # 高宽各减半的池化层
    layers.Dropout(0.25),  # 激活函数
    layers.Conv2D(64, kernel_size=3, padding='Same', activation='relu', strides=1),  # 第二组卷积核,32个3X3的卷积核,
    layers.Conv2D(64, kernel_size=3, padding='Same', activation='relu', strides=1),  # 第二组卷积核,32个3X3的卷积核,
    layers.MaxPooling2D(pool_size=2, strides=2),  # 高宽各减半的池化层
    layers.Dropout(0.25),  # 激活函数
    layers.Flatten(),  # 打平层,方便全连接层处理
    layers.Dense(256, activation='relu'),  # 全连接层,256个结点

    layers.Dense(10, activation='softmax')  # 全连接层,10个结点
])
...
# 创建损失函数的类,在实际计算时直接调用类实例
criteon = losses.CategoricalCrossentropy()
# optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.95)  # batch,128,lr=0.01,acc:0.9914
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)  # batch,128,0,89

改进Lenet-5的优化效果还是不错的,通过模型与参数改进,在30个epoch下获得了99.38%的准确率,我查了一下数据,这也是很多同学用普通模型所能得到的最好成绩了。

三、Kaggle最MNIST模型

再往上提升精确度非常困难了,我尝试过增加层数与宽度的模型,效果也不是十分明显。继续从网上查询,在Kaggle找取阳Top 1的MNIST模型,根据文章介绍扩充数据集后能达到99.9%以上。该模型对Lenet-5进行了非常大的扩充与调整该模型达到了10层,同时宽度达到了512个节点,同时该模型采用了图像增广、优化参数动态调整、训练召回等优化训练策略。

先采用不扩充数据集的MNIST进行训练,精确度能达到99.35%,与第二个改进模型差别不是很大。

import numpy as np
import tensorflow as tf

from tensorflow.keras import Sequential, layers, datasets

(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()  # 加载手写数据集数据
x_train, x_test = x_train / 255.0, x_test / 255.0
# 增加一个维度
x_train = np.expand_dims(x_train, axis=3)
x_test = np.expand_dims(x_test, axis=3)

print("train shape:", x_train.shape)
print("test shape:", x_test.shape)

datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.20,
    shear_range=15,
    zoom_range=0.10,
    validation_split=0.15,
    horizontal_flip=False
)

train_generator = datagen.flow(
    x_train,
    y_train,
    batch_size=256,
    subset='training',
)

validation_generator = datagen.flow(
    x_train,
    y_train,
    batch_size=64,
    subset='validation',
)


def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Reshape((28, 28, 1)),
        tf.keras.layers.Conv2D(filters=32, kernel_size=(5, 5), activation="relu", padding="same",
                               input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPool2D((2, 2)),

        tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation="relu", padding="same"),
        tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation="relu", padding="same"),
        tf.keras.layers.MaxPool2D((2, 2)),

        tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), activation="relu", padding="same"),
        tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), activation="relu", padding="same"),
        tf.keras.layers.MaxPool2D((2, 2)),

        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation="sigmoid"),
        tf.keras.layers.Dropout(0.25),

        tf.keras.layers.Dense(512, activation="sigmoid"),
        tf.keras.layers.Dropout(0.25),

        tf.keras.layers.Dense(256, activation="sigmoid"),
        tf.keras.layers.Dropout(0.1),

        tf.keras.layers.Dense(10, activation="sigmoid")
    ])

    model.compile(
        optimizer="adam",
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    return model


model = create_model()

reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',
                                                 factor=0.1,
                                                 patience=5,
                                                 min_lr=0.000001,
                                                 verbose=1)

checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath='model.hdf5',
                                                monitor='val_loss',
                                                save_best_only=True,
                                                save_weights_only=True,
                                                verbose=1)

history = model.fit(train_generator,
                              epochs=10,
                              validation_data=validation_generator,
                              callbacks=[reduce_lr, checkpoint],
                              verbose=1)
model.summary()

# step5 模型测试
loss, acc = model.evaluate(x_test, y_test)
print("train model, accuracy:{:5.2f}%".format(100 * acc))


model.load_weights('model.hdf5')
final_loss, final_acc = model.evaluate(x_test,  y_test, verbose=2)
print("Model accuracy: ", final_acc, ", model loss: ", final_loss)

四、扩充数据集模型

从第二、三优化来看,MNIST数据集存在先天不足,标记也存在缺陷,看来靠算法很难提高精确度了。

Facebook提供了QMNIST数据集,能MNIST数据集进行了扩充,通过下载QMNIST数据集与MNIST数据进行合并,数据集总量达到19万条,其中18万条用于训练和验证,1万条用于独立测试,数据合并如下所示。通过第三章的模型,以及本章的数据合并,能够将MNIST的预测精确度提升到99.96%,从kaggle的排行来看,训练精确度超过3个9是非常高的了。

import matplotlib.pyplot as plt
import warnings

import gzip
import lzma
import codecs

warnings.filterwarnings("ignore")
plt.rcParams['figure.figsize'] = [20, 20]


def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict


def plot_images(images, labels, shape=(3, 3)):
    fig, p = plt.subplots(shape[0], shape[1])
    i = 0
    for x in p:
        for ax in x:
            ax.imshow(images[i])
            ax.set_title(labels[i])
            i += 1


def plot_images_no_title(images, shape=(3, 3)):
    fig, p = plt.subplots(shape[0], shape[1])
    i = 0
    for x in p:
        for ax in x:
            ax.imshow(images[i])
            i += 1


def get_int(b):
    return int(codecs.encode(b, 'hex'), 16)


def open_maybe_compressed_file(path):
    if path.endswith('.gz'):
        return gzip.open(path, 'rb')
    elif path.endswith('.xz'):
        return lzma.open(path, 'rb')
    else:
        return open(path, 'rb')


def read_idx3_ubyte(path):
    with open_maybe_compressed_file(path) as f:
        data = f.read()
        assert get_int(data[:4]) == 8 * 256 + 3
        length = get_int(data[4:8])
        num_rows = get_int(data[8:12])
        num_cols = get_int(data[12:16])
        parsed = np.frombuffer(data, dtype=np.uint8, offset=16)
        return tf.convert_to_tensor(parsed)


def read_idx2_int(path):
    with open_maybe_compressed_file(path) as f:
        data = f.read()
        assert get_int(data[:4]) == 12 * 256 + 2
        length = get_int(data[4:8])
        width = get_int(data[8:12])
        parsed = np.frombuffer(data, dtype=np.dtype('>i4'), offset=12)
        return tf.convert_to_tensor(parsed.astype('i4'))


# Load MNIST data
(X_train_mnist, y_train_mnist), (X_test_mnist, y_test_mnist) = mnist.load_data()

# Preprocess MNIST to match our preprocessing
X_mnist = X_train_mnist.reshape(-1, 28, 28, 1)
X_mnist = X_mnist.astype(np.float32) / 255
y_mnist = y_train_mnist

# 测试数据预处理
X_test_mnist = X_test_mnist.reshape(-1, 28, 28, 1)
X_test_mnist = X_test_mnist.astype(np.float32) / 255
# final dataset shape
print("MNIST image dataset shape:", X_mnist.shape)

# plot_images(X_mnist[:9], y_mnist[:9], shape=(3, 3))
# lt.show()


# Read qmnist data
qmnist_data = "d:/qmnist-main/qmnist-train-images-idx3-ubyte.gz"
qminst_label = "d:/qmnist-main/qmnist-train-labels-idx2-int.gz"

qmnist = read_idx3_ubyte(qmnist_data)
y_qmnist = read_idx2_int(qminst_label)

# we reshape and normalize the data
X_qmnist = np.array(qmnist, dtype="float32") / 255
X_qmnist = X_qmnist.reshape(-1, 28, 28, 1)

# 先将EagerTensor转换为标准张量,转换为二维,并取第一列转换为one hot张量
y_qmnist = np.array(y_qmnist)
y_qmnist = y_qmnist.reshape(-1, 8)
y_qmnist = y_qmnist[:, 0]

print("QMNIST image dataset shape:", X_qmnist.shape)

# Read qmnist_test data
qmnist_test_data = "d:/qmnist-main/qmnist-test-images-idx3-ubyte.gz"
qminst_test_label = "d:/qmnist-main/qmnist-test-labels-idx2-int.gz"

qmnist_test = read_idx3_ubyte(qmnist_test_data)
y_qmnist_test = read_idx2_int(qminst_test_label)

# we reshape and normalize the data
X_qmnist_test = np.array(qmnist_test, dtype="float32") / 255
X_qmnist_test = X_qmnist_test.reshape(-1, 28, 28, 1)

# 先将EagerTensor转换为标准张量,转换为二维,并取第一列转换为one hot张量
y_qmnist_test = np.array(y_qmnist_test)
y_qmnist_test = y_qmnist_test.reshape(-1, 8)
y_qmnist_test = y_qmnist_test[:, 0]

print("QMNIST test image dataset shape:", X_qmnist_test.shape)

# Combine MNIST and QMNIST
x_train = np.concatenate((X_mnist, X_qmnist, X_qmnist_test))
y_train = np.concatenate((y_mnist, y_qmnist, y_qmnist_test))

print("Train image dataset shape:", x_train.shape)
### 提升MNIST手写数字识别准确率的佳实践和技术技巧 #### 使用更深层次的神经网络结构 为了提升模型的表现力,采用更深层数的卷积神经网络(CNN),能够捕捉图像中的复杂特征。相较于浅层架构,深层网络能更好地提取抽象层次更高的模式。 ```python import tensorflow as tf from tensorflow.keras import layers, models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10)) ``` #### 应用数据增强技术 通过随机变换输入图片来扩充训练样本的数量和多样性,有助于防止过拟合并改善泛化能力。常见的操作包括旋转、平移、缩放以及亮度调整等[^1]。 ```python data_augmentation = keras.Sequential([ layers.RandomRotation(factor=0.1), layers.RandomTranslation(height_factor=0.1, width_factor=0.1), layers.RandomFlip(mode="horizontal"), ]) ``` #### 调整优化器及其超参数配置 选择合适的梯度下降变体如Adam或RMSprop,并精细调校初始学习速率、动量项以及其他相关系数,可加速收敛进程且获得更低误差水平的结果[^4]。 ```python optimizer = tf.keras.optimizers.Adam( learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, ) model.compile(optimizer=optimizer, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) ``` #### 实施早停策略(Early Stopping) 当验证集上的表现不再继续改进时提前终止迭代过程,避免因过度拟合而导致终效果不佳的情况发生[^3]。 ```python early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True) history = model.fit(train_images, train_labels, epochs=epochs, validation_data=(test_images, test_labels), callbacks=[early_stopping_callback]) ``` #### 利用正则化手段抑制过拟合现象 引入L2权重衰减惩罚因子或者Dropout机制,在一定程度上缓解由于参数过多造成的过适应问题,使模型更加稳健可靠。 ```python model.add(tf.keras.layers.Dropout(rate=0.5)) # Dropout layer after dense layer kernel_regularizer=tf.keras.regularizers.l2(l=0.01) # L2 regularization on Conv2D filters ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值