43、基于深度卷积神经网络的图像分类-优快云博客

本文链接：https://blog.youkuaiyun.com/vim8coder/article/details/154891168

基于深度卷积神经网络的图像分类

在图像分类领域，深度卷积神经网络（CNN）展现出了强大的能力。本文将围绕使用CNN进行手写数字识别和人脸图像的性别分类展开，详细介绍相关技术和实现步骤。

手写数字识别

在手写数字识别任务中，我们可以使用CNN对输入的手写数字图像进行分类。以下是一个简单的示例代码，展示了如何使用CNN进行手写数字识别：

import tensorflow as tf
import matplotlib.pyplot as plt

# 加载MNIST数据集
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 数据预处理
x_train, x_test = x_train / 255.0, x_test / 255.0

# 构建CNN模型
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# 编译模型
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# 训练模型
model.fit(x_train.reshape(-1, 28, 28, 1), y_train, epochs=5)

# 评估模型
test_loss, test_acc = model.evaluate(x_test.reshape(-1, 28, 28, 1), y_test)
print(f"Test accuracy: {test_acc}")

# 预测并可视化结果
predictions = model.predict(x_test.reshape(-1, 28, 28, 1))
fig = plt.figure(figsize=(16, 8.5))
for i in range(10):
    ax = fig.add_subplot(2, 5, i + 1)
    ax.imshow(x_test[i], cmap='gray')
    ax.text(0.9, 0.1, '{}'.format(tf.argmax(predictions[i])),
            size=15, color='blue',
            horizontalalignment='center',
            verticalalignment='center',
            transform=ax.transAxes)
plt.show()

上述代码中，我们首先加载了MNIST数据集，并对数据进行预处理。然后构建了一个简单的CNN模型，包含卷积层、池化层、全连接层。接着编译模型并进行训练，最后评估模型并可视化预测结果。

人脸图像性别分类

接下来，我们将使用CelebA数据集进行人脸图像的性别分类。

1. 加载CelebA数据集

import tensorflow as tf
import tensorflow_datasets as tfds

celeba_bldr = tfds.builder('celeb_a')
celeba_bldr.download_and_prepare()
celeba = celeba_bldr.as_dataset(shuffle_files=False)
celeba_train = celeba['train']
celeba_valid = celeba['validation']
celeba_test = celeba['test']

def count_items(ds):
    n = 0
    for _ in ds:
        n += 1
    return n

print('Train set:  {}'.format(count_items(celeba_train)))
print('Validation: {}'.format(count_items(celeba_valid)))
print('Test set:   {}'.format(count_items(celeba_test)))

# 取子集
celeba_train = celeba_train.take(16000)
celeba_valid = celeba_valid.take(1000)
print('Train set:  {}'.format(count_items(celeba_train)))
print('Validation: {}'.format(count_items(celeba_valid)))

在这个步骤中，我们使用 tensorflow_datasets 加载CelebA数据集，并统计每个数据集的样本数量。为了加快训练速度，我们从训练集和验证集中分别取了16000和1000个样本。

2. 图像变换和数据增强

数据增强是处理训练数据有限情况的有效技术。以下是一些常见的图像变换操作：

import matplotlib.pyplot as plt

# 取5个示例
examples = []
for example in celeba_train.take(5):
    examples.append(example['image'])

fig = plt.figure(figsize=(16, 8.5))

# 裁剪到边界框
ax = fig.add_subplot(2, 5, 1)
ax.set_title('Crop to a \nbounding-box', size=15)
ax.imshow(examples[0])
ax = fig.add_subplot(2, 5, 6)
img_cropped = tf.image.crop_to_bounding_box(examples[0], 50, 20, 128, 128)
ax.imshow(img_cropped)

# 水平翻转
ax = fig.add_subplot(2, 5, 2)
ax.set_title('Flip (horizontal)', size=15)
ax.imshow(examples[1])
ax = fig.add_subplot(2, 5, 7)
img_flipped = tf.image.flip_left_right(examples[1])
ax.imshow(img_flipped)

# 调整对比度
ax = fig.add_subplot(2, 5, 3)
ax.set_title('Adjust constrast', size=15)
ax.imshow(examples[2])
ax = fig.add_subplot(2, 5, 8)
img_adj_contrast = tf.image.adjust_contrast(examples[2], contrast_factor=2)
ax.imshow(img_adj_contrast)

# 调整亮度
ax = fig.add_subplot(2, 5, 4)
ax.set_title('Adjust brightness', size=15)
ax.imshow(examples[3])
ax = fig.add_subplot(2, 5, 9)
img_adj_brightness = tf.image.adjust_brightness(examples[3], delta=0.3)
ax.imshow(img_adj_brightness)

# 中心裁剪并调整大小
ax = fig.add_subplot(2, 5, 5)
ax.set_title('Centeral crop\nand resize', size=15)
ax.imshow(examples[4])
ax = fig.add_subplot(2, 5, 10)
img_center_crop = tf.image.central_crop(examples[4], 0.7)
img_resized = tf.image.resize(img_center_crop, size=(218, 178))
ax.imshow(img_resized.numpy().astype('uint8'))

plt.show()

上述代码展示了如何对图像进行裁剪、翻转、调整对比度和亮度等操作。这些操作可以增加数据的多样性，减少过拟合。

我们还可以进行随机变换，构建数据增强管道：

tf.random.set_seed(1)
fig = plt.figure(figsize=(14, 12))
for i, example in enumerate(celeba_train.take(3)):
    image = example['image']

    ax = fig.add_subplot(3, 4, i * 4 + 1)
    ax.imshow(image)
    if i == 0:
        ax.set_title('Orig', size=15)

    ax = fig.add_subplot(3, 4, i * 4 + 2)
    img_crop = tf.image.random_crop(image, size=(178, 178, 3))
    ax.imshow(img_crop)
    if i == 0:
        ax.set_title('Step 1: Random crop', size=15)

    ax = fig.add_subplot(3, 4, i * 4 + 3)
    img_flip = tf.image.random_flip_left_right(img_crop)
    ax.imshow(tf.cast(img_flip, tf.uint8))
    if i == 0:
        ax.set_title('Step 2: Random flip', size=15)

    ax = fig.add_subplot(3, 4, i * 4 + 4)
    img_resize = tf.image.resize(img_flip, size=(128, 128))
    ax.imshow(tf.cast(img_resize, tf.uint8))
    if i == 0:
        ax.set_title('Step 3: Resize', size=15)

plt.show()

为了方便使用，我们定义一个预处理函数：

def preprocess(example, size=(64, 64), mode='train'):
    image = example['image']
    label = example['attributes']['Male']
    if mode == 'train':
        image_cropped = tf.image.random_crop(image, size=(178, 178, 3))
        image_resized = tf.image.resize(image_cropped, size=size)
        image_flip = tf.image.random_flip_left_right(image_resized)
        return image_flip / 255.0, tf.cast(label, tf.int32)
    else:
        image_cropped = tf.image.crop_to_bounding_box(image, offset_height=20, offset_width=0,
                                                      target_height=178, target_width=178)
        image_resized = tf.image.resize(image_cropped, size=size)
        return image_resized / 255.0, tf.cast(label, tf.int32)

然后对训练集和验证集应用预处理函数：

import numpy as np

BATCH_SIZE = 32
BUFFER_SIZE = 1000
IMAGE_SIZE = (64, 64)
steps_per_epoch = np.ceil(16000 / BATCH_SIZE)

ds_train = celeba_train.map(lambda x: preprocess(x, size=IMAGE_SIZE, mode='train'))
ds_train = ds_train.shuffle(buffer_size=BUFFER_SIZE).repeat()
ds_train = ds_train.batch(BATCH_SIZE)

ds_valid = celeba_valid.map(lambda x: preprocess(x, size=IMAGE_SIZE, mode='eval'))
ds_valid = ds_valid.batch(BATCH_SIZE)

3. 训练CNN性别分类器

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(rate=0.5),

    tf.keras.layers.Conv2D(64, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(rate=0.5),

    tf.keras.layers.Conv2D(128, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),

    tf.keras.layers.Conv2D(256, (3, 3), padding='same', activation='relu')
])

print(model.compute_output_shape(input_shape=(None, 64, 64, 3)))

model.add(tf.keras.layers.GlobalAveragePooling2D())
print(model.compute_output_shape(input_shape=(None, 64, 64, 3)))

model.add(tf.keras.layers.Dense(1, activation=None))
tf.random.set_seed(1)
model.build(input_shape=(None, 64, 64, 3))
model.summary()

model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(ds_train, validation_data=ds_valid,
                    epochs=20,
                    steps_per_epoch=steps_per_epoch)

在这个步骤中，我们构建了一个CNN模型，包含卷积层、池化层、全局平均池化层和全连接层。然后编译模型并进行训练。

4. 可视化学习曲线

hist = history.history
x_arr = np.arange(len(hist['loss'])) + 1

fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(1, 2, 1)
ax.plot(x_arr, hist['loss'], '-o', label='Train loss')
ax.plot(x_arr, hist['val_loss'], '--<', label='Validation loss')
ax.legend(fontsize=15)
ax.set_xlabel('Epoch', size=15)
ax.set_ylabel('Loss', size=15)

ax = fig.add_subplot(1, 2, 2)
ax.plot(x_arr, hist['accuracy'], '-o', label='Train acc.')
ax.plot(x_arr, hist['val_accuracy'], '--<', label='Validation acc.')
ax.legend(fontsize=15)
ax.set_xlabel('Epoch', size=15)
ax.set_ylabel('Accuracy', size=15)

plt.show()

通过可视化学习曲线，我们可以观察训练集和验证集的损失和准确率变化情况。

5. 继续训练和评估模型

history = model.fit(ds_train, validation_data=ds_valid,
                    epochs=30, initial_epoch=20,
                    steps_per_epoch=steps_per_epoch)

ds_test = celeba_test.map(lambda x: preprocess(x, size=IMAGE_SIZE, mode='eval')).batch(32)
test_results = model.evaluate(ds_test)
print('Test Acc: {:.2f}%'.format(test_results[1] * 100))

如果学习曲线还没有收敛，我们可以继续训练模型。最后，在测试集上评估模型的性能。

6. 预测并可视化结果

ds = ds_test.unbatch().take(10)
pred_logits = model.predict(ds.batch(10))
probas = tf.sigmoid(pred_logits)
probas = probas.numpy().flatten() * 100

fig = plt.figure(figsize=(15, 7))
for j, example in enumerate(ds):
    ax = fig.add_subplot(2, 5, j + 1)
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(example[0])
    if example[1].numpy() == 1:
        label = 'M'
    else:
        label = 'F'
    ax.text(0.5, -0.15, 'GT: {:s}\nPr(Male)={:.0f}%'.format(label, probas[j]),
            size=16,
            horizontalalignment='center',
            verticalalignment='center',
            transform=ax.transAxes)
plt.tight_layout()
plt.show()

我们从测试集中取10个样本，进行预测并可视化结果，同时显示真实标签和预测概率。

总结

通过上述步骤，我们学习了如何使用CNN进行手写数字识别和人脸图像的性别分类。在人脸图像性别分类任务中，我们使用了数据增强技术来提高模型的泛化能力。以下是整个流程的总结表格：
| 步骤 | 操作 |
| ---- | ---- |
| 1 | 加载CelebA数据集并取子集 |
| 2 | 进行图像变换和数据增强 |
| 3 | 构建并训练CNN性别分类器 |
| 4 | 可视化学习曲线 |
| 5 | 继续训练和评估模型 |
| 6 | 预测并可视化结果 |

同时，整个流程可以用以下mermaid流程图表示：

graph TD;
    A[加载CelebA数据集] --> B[取子集];
    B --> C[图像变换和数据增强];
    C --> D[构建并训练CNN模型];
    D --> E[可视化学习曲线];
    E --> F{是否继续训练};
    F -- 是 --> D;
    F -- 否 --> G[评估模型];
    G --> H[预测并可视化结果];

通过这些步骤和技术，我们可以有效地使用CNN进行图像分类任务，并提高模型的性能。

基于深度卷积神经网络的图像分类

技术细节分析

在使用CNN进行图像分类的过程中，有几个关键的技术点值得深入分析。

卷积层

卷积层是CNN的核心组件之一。在手写数字识别和人脸图像性别分类的模型中，都使用了多个卷积层。卷积层通过卷积核在输入图像上滑动，提取图像的特征。例如，在人脸图像性别分类的模型中，我们使用了不同数量的卷积核（32、64、128、256）来提取不同层次的特征。卷积核的大小通常选择为3x3，这种大小在实践中表现良好。

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(64, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(128, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(256, (3, 3), padding='same', activation='relu')
])

池化层

池化层用于减少特征图的尺寸，同时保留重要的特征信息。常见的池化操作有最大池化（MaxPooling）和平均池化（AveragePooling）。在我们的模型中，使用了最大池化层，池化窗口大小为2x2。

model = tf.keras.Sequential([
    ...
    tf.keras.layers.MaxPooling2D((2, 2)),
    ...
])

全局平均池化层

全局平均池化层是一种特殊的池化层，它将每个特征图的所有元素求平均值，得到一个标量值。这样可以将特征图的维度从二维（高度和宽度）压缩到一维，减少模型的参数数量。在人脸图像性别分类的模型中，我们使用全局平均池化层将256个8x8的特征图压缩为256个标量值。

model.add(tf.keras.layers.GlobalAveragePooling2D())

数据增强

数据增强是提高模型泛化能力的重要技术。通过对训练数据进行随机变换，如裁剪、翻转、调整对比度和亮度等，可以增加数据的多样性，减少过拟合。在人脸图像性别分类任务中，我们定义了一个预处理函数 preprocess ，在训练阶段使用随机变换，在验证和测试阶段使用中心裁剪。

def preprocess(example, size=(64, 64), mode='train'):
    image = example['image']
    label = example['attributes']['Male']
    if mode == 'train':
        image_cropped = tf.image.random_crop(image, size=(178, 178, 3))
        image_resized = tf.image.resize(image_cropped, size=size)
        image_flip = tf.image.random_flip_left_right(image_resized)
        return image_flip / 255.0, tf.cast(label, tf.int32)
    else:
        image_cropped = tf.image.crop_to_bounding_box(image, offset_height=20, offset_width=0,
                                                      target_height=178, target_width=178)
        image_resized = tf.image.resize(image_cropped, size=size)
        return image_resized / 255.0, tf.cast(label, tf.int32)

优化建议

为了进一步提高模型的性能，可以考虑以下优化建议：

增加训练数据 ：在人脸图像性别分类任务中，我们只使用了一小部分训练数据（16000个样本）。可以尝试使用更多的训练数据，以提高模型的泛化能力。
调整模型架构 ：可以尝试改变卷积层的数量、卷积核的数量和大小，或者添加更多的全连接层。例如，可以增加卷积层的深度，以提取更复杂的特征。
调整超参数 ：可以调整学习率、批量大小、训练轮数等超参数。例如，可以使用学习率调度器，在训练过程中逐渐降低学习率，以提高模型的收敛速度。
使用预训练模型 ：可以使用在大规模数据集上预训练的模型，如ResNet、VGG等，然后在自己的数据集上进行微调。这样可以利用预训练模型学习到的特征，加快模型的训练速度。

总结与展望

通过本文的介绍，我们学习了如何使用CNN进行手写数字识别和人脸图像的性别分类。在人脸图像性别分类任务中，我们使用了数据增强技术来提高模型的泛化能力。整个流程包括加载数据集、进行数据增强、构建和训练模型、可视化学习曲线、评估模型和预测结果。

未来，CNN在图像分类领域还有很大的发展空间。随着硬件技术的不断进步，模型的复杂度和性能将不断提高。同时，结合其他技术，如注意力机制、生成对抗网络等，可以进一步提高图像分类的准确率和效率。

以下是优化建议的总结列表：
1. 增加训练数据
2. 调整模型架构
3. 调整超参数
4. 使用预训练模型

同时，我们可以用以下mermaid流程图表示优化过程：

graph TD;
    A[初始模型] --> B[调整超参数];
    B --> C{性能是否提升};
    C -- 是 --> D[继续调整超参数];
    C -- 否 --> E[调整模型架构];
    E --> F{性能是否提升};
    F -- 是 --> D;
    F -- 否 --> G[增加训练数据];
    G --> H{性能是否提升};
    H -- 是 --> D;
    H -- 否 --> I[使用预训练模型];
    I --> J{性能是否提升};
    J -- 是 --> D;
    J -- 否 --> K[结束优化];

通过不断地优化和改进，我们可以使CNN在图像分类任务中取得更好的性能。