41、深度卷积神经网络图像分类

js777

于 2025-11-05 12:27:06 发布

阅读量13

点赞数

CC 4.0 BY-SA版权

分类专栏：掌握机器学习核心技能文章标签：深度卷积神经网络 CNN 图像分类

本文链接：https://blog.youkuaiyun.com/js777/article/details/154923365

掌握机器学习核心技能专栏收录该内容

53 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

深度卷积神经网络图像分类

在图像分类领域，深度卷积神经网络（CNN）展现出了强大的能力。本文将详细介绍如何使用CNN进行手写数字识别和人脸图像的性别分类，同时探讨数据增强技术在小数据集上的应用。

手写数字识别

在进行手写数字识别时，我们可以使用以下代码展示手写输入及其预测标签：

# 此处省略部分代码，假设已有预测结果 preds
for i in range(len(preds)):
    ax.text(0.9, 0.1, '{}'.format(preds[i]),
            size=15, color='blue',
            horizontalalignment='center',
            verticalalignment='center',
            transform=ax.transAxes)
plt.show()

在这组绘制的示例中，所有预测标签都是正确的。读者可以尝试展示一些错误分类的数字。

人脸图像性别分类

接下来，我们将使用CelebA数据集实现一个用于人脸图像性别分类的CNN。CelebA数据集包含202,599张名人的人脸图像，每张图像还有40个二进制面部属性，包括性别（男性或女性）和年龄（年轻或年老）。

加载CelebA数据集

我们可以按照以下步骤加载CelebA数据集：

import tensorflow as tf
import tensorflow_datasets as tfds

celeba_bldr = tfds.builder('celeb_a')
celeba_bldr.download_and_prepare()
celeba = celeba_bldr.as_dataset(shuffle_files=False)
celeba_train = celeba['train']
celeba_valid = celeba['validation']
celeba_test = celeba['test']

def count_items(ds):
    n = 0
    for _ in ds:
        n += 1
    return n

print('Train set:  {}'.format(count_items(celeba_train)))
print('Validation: {}'.format(count_items(celeba_valid)))
print('Test set:   {}'.format(count_items(celeba_test)))

# 取子集
celeba_train = celeba_train.take(16000)
celeba_valid = celeba_valid.take(1000)
print('Train set:  {}'.format(count_items(celeba_train)))
print('Validation: {}'.format(count_items(celeba_valid)))

需要注意的是，如果 celeba_bldr.as_dataset() 中的 shuffle_files 参数未设置为 False ，每次迭代时训练数据都会重新洗牌，这不符合我们使用小数据集训练模型的目的。

图像变换和数据增强

数据增强是处理训练数据有限情况的一组广泛技术。对于图像数据，有一些独特的变换，如裁剪、翻转、改变对比度、亮度和饱和度等。以下是使用 tf.image 模块进行一些确定性变换的代码：

import matplotlib.pyplot as plt

# 取5个示例
examples = []
for example in celeba_train.take(5):
    examples.append(example['image'])

fig = plt.figure(figsize=(16, 8.5))

# 裁剪到边界框
ax = fig.add_subplot(2, 5, 1)
ax.set_title('Crop to a \nbounding-box', size=15)
ax.imshow(examples[0])
ax = fig.add_subplot(2, 5, 6)
img_cropped = tf.image.crop_to_bounding_box(examples[0], 50, 20, 128, 128)
ax.imshow(img_cropped)

# 水平翻转
ax = fig.add_subplot(2, 5, 2)
ax.set_title('Flip (horizontal)', size=15)
ax.imshow(examples[1])
ax = fig.add_subplot(2, 5, 7)
img_flipped = tf.image.flip_left_right(examples[1])
ax.imshow(img_flipped)

# 调整对比度
ax = fig.add_subplot(2, 5, 3)
ax.set_title('Adjust constrast', size=15)
ax.imshow(examples[2])
ax = fig.add_subplot(2, 5, 8)
img_adj_contrast = tf.image.adjust_contrast(examples[2], contrast_factor=2)
ax.imshow(img_adj_contrast)

# 调整亮度
ax = fig.add_subplot(2, 5, 4)
ax.set_title('Adjust brightness', size=15)
ax.imshow(examples[3])
ax = fig.add_subplot(2, 5, 9)
img_adj_brightness = tf.image.adjust_brightness(examples[3], delta=0.3)
ax.imshow(img_adj_brightness)

# 中心裁剪并调整大小
ax = fig.add_subplot(2, 5, 5)
ax.set_title('Centeral crop\nand resize', size=15)
ax.imshow(examples[4])
ax = fig.add_subplot(2, 5, 10)
img_center_crop = tf.image.central_crop(examples[4], 0.7)
img_resized = tf.image.resize(img_center_crop, size=(218, 178))
ax.imshow(img_resized.numpy().astype('uint8'))

plt.show()

这些变换也可以随机化，推荐在模型训练期间进行数据增强。以下是一个随机变换的代码示例：

tf.random.set_seed(1)
fig = plt.figure(figsize=(14, 12))
for i, example in enumerate(celeba_train.take(3)):
    image = example['image']

    ax = fig.add_subplot(3, 4, i*4+1)
    ax.imshow(image)
    if i == 0:
        ax.set_title('Orig', size=15)

    ax = fig.add_subplot(3, 4, i*4+2)
    img_crop = tf.image.random_crop(image, size=(178, 178, 3))
    ax.imshow(img_crop)
    if i == 0:
        ax.set_title('Step 1: Random crop', size=15)

    ax = fig.add_subplot(3, 4, i*4+3)
    img_flip = tf.image.random_flip_left_right(img_crop)
    ax.imshow(tf.cast(img_flip, tf.uint8))
    if i == 0:
        ax.set_title('Step 2: Random flip', size=15)

    ax = fig.add_subplot(3, 4, i*4+4)
    img_resize = tf.image.resize(img_flip, size=(128, 128))
    ax.imshow(tf.cast(img_resize, tf.uint8))
    if i == 0:
        ax.set_title('Step 3: Resize', size=15)

plt.show()

为了方便，我们可以定义一个包装函数 preprocess() 用于数据增强：

def preprocess(example, size=(64, 64), mode='train'):
    image = example['image']
    label = example['attributes']['Male']
    if mode == 'train':
        image_cropped = tf.image.random_crop(image, size=(178, 178, 3))
        image_resized = tf.image.resize(image_cropped, size=size)
        image_flip = tf.image.random_flip_left_right(image_resized)
        return image_flip/255.0, tf.cast(label, tf.int32)
    else:
        image_cropped = tf.image.crop_to_bounding_box(image, offset_height=20, offset_width=0,
                                                      target_height=178, target_width=178)
        image_resized = tf.image.resize(image_cropped, size=size)
        return image_resized/255.0, tf.cast(label, tf.int32)

以下是使用这个函数进行数据增强的示例：

tf.random.set_seed(1)
ds = celeba_train.shuffle(1000, reshuffle_each_iteration=False)
ds = ds.take(2).repeat(5)
ds = ds.map(lambda x: preprocess(x, size=(178, 178), mode='train'))

fig = plt.figure(figsize=(15, 6))
for j, example in enumerate(ds):
    ax = fig.add_subplot(2, 5, j//2+(j%2)*5+1)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.imshow(example[0])

plt.show()

然后，我们将这个预处理函数应用到训练和验证数据集上：

import numpy as np

BATCH_SIZE = 32
BUFFER_SIZE = 1000
IMAGE_SIZE = (64, 64)
steps_per_epoch = np.ceil(16000/BATCH_SIZE)

ds_train = celeba_train.map(lambda x: preprocess(x, size=IMAGE_SIZE, mode='train'))
ds_train = ds_train.shuffle(buffer_size=BUFFER_SIZE).repeat()
ds_train = ds_train.batch(BATCH_SIZE)

ds_valid = celeba_valid.map(lambda x: preprocess(x, size=IMAGE_SIZE, mode='eval'))
ds_valid = ds_valid.batch(BATCH_SIZE)

训练CNN性别分类器

我们可以使用TensorFlow的Keras API构建和训练CNN模型：

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(rate=0.5),

    tf.keras.layers.Conv2D(64, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(rate=0.5),

    tf.keras.layers.Conv2D(128, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),

    tf.keras.layers.Conv2D(256, (3, 3), padding='same', activation='relu')
])

# 查看输出特征图的形状
print(model.compute_output_shape(input_shape=(None, 64, 64, 3)))

# 添加全局平均池化层
model.add(tf.keras.layers.GlobalAveragePooling2D())
print(model.compute_output_shape(input_shape=(None, 64, 64, 3)))

# 添加全连接层
model.add(tf.keras.layers.Dense(1, activation=None))

tf.random.set_seed(1)
model.build(input_shape=(None, 64, 64, 3))
model.summary()

接下来，我们编译并训练模型：

model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(ds_train, validation_data=ds_valid,
                    epochs=20,
                    steps_per_epoch=steps_per_epoch)

我们可以可视化训练和验证的损失和准确率曲线：

hist = history.history
x_arr = np.arange(len(hist['loss'])) + 1

fig = plt.figure(figsize=(12, 4))

ax = fig.add_subplot(1, 2, 1)
ax.plot(x_arr, hist['loss'], '-o', label='Train loss')
ax.plot(x_arr, hist['val_loss'], '--<', label='Validation loss')
ax.legend(fontsize=15)
ax.set_xlabel('Epoch', size=15)
ax.set_ylabel('Loss', size=15)

ax = fig.add_subplot(1, 2, 2)
ax.plot(x_arr, hist['accuracy'], '-o', label='Train acc.')
ax.plot(x_arr, hist['val_accuracy'], '--<', label='Validation acc.')
ax.legend(fontsize=15)
ax.set_xlabel('Epoch', size=15)
ax.set_ylabel('Accuracy', size=15)

plt.show()

根据学习曲线，我们可以继续训练模型：

history = model.fit(ds_train, validation_data=ds_valid,
                    epochs=30, initial_epoch=20,
                    steps_per_epoch=steps_per_epoch)

最后，我们在测试数据集上评估模型：

ds_test = celeba_test.map(lambda x: preprocess(x, size=IMAGE_SIZE, mode='eval')).batch(32)
test_results = model.evaluate(ds_test)
print('Test Acc: {:.2f}%'.format(test_results[1]*100))

我们还可以获取一些测试示例的预测结果：

ds = ds_test.unbatch().take(10)
pred_logits = model.predict(ds.batch(10))
probas = tf.sigmoid(pred_logits)
probas = probas.numpy().flatten()*100

fig = plt.figure(figsize=(15, 7))
for j, example in enumerate(ds):
    ax = fig.add_subplot(2, 5, j+1)
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(example[0])
    if example[1].numpy() == 1:
        label = 'M'
    else:
        label = 'F'
    ax.text(0.5, -0.15, 'GT: {:s}\nPr(Male)={:.0f}%'.format(label, probas[j]),
            size=16,
            horizontalalignment='center',
            verticalalignment='center',
            transform=ax.transAxes)

plt.tight_layout()
plt.show()

整个流程可以用以下mermaid流程图表示：

graph LR
    A[加载CelebA数据集] --> B[数据增强]
    B --> C[构建CNN模型]
    C --> D[编译模型]
    D --> E[训练模型]
    E --> F[评估模型]
    F --> G[预测结果]

读者可以尝试使用整个训练数据集，修改CNN架构，如改变dropout概率和卷积层的滤波器数量，或者用全连接层代替全局平均池化层，以获得更高的准确率。

深度卷积神经网络图像分类

总结与展望

通过前面的步骤，我们完成了手写数字识别和人脸图像性别分类的任务，深入了解了深度卷积神经网络（CNN）及其主要组件，包括卷积操作、池化层等，并学会了如何使用TensorFlow的Keras API实现深度CNN。

在人脸图像性别分类任务中，我们使用了CelebA数据集，为了加快训练过程，仅使用了一小部分训练数据，并通过数据增强技术提高了模型的泛化性能，减少了过拟合。具体操作步骤总结如下：
1. 加载CelebA数据集 ：

import tensorflow as tf
import tensorflow_datasets as tfds

celeba_bldr = tfds.builder('celeb_a')
celeba_bldr.download_and_prepare()
celeba = celeba_bldr.as_dataset(shuffle_files=False)
celeba_train = celeba['train']
celeba_valid = celeba['validation']
celeba_test = celeba['test']

取子集 ：

celeba_train = celeba_train.take(16000)
celeba_valid = celeba_valid.take(1000)

数据增强 ：

def preprocess(example, size=(64, 64), mode='train'):
    image = example['image']
    label = example['attributes']['Male']
    if mode == 'train':
        image_cropped = tf.image.random_crop(image, size=(178, 178, 3))
        image_resized = tf.image.resize(image_cropped, size=size)
        image_flip = tf.image.random_flip_left_right(image_resized)
        return image_flip/255.0, tf.cast(label, tf.int32)
    else:
        image_cropped = tf.image.crop_to_bounding_box(image, offset_height=20, offset_width=0,
                                                      target_height=178, target_width=178)
        image_resized = tf.image.resize(image_cropped, size=size)
        return image_resized/255.0, tf.cast(label, tf.int32)

import numpy as np
BATCH_SIZE = 32
BUFFER_SIZE = 1000
IMAGE_SIZE = (64, 64)
steps_per_epoch = np.ceil(16000/BATCH_SIZE)

ds_train = celeba_train.map(lambda x: preprocess(x, size=IMAGE_SIZE, mode='train'))
ds_train = ds_train.shuffle(buffer_size=BUFFER_SIZE).repeat()
ds_train = ds_train.batch(BATCH_SIZE)

ds_valid = celeba_valid.map(lambda x: preprocess(x, size=IMAGE_SIZE, mode='eval'))
ds_valid = ds_valid.batch(BATCH_SIZE)

构建CNN模型 ：

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(rate=0.5),

    tf.keras.layers.Conv2D(64, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(rate=0.5),

    tf.keras.layers.Conv2D(128, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),

    tf.keras.layers.Conv2D(256, (3, 3), padding='same', activation='relu')
])

model.add(tf.keras.layers.GlobalAveragePooling2D())
model.add(tf.keras.layers.Dense(1, activation=None))

tf.random.set_seed(1)
model.build(input_shape=(None, 64, 64, 3))
model.summary()

编译模型 ：

model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

训练模型 ：

history = model.fit(ds_train, validation_data=ds_valid,
                    epochs=20,
                    steps_per_epoch=steps_per_epoch)

history = model.fit(ds_train, validation_data=ds_valid,
                    epochs=30, initial_epoch=20,
                    steps_per_epoch=steps_per_epoch)

评估模型 ：

ds_test = celeba_test.map(lambda x: preprocess(x, size=IMAGE_SIZE, mode='eval')).batch(32)
test_results = model.evaluate(ds_test)
print('Test Acc: {:.2f}%'.format(test_results[1]*100))

预测结果 ：

ds = ds_test.unbatch().take(10)
pred_logits = model.predict(ds.batch(10))
probas = tf.sigmoid(pred_logits)
probas = probas.numpy().flatten()*100

fig = plt.figure(figsize=(15, 7))
for j, example in enumerate(ds):
    ax = fig.add_subplot(2, 5, j+1)
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(example[0])
    if example[1].numpy() == 1:
        label = 'M'
    else:
        label = 'F'
    ax.text(0.5, -0.15, 'GT: {:s}\nPr(Male)={:.0f}%'.format(label, probas[j]),
            size=16,
            horizontalalignment='center',
            verticalalignment='center',
            transform=ax.transAxes)

plt.tight_layout()
plt.show()

通过这些步骤，我们可以看到模型在测试集上取得了较好的准确率。但我们也可以进一步优化模型，例如：
- 使用整个训练数据集 ：可以提高模型的学习能力，从而获得更高的准确率。
- 修改CNN架构 ：改变dropout概率和卷积层的滤波器数量，或者用全连接层代替全局平均池化层等。

未来，我们可以探索更多关于CNN的应用，如目标检测、图像分割等。同时，也可以尝试使用其他深度学习架构，如循环神经网络（RNN），用于处理序列数据，在语言翻译和图像字幕生成等领域有着广泛的应用。

总之，深度学习是一个充满挑战和机遇的领域，不断尝试和创新将有助于我们取得更好的成果。