24、深度学习中的图像降噪、CNN、RNN、LSTM及迁移学习

最新推荐文章于 2025-12-12 19:34:17 发布

皮肤PHP

最新推荐文章于 2025-12-12 19:34:17 发布

阅读量81

点赞数

CC 4.0 BY-SA版权

分类专栏：六步玩转Python机器学习文章标签：深度学习图像降噪自编码器

本文链接：https://blog.youkuaiyun.com/k5l6m/article/details/152189606

六步玩转Python机器学习专栏收录该内容

25 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

深度学习中的图像降噪、CNN、RNN、LSTM及迁移学习

1. 利用自编码器进行图像降噪

自编码器能够从压缩的隐藏层中发现强大的特征，从而有效地从去噪版本或原始图像中重构输入。去噪自编码器是自编码器的随机版本，它可以解决这一问题。

1.1 操作步骤

引入噪声 ：对数字数据集引入噪声。

# Introducing noise to the image
noise_factor = 0.5
X_train_noisy = X_train + noise_factor * np.random.normal(loc=0.0, 
scale=1.0, size=X_train.shape)
X_train_noisy = np.clip(X_train_noisy, 0., 1.)

可视化函数 ：定义可视化函数。

# Function for visualization
def draw(data, row, col, n):
    plt.subplot(row, col, n)
    plt.imshow(data, cmap=plt.cm.gray_r)
    plt.axis('off')

显示含噪图像 ：显示含噪图像。

show_size = 10
plt.figure(figsize=(20,20))
for i in range(show_size):
    draw(X_train_noisy[i].reshape(28,28), 1, show_size, i+1)
plt.show()

模型训练 ：在含噪训练数据集上拟合模型。

model.fit(X_train_noisy, X_train, nb_epoch=5, batch_size=258)

去噪图像预测 ：对去噪图像进行预测并显示。

# Prediction for denoised image
X_train_pred = model.predict(X_train_noisy)
show_size = 10
plt.figure(figsize=(20,20))
for i in range(show_size):
    draw(X_train_pred[i].reshape(28,28), 1, show_size, i+1)
plt.show()

1.2 注意事项

可以调整模型以提高去噪图像的清晰度。

2. 卷积神经网络（CNN）

在图像分类领域，CNN已成为构建高效模型的首选算法。它与普通神经网络类似，但明确假设输入为图像，这使得我们可以将某些属性编码到架构中，从而使前向函数更高效，减少网络中的参数。神经元按宽度、高度和深度三个维度排列。

2.1 CIFAR - 10数据集

CIFAR - 10是一个标准的计算机视觉和深度学习图像数据集，包含60,000张32x32像素的彩色照片，每个像素有RGB值，分为十个类别，如飞机、汽车、鸟类等。

2.2 CNN的主要层

CNN主要由四种类型的层组成：输入层、卷积层、池化层和全连接层。各层作用及维度变化如下表所示：
| 层类型 | 作用 | 输入维度（CIFAR - 10示例） | 输出维度（示例） |
| ---- | ---- | ---- | ---- |
| 输入层 | 保存原始像素 | 32×32×3 | 32×32×3 |
| 卷积层 | 计算输入层小局部区域权重的点积 | 32×32×3 | 32×32×5（假设5个滤波器） |
| ReLU层 | 应用逐元素激活函数 | 32×32×5 | 32×32×5 |
| 池化层 | 沿宽度和高度对空间维度进行下采样 | 32×32×5 | 16×16×5 |
| 全连接层 | 计算类别得分 | 16×16×5 | 1×1×10 |

2.3 使用Keras和Theano后端的CNN示例

graph LR
    A[导入库] --> B[设置后端和参数]
    B --> C[加载数据]
    C --> D[数据预处理]
    D --> E[定义模型层]
    E --> F[创建模型]
    F --> G[编译模型]
    G --> H[训练模型]
    H --> I[可视化模型]

import keras
if K=='tensorflow':
    keras.backend.set_image_dim_ordering('tf')
else:
    keras.backend.set_image_dim_ordering('th')
from keras.models import Sequential
from keras.datasets import cifar10
from keras.layers import Dense, Dropout, Activation, Conv2D, MaxPooling2D, 
Flatten
from keras.utils import np_utils
from keras.preprocessing import sequence
from keras import backend as K
from IPython.display import SVG, display
from keras.utils.vis_utils import model_to_dot, plot_model
import numpy as np
np.random.seed(2017)
img_rows, img_cols = 32, 32
img_channels = 3
batch_size = 256
nb_classes = 10
nb_epoch = 4
nb_filters = 10
nb_conv = 3
nb_pool = 2
kernel_size = 3 # convolution kernel size
if K.image_dim_ordering() == 'th':
    input_shape = (3, img_rows, img_cols)
else:
    input_shape = (img_rows, img_cols, 3)

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

# define two groups of layers: feature (convolutions) and classification (dense)
feature_layers = [
    Conv2D(nb_filters, kernel_size, input_shape=input_shape),
    Activation('relu'),
    Conv2D(nb_filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=(nb_pool, nb_pool)),
    Flatten(),
]
classification_layers = [
    Dense(512),
    Activation('relu'),
    Dense(nb_classes),
    Activation('softmax')
]
# create complete model
model = Sequential(feature_layers + classification_layers)
model.compile(loss='categorical_crossentropy', optimizer="adadelta", 
metrics=['accuracy'])

SVG(model_to_dot(model, show_shapes=True).create(prog='dot', format='svg'))

model.fit(X_train, Y_train, validation_data=(X_test, Y_test),
          epochs=nb_epoch, batch_size=batch_size, verbose=2)

2.4 各层可视化

# function for Visualization
def draw(data, row, col, n):
    plt.subplot(row, col, n)
    plt.imshow(data)
def draw_digit(data, row, col):
    for j in range(row):
        plt.figure(figsize=(16,16))
        for i in range(col):
            plt.subplot(row, col, i+1)
            plt.imshow(data[j,:,:,i])
            plt.axis('off')
        plt.tight_layout()
    plt.show()

### Input layer (original image)
show_size = 10
plt.figure(figsize=(16,16))
for i in range(show_size):
    draw(X_train[i], 1, show_size, i+1)
plt.show()

# first layer
get_first_layer_output = K.function([model.layers[0].input],
                          [model.layers[1].output])
first_layer = get_first_layer_output([X_train[0:show_size]])[0]
print ('first layer shape: ', first_layer.shape)
draw_digit(first_layer, first_layer.shape[0], first_layer.shape[3])

# second layer
get_second_layer_output = K.function([model.layers[0].input],
                          [model.layers[3].output])
second_layers = get_second_layer_output([X_train[0:show_size]])[0]
print ('second layer shape: ', second_layers.shape)
draw_digit(second_layers, second_layers.shape[0], second_layers.shape[3]) 

# third layer
get_third_layer_output = K.function([model.layers[0].input],
                          [model.layers[4].output])
third_layers = get_third_layer_output([X_train[0:show_size]])[0]
print ('third layer shape: ', third_layers.shape)
draw_digit(third_layers, third_layers.shape[0], third_layers.shape[3])

3. MNIST数据集上的CNN

以下是在MNIST数据集上使用CNN的示例代码：

import keras
keras.backend.backend()
keras.backend.image_dim_ordering()
# using theano as backend
K = keras.backend.backend()
if K=='tensorflow':
    keras.backend.set_image_dim_ordering('tf')
else:
    keras.backend.set_image_dim_ordering('th')
from matplotlib import pyplot as plt
%matplotlib inline
import numpy as np
np.random.seed(2017)
from keras import backend as K
from keras.models import Sequential
from keras.datasets import mnist
from keras.layers import Dense, Dropout, Activation, Conv2D, MaxPooling2D, 
Flatten
from keras.utils import np_utils
from keras.preprocessing import sequence
from keras import backend as K
from IPython.display import SVG, display 
from keras.utils.vis_utils import model_to_dot, plot_model
nb_filters = 5 # the number of filters
nb_pool = 2 # window size of pooling
nb_conv = 3 # window or kernel size of filter
nb_epoch = 5
kernel_size = 3 # convolution kernel size
if K.image_dim_ordering() == 'th':
    input_shape = (1, img_rows, img_cols)
else:
    input_shape = (img_rows, img_cols, 1)

# data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

# define two groups of layers: feature (convolutions) and classification (dense)
feature_layers = [
    Conv2D(nb_filters, kernel_size, input_shape=input_shape),
    Activation('relu'),
    Conv2D(nb_filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size = nb_pool),
    Dropout(0.25),
    Flatten(),
]
classification_layers = [
    Dense(128),
    Activation('relu'),
    Dropout(0.5),
    Dense(nb_classes),
    Activation('softmax')
]
# create complete model
model = Sequential(feature_layers + classification_layers)
model.compile(loss='categorical_crossentropy', optimizer="adadelta", 
metrics=['accuracy'])

SVG(model_to_dot(model, show_shapes=True).create(prog='dot', format='svg'))
print(model.summary())

model.fit(X_train, Y_train, batch_size=256, epochs=nb_epoch, 
verbose=2,  validation_split=0.2)

3.1 各层可视化

# visualization
def draw(data, row, col, n):
    plt.subplot(row, col, n)
    plt.imshow(data, cmap=plt.cm.gray_r)
    plt.axis('off')
def draw_digit(data, row, col):
    for j in range(row):
        plt.figure(figsize=(8,8))
        for i in range(col):
            plt.subplot(row, col, i+1)
            plt.imshow(data[j,:,:,i], cmap=plt.cm.gray_r)
            plt.axis('off')
        plt.tight_layout()
    plt.show()

# Sample input layer (original image)
show_size = 10
plt.figure(figsize=(20,20))
for i in range(show_size):
    draw(X_train[i].reshape(28,28), 1, show_size, i+1)
plt.show()

# First layer with 5 filters
get_first_layer_output = K.function([model.layers[0].input], [model.
layers[1].output])
first_layer = get_first_layer_output([X_train[0:show_size]])[0]
print ('first layer shape: ', first_layer.shape)
draw_digit(first_layer, first_layer.shape[0], first_layer.shape[3])

4. 循环神经网络（RNN）

多层感知器（MLP）在处理顺序事件模型（如概率语言模型）时表现不佳，RNN架构解决了这一问题。它与MLP类似，但有一个反馈循环，将前一个时间步的信息反馈到当前步。这种架构可以生成序列来模拟情况并创建合成数据，适用于处理序列数据，如语音文本挖掘、图像字幕、时间序列预测等。

4.1 RNN的优缺点

优点：能够记住过去的信息，反复预测接下来会发生的事情。
缺点：内存占用大，难以训练长期时间依赖问题。

5. 长短期记忆网络（LSTM）

LSTM是改进的RNN架构，解决了普通RNN的问题，能够实现长距离依赖。它通过线性记忆单元和一组门控单元来控制信息的流动，决定信息何时进入记忆、何时遗忘和何时输出。其循环组件中不使用激活函数，因此在反向传播时梯度项不会消失。

5.1 LSTM组件公式

LSTM组件	公式
输入门层	it = sigmoid(wixt + uiht - 1 + bi)
遗忘门层	ft = sigmoid(Wfxt + Ufht - 1 + bf)
输出门层	Ot = sigmoid(Woxt + uiht - 1 + bo)
记忆单元状态向量	ct = ft o ct - 1+ ito * hyperbolic tangent(Wcxt + ucht - 1 + bc)

5.2 使用Keras的LSTM示例

import numpy as np
np.random.seed(2017)  # for reproducibility
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features 
most common words)
batch_size = 32
print('Loading data...')
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_
features)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')
print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

#Model configuration
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, recurrent_dropout=0.2, dropout=0.2))  # try using a GRU 
instead, for fun
model.add(Dense(1))
model.add(Activation('sigmoid'))
# Try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy', optimizer='adam', 
metrics=['accuracy'])
#Train
model.fit(X_train, y_train, batch_size=batch_size, epochs=5, validation_
data=(X_test, y_test))

# Evaluate
train_score, train_acc = model.evaluate(X_train, y_train, batch_ 
size=batch_size)
test_score, test_acc = model.evaluate(X_test, y_test, batch_size=batch_size)
print ('Train score:', train_score)
print ('Train accuracy:', train_acc)
print ('Test score:', test_score)
print ('Test accuracy:', test_acc)

6. 迁移学习

迁移学习旨在利用解决一个问题时获得的知识来解决另一个不同但相关的问题。就像人类基于过去的经验更容易学习新技能一样，在机器学习中，我们可以利用已有的模型知识来解决新问题。

6.1 迁移学习示例

以下是在MNIST数据集上进行迁移学习的示例代码：

import numpy as np
np.random.seed(2017)  # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
batch_size = 128
nb_classes = 5
nb_epoch = 5
# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = 2
# convolution kernel size
kernel_size = 3
input_shape = (img_rows, img_cols, 1)
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# create two datasets one with digits below 5 and one with 5 and above
X_train_lt5 = X_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
X_test_lt5 = X_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]
X_train_gte5 = X_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5  # make classes start at 0 for
X_test_gte5 = X_test[y_test >= 5]         # np_utils.to_categorical
y_test_gte5 = y_test[y_test >= 5] – 5

# Train model for digits 0 to 4
def train_model(model, train, test, nb_classes):
    X_train = train[0].reshape((train[0].shape[0],) + input_shape)
    X_test = test[0].reshape((test[0].shape[0],) + input_shape)
    X_train = X_train.astype('float32')
    X_test = X_test.astype('float32')
    X_train /= 255
    X_test /= 255
    print('X_train shape:', X_train.shape)
    print(X_train.shape[0], 'train samples')
    print(X_test.shape[0], 'test samples')
    # convert class vectors to binary class matrices
    Y_train = np_utils.to_categorical(train[1], nb_classes)
    Y_test = np_utils.to_categorical(test[1], nb_classes)
    model.compile(loss='categorical_crossentropy',
                  optimizer='adadelta',
                  metrics=['accuracy'])
    model.fit(X_train, Y_train, 
              batch_size=batch_size, epochs=nb_epoch,
              verbose=1,
              validation_data=(X_test, Y_test))
    score = model.evaluate(X_test, Y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

# define two groups of layers: feature (convolutions) and classification (dense)
feature_layers = [
    Conv2D(nb_filters, kernel_size,
                  padding='valid',
                  input_shape=input_shape),
    Activation('relu'),
    Conv2D(nb_filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=(pool_size, pool_size)),
    Dropout(0.25),
    Flatten(),
]
classification_layers = [
    Dense(128),
    Activation('relu'),
    Dropout(0.5),
    Dense(nb_classes),
    Activation('softmax')
]
# create complete model
model = Sequential(feature_layers + classification_layers)
# train model for 5-digit classification [0..4]
train_model(model, (X_train_lt5, y_train_lt5), (X_test_lt5, y_test_lt5), 
nb_classes)

# Transfer existing trained model on 0 to 4 to build model for digits 5 to 9
# freeze feature layers and rebuild model
for layer in feature_layers:
    layer.trainable = False
# transfer: train dense layers for new classification task [5..9]
train_model(model, (X_train_gte5, y_train_gte5), (X_test_gte5,  
y_test_gte5), nb_classes)

通过以上示例，我们可以看到如何在MNIST数据集上先训练一个简单的CNN模型对0 - 4的数字进行分类，然后通过迁移学习冻结特征层，微调全连接层对5 - 9的数字进行分类。这种方法可以利用已有的模型知识，提高新任务的训练效率和性能。

6.2 迁移学习步骤总结

数据准备 ：
- 加载MNIST数据集。
- 将数据集分为两部分，一部分是数字0 - 4的数据集，另一部分是数字5 - 9的数据集。
- 对数据进行预处理，包括调整形状、转换数据类型和归一化。
模型构建 ：
- 定义特征层和分类层。
- 创建完整的模型。
训练初始模型 ：
- 使用数字0 - 4的数据集训练模型。
- 编译模型，设置损失函数、优化器和评估指标。
- 训练模型并评估测试集上的性能。
迁移学习 ：
- 冻结特征层，使其参数在后续训练中不更新。
- 使用数字5 - 9的数据集训练模型，只更新分类层的参数。
- 编译模型，设置损失函数、优化器和评估指标。
- 训练模型并评估测试集上的性能。

graph LR
    A[数据准备] --> B[模型构建]
    B --> C[训练初始模型]
    C --> D[迁移学习]

7. 总结

本文介绍了深度学习中的多种模型和技术，包括利用自编码器进行图像降噪、卷积神经网络（CNN）、循环神经网络（RNN）、长短期记忆网络（LSTM）以及迁移学习。

7.1 自编码器图像降噪

自编码器通过从压缩的隐藏层中提取特征，能够有效地对图像进行降噪处理。具体操作步骤包括引入噪声、定义可视化函数、训练模型和预测去噪图像。可以通过调整模型来提高去噪图像的清晰度。

7.2 CNN

CNN在图像分类领域表现出色，它通过特定的架构设计，使前向函数更高效，减少了网络参数。以CIFAR - 10和MNIST数据集为例，展示了CNN的构建、训练和各层可视化的过程。CNN主要由输入层、卷积层、池化层和全连接层组成，各层在处理图像数据时发挥着不同的作用。

7.3 RNN和LSTM

RNN通过反馈循环解决了MLP在处理顺序事件模型时的不足，但存在内存占用大、难以处理长期时间依赖的问题。LSTM作为改进的RNN架构，通过门控单元解决了这些问题，能够实现长距离依赖，适用于处理序列数据。

7.4 迁移学习

迁移学习利用已有的模型知识来解决新问题，提高了新任务的训练效率和性能。在MNIST数据集上的示例展示了如何先训练一个简单的CNN模型对0 - 4的数字进行分类，然后通过迁移学习冻结特征层，微调全连接层对5 - 9的数字进行分类。

通过对这些模型和技术的学习和实践，我们可以更好地理解深度学习在不同领域的应用，为解决实际问题提供更多的思路和方法。

7.5 技术对比

技术	适用场景	优点	缺点
自编码器图像降噪	图像去噪	能够从压缩隐藏层提取特征进行图像重构	可能需要调整模型以提高去噪效果
CNN	图像分类	前向函数高效，减少网络参数	对硬件资源要求较高
RNN	序列数据处理	能够记住过去信息进行序列预测	内存占用大，难以处理长期依赖
LSTM	序列数据处理	解决了RNN的长期依赖问题	模型复杂度较高
迁移学习	类似问题的快速解决	利用已有知识提高训练效率和性能	需要合适的已有模型