深度学习模型优化与可视化实践
1. 网格搜索初始化与并行训练
在深度学习中,我们可以进行网格搜索的初始化、启动,并打印最终结果,代码如下:
trials = Trials()
best = fmin(f_nn, space, algo=tpe.suggest, max_evals=50,
trials=trials)
print(best)
若要在多个 GPU 上并行训练,需要使用 MongoDB 创建一个处理异步更新的数据库。
2. 学习率与学习率调度器
使用较小的学习率有助于避免局部最优,但收敛时间通常较长。使用预热期可以缩短训练时间,在预热期的前几个 epoch 使用较大的学习率,经过一定数量的 epoch 后降低学习率。不过,不建议在每个步骤后都降低学习率,此时使用不同的优化器可能效果更好(例如,若要使用衰减,可以将其作为超参数指定)。理论上,若预热期学习率过大,可能无法达到全局最优。
以下是使用 Keras 设置自定义学习率调度器的步骤:
1. 导入所需库:
import math
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD
from sklearn.model_selection import train_test_split
from keras.utils import to_categorical
from keras.callbacks import EarlyStopping, TensorBoard,
ModelCheckpoint, LearningRateScheduler, Callback
from keras import backend as K
from keras.datasets import cifar10
- 加载并预处理数据:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
validation_split = 0.1
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train,
test_size=validation_split, random_state=SEED)
X_train = X_train.astype('float32')
X_train /=255.
X_val = X_val.astype('float32')
X_val /=255.
X_test = X_test.astype('float32')
X_test /=255.
n_classes = 10
y_train = to_categorical(y_train, n_classes)
y_val = to_categorical(y_val, n_classes)
y_test = to_categorical(y_test, n_classes)
- 设置学习率:
learning_rate_schedule = {0: '0.1', 10: '0.01', 25: '0.0025'}
class get_learning_rate(Callback):
def on_epoch_end(self, epoch, logs={}):
optimizer = self.model.optimizer
if epoch in learning_rate_schedule:
K.set_value(optimizer.lr,
learning_rate_schedule[epoch])
lr = K.eval(optimizer.lr)
print('\nlr: {:.4f}'.format(lr))
除了自定义回调函数,Keras 还提供了
LearningRateScheduler
和
ReduceLROnPlateau
回调函数,可实现依赖于 epoch 的学习率方案,或在监测的损失或指标达到平稳时降低学习率。
4. 将自定义函数添加到回调列表:
callbacks =[EarlyStopping(monitor='val_acc', patience=5,
verbose=2),
ModelCheckpoint('checkpoints/{epoch:02d}.h5',
save_best_only=True),
TensorBoard('~/notebooks/logs-lrscheduler',
write_graph=True, write_grads=True,
write_images=True, embeddings_freq=0,
embeddings_layer_names=None,
embeddings_metadata=None),
get_learning_rate()
]
- 定义并编译模型:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes))
model.add(Activation('softmax'))
optimizer = SGD()
model.compile(loss='categorical_crossentropy',
optimizer=optimizer, metrics=['accuracy'])
- 开始训练:
n_epochs = 20
batch_size = 128
history = model.fit(X_train, y_train, epochs=n_epochs,
batch_size=batch_size,
validation_data=[X_val, y_val],
verbose = 1, callbacks=callbacks)
3. 优化器比较
深度学习中最常用的优化器是随机梯度下降(SGD),其他优化器是 SGD 的变体,通过添加启发式方法来加速收敛,且有些优化器需要调整的超参数较少。选择优化器很大程度上取决于用户调整优化器的能力,没有一种理想的解决方案适用于所有问题,但有些优化器参数较少,在默认设置下表现优于其他优化器。
以下是比较不同优化器的步骤:
1. 加载库:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasRegressor
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.optimizers import SGD, Adadelta, Adam, RMSprop, Adagrad,
Nadam, Adamax
- 创建训练、验证和测试集并预处理:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
validation_split = 0.1
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train,
test_size=validation_split, random_state=SEED)
X_train = X_train.astype('float32')
X_train /=255.
X_val = X_val.astype('float32')
X_val /=255.
X_test = X_test.astype('float32')
X_test /=255.
n_classes = 10
y_train = to_categorical(y_train, n_classes)
y_val = to_categorical(y_val, n_classes)
y_test = to_categorical(y_test, n_classes)
- 定义创建模型的函数:
def create_model(opt):
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
return model
- 定义训练时使用的回调函数:
def create_callbacks(opt):
callbacks = [EarlyStopping(monitor='val_acc', patience=5,
verbose=2),
ModelCheckpoint('checkpoints/weights.{epoch:02d}-
'+opt+'.h5', save_best_only=False, verbose=True),
TensorBoard()]
return callbacks
- 创建要尝试的优化器字典:
opts = dict({
'sgd': SGD(),
'sgd-0001': SGD(lr=0.0001, decay=0.00001),
'adam': Adam(),
'adam': Adam(lr=0.0001),
'adadelta': Adadelta(),
'rmsprop': RMSprop(),
'rmsprop-0001': RMSprop(lr=0.0001),
'nadam': Nadam(),
'adamax': Adamax()
})
也可以使用 Hyperopt 运行不同的优化器。
6. 训练网络并存储结果:
n_epochs = 1000
batch_size = 128
results = []
# Loop through the optimizers
for opt in opts:
model = create_model(opt)
callbacks = create_callbacks(opt)
model.compile(loss='categorical_crossentropy',
optimizer=opts[opt], metrics=['accuracy'])
hist = model.fit(X_train, y_train, batch_size=batch_size,
epochs=n_epochs,
validation_data=(X_val, y_val),
verbose=1,
callbacks=callbacks)
best_epoch = np.argmax(hist.history['val_acc'])
best_acc = hist.history['val_acc'][best_epoch]
best_model = create_model(opt)
# Load the model weights with the highest validation accuracy
best_model.load_weights('checkpoints/weights.{:02d}-
{}.h5'.format(best_epoch, opt))
best_model.compile(loss='mse', optimizer=opts[opt],
metrics=['accuracy'])
score = best_model.evaluate(X_test, y_test, verbose=0)
results.append([opt, best_epoch, best_acc, score[1]])
- 比较结果:
res = pd.DataFrame(results)
res.columns = ['optimizer', 'epochs', 'val_accuracy', 'test_last',
'test_accuracy']
res
不同优化器的训练结果如下表所示(此处假设已运行代码得到结果):
| optimizer | epochs | val_accuracy | test_last | test_accuracy |
| — | — | — | — | — |
| sgd | … | … | … | … |
| sgd-0001 | … | … | … | … |
| adam | … | … | … | … |
| … | … | … | … | … |
4. 确定网络深度
从头开始构建深度学习模型时,很难预先确定应堆叠多少(不同类型的)层。通常,查看一个知名的深度学习模型并以此为基础构建是个不错的选择。一般来说,先尽可能让模型对训练数据过拟合,以确保模型能够在输入数据上进行训练,然后应用正则化技术(如 dropout)防止过拟合并促进泛化。
5. 添加 Dropout 防止过拟合
添加 Dropout 是防止神经网络过拟合的常用方法。以下是使用 Cifar10 数据集演示添加 Dropout 前后性能差异的步骤:
1. 导入所有库:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from keras.utils import to_categorical
from keras.callbacks import EarlyStopping, TensorBoard,
ModelCheckpoint
from keras.datasets import cifar10
- 加载 Cifar10 数据集并预处理:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
validation_split = 0.1
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train,
test_size=validation_split, random_state=SEED)
X_train = X_train.astype('float32')
X_train /=255.
X_val = X_val.astype('float32')
X_val /=255.
X_test = X_test.astype('float32')
X_test /=255.
n_classes = 10
y_train = to_categorical(y_train, n_classes)
y_val = to_categorical(y_val, n_classes)
y_test = to_categorical(y_test, n_classes)
- 引入回调函数以监控训练并防止过拟合:
callbacks =[EarlyStopping(monitor='val_acc', patience=5,
verbose=2),
ModelCheckpoint('checkpoints/{epoch:02d}.h5',
save_best_only=True),
TensorBoard('~/notebooks/logs-lrscheduler',
write_graph=True, write_grads=True, write_images=True,
embeddings_freq=0, embeddings_layer_names=None,
embeddings_metadata=None),
]
- 定义模型架构并编译模型:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
# model.add(Dropout(0.5))
model.add(Dense(n_classes))
model.add(Activation('softmax'))
optimizer = SGD()
model.compile(loss='categorical_crossentropy',
optimizer=optimizer, metrics=['accuracy'])
- 开始训练:
n_epochs = 1000
batch_size = 128
history = model.fit(X_train, y_train, epochs=n_epochs,
batch_size=batch_size,
validation_data=[X_val, y_val],
verbose = 1, callbacks=callbacks)
- 在模型架构中添加 Dropout:
model_dropout = Sequential()
model_dropout.add(Conv2D(32, (3, 3), padding='same',
input_shape=X_train.shape[1:]))
model_dropout.add(Activation('relu'))
model_dropout.add(Conv2D(32, (3, 3)))
model_dropout.add(Activation('relu'))
model_dropout.add(MaxPooling2D(pool_size=(2, 2)))
model_dropout.add(Dropout(0.25))
model_dropout.add(Conv2D(64, (3, 3), padding='same'))
model_dropout.add(Activation('relu'))
model_dropout.add(Conv2D(64, (3, 3)))
model_dropout.add(Activation('relu'))
model_dropout.add(MaxPooling2D(pool_size=(2, 2)))
model_dropout.add(Dropout(0.25))
model_dropout.add(Flatten())
model_dropout.add(Dense(512))
model_dropout.add(Activation('relu'))
model_dropout.add(Dropout(0.5))
model_dropout.add(Dense(n_classes))
model_dropout.add(Activation('softmax'))
optimizer = Adam()
model_dropout.compile(loss='categorical_crossentropy',
optimizer=optimizer, metrics=['accuracy'])
- 重新开始训练:
n_epochs = 1000
batch_size = 128
history_dropout = model_dropout.fit(X_train, y_train,
epochs=n_epochs, batch_size=batch_size,
validation_data=[X_val, y_val],
verbose = 1, callbacks=callbacks)
- 绘制无 Dropout 模型的训练和验证准确率:
plt.plot(np.arange(len(history.history['acc'])),
history.history['acc'], label='training')
plt.plot(np.arange(len(history.history['val_acc'])),
history.history['val_acc'], label='validation')
plt.title('Accuracy of model without dropouts')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.legend(loc=0)
plt.show()
从图中可以看出,无 Dropout 的模型明显在训练数据上过拟合。
9. 绘制有 Dropout 模型的训练和验证准确率:
plt.plot(np.arange(len(history_dropout.history['acc'])),
history_dropout.history['acc'], label='training')
plt.plot(np.arange(len(history_dropout.history['val_acc'])),
history_dropout.history['val_acc'], label='validation')
plt.title('Accuracy of model with dropouts')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.legend(loc=0)
plt.show()
可以看到,添加 Dropout 后模型的泛化能力更好,对训练数据的过拟合程度降低,但仍有改进空间。
以下是添加 Dropout 前后的流程对比:
graph LR
A[导入库] --> B[加载并预处理数据]
B --> C[定义回调函数]
C --> D[定义无 Dropout 模型并编译]
D --> E[训练无 Dropout 模型]
C --> F[定义有 Dropout 模型并编译]
F --> G[训练有 Dropout 模型]
E --> H[绘制无 Dropout 模型准确率图]
G --> I[绘制有 Dropout 模型准确率图]
6. 数据增强使模型更健壮
在计算机视觉任务中,添加数据增强是提高网络性能的常用方法。通过在训练时使用数据增强,可以增加训练集的大小,使模型对训练数据的微小变化更具鲁棒性。以下是使用 Keras 的
ImageDataGenerator
进行数据增强的步骤:
1. 导入所有库:
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, TensorBoard,
ModelCheckpoint
from keras.datasets import cifar10
- 加载并预处理训练和验证数据:
(X_train, y_train), (X_val, y_val) = cifar10.load_data()
X_train = X_train.astype('float32')/255.
X_val = X_val.astype('float32')/255.
n_classes = 10
y_train = keras.utils.to_categorical(y_train, n_classes)
y_val = keras.utils.to_categorical(y_val, n_classes)
- 定义模型架构:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes))
model.add(Activation('softmax'))
- 定义 Adam 优化器并编译模型:
opt = Adam(lr=0.0001)
model.compile(loss='categorical_crossentropy',
optimizer=opt, metrics=['accuracy'])
-
使用
ImageDataGenerator设置图像增强:
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.15,
height_shift_range=0.15,
horizontal_flip=True,
vertical_flip=False)
datagen.fit(X_train)
- 设置回调函数:
callbacks = [EarlyStopping(monitor='val_acc', patience=5,
verbose=2), ModelCheckpoint('checkpoints/weights.{epoch:02d}-
'+str(batch_size)+'.hdf5', save_best_only=True),
TensorBoard('~/notebooks/logs-lrscheduler',
write_graph=True, write_grads=True, write_images=True,
embeddings_freq=0, embeddings_layer_names=None,
embeddings_metadata=None)
]
- 开始训练模型:
batch_size = 128
n_epochs = 1000
history = model.fit_generator(datagen.flow(X_train, y_train,
batch_size=batch_size),
steps_per_epoch=X_train.shape[0] // batch_size,
epochs=epochs,
validation_data=(X_val, y_val),
callbacks=callbacks
)
7. 测试时增强(TTA)提高准确率
训练时的数据增强是一种广为人知且广泛使用的技术,而测试时增强(TTA)相对较少被人了解。使用 TTA 时,可以向训练好的模型呈现同一图像的不同视图(如翻转或轻微旋转图像),使模型做出更准确的预测。TTA 可以看作是多个模型的集成,可以选择取概率的平均值或其他集成技术来组合每个单独预测的结果。
8. 可视化训练过程
8.1 使用 TensorBoard 可视化训练
可以使用 TensorBoard 可视化 TensorFlow 训练过程,以下是对 Fashion-MNIST 进行分类时使用 TensorBoard 的步骤:
1. 导入 TensorFlow 和加载 mnist 数据集的工具:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
- 指定并加载 Fashion MNIST 数据集:
mnist = input_data.read_data_sets('Data/fashion', one_hot=True)
- 创建输入数据的占位符:
n_classes = 10
input_size = 784
x = tf.placeholder(tf.float32, shape=[None, input_size])
y = tf.placeholder(tf.float32, shape=[None, n_classes])
- 定义创建和初始化权重的函数:
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
- 定义创建和初始化偏置的函数:
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
- 定义卷积和最大池化层函数:
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
- 定义完整的网络架构:
W_conv1 = weight_variable([7, 7, 1, 100])
b_conv1 = bias_variable([100])
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([4, 4, 100, 150])
b_conv2 = bias_variable([150])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_conv3 = weight_variable([4, 4, 150, 250])
b_conv3 = bias_variable([250])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
h_pool3 = max_pool_2x2(h_conv3)
W_fc1 = weight_variable([4 * 4 * 250, 300])
b_fc1 = bias_variable([300])
h_pool3_flat = tf.reshape(h_pool3, [-1, 4*4*250])
h_fc1 = tf.nn.relu(tf.matmul(h_pool3_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([300, n_classes])
b_fc2 = bias_variable([n_classes])
y_pred = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
- 提取交叉熵:
with tf.name_scope('cross_entropy'):
diff = tf.nn.softmax_cross_entropy_with_logits(labels=y,
logits=y_pred)
with tf.name_scope('total'):
cross_entropy = tf.reduce_mean(diff)
tf.summary.scalar('cross_entropy', cross_entropy)
- 使用 AdamOptimizer 进行训练,学习率为 0.001:
learning_rate = 0.001
train_step =
tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)
- 提取准确率:
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(y_pred, 1),
tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,
tf.float32))
tf.summary.scalar('accuracy', accuracy)
- 创建交互式 TensorFlow 会话:
sess = tf.InteractiveSession()
- 设置 TensorBoard 的摘要写入器:
log_dir = 'tensorboard-example'
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(log_dir + '/train',
sess.graph)
val_writer = tf.summary.FileWriter(log_dir + '/val')
- 定义训练前的超参数:
n_steps = 1000
batch_size = 128
dropout = 0.25
evaluate_every = 10
- 开始训练:
tf.global_variables_initializer().run()
for i in range(n_steps):
x_batch, y_batch = mnist.train.next_batch(batch_size)
summary, _, train_acc = sess.run([merged, train_step,
accuracy], feed_dict={x: x_batch, y: y_batch, keep_prob:
dropout})
train_writer.add_summary(summary, i)
if i % evaluate_every == 0:
summary, val_acc = sess.run([merged, accuracy],
feed_dict={x: mnist.test.images, y: mnist.test.labels,
keep_prob: 1.0})
val_writer.add_summary(summary, i)
print('Step {:04.0f}: train_acc: {:.4f}; val_acc
{:.4f}'.format(i, train_acc, val_acc))
train_writer.close()
val_writer.close()
-
连接 TensorBoard:
- 打开新的终端窗口。若登录到服务器,需建立新的服务器连接并使用不同端口进行 SSH 隧道,例如使用 GCP 时:
gcloud compute ssh --ssh-flag="-L 6006:localhost:6006" --zone
"instance-zone" "instance-name"
- 登录后启动 TensorBoard 连接:
tensorboard --logdir='~/tensorflow-example'
终端将输出可连接 TensorBoard 的位置,如
http://instance-name:6006
。由于启用了 SSH 隧道,可在本地浏览器中使用
http://localhost:6006/
访问仪表盘,在 TensorFlow 仪表盘中可跟踪模型的训练进度。
以下是使用 TensorBoard 可视化训练的流程:
graph LR
A[导入库] --> B[加载数据集]
B --> C[创建占位符]
C --> D[定义权重和偏置函数]
D --> E[定义卷积和池化函数]
E --> F[定义网络架构]
F --> G[提取交叉熵和准确率]
G --> H[创建会话和摘要写入器]
H --> I[定义超参数]
I --> J[开始训练]
J --> K[连接 TensorBoard]
通过以上方法,可以对深度学习模型进行优化、防止过拟合、增强模型健壮性,并可视化训练过程,从而提高模型的性能和可解释性。
8.2 分析网络权重及更多
在深度学习中,分析网络权重有助于我们理解模型的学习过程和内部机制。虽然文档中未详细给出分析网络权重的具体步骤,但我们可以推测一般的分析方法。例如,我们可以查看权重的分布情况,这能帮助我们了解模型在不同层的学习强度。如果某一层的权重分布较为集中,可能意味着该层的学习能力有限或者输入特征的影响较为单一。
另外,我们还可以观察权重的变化趋势。在训练过程中,权重会随着迭代不断更新。通过记录不同训练阶段的权重,我们可以分析权重是如何从初始状态逐渐调整到最优状态的。这对于发现训练过程中的问题,如梯度消失或梯度爆炸,非常有帮助。
8.3 冻结层
在某些情况下,我们可能希望冻结模型的某些层,即固定这些层的权重,使其在训练过程中不发生变化。这通常用于迁移学习中,当我们使用预训练模型时,可能只需要微调模型的部分层,而保持其他层的权重不变。
冻结层的操作步骤如下:
1.
加载预训练模型
:
from keras.applications import VGG16
base_model = VGG16(weights='imagenet', include_top=False)
- 冻结部分层 :
for layer in base_model.layers[:10]:
layer.trainable = False
在上述代码中,我们加载了预训练的 VGG16 模型,并冻结了前 10 层的权重。
- 构建新模型 :
from keras.models import Model
from keras.layers import Dense, Flatten
x = base_model.output
x = Flatten()(x)
x = Dense(256, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
- 编译和训练模型 :
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=32)
8.4 存储网络拓扑和训练权重
在训练完模型后,我们通常需要存储网络的拓扑结构和训练好的权重,以便后续使用或分享。
存储网络拓扑
可以使用
model.to_json()
或
model.to_yaml()
方法将网络拓扑结构保存为 JSON 或 YAML 文件。
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
存储训练权重
使用
model.save_weights()
方法将训练好的权重保存为 HDF5 文件。
model.save_weights("model_weights.h5")
加载模型和权重
在需要使用模型时,可以通过以下方式加载网络拓扑和权重。
from keras.models import model_from_json
# 加载网络拓扑
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# 加载权重
loaded_model.load_weights("model_weights.h5")
9. 总结
本文围绕深度学习模型的优化、训练和可视化展开,涵盖了多个关键技术点,下面通过表格对这些技术点进行总结:
| 技术点 | 作用 | 关键代码示例 |
| — | — | — |
| 网格搜索初始化 | 寻找最优超参数组合 |
trials = Trials(); best = fmin(f_nn, space, algo=tpe.suggest, max_evals=50, trials=trials)
|
| 学习率调度器 | 避免局部最优,缩短训练时间 |
learning_rate_schedule = {0: '0.1', 10: '0.01', 25: '0.0025'}; class get_learning_rate(Callback): ...
|
| 优化器比较 | 选择最适合的优化器 |
opts = dict({...}); for opt in opts: ...
|
| 添加 Dropout | 防止过拟合 |
model_dropout.add(Dropout(0.25)); ...
|
| 数据增强 | 使模型更健壮 |
datagen = ImageDataGenerator(...); datagen.fit(X_train)
|
| 测试时增强(TTA) | 提高准确率 | 向训练好的模型呈现不同视图的输入数据,组合预测结果 |
| TensorBoard 可视化 | 监控训练过程 |
merged = tf.summary.merge_all(); train_writer = tf.summary.FileWriter(log_dir + '/train', sess.graph); ...
|
| 分析网络权重 | 理解模型学习过程 | 查看权重分布和变化趋势 |
| 冻结层 | 固定部分层的权重 |
for layer in base_model.layers[:10]: layer.trainable = False
|
| 存储网络拓扑和权重 | 保存模型以便后续使用 |
model_json = model.to_json(); model.save_weights("model_weights.h5")
|
通过这些技术,我们可以提高深度学习模型的性能、可解释性和泛化能力。在实际应用中,需要根据具体问题和数据集的特点,灵活选择和组合这些技术,以达到最佳的效果。
以下是整个深度学习模型优化与可视化流程的 mermaid 流程图:
graph LR
A[数据准备] --> B[模型构建]
B --> C[超参数优化]
C --> D[训练模型]
D --> E[防止过拟合]
E --> F[增强模型健壮性]
F --> G[提高准确率]
G --> H[可视化训练过程]
H --> I[分析模型内部]
I --> J[存储模型]
B --> K[冻结部分层]
K --> D
希望本文能为深度学习爱好者和从业者提供有价值的参考,帮助大家更好地理解和应用深度学习技术。
超级会员免费看
9万+

被折叠的 条评论
为什么被折叠?



