内容来自:https://cloud.tencent.com/developer/article/1049579
如何为Keras中的深度学习模型建立Checkpoint
应用程序Checkpoint是为长时间运行进程准备的容错技术。
这是一种在系统故障的情况下拍摄系统状态快照的方法。一旦出现问题不会让进度全部丢失。Checkpoint可以直接使用,也可以作为从它停止的地方重新运行的起点。
训练深度学习模型时,Checkpoint是模型的权重。他们可以用来作预测,或作持续训练的基础。
Keras库通过回调API提供Checkpoint功能。
ModelCheckpoint回调类允许你定义检查模型权重的位置在何处,文件应如何命名,以及在什么情况下创建模型的Checkpoint。
API允许你指定要监视的指标,例如训练或验证数据集的丢失或准确性。你可以指定是否寻求最大化或最小化分数的改进。最后,用于存储权重的文件名可以包括诸如训练次数的编号或标准的变量。
当模型上调用fit()函数时,可以将ModelCheckpoint传递给训练过程。
注意,你可能需要安装h5py库以HDF5格式输出网络权重。
Checkpoint神经网络模型改进
应用Checkpoint时,应在每次训练中观察到改进时输出模型权重。
下面的示例创建一个小型神经网络Pima印第安人发生糖尿病的二元分类问题。你可以在UCI机器学习库下载这个数据集。本示例使用33%的数据进行验证。
Checkpoint设置成当验证数据集的分类精度提高时保存网络权重(monitor=’val_acc’ and mode=’max’)。权重存储在一个包含评价的文件中(weights-improvement – { val_acc = .2f } .hdf5)。
# Checkpoint the weights when validation accuracy improves from keras.modelsimport Sequential from keras.layersimport Dense from keras.callbacksimport ModelCheckpoint import matplotlib.pyplot as plt import numpy # fix random seed for reproducibility seed= 7 numpy.random.seed(seed) # load pima indians dataset dataset= numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") # split into input (X) and output (Y) variables X= dataset[:,0:8] Y= dataset[:,8] # create model model= Sequential() model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu')) model.add(Dense(8, kernel_initializer='uniform', activation='relu')) model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid')) # Compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # checkpoint filepath="weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5" checkpoint= ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max') callbacks_list= [checkpoint] # Fit the model model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, callbacks=callbacks_list
运行示例会生成以下输出(有删节):
... Epoch00134: val_acc didnot improve Epoch00135: val_acc didnot improve Epoch00136: val_acc didnot improve Epoch00137: val_acc didnot improve Epoch00138: val_acc didnot improve Epoch00139: val_acc didnot improve Epoch00140: val_acc improvedfrom 0.83465 to0.83858, saving model to weights-improvement-140-0.84.hdf5 Epoch00141: val_acc didnot improve Epoch00142: val_acc didnot improve Epoch00143: val_acc didnot improve Epoch00144: val_acc didnot improve Epoch00145: val_acc didnot improve Epoch00146: val_acc improvedfrom 0.83858 to0.84252, saving model to weights-improvement-146-0.84.hdf5 Epoch00147: val_acc didnot improve Epoch00148: val_acc improvedfrom 0.84252 to0.84252, saving model to weights-improvement-148-0.84.hdf5 Epoch00149: val_acc didnot improve
你将在工作目录中看到包含多个HDF5格式的网络权重文件。例如:
... weights-improvement-53-0.76.hdf5 weights-improvement-71-0.76.hdf5 weights-improvement-77-0.78.hdf5 weights-improvement-99-0.78.hdf5
这是一个非常简单的Checkpoint策略。如果验证精度在训练周期上下波动 ,则可能会创建大量不必要的Checkpoint文件。然而,它将确保你具有在运行期间发现的最佳模型的快照。
Checkpoint最佳神经网络模型
如果验证精度提高的话,一个更简单的Checkpoint策略是将模型权重保存到相同的文件中。
这可以使用上述相同的代码轻松完成,并将输出文件名更改为固定(不包括评价或次数的信息)。
在这种情况下,只有当验证数据集上的模型的分类精度提高到到目前为止最好的时候,才会将模型权重写入文件“weights.best.hdf5”。
# Checkpoint the weights for best model on validation accuracy from keras.modelsimport Sequential from keras.layersimport Dense from keras.callbacksimport ModelCheckpoint import matplotlib.pyplot as plt import numpy # fix random seed for reproducibility seed= 7 numpy.random.seed(seed) # load pima indians dataset dataset= numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") # split into input (X) and output (Y) variables X= dataset[:,0:8] Y= dataset[:,8] # create model model= Sequential() model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu')) model.add(Dense(8, kernel_initializer='uniform', activation='relu')) model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid')) # Compile model model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # checkpoint filepath="weights.best.hdf5" checkpoint= ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max') callbacks_list= [checkpoint] # Fit the model model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, callbacks=callbacks_list, verbose=0)
运行示例会生成以下输出(有删节):
... Epoch00139: val_acc improvedfrom 0.79134 to0.79134, saving model to weights.best.hdf5 Epoch00140: val_acc didnot improve Epoch00141: val_acc didnot improve Epoch00142: val_acc didnot improve Epoch00143: val_acc didnot improve Epoch00144: val_acc improvedfrom 0.79134 to0.79528, saving model to weights.best.hdf5 Epoch00145: val_acc improvedfrom 0.79528 to0.79528, saving model to weights.best.hdf5 Epoch00146: val_acc didnot improve Epoch00147: val_acc didnot improve Epoch00148: val_acc didnot improve Epoch00149: val_acc didnot improve
你应该在本地目录中看到权重文件:
weights.best.hdf5
这是一个在你的实验中需要经常用到的方便的Checkpoint策略。它将确保你的最佳模型被保存,以便稍后使用。它避免了输入代码来手动跟踪,并在训练时序列化最佳模型。
加载Checkpoint神经网络模型
现在你已经了解了如何在训练期间检查深度学习模型,你需要回顾一下如何加载和使用一个Checkpoint模型。
Checkpoint只包括模型权重。它假定你了解网络结构。这也可以序列化成JSON或YAML格式。
在下面的示例中,模型结构是已知的,并且最好的权重从先前的实验中加载,然后存储在weights.best.hdf5文件的工作目录中。
那么将该模型用于对整个数据集进行预测。
# How to load and use weights from a checkpoint from keras.modelsimport Sequential from keras.layersimport Dense from keras.callbacksimport ModelCheckpoint import matplotlib.pyplot as plt import numpy # fix random seed for reproducibility seed= 7 numpy.random.seed(seed) # create model model= Sequential() model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu')) model.add(Dense(8, kernel_initializer='uniform', activation='relu')) model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid')) # load weights model.load_weights("weights.best.hdf5") # Compile model (required to make predictions) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) print("Created model and loaded weights from file") # load pima indians dataset dataset= numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") # split into input (X) and output (Y) variables X= dataset[:,0:8] Y= dataset[:,8] # estimate accuracy on whole dataset using loaded weights scores= model.evaluate(X, Y, verbose=0) print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) 运行示例生成以下输出: Created modeland loaded weightsfrom file acc:77.73%