保存和加载通用的断点模型以进行inference或恢复训练,这有助于您从上一个地方继续进行。当保存一个常规断点时,您必须保存模型的state_dict之外的更多信息。保存优化器的state_dict也很重要,因为它包含缓冲区和参数,随着模型的运行而更新。您可能希望保存的其他项目是您离开的时期,最新记录的训练损失,外部torch.nn.嵌入层,以及更多,基于您自己的算法。
要保存多个checkpoint,必须将它们组织在字典中,并使用torch.save()序列化字典。一个常见的PyTorch约定是使用.tar文件扩展名保存这些检查点。要加载条目,首先初始化模型和优化器,然后使用torch.load()在本地加载字典。从这里开始,您可以通过查询字典轻松地访问保存的项目。在这个菜谱中,我们将探索如何保存和加载多个检查点。
Steps
- Import all necessary libraries for loading our data
- Define and intialize the neural network
- Initialize the optimizer
- Save the general checkpoint
- Load the general checkpoint
- Import necessary libraries for loading our data
For this recipe, we will use torch and its subsidiaries torch.nn and torch.optim.
import torch
import torch.nn as nn
import torch.optim as optim
- Define and intialize the neural network
For sake of example, we will create a neural network for training images. To learn more see the Defining a Neural Network recipe.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
print(net)
- Initialize the optimizer
We will use SGD with momentum.
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
- Save the general checkpoint
Collect all relevant information and build your dictionary.
# Additional information
EPOCH = 5
PATH = "model.pt"
LOSS = 0.4
torch.save({
'epoch': EPOCH,
'model_state_dict': net.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': LOSS,
}, PATH)
- Load the general checkpoint
Remember to first initialize the model and optimizer, then load the dictionary locally.
model = Net()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
model.eval()
# - or -
model.train()
这篇博客介绍了如何在PyTorch中保存和加载模型的断点以继续训练或进行推断。保存断点时,除了模型的state_dict,还需要保存优化器的状态,以及周期、训练损失等信息。加载时,先初始化模型和优化器,然后使用torch.load()加载字典。保存的文件通常以.tar为扩展名。
920

被折叠的 条评论
为什么被折叠?



