SAVING AND LOADING A GENERAL CHECKPOINT IN PYTORCH

这篇博客介绍了如何在PyTorch中保存和加载模型的断点以继续训练或进行推断。保存断点时,除了模型的state_dict,还需要保存优化器的状态,以及周期、训练损失等信息。加载时,先初始化模型和优化器,然后使用torch.load()加载字典。保存的文件通常以.tar为扩展名。

保存和加载通用的断点模型以进行inference或恢复训练,这有助于您从上一个地方继续进行。当保存一个常规断点时,您必须保存模型的state_dict之外的更多信息。保存优化器的state_dict也很重要,因为它包含缓冲区和参数,随着模型的运行而更新。您可能希望保存的其他项目是您离开的时期,最新记录的训练损失,外部torch.nn.嵌入层,以及更多,基于您自己的算法。

要保存多个checkpoint,必须将它们组织在字典中,并使用torch.save()序列化字典。一个常见的PyTorch约定是使用.tar文件扩展名保存这些检查点。要加载条目,首先初始化模型和优化器,然后使用torch.load()在本地加载字典。从这里开始,您可以通过查询字典轻松地访问保存的项目。在这个菜谱中,我们将探索如何保存和加载多个检查点。

Steps

  1. Import all necessary libraries for loading our data
  2. Define and intialize the neural network
  3. Initialize the optimizer
  4. Save the general checkpoint
  5. Load the general checkpoint

  1. Import necessary libraries for loading our data
    For this recipe, we will use torch and its subsidiaries torch.nn and torch.optim.
import torch
import torch.nn as nn
import torch.optim as optim
  1. Define and intialize the neural network
    For sake of example, we will create a neural network for training images. To learn more see the Defining a Neural Network recipe.
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()
print(net)
  1. Initialize the optimizer
    We will use SGD with momentum.
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
  1. Save the general checkpoint
    Collect all relevant information and build your dictionary.
# Additional information
EPOCH = 5
PATH = "model.pt"
LOSS = 0.4

torch.save({
            'epoch': EPOCH,
            'model_state_dict': net.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': LOSS,
            }, PATH)
  1. Load the general checkpoint
    Remember to first initialize the model and optimizer, then load the dictionary locally.
model = Net()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()
Homework 5: Image Classification November 19, 2025 Due Date: December 10 by 23:59:59 Introduction In this assignment, you will implement and test various image classification models on the CIFAR-10 dataset. The goals of this assignment are as follows: • Implement and compare the linear classifier and the full-connected neural network. • Train and test two types of classifiers. • Compare the AdamW and the SGD optimizer based on FCNN. • Compare the StepLR and the CosineAnnealingLR scheduler based on FCNN. You can learn how to create, train, and test a model using PyTorch here. You are highly encouraged to go through this tutorial before you start. Here are some other supplementary materials that may help you: • PyTorch Documentation • PyTorch Chinese Documentation • Dive into deep learning Notes for hyper-parameter tuning: you can get full score when accuracy is above 60%, save time for your busy end-of-term season. 1 Define Classifiers (30 pts.) Here are some useful function: • torch.nn.Linear() • torch.nn.ReLU() • torch.nn.Tanh() You are free to use any torch functions. Note: if you want to use Convolution-based classifiers, feel free to have a try. But we don’t set bonus in this homework. 1.1 Linear classifier (15 pts.) Add your own code to the LinearClassifier class to define a linear classifier. Your classifier is required to process a mini-batch data. 1 1.2 Full-connected neural network classifier (15 pts.) Add your own code to the FCNN class to define a full-connected neural network classifier. You are responsible for choosing the network depth, width, and activation type. 2 Implement the training and testing function (40 pts.) There is a whole training code in PyTorch Tutorial: train a classifier, you can learn from it. In this task, you need to implement the train() and test() function that can choose a model, optimizer, scheduler, and so on; see the end of the main.py for details. 3 Report(30 pts.) You can use TensorBoard in PyTorch to record and visualize the loss and accuracy curves. Here is a tutorial introducing TensorBoard. Based on FCNN, analysis sec￾tion 3.1 and section 3.2. 3.1 Compare AdamW and SGD optimizer (10 pts.) Train the classifiers you implemented using the AdamW (torch.optim.AdamW) and SGD (torch.optim.SGD) optimizer and compare the loss and accuracy curves. Put the results in your report. 3.2 Compare StepLR and CosineAnnealingLR scheduler (10 pts.) Train the classifiers you implemented using two learning rate schedulers, including the StepLR (torch.optim.lr scheduler.StepLR) and CosineAnnealingLR (torch.opt￾im.lr scheduler.CosineAnnealingLR) scheduler and compare the loss and accuracy curves. Put the results in your report. 3.3 Visualization (10 pts.) You have now completed the entire process of this project. Put all the visualizations and results in your report: the loss and accuracy curves and the final classification accuracy scores. For the result of Linear Classifier, you can report with arbitrary optimizer or learning rate scheduler. For FCNN, you should report section 3.1 and section 3.2. 4 Submit Be sure to zip your code and final report; Name it as StudentID YourName HW5.zip. Any wrong name will cost you 0.5 pts in final score. 2 import torch import torch.nn as nn import argparse class LinearClassifier(nn.Module): # define a linear classifier def __init__(self, in_channels: int, out_channels: int): super().__init__() # inchannels: dimenshion of input data. For example, a RGB image [3x32x32] is converted to vector [3 * 32 * 32], so dimenshion=3072 # out_channels: number of categories. For CIFAR-10, it's 10 def forward(self, x: torch.Tensor): return class FCNN(nn.Module): # def a full-connected neural network classifier def __init__(self, in_channels: int, hidden_channels: int, out_channels: int): super().__init__() # inchannels: dimenshion of input data. For example, a RGB image [3x32x32] is converted to vector [3 * 32 * 32], so dimenshion=3072 # hidden_channels # out_channels: number of categories. For CIFAR-10, it's 10 # full connected layer # activation function # full connected layer # ...... def forward(self, x: torch.Tensor): return def train(model, optimizer, scheduler, args): ''' Model training function input: model: linear classifier or full-connected neural network classifier loss_function: Cross-entropy loss optimizer: Adamw or SGD scheduler: step or cosine args: configuration ''' # create dataset # create dataloader # for-loop # train # get the inputs; data is a list of [inputs, labels] # zero the parameter gradients # forward # loss backward # optimize # adjust learning rate # test # forward # calculate accuracy # save checkpoint (Tutorial: https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html) def test(model, args): ''' input: model: linear classifier or full-connected neural network classifier loss_function: Cross-entropy loss ''' # load checkpoint (Tutorial: https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html) # create testing dataset # create dataloader # test # forward # calculate accuracy if __name__ == '__main__': parser = argparse.ArgumentParser(description='The configs') parser.add_argument('--run', type=str, default='train') parser.add_argument('--model', type=str, default='linear') parser.add_argument('--optimizer', type=str, default='adamw') parser.add_argument('--scheduler', type=str, default='step') args = parser.parse_args() # create model if args.model == 'linear': model = elif args.model == 'fcnn': model = else: raise AssertionError # create optimizer if args.optimizer == 'adamw': # create Adamw optimizer optimizer = elif args.optimizer == 'sgd': # create SGD optimizer optimizer = else: raise AssertionError # create scheduler if args.scheduler == 'step': # create torch.optim.lr_scheduler.StepLR scheduler scheduler = elif args.scheduler == 'cosine': # create torch.optim.lr_scheduler.CosineAnnealingLR scheduler scheduler = else: raise AssertionError if args.run == 'train': train(model, optimizer, scheduler, args) elif args.run == 'test': test(model, args) else: raise AssertionError # You need to implement training and testing function that can choose model, optimizer, scheduler and so on by command, such as: # python main.py --run=train --model=fcnn --optimizer=adamw --scheduler=step 写全代码
12-11
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值