pytorch中模型参数requires_grad的含义

原创于 2023-07-06 17:56:03 发布 · 1.2w 阅读

CC 4.0 BY-SA版权

文章标签：

在PyTorch中，更新模型层如nn.Linear的参数需要设置requires_grad=True并包含在optimizer中。requires_grad决定了是否保留梯度以供优化器更新参数。实验展示了requires_grad=True但未注册到optimizer，以及requires_grad=False但注册到optimizer的情况下，参数都不会更新。只有同时满足两个条件，参数才会改变。

部署运行你感兴趣的模型镜像

                    
                        
                    
                    结论
需要明确，如果想要在训练中更新模型某一层（例如nn.Linear）的参数，需要确保两件事。第一，该参数的requires_grad属性为True，例如a_Linear.weight.requires_grad = True，a_Linear是一个nn.Linear层。第二，在optimizer初始化时，需要确保该参数包含在参数组中，被注册到优化器中，例如optimizer = torch.optim.SGD(params=[{'params': a_Linear.parameters()}], lr=0.1)。对于第一点，通常代码中会明确指定某些层的requires_grad为True或False。而对于第二点，通常会先构建一个完整的model，然后直接通过optimizer = torch.optim.SGD(params=[{'params': model.parameters()}], lr=0.1)或optimizer = torch.optim.SGD(params=[{'params': [p for param in model.parameters() if p.requires_grad=True]}], lr=0.1)实现。
requires_grad 表达的含义是，这一参数是否保留（或者说持有，即在前向传播完成后，是否在显存中记录这一参数的梯度，而非立即释放）梯度，等待优化器执行optim.step()更新参数。
当requires_grad =  False，则不保留梯度，因此即便在optimizer中注册了参数，也没有梯度可以用来更新参数，因此参数不变。不过不影响

梯度继续反向传播，即假设某一层（例如第三层）参数的requires_grad为False或True，前面层（第1或2层）参数的梯度都不变。
当requires_grad =  True，则在前向计算后保留梯度，用于optimizer更新参数。但如果没有在optimizer中注册参数，那么即便保存了梯度

也无法更新参数。


实验
这里对以下三种情况进行验证：

(1) requires_grad =  True，不在optimizer中注册参数。

(2) requires_grad =  False，在optimizer中注册参数。

(3) requires_grad =  True，在optimizer中注册参数。

由于requires_grad =  False且不在optimizer中注册参数，本身就是不更新梯度的情况，因此不关注。在对比(1)-(2)和(1)-(3)时，主要关注该参数是否保留梯度（通过tensor.grad查看）、以及参数数值是否更新。整体代码如下：import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np 
import random

def set_random_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
    np.random.seed(seed)  # Numpy module.
    random.seed(seed)  # Python random module.
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True

class test_net(nn.Module):
    def __init__(self) -> None:
        super().__init__()

        self.a_Linear=nn.Linear(1,2)
        self.a_Linear.weight.data = torch.Tensor([[5.],[7.]])
        self.a_Linear.bias.data = torch.Tensor([3.])
        self.a_Linear.weight.requires_grad = True
        self.a_Linear.bias.requires_grad = True
        
        self.b_Linear=nn.Linear(2,2)
        self.b_Linear.weight.data = torch.Tensor([[5.,6.],[7.,8.]])
        self.b_Linear.bias.data = torch.Tensor([9.,10.])
        self.b_Linear.weight.requires_grad = True
        self.b_Linear.bias.requires_grad = True
        
        self.c_Linear=nn.Linear(2,1)
        self.c_Linear.weight.data = torch.Tensor([[1.,2.]])
        self.c_Linear.bias.data = torch.Tensor([3.])
        self.c_Linear.weight.requires_grad = False
        self.c_Linear.bias.requires_grad = False
        
        self.d_Linear=nn.Linear(2,1)
        self.d_Linear.weight.data = torch.Tensor([[7.,8.]])
        self.d_Linear.bias.data = torch.Tensor([9.])
        self.d_Linear.weight.requires_grad = False
        self.d_Linear.bias.requires_grad = False

    def forward(self, x):
        x = self.a_Linear(x)
        x = self.b_Linear(x)        
        x1 = self.d_Linear(x)
        x2 = self.c_Linear(x)
        return x1 + x2
set_random_seed(1)
model = test_net()
# print([n for n,p in model.named_parameters()])
optimizer = torch.optim.SGD(params=[{'params': [p for n,p in model.named_parameters() if 'a_Linear' in n or 'd_Linear' in n]}], lr=0.1)
optimizer.zero_grad()
a=torch.tensor([2.], requires_grad=False)
loss = model(a)
loss.backward()
optimizer.step()
for n,p in model.named_parameters():
    print(f"{n}'s weight: {p.data}")
    print(f"{n}'s grad: {p.grad}")
print('done.')

具体来说，主要是关注self.d_Linear层的weight和bias在三种情况下的数值和梯度（绿色框）。除此以外，可以对比不同情况下self.a_Linear层的数值和梯度（浅蓝色框），看看self.d_Linear层的变化是否会影响self.a_Linear层
requires_grad =  True，不在optimizer中注册参数。

requires_grad =  False，在optimizer中注册参数。

requires_grad =  True，在optimizer中注册参数。

小结：对比三组的绿框，可以看到只有同时requires_grad=True且注册到优化器，才能更新参数。对比浅蓝色框，可以看到，无论requires_grad=True或False，都不会影响a_Linear的参数。



                

您可能感兴趣的与本文相关的镜像

PyTorch 2.9

PyTorch

Cuda

PyTorch 是一个开源的 Python 机器学习库，基于 Torch 库，底层由 C++ 实现，应用于人工智能领域，如计算机视觉和自然语言处理

pytorch中模型参数requires_grad的含义

结论

实验

1 条评论