pytorch中的parameter与buffer

最新推荐文章于 2024-10-11 19:57:59 发布

wukurua

最新推荐文章于 2024-10-11 19:57:59 发布

阅读量1k

点赞数 4

分类专栏： pytorch 文章标签： pytorch 深度学习 python

本文链接：https://blog.youkuaiyun.com/qq_40744423/article/details/122127366

版权

pytorch 专栏收录该内容

4 篇文章

订阅专栏

先上结论：

parameter在反向传播会被optimizer.step更新，buffer在反向传播不会被更新
parameter和buffer都被保存在model.state_dict()返回的OrderedDict中（这也是模型保存的对象）
模型进行设备移动时，模型中注册的参数(parameter和buffer)，即model.state_dict()中的内容会同时进行移动

咱来解释一下！

文章目录

先创建这两种参数吧！
- 1.创建parameter
- 2.创建buffer
parameter在反向传播会被`optimizer.step`更新，buffer在反向传播不会被更新
为什么不直接将不需要进行参数更新的变量作为模型类的成员变量就，还要进行注册?

先创建这两种参数吧！

1.创建parameter

直接将模型的成员变量self.xxx 通过nn.Parameter()创建，这样会自动注册到parameters中
通过nn.Parameter()创建普通parameter对象，而不作为模型的成员变量，然后将parameter对象通过register_parameter()进行注册

这两种方式创建的parameter都可以通model.parameters()返回，注册后的参数也会自动保存到model.state_dict()中去。

# 方式一
self.param = nn.Parameter(torch.randn(3, 3))
# 方式二
param = nn.Parameter(torch.randn(3, 3))  # 普通 Parameter 对象
self.register_parameter("param", param)

2.创建buffer

通过register_buffer()进行注册，buffer可以通model.buffers()返回，注册完后参数会自动保存到model.state_dict()中去。

self.register_buffer('my_buffer', torch.randn(2, 3))

parameter在反向传播会被`optimizer.step`更新，buffer在反向传播不会被更新

import torch
import torch.nn as nn

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_output):
        super(Net, self).__init__()
        self.register_buffer('my_buffer', torch.randn(2, 3))
        self.linear = torch.nn.Linear(n_feature, n_output)
        
    def forward(self, x):
        x = self.linear(x)  # 输出值
        return x


model = Net(3, 1)
print('更新前:')
#  parameter和buffer都被保存在`model.state_dict() `返回的`OrderedDict`中
print(model.state_dict())

# 一次更新
loss_fn = nn.L1Loss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
input = torch.ones(3, requires_grad=True)
output = model(input)
target = torch.ones([1])
loss = loss_fn(output, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()

print('更新后:')
print(model.state_dict())

其中， parameter的创建在torch.nn.Linear中的__init__中完成，其成员变量weights和bias是parameter对象，并进行了初始化：

self.weight = Parameter(torch.Tensor(out_features, in_features))
if bias:
      self.bias = Parameter(torch.Tensor(out_features))
  else:
      self.register_parameter('bias', None)

Out:

更新前:
OrderedDict([('my_buffer', tensor([[-0.6783, -0.1426, -0.5545],
        [-0.0529, -1.6932,  0.3820]])), ('linear.weight', tensor([[-0.5150, -0.1703,  0.2062]])), ('linear.bias', tensor([0.1766]))])
更新后:
OrderedDict([('my_buffer', tensor([[-0.6783, -0.1426, -0.5545],
        [-0.0529, -1.6932,  0.3820]])), ('linear.weight', tensor([[-0.5050, -0.1603,  0.2162]])), ('linear.bias', tensor([0.1866]))])

可以看出my_buffer未更新，linear.weight和linear.bias更新了。

torch.nn.Parameter是继承自torch.Tensor的子类，其主要作用就是作为nn.Module中的可训练参数使用。parameter在反向传播会被optimizer.step更新，但是buffer在反向传播不会被更新。

为什么不直接将不需要进行参数更新的变量作为模型类的成员变量就，还要进行注册?

不进行注册，参数不能保存到model.state_dict()，也就无法进行模型的保存
模型进行参数在CPU和GPU移动时, 执行 model.to(device) ，注册后的参数也会自动进行设备移动

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_output):
        super(Net, self).__init__()

        self.register_buffer('my_buffer', torch.randn(2, 3))
        # 普通的成员变量
        self.my_tensor = torch.randn(1)

        self.linear = torch.nn.Linear(n_feature, n_output)

    def forward(self, x):
        x = self.linear(x)  # 输出值
        return x


model = Net(3, 1)
model.cuda()
print(model.state_dict())
print(model.my_tensor)

Out：

OrderedDict([('my_buffer', tensor([[-0.3508, -1.4253,  0.7532],
        [-2.0955,  1.6653, -0.7471]], device='cuda:0')), ('linear.weight', tensor([[-0.0708, -0.0424,  0.5221]], device='cuda:0')), ('linear.bias', tensor([0.5139], device='cuda:0'))])
tensor([-0.9557])

可以看到普通的成员变量self.my_tensor不在model.state_dict()中，模型移动到GPU上后，普通的成员变量也不会跟着移动，但是buffer对象my_buffer、parameter对象linear.weight和linear.bias都移动到GPU了。

参考博客：