PyTorch 中的累积梯度

最新推荐文章于 2024-06-28 01:52:50 发布

原创

最新推荐文章于 2024-06-28 01:52:50 发布 · 816 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#pytorch #人工智能 #python

文章探讨了在PyTorch中，特别是在小批量训练时，梯度累积对模型权重更新的影响。它指出，对于简单的模型如ExampleLinear，梯度累积可能得到理想结果，但在复杂模型如BERT中，累积策略可能导致BatchNormalization失效，推荐使用GroupNormalization。

https://stackoverflow.com/questions/62067400/understanding-accumulated-gradients-in-pytorch

有一个小的计算图，两次前向梯度累积的结果，可以看到梯度是严格相等的。

在这里插入图片描述

代码：

import numpy as np
import torch


class ExampleLinear(torch.nn.Module):

    def __init__(self):
        super().__init__()
        # Initialize the weight at 1
        self.weight = torch.nn.Parameter(torch.Tensor([1]).float(),
                                         requires_grad=True)

    def forward(self, x):
        return self.weight * x


model = ExampleLinear()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)


def calculate_loss(x: torch.Tensor) -> torch.Tensor:
    y = 2 * x
    y_hat = model(x)
    temp1 = (y - y_hat)
    temp2 = temp1**2
    return temp2


# With mulitple batches of