torch one of the variables needed for gradient computation has been modified by an inplace operation

最新推荐文章于 2024-12-24 22:35:28 发布

黑符石

最新推荐文章于 2024-12-24 22:35:28 发布

阅读量858

点赞数 1

CC 4.0 BY-SA版权

分类专栏：跳坑日记文章标签： pytorch 神经网络原位操作原位计算 in-place

本文链接：https://blog.youkuaiyun.com/devshilei/article/details/103639173

跳坑日记专栏收录该内容

6 篇文章

订阅专栏

探讨PyTorch中in-place操作导致的梯度计算错误，解析其背后原理及解决方案，避免训练过程中的常见坑。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

训练Pytorch编写网络时，程序总爆出如下错误：

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [8, 5]], which is output 0 of LogsumexpBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

其中发现关键信息 LogsumexpBackward 以及 torch.FloatTensor [8, 5] ，就到相应的代码模块中进行查找，发现是在代码中出现了原位计算操作。

predicted_value += self.xxx

后来查到 pytorch 在自动求梯度的过程中有原地计算的限制，不可以使用 += 等类似操作。具体可以参看官网【https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd】，也可以参考其他博客。

以下是官方给出的说明：

Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases. Autograd’s aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you’re operating under heavy memory pressure, you might never need to use them.

There are two main reasons that limit the applicability of in-place operations:

In-place operations can potentially overwrite values required to compute gradients.

Every in-place operation actually requires the implementation to rewrite the computational graph. Out-of-place versions simply allocate new objects and keep references to the old graph, while in-place operations, require changing the creator of all inputs to the Function representing this operation. This can be tricky, especially if there are many Tensors that reference the same storage (e.g. created by indexing or transposing), and in-place functions will actually raise an error if the storage of modified inputs is referenced by any other Tensor.

大致意思是：

原位操作是节省内存的情况，而 pytorch 觉得这种操作在实际中很少遇到，所以不推荐。并且给出了两方面原因：

1、计算梯度时可能重写原来的值；

2、因为在计算中可能很多张量用到同一个存储（例如通过索引或者转置创建的内容）， in-place 操作会在原来基础上做修改，而 Out-of-place 则会开辟新的一块内存空间并添加对原值的引用。

了解这些之后，其实解决也很简单，将原来的 += 换成原始变量名 + 目标变量就可以了，例如上面的部分，改为以下内容就可以了：

predicted_value = predicted_value + self.xxx

另外，关于 pytorch in-place计算相关的博客也都可以参考一下，都写的很好：

【https://blog.youkuaiyun.com/york1996/article/details/81835873】

【https://blog.youkuaiyun.com/goodxin_ie/article/details/89577224】

【https://blog.youkuaiyun.com/xijuezhu8128/article/details/86590311】

【https://zhuanlan.zhihu.com/p/38475183】