torch one of the variables needed for gradient computation has been modified by an inplace operation

探讨PyTorch中in-place操作导致的梯度计算错误,解析其背后原理及解决方案,避免训练过程中的常见坑。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

训练Pytorch编写网络时,程序总爆出如下错误:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [8, 5]], which is output 0 of LogsumexpBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

 

其中发现关键信息 LogsumexpBackward 以及 torch.FloatTensor [8, 5] ,就到相应的代码模块中进行查找,发现是在代码中出现了原位计算操作。

predicted_value += self.xxx

后来查到 pytorch 在自动求梯度的过程中有原地计算的限制,不可以使用 += 等类似操作。具体可以参看官网【https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd】,也可以参考其他博客。

以下是官方给出的说明:

Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases. Autograd’s aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you’re operating under heavy memory pressure, you might never need to use them.

There are two main reasons that limit the applicability of in-place operations:

In-place operations can potentially overwrite values required to compute gradients.

Every in-place operation actually requires the implementation to rewrite the computational graph. Out-of-place versions simply allocate new objects and keep references to the old graph, while in-place operations, require changing the creator of all inputs to the Function representing this operation. This can be tricky, especially if there are many Tensors that reference the same storage (e.g. created by indexing or transposing), and in-place functions will actually raise an error if the storage of modified inputs is referenced by any other Tensor.

大致意思是:

原位操作是节省内存的情况,而 pytorch 觉得这种操作在实际中很少遇到,所以不推荐。并且给出了两方面原因:

1、计算梯度时可能重写原来的值;

2、因为在计算中可能很多张量用到同一个存储(例如通过索引或者转置创建的内容), in-place 操作会在原来基础上做修改,而 Out-of-place 则会开辟新的一块内存空间并添加对原值的引用。


了解这些之后,其实解决也很简单,将原来的 += 换成原始变量名 + 目标变量 就可以了,例如上面的部分,改为以下内容就可以了:

predicted_value = predicted_value + self.xxx

另外,关于 pytorch in-place计算相关的博客也都可以参考一下,都写的很好:

https://blog.youkuaiyun.com/york1996/article/details/81835873

https://blog.youkuaiyun.com/goodxin_ie/article/details/89577224

https://blog.youkuaiyun.com/xijuezhu8128/article/details/86590311

https://zhuanlan.zhihu.com/p/38475183

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

黑符石

感谢小主对原创的大力支持

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值