torch中.data和.detach()的区别

最新推荐文章于 2024-02-24 16:52:07 发布

原创最新推荐文章于 2024-02-24 16:52:07 发布 · 4.7k 阅读

17 ·

CC 4.0 BY-SA版权

文章标签：

#python #pytorch

pytorch 专栏收录该内容

30 篇文章

订阅专栏

本文详细解释了PyTorch中为何不能在requires_grad=True的叶子张量上使用inplace操作，以及在求梯度过程中为何不能修改相关张量。同时，对比了.data和.detach()在释放计算历史和安全性上的差异，建议在需要时使用.detach()以确保正确计算梯度。

部署运行你感兴趣的模型镜像

1. 概述

在 pytorch 中, 有两种情况不能使用 inplace operation

对于 requires_grad=True 的叶子张量(leaf tensor) 不能使用 inplace operation
对于在求梯度阶段需要用到的张量不能使用 inplace operation

第一种情况: requires_grad=True 的 leaf tensor
类似于x = torch.ones(2, 2, requires_grad=True)中的x是直接创建的，它没有grad_fn，称为叶子节点

import torch
x = torch.ones(1, requires_grad=True)
print("x.data:", x.data)
print("x.data.requires_grad:", x.data.requires_grad)
y = 2*x
x *= 100
y.backward()
print(x)
print(x.grad)

输出：

x.data: tensor([1.])
x.data.requires_grad: False
Traceback (most recent call last):
  File "D:/Desktop/TEMP/TorchLearning/main.py", line 6, in <module>
    x *= 100
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

改正：

x.data *= 100

输出：

x.data: tensor([1.])
x.data.requires_grad: False
tensor([100.], requires_grad=True)
tensor([2.])

第二种情况: 求梯度阶段需要用到的张量

import torch
x = torch.FloatTensor([[1., 2.]])
w1 = torch.FloatTensor([[2.], [1.]])
w2 = torch.FloatTensor([3.])
w1.requires_grad = True
w2.requires_grad = True

d = torch.matmul(x, w1)
f = torch.matmul(d, w2)
d[:] = 1 # 因为这句, 代码报错了 RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

f.backward()

以上出错的原因在于 $\frac{\partial f}{\partial w^{2}}=g(d)$ ，即 $f$ 对 $\omega_2$ 求导是关于 $d$ 的函数，在计算 $f$ 的时候， $d$ 是等于某个值的， $f$ 对 $\omega_2$ 的导数和该值相关，但计算完 $f$ 后， $d$ 值变了，会导致求导错误。

改正：

import torch
x = torch.FloatTensor([[1., 2.]])
w1 = torch.FloatTensor([[2.], [1.]])
w2 = torch.FloatTensor([3.])
w1.requires_grad = True
w2.requires_grad = True

d = torch.matmul(x, w1)
d[:] = 1   # 稍微调换一下位置, 就没有问题了
f = torch.matmul(d, w2)
f.backward()

2. .data 与 .detach()的区别

相同点

都和 x 共享同一块数据
都和 x 的计算历史无关
requires_grad = False

不同点
y=x.data 在某些情况下不安全, 某些情况, 指的就是上述 inplace operation 的第二种情况

import torch
x = torch.FloatTensor([[1., 2.]])
w1 = torch.FloatTensor([[2.], [1.]])
w2 = torch.FloatTensor([3.])
w1.requires_grad = True
w2.requires_grad = True

d = torch.matmul(x, w1)
d_ = d.data

f = torch.matmul(d, w2)
d_[:] = 1

f.backward()

# 这段代码没有报错, 但是计算上的确错了
# 如果 打印 w2.grad 结果看一下的话, 得到是1, 但是正确的结果应该是4

上述代码应该报错, 因为: d_ 和 d 共享同一块数据，改 d_ 就相当于改 d 了；但是, 代码并没有报错 , 但是计算上的确错了

所以, release note 中指出, 如果想要 detach 的效果的话, 还是 detach() 安全一些.

import torch
x = torch.FloatTensor([[1., 2.]])
w1 = torch.FloatTensor([[2.], [1.]])
w2 = torch.FloatTensor([3.])
w1.requires_grad = True
w2.requires_grad = True

d = torch.matmul(x, w1)

d_ = d.detach() # 换成 .detach(), 就可以看到 程序报错了...

f = torch.matmul(d, w2)
d_[:] = 1
f.backward()