copy.deepcopy(train_model)时报错：Only Tensors created explicitly by the user support the deepcopy

原创已于 2023-02-28 14:57:10 修改 · 8.6k 阅读

6 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习

于 2023-02-28 14:52:14 首次发布

深度学习专栏收录该内容

1 篇文章

订阅专栏

在PyTorch模型训练过程中，使用`copy.deepcopy()`对模型进行拷贝时遇到RuntimeError，原因是不支持非用户显式创建的Tensors。问题定位到模型子模块返回了self.features，修改为返回临时变量features解决了问题。修改前后的代码示例展示了这一变化。

部署运行你感兴趣的模型镜像

错误信息：

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

可能的原因：

模型训练过程中常需边训练边做validation，通常使用copy.deepcopy()直接深度拷贝训练中的model用来做validation是比较简洁的写法，如在我的validation.py中，会用到：

 val_model = copy.deepcopy(train_model)

但是由于copy.deepcopy()的限制，调用copy.deepcopy(model)时可能就会遇到这个错误：Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment，详细错误信息如下:

  File "/home/users/xinxin.li/HAT-dev-toolchain/hat/engine/ddp_trainer.py", line 359, in _with_exception
    fn(*args)
  File "/home/users/xinxin.li/HAT-dev-toolchain/tools/train.py", line 186, in train_entrance
    trainer.fit()
  File "/home/users/xinxin.li/HAT-dev-toolchain/hat/engine/loop_base.py", line 523, in fit
    storage=self.storage,
  File "/home/users/xinxin.li/HAT-dev-toolchain/hat/engine/loop_base.py", line 73, in on_epoch_end
    cb.on_epoch_end(**kwargs)
  File "/home/users/xinxin.li/HAT-dev-toolchain/hat/callbacks/validation.py", line 207, in on_epoch_end
    self._do_val(epoch_id, model, ema_model, device, val_metrics)
  File "/home/users/xinxin.li/HAT-dev-toolchain/hat/callbacks/validation.py", line 163, in _do_val
    val_model = self._select_and_init_val_model(train_model=eval_model)
  File "/home/users/xinxin.li/HAT-dev-toolchain/hat/callbacks/validation.py", line 147, in _select_and_init_val_model
    val_model = copy.deepcopy(train_model)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 306, in _reconstruct
    value = deepcopy(value, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 306, in _reconstruct
    value = deepcopy(value, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "/home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/site-packages/torch/_tensor.py", line 85, in __deepcopy__
    raise RuntimeError("Only Tensors created explicitly by the user "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

如何排查：

1. 进入 /home/users/xinxin.li/anaconda3/envs/python36/lib/python3.6/copy.py ，给下面位置打断点，并输出对应的 key 和 value

2. 重新运行程序，定位报错的前一行的网络对应原模型的哪一行，找到你网络结构对应的位置，就是这个地方的报错

我的问题定位：

因为我的模型子模块在构建时返回了 self.features，导致了这个错误，我修改返回临时变量后，这个错误解决了。

修改前的代码：

    def forward(self, input_image):
        self.features = []
        x = (input_image - 0.45) / 0.225
        x = self.encoder.conv1(x)
        x = self.encoder.bn1(x)
        self.features.append(self.encoder.relu(x))
        self.features.append(self.encoder.layer1(self.encoder.maxpool(self.features[-1])))
        self.features.append(self.encoder.layer2(self.features[-1]))
        self.features.append(self.encoder.layer3(self.features[-1]))
        self.features.append(self.encoder.layer4(self.features[-1]))

        return self.features

修改后的代码：

    def forward(self, input_image):
        features = []
        x = (input_image - 0.45) / 0.225
        x = self.encoder.conv1(x)
        x = self.encoder.bn1(x)
        features.append(self.encoder.relu(x))
        features.append(self.encoder.layer1(self.encoder.maxpool(features[-1])))
        features.append(self.encoder.layer2(features[-1]))
        features.append(self.encoder.layer3(features[-1]))
        features.append(self.encoder.layer4(features[-1]))

        return features

参考链接：(138条消息) 解决使用copy.deepcopy()拷贝Tensor或model时报错只支持用户显式创建的Tensor问题_Arnold-FY-Chen的博客-优快云博客_copy tensor

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch

Cuda

PyTorch 是一个开源的 Python 机器学习库，基于 Torch 库，底层由 C++ 实现，应用于人工智能领域，如计算机视觉和自然语言处理