Pytorch load深度模型时报错:RuntimeError: cuda runtime error (10) : invalid device ordinal

在服务器上尝试在非训练时使用的GPU上加载PyTorch深度学习模型时遇到RuntimeError: cuda runtime error (10) : invalid device ordinal。错误源于模型保存时包含了GPU信息。解决办法是通过代码将加载的张量转移到CPU,或者在指定GPU上进行。参考链接提供了详细解决方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题背景和描述:

  我是在服务器上用显卡2上训练我的模型,但是模型还在继续跑,所以我只能在其他显卡上重新做测试实验看效果的好坏。在pytorch上重新load训练好的深度学习模型时报错:RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:32。

THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=32 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "las_test.py", line 35, in <module>
    listener = torch.load(listener_model_path)
  File "/home/zyh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 303, in load
    return _load(f, map_location, pickle_module)
  File "/home/zyh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 469, in _load
    result = unpickler.load()
  File "/home/zyh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 437, in persistent_load
    data_type(size), location)
  File "/home/zyh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 88, in default_restore_location
    result = fn(storage, location)
  File "/home/zyh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 70, in _cuda_deserialize
    return obj.cuda(device)
  File "/home/zyh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py", line 68, in _cuda
    with torch.cuda.device(device):
  File "/home/zyh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/cuda/__init__.py", line 227, in __enter__
    torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:32

解决方案:

  引起这种报错的原因是因为pytorch在save模型的时候会把显卡的信息也保存,当重新load的时候,发现不是同一一块显卡就报错invalid device ordinal。我们可以追溯报错的源码:

File "/home/zyh/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 303, in load
    return _load(f, map_location, pickle_module)

再次进入serialization.py,可以发现原来源码里面其实都说明了:

def load(f, map_location=None, pickle_module=pickle):
    """Loads an object saved with :func:`torch.save` from a file.

    :meth:`torch.load` uses Python's unpickling facilities but treats storages,
    which underlie tensors, specially. They are first deserialized on the
    CPU and are then moved to the device they were saved from. If this fails
    (e.g. because the run time system doesn't have certain devices), an exception
    is raised. However, storages can be dynamically remapped to an alternative
    set of devices using the `map_location` argument.

    If `map_location` is a callable, it will be called once for each serialized
    storage with two arguments: storage and location. The storage argument
    will be the initial deserialization of the storage, residing on the CPU.
    Each serialized storage has a location tag associated with it which
    identifies the device it was saved from, and this tag is the second
    argument passed to map_location. The builtin location tags are `'cpu'` for
    CPU tensors and `'cuda:device_id'` (e.g. `'cuda:2'`) for CUDA tensors.
    `map_location` should return either None or a storage. If `map_location` returns
    a storage, it will be used as the final deserialized object, already moved to
    the right device. Otherwise, :math:`torch.load` will fall back to the default
    behavior, as if `map_location` wasn't specified.

    If `map_location` is a string, it should be a device tag, where all tensors
    should be loaded.

    Otherwise, if `map_location` is a dict, it will be used to remap location tags
    appearing in the file (keys), to ones that specify where to put the
    storages (values).

    User extensions can register their own location tags and tagging and
    deserialization methods using `register_package`.

    Args:
        f: a file-like object (has to implement read, readline, tell, and seek),
            or a string containing a file name
        map_location: a function, string or a dict specifying how to remap storage
            locations
        pickle_module: module used for unpickling metadata and objects (has to
            match the pickle_module used to serialize file)

    Example:
        >>> torch.load('tensors.pt')
        # Load all tensors onto the CPU
        >>> torch.load('tensors.pt', map_location='cpu')
        # Load all tensors onto the CPU, using a function
        >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage)
        # Load all tensors onto GPU 1
        >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage.cuda(1))
        # Map tensors from GPU 1 to GPU 0
        >>> torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'})
        # Load tensor from io.BytesIO object
        >>> with open('tensor.pt') as f:
                buffer = io.BytesIO(f.read())
        >>> torch.load(buffer)
    """

在最后的说明中写了我们最终的解决方案:

#如果你想把所有的tensors放到CPU的话:

torch.load('path of your model', map_location='cpu')

#如果你想使用函数将所有张量加载到CPU上

torch.load('path of your model', map_location=lambda storage, loc: storage)

#在GPU 1上加载所有张量

torch.load('path of your model', map_location=lambda storage, loc: storage.cuda(1))

#将张量从GPU 1(原来训练模型的GPU ID)映射到GPU 0(重新映射的GPU ID)

torch.load('path of your model', map_location={'cuda:1':'cuda:0'})

最终在使用在GPU0上加载所有的张量解决了这个问题。

 

Reference:

https://blog.youkuaiyun.com/shincling/article/details/78919282

https://pytorch-cn.readthedocs.io/zh/latest/

### 关于DVWA靶场搭建过程中出现404错误的解决方案 在DVWA靶场搭建的过程中,如果遇到404错误,通常是因为服务器无法找到目标页面或资源。以下是可能的原因及其解决方法: #### 1. **检查文件路径** 确保DVWA解压后的文件夹已正确放置到PHPStudy的网页根目录下。对于PHPStudy,默认的网页根目录通常是`C:\phpstudy_pro\WWW`[^2]。确认DVWA文件夹名称是否与访问URL一致。例如,如果文件夹名为`DVWA-master`,则应通过`http://127.0.0.1/DVWA-master`访问。 #### 2. **验证Apache服务状态** 确保Apache服务已经成功启动。可以通过PHPStudy控制面板查看并启动Apache服务。如果服务未正常运行,则可能导致请求的目标页面不可达[^4]。 #### 3. **配置Apache虚拟主机** 有需要手动调整Apache的配置文件来支持自定义路径。打开`httpd.conf`文件(位于`C:\phpstudy_pro\Extensions\Apache\conf`),查找以下两行内容并取消注释: ```apache LoadModule rewrite_module modules/mod_rewrite.so <Directory "C:/phpstudy_pro/WWW"> Options Indexes FollowSymLinks Includes ExecCGI AllowOverride All </Directory> ``` 保存更改后重启Apache服务[^4]。 #### 4. **检查权限设置** 确保DVWA所在的目录具有足够的读取和执行权限。可以在命令提示符中运行以下命令赋予相应权限: ```bash icacls C:\phpstudy_pro\WWW\DVWA-master /grant Everyone:F /T ``` #### 5. **浏览器缓存清理** 有候浏览器会因为缓存问题而返回旧的结果。尝试清除浏览器缓存或者更换不同的浏览器重新访问网站[^3]。 #### 6. **数据库初始化** 即使能够加载初始界面,但如果某些功能模块依赖的数据表尚未创建也可能引发类似的错误表现形式之一即表现为部分链接指向不存在的内容从而触发HTTP Status Code 404响应消息;因此建议按照官方文档指引完成必要的SQL脚本导入操作以便建立起完整的后台支撑结构体系[^1]。 --- ### 示例代码片段 当修改配置文件,请注意以下示例中的语法准确性: ```php <?php $_DVWA['db_server'] = '127.0.0.1'; $_DVWA['db_user'] = 'root'; $_DVWA['db_password'] = ''; // 如果设置了密码,请填写实际值 ?> ``` ---
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值