【问题】程序中已经设定了在cuda:6上面进行训练,但是一直显示For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

文章讲述了在使用PyTorch加载预训练模型时遇到CUDA错误(out_of_memory),作者发现可能是因为模型权重保存时包含了训练时使用的显卡信息。通过修改`torch.load`中的`map_location`参数,解决了将模型映射到指定CUDA设备的问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题:

Traceback (most recent call last):
  File "2_generate_PM_CAM_bcss.py", line 47, in <module>
    model.load_state_dict(torch.load(args.weights), strict=False)
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/serialization.py", line 1046, in _load
    result = unpickler.load()
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/serialization.py", line 1016, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/serialization.py", line 1001, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/serialization.py", line 176, in default_restore_location
    result = fn(storage, location)
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/serialization.py", line 158, in _cuda_deserialize
    return obj.cuda(device)
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/_utils.py", line 79, in _cuda
    return new_type(self.size()).copy_(self, non_blocking)
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/cuda/__init__.py", line 661, in _lazy_new
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

我在代码中已经设置了:

torch.cuda.set_device(6)

即:我已经设置了在cuda:6上面运行,但是一直显示的是程序运行在cuda:1上面。

解决措施:

state_dict = torch.load(args.weights, map_location=torch.device('cuda:6' if torch.cuda.is_available() else 'cpu'))
model.load_state_dict(state_dict, strict=False)

即通过 map_location参数将模型参数映射到一个现有的设备上。

个人推测:可能是在保存模型权重的时候,将训练显卡型号也保存下来了,这样一来,在加载权重的时候自然就会加载到指定的显卡上面。(可能不是很对,先去吃饭去了,回来再看看,debug了一上午)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值