RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCB

原创已于 2024-05-18 11:00:54 修改 · 3k 阅读

27 ·

CC 4.0 BY-SA版权

文章标签：

#pytorch #人工智能 #python

于 2024-04-24 09:06:58 首次发布

解决报错的那些事儿专栏收录该内容

15 篇文章

订阅专栏

文章描述了一个用户在使用PyTorch运行深度学习模型时遇到的CUDA相关错误，涉及cublasruntimeerror。作者尝试了卸载并重装不同版本的PyTorch及其相关库，以及针对CUDA版本兼容性的问题，但仍有新报错。解决过程和可能的原因分析都在文中详述。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1、报错情况

按照代码给出的requirement.txt安装包，运行报错：

“Traceback (most recent calls WITHOUT Sacred internals):
File "main.py", line 36, in my_main
run(_run, config, _log)
File "/home/felicity/PyProjects/MASER/run.py", line 52, in run
run_sequential(args=args, logger=logger)
File "/home/felicity/PyProjects/MASER/run.py", line 179, in run_sequential
episode_batch = runner.run(test_mode=False)
File "/home/felicity/PyProjects/MASER/runners/episode_runner.py", line 84, in run
Qtot = mixer(goal_Qvalue, st).reshape(1)
File "/home/felicity/anaconda3/envs/mase/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/felicity/PyProjects/MASER/modules/mixers/qmix.py", line 38, in forward
hidden = F.elu(th.bmm(agent_qs, w1) + b1)
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:450
”

2、解决办法

这个错误是由于在运行深度学习模型时出现了 CUDA 相关的问题。 cublas 运行错误通常与 GPU 相关的计算有关。可能的原因有内存不足、驱动程序问题、CUDA 版本不匹配等。

2.1 试了无效果

搜集了资料，看到有的博主说是源的问题，我卸载重装了清华源版本的，仍然报错，所以不适用我，其他同学也可以参照这个试试。

pytorch出现RuntimeError: cublas runtime error :cu:259问题，及解决方法_runtimeerror: cublas runtime error : the gpu progr-优快云博客

2.2 试了有效果

看到issues里有很多遇到类似的，本着试试的态度用了里面一个大神的solution，见下图：

装的很慢，我让电脑自己安装了一晚上没管，今天一大早来看到装好了，就跑了一下程序，没有报RuntimeError: 的错误了。

贴出解决办法：

pip uninstall torch
pip uninstall torchvision
pip uninstall torchaudio

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Using the Tsinghua Mirror

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 -i https://pypi.tuna.tsinghua.edu.cn/simple

刚看到还有个博主说是cuda版本和高GPU 不兼容问题，所以会不会最终是这个原因然后误打误撞装了兼容的cuda：RuntimeError: cublas runtime error : the GPU program failed to execute at /tmp/pip-req-build-jh50bw2-优快云博客

但是还是没跑通，又有新的报错：