1、报错情况
按照代码给出的requirement.txt安装包,运行报错:
“Traceback (most recent calls WITHOUT Sacred internals):
File "main.py", line 36, in my_main
run(_run, config, _log)
File "/home/felicity/PyProjects/MASER/run.py", line 52, in run
run_sequential(args=args, logger=logger)
File "/home/felicity/PyProjects/MASER/run.py", line 179, in run_sequential
episode_batch = runner.run(test_mode=False)
File "/home/felicity/PyProjects/MASER/runners/episode_runner.py", line 84, in run
Qtot = mixer(goal_Qvalue, st).reshape(1)
File "/home/felicity/anaconda3/envs/mase/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/felicity/PyProjects/MASER/modules/mixers/qmix.py", line 38, in forward
hidden = F.elu(th.bmm(agent_qs, w1) + b1)
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:450
”
2、解决办法
这个错误是由于在运行深度学习模型时出现了 CUDA 相关的问题。 cublas 运行错误通常与 GPU 相关的计算有关。可能的原因有内存不足、驱动程序问题、CUDA 版本不匹配等。
2.1 试了无效果
搜集了资料,看到有的博主说是源的问题,我卸载重装了清华源版本的,仍然报错,所以不适用我,其他同学也可以参照这个试试。
2.2 试了有效果
看到issues里有很多遇到类似的,本着试试的态度用了里面一个大神的solution,见下图:
装的很慢,我让电脑自己安装了一晚上没管,今天一大早来看到装好了,就跑了一下程序,没有报RuntimeError: 的错误了。
贴出解决办法:
pip uninstall torch
pip uninstall torchvision
pip uninstall torchaudio
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
Using the Tsinghua Mirror
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 -i https://pypi.tuna.tsinghua.edu.cn/simple
刚看到还有个博主说是cuda版本和高GPU 不兼容问题,所以会不会最终是这个原因然后误打误撞装了兼容的cuda:RuntimeError: cublas runtime error : the GPU program failed to execute at /tmp/pip-req-build-jh50bw2-优快云博客
但是还是没跑通,又有新的报错: