CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /opt/conda/conda-bld/pytorch_1702400366987/work/c10/cuda/CUDAFunctions.cpp:108.)
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
环境ubuntu 22.04系统,显卡驱动Driver Version: 535.171.04 CUDA Version: 12.2 ,cuda-toolkit版本520.61.05,为cuda 11.8,安装cuda 后,使用torch.cuda.is_available() 报上述错误,按照参考资料1的提示,安装了sudo apt install nvidia-modprobe,问题依旧,这个问题是系统刚开机时好用,过一段时间后就会报错
最后按照参考资料4里彻底卸载cuda然后重新安装cuda,再从参考资料2里下载cuda-samples,按照参考资料3的提示进行安装完毕后的测试,发现测试通过,但后续还会出现这样问题,即刚开机好用,过一段时间不用就又不好用了
后来发现是系统休眠、显示器黑屏的原因(显示是用的独立显卡进行显示的)
systemctl status sleep.target
○ sleep.target - Sleep
Loaded: loaded (/lib/systemd/system/sleep.target; static)
Active: inactive (dead) since Wed 2024-10-30 17:16:13 CST; 3min 10s ago
Docs: man:systemd.special(7)
Loaded 状态是loaded,需要改成masked
systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
Created symlink /etc/systemd/system/sleep.target → /dev/null.
Created symlink /etc/systemd/system/suspend.target → /dev/null.
Created symlink /etc/systemd/system/hibernate.target → /dev/null.
Created symlink /etc/systemd/system/hybrid-sleep.target → /dev/null.
重启,再次确认状态systemctl status sleep.target
○ sleep.target
Loaded: masked (Reason: Unit sleep.target is masked.)
Active: inactive (dead)
已是masked 状态,同时在设置-电源 选项里,设置显示器关闭为从不、不要挂起等,发现问题已不再出现,至此已 解决问题。
参考资料:
2 https://github.com/NVIDIA/cuda-samples/releases
3CUDA学习3——samples下载与编译运行(优快云_0036_20231126)_cuda samples-优快云博客