Sparsebev环境问题（CUDA Runtime和nvcc 编译器版本冲突）和训练时间问题

早睡早起吧

已于 2025-05-16 12:21:13 修改

阅读量561

点赞数 9

分类专栏：自动驾驶BEV感知文章标签：计算机视觉 python 自动驾驶 pytorch 人工智能

于 2025-03-25 23:18:50 首次发布

本文链接：https://blog.youkuaiyun.com/weixin_41859890/article/details/146486590

版权

前言

记录一下碰到的CUDA运行时库和nvcc编译器问题，以及训练时间问题：

配置sparsebev环境时一直报CUDA版本错误，

RuntimeError:
      The detected CUDA version (10.2) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.

但是我执行：

(sparsebev) lm@ubuntu-server:/data1/wjx/SparseBEV/models/csrc$ python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.is_available())"
2.0.0
11.8
True
(sparsebev) lm@ubuntu-server:/data1/wjx/SparseBEV/models/csrc$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

我看sparsebev虚拟环境下就是CUDA-11.8啊，但是为啥nvcc --version之后就显示CUDA-10.2了呢？

下面从两个概念开始

一、CUDA运行时库和nvcc编译器都是什么？

1.1 概念

执行nvidia-smi：

| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |

上面显示的就是CUDA的运行时库的版本，那什么是CUDA的运行时库（CUDA Runtime）？CUDA Runtime 是一个高级API，封装了底层的CUDA驱动程序API（Driver API）。

这是NVIDIA 提供的一组预先写好的程序代码（库文件），它的作用是让你的程序（比如 PyTorch）能够直接使用 GPU 来运行计算任务，而不需要你自己从头编写复杂的 GPU 指令。在系统中，它通常是 libcudart.so（比如 libcudart.so.11.8），是运行时库的核心文件。

简单比喻：就像一个“外卖汉堡”，你直接拿来吃，不用自己去种小麦、养牛、做面包。CUDA 运行时库是已经做好的“汉堡”，帮你快速用 GPU。

执行nvcc -V：

(sparsebev) lm@ubuntu-server:/data1/wjx/SparseBEV/pretrain$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

nvcc -V就是CUDA 编译器（NVIDIA CUDA Compiler）。

它是CUDA Toolkit中的核心编译器工具，用于将CUDA代码（通常以.cu扩展名标识）编译成可在NVIDIA GPU上运行的二进制代码（比如 .so 文件）。

编译过程本身不依赖运行时库，只依赖 nvcc

在这里插入图片描述

1.2 错误原因

RuntimeError:
      The detected CUDA version (10.2) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.

之所以出现这个问题，就是说明你的项目代码需要CUDA 编译器，但是你conda install只安装了CUDA运行时库，没有安装编译器。

conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia

安装CUDA 运行时库，可以让 PyTorch 能跑 GPU 代码。

python -c "import torch; print(torch.cuda.is_available())"

以上命令输出 True，是因为 PyTorch 用CUDA 运行时库直接跟 GPU 通信。但是你没法编译 CUDA 代码（比如 .cu 文件），因为没有下载 nvcc。

【之所以报匹配的错误，是因为编译时，用到了系统目录中的CUDA 12.2的编译器版本】

1.3 下载CUDA完整工具包（包括 nvcc）

1.3.1 下载并安装(不用sudo权限)：

wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
chmod +x cuda_11.8.0_520.61.05_linux.run
./cuda_11.8.0_520.61.05_linux.run --no-opengl-libs --toolkit --installpath=~/

最低0.47元/天解锁文章