FSDv2环境配置踩坑全记录

FSDv2环境配置踩坑全记录

FSDv2 论文 开源代码
先贴一下我终于配成功的环境:

PyTorch version: 1.8.1+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 3070
Nvidia driver version: 460.91.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] pytorch-jacinto-ai==1.0.0+76b6fea
[pip3] torch==1.8.1+cu111
[pip3] torch-scatter==2.1.1
[pip3] torchaudio==0.8.1
[pip3] torchelastic==0.2.2
[pip3] torchex==0.1.0
[pip3] torchtext==0.9.0
[pip3] torchvision==0.9.1+cu111
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.1.74              h6bb024c_0    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2020.2                      256  
[conda] mkl-service               2.3.0            py38he904b0f_0  
[conda] mkl_fft                   1.3.0            py38h54f3939_0  
[conda] mkl_random                1.1.1            py38h0573a6f_0  
[conda] numpy                     1.19.5                   pypi_0    pypi
[conda] pytorch-jacinto-ai        1.0.0+76b6fea             dev_0    <develop>
[conda] torch                     1.8.1+cu111              pypi_0    pypi
[conda] torch-scatter             2.1.1                    pypi_0    pypi
[conda] torchaudio                0.8.1                    pypi_0    pypi
[conda] torchelastic              0.2.2                    pypi_0    pypi
[conda] torchex                   0.1.0                     dev_0    <develop>
[conda] torchtext                 0.9.0                      py38    pytorch
[conda] torchvision               0.9.1+cu111              pypi_0    pypi

## pip list
cumm-cu102                    0.4.11
mmcv-full                     1.3.8
mmdet                         2.14.0
mmdet3d                       0.15.0        /codes/SST
mmengine                      0.8.4
mmpycocotools                 12.0.3
mmsegmentation                0.14.1

遇到问题及解决:

  1. ModuleNotFoundError: No module named 'ingroup_indices'
    解决方法:下载编译TorchEx

  2. provided PTX was compiled with an unsupported toolchain
    解决方法:检查cuda,cudnn,torch版本匹配。
    参考:https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

  3. OSError: /opt/conda/lib/python3.8/site-packages/torch_scatter/_scatter_cuda.so: undefined symbol: _ZNK2at6Tensor6deviceEv
    解决方法:重装torch_scatter:
    pip install --no-index torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu111.html

  4. ImportError: /data/codes/SST/mmdet3d/ops/knn/knn_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv
    解决方法:重新在SST文件夹内编译mmdet3d(不要去下载官方0.15.0的代码,作者改了源码)
    pip install -e . -v 或者 python setup.py build develop

  5. ImportError: cannot import name 'SparseModule' from 'mmcv.ops' (/opt/conda/lib/python3.8/site-packages/mmcv/ops/__init__.py)
    SST中相关代码:

    from .spconv import IS_SPCONV2_AVAILABLE
    	
    	if IS_SPCONV2_AVAILABLE:
    	    from spconv.pytorch import SparseModule, SparseSequential
    	else:
    	    from mmcv.ops import SparseModule, SparseSequential
    

    解决方法:由于SparseModule在mmcv-full>=1.4.12中才出现,因此报错的实际原因IS_SPCONV2_AVAILABLE=False,即spconv2没有成功安装。
    解决方法:重装spconv2:(需要cuda>11.0,已编译好的wheel有10.2/11.3/1.4,不要信作者的鬼话什么11.x之间可兼容,11.3的wheel就不能兼容cuda11.1!!!!最后还是安装了10.2版本的跑起来了)
    bash pip install spconv-cuxxx

  6. AttributeError: module 'mmcv' has no attribute 'Config'
    解决方法:先检查是不是误装了mmcv而不是mmcv-full,否则考虑mmcv版本问题,按照requirements/mminstall.txt里的版本要求重装mmcv(1.3.8<=mmcv<=1.4.0):

    pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
    

    参考:https://mmcv.readthedocs.io/zh_CN/v1.3.16/get_started/installation.html

  7. OSError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
    解决方法:mmcv和torch版本不匹配,重装torch或者mmcv-full

    pip install -U torch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch
    
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值