行路时千难万险,回头看也只是云淡风轻
一、pip 安装apex
pip安装apex很简单,但是使用会提示:
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'
二、从源码安装apex
git clone https://github.com/NVIDIA/apex
cd apex
# Build with core extensions (cpp and cuda)
APEX_CPP_EXT=1 APEX_CUDA_EXT=1 pip install -v --no-build-isolation .
# To build with additional extensions, specify them with environment variables
APEX_CPP_EXT=1 APEX_CUDA_EXT=1 APEX_FAST_MULTIHEAD_ATTN=1 APEX_FUSED_CONV_BIAS_RELU=1 pip install -v --no-build-isolation .
# To build all contrib extensions at once
APEX_CPP_EXT=1 APEX_CUDA_EXT=1 APEX_ALL_CONTRIB_EXT=1 pip install -v --no-build-isolation .
这是官网安装的步骤,不过正常的情况下,肯定会有意外发生的,对的吧。所以来看我为走过的路。
三、安装中遇到的问题
3.1 GNU version
提示错误如下:
unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
主要原因:gcc版本高了,你可以选择降级,不过我选择忽略。我是ubuntu,所以:
export NVCC_APPEND_FLAGS=-allow-unsupported-compiler
设置过之后。
我使用的是conda 环境:
各个环境如下:
torch 2.7.1
torchvision 0.22.1
CUDA Version: 12.9
于是出现了如下问题:
3.2 nvcc' failed with exit code 127
sh: 1: /home/pyUser/anaconda3/envs/pytorch/bin/../targets/x86_64-linux/nvvm/bin/cicc: not found
error: command '/home/pyUser/anaconda3/envs/pytorch/bin/nvcc' failed with exit code 127
到这一步就快成功了,只需要指定CONA_HOME就行了
CUDA_HOME=/usr/local/cuda-12.9 APEX_CPP_EXT=1 APEX_CUDA_EXT=1 pip install -v --no-build-isolation .
但是:一定要注意CONA_HOME和CUDA Version版本要一致,这就是我来时路
道阻且长,行则将至
1万+

被折叠的 条评论
为什么被折叠?



