由fused_layer_norm_cuda模块, 安装apex 一环套一环的解决过程

最近在研究ComfyUI插件SDXL_EcomID_ComfyUI,这是个换脸,保持人物一至性的插件,由阿里妈妈开发:
SDXL_EcomID_ComfyUI: https://github.com/alimama-creative/SDXL_EcomID_ComfyUI
SDXL-EcomIDhttps://www.modelscope.cn/models/alimama-creative/SDXL-EcomID/file/view/master?fileName=README_ZH.md&status=1
在这里插入图片描述

出错过程

No module named ‘fused_layer_norm_cuda’

在运行时出错:

  File "E:\ComfyUI\ComfyUI_windows_portable1.0\ComfyUI\custom_nodes\PuLID_ComfyUI\eva_clip\eva_vit_model.py", line 418, in <listcomp>
    Block(
  File "E:\ComfyUI\ComfyUI_windows_portable1.0\ComfyUI\custom_nodes\PuLID_ComfyUI\eva_clip\eva_vit_model.py", line 253, in __init__
    self.norm1 = norm_layer(dim)
                 ^^^^^^^^^^^^^^^
  File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\normalization\fused_layer_norm.py", line 294, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "importlib\__init__.py", line 126, in import_module
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1140, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

然后各种找方法, 原来是安装apex时没有安装cuda版本

顺带解释一下CUDA

nvidia-smi 与 nvcc -v 获得的版本号不一样
CUDA 有两种API,分别是 运行时 API 和 驱动API,即所谓的 Runtime API 与 Driver API。
nvidia-smi 的结果除了有 GPU 驱动版本型号,还有 CUDA Driver API的型号,
nvcc --version的结果是对应 CUDA Runtime API
要是运行不了的话要装CUDA Toolkit: https://developer.nvidia.com/cuda-downloads

在这里插入图片描述
是准许二个版本不一样的, 不冲突
只是在装python 关于显卡的包时一般看 nvcc --version

cuda_dir FileNotFoundError: [WinError 2] 系统找不到指定的文件。

在这里插入图片描述

raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

在这里插入图片描述
加个环境变量就能解释到cuda_dir 值

安装 apex 时提示Microsoft Visual Studio version版本不对

选装 Visual Studio 2022 使用 C++ 桌面开发

cd E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded
git clone https://github.com/NVIDIA/apex
cd apex
git checkout 2ec84eb
..\python -m pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
 --expt-relaxed-constexpr -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 --use-local-env
  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include\crt/host_config.h(153): fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
  multi_tensor_adam.cu
  error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin\\nvcc' failed with exit code 2
  error: subprocess-exited-with-error

加nvcc的环境变量不行
然后改VS的属性_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH 也不行
最后找到进入->C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include\crt
修改host_config.h

#if _MSC_VER < 1910 || _MSC_VER >= 1930

改为

#if _MSC_VER < 1910 || _MSC_VER >= 2030

把值改大,让它乱提示

error STL1002: Unexpected compiler version, expected CUDA 12.4 or newer.

接着又有新的问题:
在这里插入图片描述

NCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 --use-local-env
  C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.41.34120/include\yvals_core.h(888): error: static assertion failed with "error STL1002: Unexpected compiler version, expected CUDA 12.4 or newer."
    static_assert(false, "error " "STL1002" ": " "Unexpected compiler version, expected CUDA 12.4 or newer.");
    ^

  1 error detected in the compilation of "csrc/multi_tensor_adam.cu".
  multi_tensor_adam.cu
  error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin\\nvcc' failed with exit code 1
  error: subprocess-exited-with-error

打开:“C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\include\yvals_core.h”

#if defined(__CUDACC__) && defined(__CUDACC_VER_MAJOR__)
#if __CUDACC_VER_MAJOR__ < 12 || (__CUDACC_VER_MAJOR__ == 12 && __CUDACC_VER_MINOR__ < 4)
_EMIT_STL_ERROR(STL1002, "Unexpected compiler version, expected CUDA 12.4 or newer.");
#endif

改为:

#ifndef _ALLOW_COMPILER_AND_STL_VERSION_MISMATCH
#if defined(__CUDACC__) && defined(__CUDACC_VER_MAJOR__)
#if __CUDACC_VER_MAJOR__ < 10 || (__CUDACC_VER_MAJOR__ == 10 && __CUDACC_VER_MINOR__ < 1)
_EMIT_STL_ERROR(STL1002, "Unexpected compiler version, expected CUDA 12.4 or newer.");
#endif // ^^^ old CUDA ^^

apex编译并安装成功

查看一下有没有成功,ok

..\python.exe -m pip list | findstr apex

使用一下:

..\python -c "from apex import amp"
E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\apex>..\python -c "from apex import amp"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\__init__.py", line 8, in <module>
    from . import amp
  File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\__init__.py", line 1, in <module>
    from .amp import init, half_function, float_function, promote_function,\
  File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\amp.py", line 1, in <module>
    from . import compat, rnn_compat, utils, wrap
  File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\rnn_compat.py", line 1, in <module>
    from . import utils, wrap
  File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\wrap.py", line 3, in <module>
    from ._amp_state import _amp_state
  File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\_amp_state.py", line 14, in <module>
    from torch._six import container_abcs
ModuleNotFoundError: No module named 'torch._six'    

提示: No module named 'torch._six'
也有提示:
在这里插入图片描述

原因是最新版的包去掉这个torch._six了
改 E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp_amp_state.py

if TORCH_MAJOR == 1 and TORCH_MINOR < 8:
    import collections.abc as container_abcs
else:
    from torch._six import container_abcs

改为

if TORCH_MAJOR == 1 and TORCH_MINOR < 8:
    from torch._six import container_abcs
else:
    import collections.abc as container_abcs

E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp_initialize.py
注释一句import,加二个变量:

#from torch._six import string_classes

int_classes = int
string_classes = str

然后终于amp能 import了

..\python -c "from apex import amp"

也就是fused_layer_norm_cuda 最终按装完成

SDXL_EcomID_ComfyUI能正常的出图了

总结:

SDXL_EcomID_ComfyUI 缺 fused_layer_norm_cuda, 再装apex, 编译各种出错要改源代码,再由 torch._six升级改版出错,一路升级打怪!

上一张换脸成功了的图:
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值