最近在研究ComfyUI插件SDXL_EcomID_ComfyUI,这是个换脸,保持人物一至性的插件,由阿里妈妈开发:
SDXL_EcomID_ComfyUI
: https://github.com/alimama-creative/SDXL_EcomID_ComfyUI
SDXL-EcomID
:https://www.modelscope.cn/models/alimama-creative/SDXL-EcomID/file/view/master?fileName=README_ZH.md&status=1
出错过程
No module named ‘fused_layer_norm_cuda’
在运行时出错:
File "E:\ComfyUI\ComfyUI_windows_portable1.0\ComfyUI\custom_nodes\PuLID_ComfyUI\eva_clip\eva_vit_model.py", line 418, in <listcomp>
Block(
File "E:\ComfyUI\ComfyUI_windows_portable1.0\ComfyUI\custom_nodes\PuLID_ComfyUI\eva_clip\eva_vit_model.py", line 253, in __init__
self.norm1 = norm_layer(dim)
^^^^^^^^^^^^^^^
File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\normalization\fused_layer_norm.py", line 294, in __init__
fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "importlib\__init__.py", line 126, in import_module
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1140, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'
然后各种找方法, 原来是安装apex时没有安装cuda版本
顺带解释一下CUDA
nvidia-smi 与 nvcc -v 获得的版本号不一样
CUDA 有两种API,分别是 运行时 API 和 驱动API,即所谓的 Runtime API 与 Driver API。
nvidia-smi
的结果除了有 GPU 驱动版本型号,还有 CUDA Driver API的型号,
nvcc --version
的结果是对应 CUDA Runtime API
要是运行不了的话要装CUDA Toolkit: https://developer.nvidia.com/cuda-downloads
是准许二个版本不一样的, 不冲突
只是在装python 关于显卡的包时一般看 nvcc --version
cuda_dir FileNotFoundError: [WinError 2] 系统找不到指定的文件。
raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
加个环境变量就能解释到cuda_dir 值
安装 apex 时提示Microsoft Visual Studio version版本不对
选装 Visual Studio 2022 使用 C++ 桌面开发
cd E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded
git clone https://github.com/NVIDIA/apex
cd apex
git checkout 2ec84eb
..\python -m pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
--expt-relaxed-constexpr -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 --use-local-env
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include\crt/host_config.h(153): fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
multi_tensor_adam.cu
error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin\\nvcc' failed with exit code 2
error: subprocess-exited-with-error
加nvcc的环境变量不行
然后改VS的属性_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH
也不行
最后找到进入->C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include\crt
修改host_config.h
#if _MSC_VER < 1910 || _MSC_VER >= 1930
改为
#if _MSC_VER < 1910 || _MSC_VER >= 2030
把值改大,让它乱提示
error STL1002: Unexpected compiler version, expected CUDA 12.4 or newer.
接着又有新的问题:
NCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17 --use-local-env
C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.41.34120/include\yvals_core.h(888): error: static assertion failed with "error STL1002: Unexpected compiler version, expected CUDA 12.4 or newer."
static_assert(false, "error " "STL1002" ": " "Unexpected compiler version, expected CUDA 12.4 or newer.");
^
1 error detected in the compilation of "csrc/multi_tensor_adam.cu".
multi_tensor_adam.cu
error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin\\nvcc' failed with exit code 1
error: subprocess-exited-with-error
打开:“C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\include\yvals_core.h”
#if defined(__CUDACC__) && defined(__CUDACC_VER_MAJOR__)
#if __CUDACC_VER_MAJOR__ < 12 || (__CUDACC_VER_MAJOR__ == 12 && __CUDACC_VER_MINOR__ < 4)
_EMIT_STL_ERROR(STL1002, "Unexpected compiler version, expected CUDA 12.4 or newer.");
#endif
改为:
#ifndef _ALLOW_COMPILER_AND_STL_VERSION_MISMATCH
#if defined(__CUDACC__) && defined(__CUDACC_VER_MAJOR__)
#if __CUDACC_VER_MAJOR__ < 10 || (__CUDACC_VER_MAJOR__ == 10 && __CUDACC_VER_MINOR__ < 1)
_EMIT_STL_ERROR(STL1002, "Unexpected compiler version, expected CUDA 12.4 or newer.");
#endif // ^^^ old CUDA ^^
apex编译并安装成功
查看一下有没有成功,ok
..\python.exe -m pip list | findstr apex
使用一下:
..\python -c "from apex import amp"
E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\apex>..\python -c "from apex import amp"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\__init__.py", line 8, in <module>
from . import amp
File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\__init__.py", line 1, in <module>
from .amp import init, half_function, float_function, promote_function,\
File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\amp.py", line 1, in <module>
from . import compat, rnn_compat, utils, wrap
File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\rnn_compat.py", line 1, in <module>
from . import utils, wrap
File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\wrap.py", line 3, in <module>
from ._amp_state import _amp_state
File "E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp\_amp_state.py", line 14, in <module>
from torch._six import container_abcs
ModuleNotFoundError: No module named 'torch._six'
提示: No module named 'torch._six'
也有提示:
原因是最新版的包去掉这个torch._six了
改 E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp_amp_state.py
if TORCH_MAJOR == 1 and TORCH_MINOR < 8:
import collections.abc as container_abcs
else:
from torch._six import container_abcs
改为
if TORCH_MAJOR == 1 and TORCH_MINOR < 8:
from torch._six import container_abcs
else:
import collections.abc as container_abcs
E:\ComfyUI\ComfyUI_windows_portable1.0\python_embeded\Lib\site-packages\apex\amp_initialize.py
注释一句import,加二个变量:
#from torch._six import string_classes
int_classes = int
string_classes = str
然后终于amp能 import了
..\python -c "from apex import amp"
也就是fused_layer_norm_cuda
最终按装完成
SDXL_EcomID_ComfyUI能正常的出图了
总结:
SDXL_EcomID_ComfyUI 缺 fused_layer_norm_cuda, 再装apex, 编译各种出错要改源代码,再由 torch._six升级改版出错,一路升级打怪!
上一张换脸成功了的图: