首先放上自己遇到的报错情况。Running setup.py clean for horovod Failed to build horovod ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (horovod), 熬夜到凌晨两点debug,今天上午突发奇想没想到真解决成功了,先说下我的配置cuda11.8+pytorch2.10+python3.8.19+wsl2+ubuntu20.04, 网上给的安装教程博客很老,大部分基于老版本11.0左右的cuda进行安装的,而且很少关于conda环境下的安装,官网给的文档是基于cuda10.0环境下的conda安装,国内外很多网站没有看到基于WSL的conda环境下安装horovod,但不可否认以前那些老文档和博客有参考价值。先说结论horovod在windows环境不能安装!!!,只能在linux或者macOS系统下安装,对于windows系统有前人尝试成功在docker下安装horovod。可以参考这个博客
分布式训练-多机多卡 通过docker安装horovod框架-优快云博客
conda环境下安装horovod本身就是一件麻烦的事情,而且很多时不时出现很多问题。如果再加上wsl那更加的不确定能不能装上。同时我当时弄的时候老是怀疑WSL的Ubuntu20.04和真正的Ubuntu20.04能完全一样吗?
但是从理论上wsl上装一个Ubuntu20.04可以实现windows下的linux的使用,所以可以报着试试的心态,中间遇到报错和报红,以及解决报错报红,不断尝试是一件很痛苦的时间,我为了安装成功我尝试了10几次,中间也放弃wsl,上真正的Linux系统上装horovod框架,最后上午来了灵感一试真成功了。中间解决错误的过程不放上了,直接放上正确步骤
PS:网上说的升级settools, cmake, gcc, g++什么的我都试了没什么用,对于我安装。
灵感来源于下面文章。
https://zhuanlan.zhihu.com/p/677440225
pip安装python库时报Failed building wheel for xxx错误的解决方法_python_脚本之家
Building wheel for horovod (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [277 lines of output]
/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/__init__.py:94: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try pip install --use-pep517.
********************************************************************************
!!
dist.fetch_build_eggs(dist.setup_requires)
/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'tests_require'
warnings.warn(msg)
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-38
creating build/lib.linux-x86_64-cpython-38/horovod
copying horovod/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod
creating build/lib.linux-x86_64-cpython-38/horovod/runner
copying horovod/runner/mpi_run.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
copying horovod/runner/launch.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
copying horovod/runner/task_fn.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
copying horovod/runner/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
copying horovod/runner/gloo_run.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
copying horovod/runner/js_run.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
copying horovod/runner/run_task.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
creating build/lib.linux-x86_64-cpython-38/horovod/tensorflow
copying horovod/tensorflow/gradient_aggregation.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
copying horovod/tensorflow/sync_batch_norm.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
copying horovod/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
copying horovod/tensorflow/functions.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
copying horovod/tensorflow/compression.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
copying horovod/tensorflow/util.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
copying horovod/tensorflow/gradient_aggregation_eager.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
copying horovod/tensorflow/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
copying horovod/tensorflow/mpi_ops.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
creating build/lib.linux-x86_64-cpython-38/horovod/mxnet
copying horovod/mxnet/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/mxnet
copying horovod/mxnet/functions.py -> build/lib.linux-x86_64-cpython-38/horovod/mxnet
copying horovod/mxnet/compression.py -> build/lib.linux-x86_64-cpython-38/horovod/mxnet
copying horovod/mxnet/mpi_ops.py -> build/lib.linux-x86_64-cpython-38/horovod/mxnet
creating build/lib.linux-x86_64-cpython-38/horovod/common
copying horovod/common/exceptions.py -> build/lib.linux-x86_64-cpython-38/horovod/common
copying horovod/common/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/common
copying horovod/common/util.py -> build/lib.linux-x86_64-cpython-38/horovod/common
copying horovod/common/process_sets.py -> build/lib.linux-x86_64-cpython-38/horovod/common
copying horovod/common/basics.py -> build/lib.linux-x86_64-cpython-38/horovod/common
copying horovod/common/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/common
creating build/lib.linux-x86_64-cpython-38/horovod/data
copying horovod/data/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/data
copying horovod/data/data_loader_base.py -> build/lib.linux-x86_64-cpython-38/horovod/data
creating build/lib.linux-x86_64-cpython-38/horovod/torch
copying horovod/torch/sync_batch_norm.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
copying horovod/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
copying horovod/torch/functions.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
copying horovod/torch/compression.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
copying horovod/torch/optimizer.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
copying horovod/torch/mpi_ops.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
creating build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/utils.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/elastic_v2.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/strategy.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/ray_logger.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/runner.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/adapter.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/driver_service.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/worker.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
copying horovod/ray/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
creating build/lib.linux-x86_64-cpython-38/horovod/spark
copying horovod/spark/mpi_run.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
copying horovod/spark/conf.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
copying horovod/spark/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
copying horovod/spark/gloo_run.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
copying horovod/spark/runner.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
creating build/lib.linux-x86_64-cpython-38/horovod/_keras
copying horovod/_keras/callbacks.py -> build/lib.linux-x86_64-cpython-38/horovod/_keras
copying horovod/_keras/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/_keras
copying horovod/_keras/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/_keras
creating build/lib.linux-x86_64-cpython-38/horovod/keras
copying horovod/keras/callbacks.py -> build/lib.linux-x86_64-cpython-38/horovod/keras
copying horovod/keras/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/keras
copying horovod/keras/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/keras
creating build/lib.linux-x86_64-cpython-38/horovod/runner/common
copying horovod/runner/common/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common
creating build/lib.linux-x86_64-cpython-38/horovod/runner/driver
copying horovod/runner/driver/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/driver
copying horovod/runner/driver/driver_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/driver
creating build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
copying horovod/runner/elastic/registration.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
copying horovod/runner/elastic/discovery.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
copying horovod/runner/elastic/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
copying horovod/runner/elastic/driver.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
copying horovod/runner/elastic/settings.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
copying horovod/runner/elastic/constants.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
copying horovod/runner/elastic/rendezvous.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
copying horovod/runner/elastic/worker.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
creating build/lib.linux-x86_64-cpython-38/horovod/runner/http
copying horovod/runner/http/http_server.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/http
copying horovod/runner/http/http_client.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/http
copying horovod/runner/http/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/http
creating build/lib.linux-x86_64-cpython-38/horovod/runner/task
copying horovod/runner/task/task_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/task
copying horovod/runner/task/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/task
creating build/lib.linux-x86_64-cpython-38/horovod/runner/util
copying horovod/runner/util/cache.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
copying horovod/runner/util/streams.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
copying horovod/runner/util/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
copying horovod/runner/util/remote.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
copying horovod/runner/util/threads.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
copying horovod/runner/util/lsf.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
copying horovod/runner/util/network.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
creating build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
copying horovod/runner/common/service/task_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
copying horovod/runner/common/service/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
copying horovod/runner/common/service/compute_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
copying horovod/runner/common/service/driver_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
creating build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/timeout.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/config_parser.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/tiny_shell_exec.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/host_hash.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/settings.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/codec.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/env.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/safe_shell_exec.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/network.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/hosts.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
copying horovod/runner/common/util/secret.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
creating build/lib.linux-x86_64-cpython-38/horovod/tensorflow/data
copying horovod/tensorflow/data/compute_worker.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/data
copying horovod/tensorflow/data/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/data
copying horovod/tensorflow/data/compute_service.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/data
creating build/lib.linux-x86_64-cpython-38/horovod/tensorflow/keras
copying horovod/tensorflow/keras/callbacks.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/keras
copying horovod/tensorflow/keras/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/keras
copying horovod/tensorflow/keras/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/keras
creating build/lib.linux-x86_64-cpython-38/horovod/torch/elastic
copying horovod/torch/elastic/state.py -> build/lib.linux-x86_64-cpython-38/horovod/torch/elastic
copying horovod/torch/elastic/sampler.py -> build/lib.linux-x86_64-cpython-38/horovod/torch/elastic
copying horovod/torch/elastic/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/torch/elastic
creating build/lib.linux-x86_64-cpython-38/horovod/spark/data_loaders
copying horovod/spark/data_loaders/pytorch_data_loaders.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/data_loaders
copying horovod/spark/data_loaders/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/data_loaders
creating build/lib.linux-x86_64-cpython-38/horovod/spark/tensorflow
copying horovod/spark/tensorflow/compute_worker.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/tensorflow
copying horovod/spark/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/tensorflow
creating build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/cache.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/_namedtuple_fix.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/params.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/datamodule.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/serialization.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/backend.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/store.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/constants.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/util.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
copying horovod/spark/common/estimator.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
creating build/lib.linux-x86_64-cpython-38/horovod/spark/driver
copying horovod/spark/driver/host_discovery.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
copying horovod/spark/driver/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
copying horovod/spark/driver/job_id.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
copying horovod/spark/driver/mpirun_rsh.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
copying horovod/spark/driver/rendezvous.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
copying horovod/spark/driver/driver_service.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
copying horovod/spark/driver/rsh.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
creating build/lib.linux-x86_64-cpython-38/horovod/spark/torch
copying horovod/spark/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
copying horovod/spark/torch/remote.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
copying horovod/spark/torch/datamodule.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
copying horovod/spark/torch/util.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
copying horovod/spark/torch/estimator.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
creating build/lib.linux-x86_64-cpython-38/horovod/spark/task
copying horovod/spark/task/gloo_exec_fn.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
copying horovod/spark/task/task_info.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
copying horovod/spark/task/mpirun_exec_fn.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
copying horovod/spark/task/task_service.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
copying horovod/spark/task/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
creating build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
copying horovod/spark/lightning/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
copying horovod/spark/lightning/remote.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
copying horovod/spark/lightning/datamodule.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
copying horovod/spark/lightning/util.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
copying horovod/spark/lightning/estimator.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
copying horovod/spark/lightning/legacy.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
creating build/lib.linux-x86_64-cpython-38/horovod/spark/keras
copying horovod/spark/keras/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
copying horovod/spark/keras/remote.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
copying horovod/spark/keras/datamodule.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
copying horovod/spark/keras/tensorflow.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
copying horovod/spark/keras/util.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
copying horovod/spark/keras/optimizer.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
copying horovod/spark/keras/bare.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
copying horovod/spark/keras/estimator.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
running build_ext
Running CMake in build/temp.linux-x86_64-cpython-38/RelWithDebInfo:
cmake /tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63/build/lib.linux-x86_64-cpython-38 -DPYTHON_EXECUTABLE:FILEPATH=/home/leo/miniconda3/envs/topk/bin/python
cmake --build . --config RelWithDebInfo -- -j8 VERBOSE=1
-- Could not find CCache. Consider installing CCache to speed up compilation.
-- The CXX compiler identification is GNU 11.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/leo/miniconda3/envs/topk/bin/x86_64-conda-linux-gnu-c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build architecture flags: -mf16c -mavx -mfma
-- Using command /home/leo/miniconda3/envs/topk/bin/python
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS)
-- Could NOT find MPI (missing: MPI_CXX_FOUND)
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 12.1.66
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.1.66")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Linking against static NCCL library
-- Could not find static NCCL library. Trying to find shared lib instead.
CMake Error at /home/leo/miniconda3/envs/topk/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find NCCL (missing: NCCL_INCLUDE_DIR NCCL_LIBRARY)
Call Stack (most recent call first):
/home/leo/miniconda3/envs/topk/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
cmake/Modules/FindNCCL.cmake:53 (find_package_handle_standard_args)
CMakeLists.txt:223 (find_package)
-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63/setup.py", line 213, in <module>
setup(name='horovod',
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/__init__.py", line 117, in setup
return distutils.core.setup(**attrs)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 184, in setup
return run_commands(dist)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
dist.run_commands()
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 954, in run_commands
self.run_command(cmd)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/dist.py", line 950, in run_command
super().run_command(command)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
cmd_obj.run()
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/command/bdist_wheel.py", line 384, in run
self.run_command("build")
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/dist.py", line 950, in run_command
super().run_command(command)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
cmd_obj.run()
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/dist.py", line 950, in run_command
super().run_command(command)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
cmd_obj.run()
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 98, in run
_build_ext.run(self)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
self.build_extensions()
File "/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63/setup.py", line 145, in build_extensions
subprocess.check_call(command, cwd=cmake_build_dir)
File "/home/leo/miniconda3/envs/topk/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63/build/lib.linux-x86_64-cpython-38', '-DPYTHON_EXECUTABLE:FILEPATH=/home/leo/miniconda3/envs/topk/bin/python']' returned non-zero exit status 1.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for horovod
Running setup.py clean for horovod
Failed to build horovod
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (horovod)
1. 核心问题没有wheel文件,
只要创建wheel文件就好,但是没有现成的horovod-0.28.1-cp38-cp38-linux_x86_64.whl文件,需要自己build构建。
1.为了保险期间还是先升级下settools等等。
#激活虚拟环境(这些包我也不知道有没有用,但是以防万一都装上吧)
sudo apt install -y cmake
pip install --upgrade pip setuptools
conda install gxx_linux-64
conda install gcc_linux-64 gxx_linux-64 make cmake -c conda-forge
2. 安装cuda和pytorch
conda install cudatoolkit=11.8 -c nvidia
conda install -c nvidia cudnn
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
3. 安装NCCL2
去Nvidia官网下载自己对应的版本,下载对应的版本!!!
这里以我的为例子
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt install libnccl2=2.16.5-1+cuda11.8 libnccl-dev=2.16.5-1+cuda11.8
4. 安装openmpi
这里我是这样安装的,你看你安装的nccl2发行的时间,去Open MPI: Version 5.0找到相近的时间去下载,注意不要使用3.1.3版本,这个版本官方说有问题。
我这里下载的是openmpi-4.1.6的压缩包然后执行下面的命令
tar -xvzf openmpi-4.1.6.tar.gz
cd openmpi-4.1.6
./configure --prefix=/usr/local
sudo make all install
5. 安装horovod框架
很多教程都是让按照下面代码
HOROVOD_GPU_OPERATIONS=NCCL pip install horovod
或者
HOROVOD_GPU_OPERATIONS=NCCL pip install --no-cache-dir horovod
但是会出现文章最初的报错。解决办法,没轮子造轮子!!!
1. 去horovod · PyPI下载horovod,解压horovod,生成horovod目录文件夹
2. 进入horovod文件夹,然后创建build文件夹。
3. 接下来需要确保自己的nvcc可用,输入一下命令
nvcc --version
如果正常输出,那就什么也不用管,如果输出什么no found,编辑自己的.bashrc文件
nano ~/.bashrc
在里面添加下面两行命令
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
然后退出,在执行下面的命令
source ~./bashrc
重新激活自己的虚拟环境
4. 导航到包含 setup.py
的目录下
这里你可以尝试下面的命令
python setup.py install
但我遇到报错
/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/__init__.py:81: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try pip install --use-pep517.
********************************************************************************
!!
dist.fetch_build_eggs(dist.setup_requires)
running install
/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!
********************************************************************************
Please avoid running `setup.py directly.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
********************************************************************************
!!
self.initialize_options()
/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!
********************************************************************************
Please avoid running `setup.py and easy_install.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://github.com/pypa/setuptools/issues/917 for details.
********************************************************************************
!!
self.initialize_options()
running bdist_egg
running egg_info
writing horovod.egg-info/PKG-INFO
writing dependency_links to horovod.egg-info/dependency_links.txt
writing entry points to horovod.egg-info/entry_points.txt
writing requirements to horovod.egg-info/requires.txt
writing top-level names to horovod.egg-info/top_level.txt
reading manifest file 'horovod.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files found matching 'third_party/eigen/Eigen/Eigen'
warning: no previously-included files found matching 'third_party/eigen/Eigen/IterativeLinearSolvers'
warning: no previously-included files found matching 'third_party/eigen/Eigen/MetisSupport'
warning: no previously-included files found matching 'third_party/eigen/Eigen/Sparse'
warning: no previously-included files found matching 'third_party/eigen/Eigen/SparseCholesky'
warning: no previously-included files found matching 'third_party/eigen/Eigen/SparseLU'
warning: no previously-included files found matching 'third_party/eigen/Eigen/src/IterativeSolvers/*'
warning: no previously-included files found matching 'third_party/eigen/Eigen/src/OrderingMethods/Amd.h'
warning: no previously-included files found matching 'third_party/eigen/Eigen/src/SparseCholesky/*'
warning: no previously-included files found matching 'third_party/eigen/unsupported/test/mpreal/mpreal.h'
warning: no previously-included files found matching 'third_party/eigen/unsupported/Eigen/FFT'
warning: no previously-included files found matching 'third_party/eigen/unsupported/Eigen/MPRealSupport'
warning: no previously-included files found matching 'third_party/eigen/doc/PreprocessorDirectives.dox'
warning: no previously-included files found matching 'third_party/eigen/doc/UsingIntelMKL.dox'
warning: no previously-included files found matching 'third_party/eigen/doc/SparseLinearSystems.dox'
warning: no previously-included files found matching 'third_party/eigen/COPYING.GPL'
warning: no previously-included files found matching 'third_party/eigen/COPYING.LGPL'
warning: no previously-included files found matching 'third_party/eigen/COPYING.README'
adding license file 'LICENSE'
writing manifest file 'horovod.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/horovod
copying horovod/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod
creating build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/mpi_run.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/launch.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/task_fn.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/gloo_run.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/js_run.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/run_task.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
creating build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/gradient_aggregation.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/sync_batch_norm.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/functions.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/compression.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/util.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/gradient_aggregation_eager.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/mpi_ops.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
creating build/lib.linux-x86_64-cpython-310/horovod/mxnet
copying horovod/mxnet/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/mxnet
copying horovod/mxnet/functions.py -> build/lib.linux-x86_64-cpython-310/horovod/mxnet
copying horovod/mxnet/compression.py -> build/lib.linux-x86_64-cpython-310/horovod/mxnet
copying horovod/mxnet/mpi_ops.py -> build/lib.linux-x86_64-cpython-310/horovod/mxnet
creating build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/exceptions.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/util.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/process_sets.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/basics.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/common
creating build/lib.linux-x86_64-cpython-310/horovod/data
copying horovod/data/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/data
copying horovod/data/data_loader_base.py -> build/lib.linux-x86_64-cpython-310/horovod/data
creating build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/sync_batch_norm.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/functions.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/compression.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/optimizer.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/mpi_ops.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
creating build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/utils.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/elastic_v2.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/strategy.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/ray_logger.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/runner.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/adapter.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/driver_service.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/worker.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
creating build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/mpi_run.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/conf.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/gloo_run.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/runner.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
creating build/lib.linux-x86_64-cpython-310/horovod/_keras
copying horovod/_keras/callbacks.py -> build/lib.linux-x86_64-cpython-310/horovod/_keras
copying horovod/_keras/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/_keras
copying horovod/_keras/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/_keras
creating build/lib.linux-x86_64-cpython-310/horovod/keras
copying horovod/keras/callbacks.py -> build/lib.linux-x86_64-cpython-310/horovod/keras
copying horovod/keras/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/keras
copying horovod/keras/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/keras
creating build/lib.linux-x86_64-cpython-310/horovod/runner/common
copying horovod/runner/common/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common
creating build/lib.linux-x86_64-cpython-310/horovod/runner/driver
copying horovod/runner/driver/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/driver
copying horovod/runner/driver/driver_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/driver
creating build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/registration.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/discovery.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/driver.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/settings.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/constants.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/rendezvous.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/worker.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
creating build/lib.linux-x86_64-cpython-310/horovod/runner/http
copying horovod/runner/http/http_server.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/http
copying horovod/runner/http/http_client.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/http
copying horovod/runner/http/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/http
creating build/lib.linux-x86_64-cpython-310/horovod/runner/task
copying horovod/runner/task/task_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/task
copying horovod/runner/task/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/task
creating build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/cache.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/streams.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/remote.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/threads.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/lsf.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/network.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
creating build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
copying horovod/runner/common/service/task_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
copying horovod/runner/common/service/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
copying horovod/runner/common/service/compute_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
copying horovod/runner/common/service/driver_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
creating build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/timeout.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/config_parser.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/tiny_shell_exec.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/host_hash.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/settings.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/codec.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/env.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/safe_shell_exec.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/network.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/hosts.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/secret.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
creating build/lib.linux-x86_64-cpython-310/horovod/tensorflow/data
copying horovod/tensorflow/data/compute_worker.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/data
copying horovod/tensorflow/data/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/data
copying horovod/tensorflow/data/compute_service.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/data
creating build/lib.linux-x86_64-cpython-310/horovod/tensorflow/keras
copying horovod/tensorflow/keras/callbacks.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/keras
copying horovod/tensorflow/keras/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/keras
copying horovod/tensorflow/keras/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/keras
creating build/lib.linux-x86_64-cpython-310/horovod/torch/elastic
copying horovod/torch/elastic/state.py -> build/lib.linux-x86_64-cpython-310/horovod/torch/elastic
copying horovod/torch/elastic/sampler.py -> build/lib.linux-x86_64-cpython-310/horovod/torch/elastic
copying horovod/torch/elastic/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/torch/elastic
creating build/lib.linux-x86_64-cpython-310/horovod/spark/data_loaders
copying horovod/spark/data_loaders/pytorch_data_loaders.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/data_loaders
copying horovod/spark/data_loaders/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/data_loaders
creating build/lib.linux-x86_64-cpython-310/horovod/spark/tensorflow
copying horovod/spark/tensorflow/compute_worker.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/tensorflow
copying horovod/spark/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/tensorflow
creating build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/cache.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/_namedtuple_fix.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/params.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/datamodule.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/serialization.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/backend.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/store.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/constants.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/util.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/estimator.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
creating build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/host_discovery.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/job_id.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/mpirun_rsh.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/rendezvous.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/driver_service.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/rsh.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
creating build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/remote.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/datamodule.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/util.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/estimator.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
creating build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/gloo_exec_fn.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/task_info.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/mpirun_exec_fn.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/task_service.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
creating build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/remote.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/datamodule.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/util.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/estimator.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/legacy.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
creating build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/remote.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/datamodule.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/tensorflow.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/util.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/optimizer.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/bare.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/estimator.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
running build_ext
Running CMake in build/temp.linux-x86_64-cpython-310/RelWithDebInfo:
cmake /home/leo/horovod-0.28.1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/home/leo/horovod-0.28.1/build/lib.linux-x86_64-cpython-310 -DPYTHON_EXECUTABLE:FILEPATH=/home/leo/miniconda3/bin/python
cmake --build . --config RelWithDebInfo -- -j8 VERBOSE=1
-- Could not find CCache. Consider installing CCache to speed up compilation.
-- The CXX compiler identification is GNU 11.2.0
-- Check for working CXX compiler: /home/leo/miniconda3/bin/x86_64-conda-linux-gnu-c++
-- Check for working CXX compiler: /home/leo/miniconda3/bin/x86_64-conda-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build architecture flags: -mf16c -mavx -mfma
-- Using command /home/leo/miniconda3/bin/python
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_WORKS)
-- Could NOT find MPI (missing: MPI_CXX_FOUND)
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- Looking for a CUDA host compiler - /home/leo/miniconda3/bin/x86_64-conda-linux-gnu-c++
-- The CUDA compiler identification is unknown
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- broken
CMake Error at /usr/share/cmake-3.16/Modules/CMakeTestCUDACompiler.cmake:46 (message):
The CUDA compiler
"/usr/local/cuda/bin/nvcc"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: /home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeTmp
Run Build Command(s):/usr/bin/make cmTC_0ffd3/fast && /usr/bin/make -f CMakeFiles/cmTC_0ffd3.dir/build.make CMakeFiles/cmTC_0ffd3.dir/build
make[1]: Entering directory '/home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeTmp'
Building CUDA object CMakeFiles/cmTC_0ffd3.dir/main.cu.o
/usr/local/cuda/bin/nvcc -ccbin=/home/leo/miniconda3/bin/x86_64-conda-linux-gnu-c++ -x cu -c /home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_0ffd3.dir/main.cu.o
Linking CUDA executable cmTC_0ffd3
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_0ffd3.dir/link.txt --verbose=1
"" -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/leo/miniconda3/lib -Wl,-rpath-link,/home/leo/miniconda3/lib -L/home/leo/miniconda3/lib CMakeFiles/cmTC_0ffd3.dir/main.cu.o -o cmTC_0ffd3
Error running link command: No such file or directory
make[1]: *** [CMakeFiles/cmTC_0ffd3.dir/build.make:87: cmTC_0ffd3] Error 2
make[1]: Leaving directory '/home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeTmp'
make: *** [Makefile:121: cmTC_0ffd3/fast] Error 2
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:177 (enable_language)
-- Configuring incomplete, errors occurred!
See also "/home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeOutput.log".
See also "/home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeError.log".
Traceback (most recent call last):
File "/home/leo/horovod-0.28.1/setup.py", line 213, in <module>
setup(name='horovod',
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
return distutils.core.setup(**attrs)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
return run_commands(dist)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
dist.run_commands()
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/install.py", line 87, in run
self.do_egg_install()
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/install.py", line 139, in do_egg_install
self.run_command('bdist_egg')
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 167, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 153, in call_command
self.run_command(cmdname)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/command/install_lib.py", line 110, in build
self.run_command('build_ext')
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 91, in run
_build_ext.run(self)
File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
self.build_extensions()
File "/home/leo/horovod-0.28.1/setup.py", line 145, in build_extensions
subprocess.check_call(command, cwd=cmake_build_dir)
File "/home/leo/miniconda3/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/home/leo/horovod-0.28.1', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/home/leo/horovod-0.28.1/build/lib.linux-x86_64-cpython-310', '-DPYTHON_EXECUTABLE:FILEPATH=/home/leo/miniconda3/bin/python']' returned non-zero exit status 1.
换种思路解决问题,执行下面的代码
pip install build
python -m build
此时发现出现horovod-0.28.1-cp38-cp38-linux_x86_64.whl。问题迎来了曙光,接下来关键问题怎么安装.whl呢?
首先找到自己whl文件所在的文件夹下,应该在dist文件夹里,然后进入,执行下面的程序
conda activate 你的虚拟环境 #一定要确保自己的虚拟环境处于激活状态
python -m pip install horovod-0.28.1-cp38-cp38-linux_x86_64.whl
大功告成,安装成功,接下来可以简单验证下是否安装成功。
python
import horovod
如果没报错,说明安装成功了!!!
以上文章仅代表个人观点,不保证按照我的总结流程一定成功!!!随机应变!!!终于可以开始继续弄论文,写程序!!!