解决WSL-Ubuntu-Conda环境下安装horovod框架失败的问题。报错为 Building wheel for horovod (setup.py) ... error

首先放上自己遇到的报错情况。Running setup.py clean for horovod Failed to build horovod ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (horovod), 熬夜到凌晨两点debug,今天上午突发奇想没想到真解决成功了,先说下我的配置cuda11.8+pytorch2.10+python3.8.19+wsl2+ubuntu20.04, 网上给的安装教程博客很老,大部分基于老版本11.0左右的cuda进行安装的,而且很少关于conda环境下的安装,官网给的文档是基于cuda10.0环境下的conda安装,国内外很多网站没有看到基于WSL的conda环境下安装horovod,但不可否认以前那些老文档和博客有参考价值。先说结论horovod在windows环境不能安装!!!,只能在linux或者macOS系统下安装,对于windows系统有前人尝试成功在docker下安装horovod。可以参考这个博客

分布式训练-多机多卡 通过docker安装horovod框架-优快云博客

conda环境下安装horovod本身就是一件麻烦的事情,而且很多时不时出现很多问题。如果再加上wsl那更加的不确定能不能装上。同时我当时弄的时候老是怀疑WSL的Ubuntu20.04和真正的Ubuntu20.04能完全一样吗?

但是从理论上wsl上装一个Ubuntu20.04可以实现windows下的linux的使用,所以可以报着试试的心态,中间遇到报错和报红,以及解决报错报红,不断尝试是一件很痛苦的时间,我为了安装成功我尝试了10几次,中间也放弃wsl,上真正的Linux系统上装horovod框架,最后上午来了灵感一试真成功了。中间解决错误的过程不放上了,直接放上正确步骤

PS:网上说的升级settools, cmake, gcc, g++什么的我都试了没什么用,对于我安装。

灵感来源于下面文章。

解决 ERROR: Failed to build installable wheels for some pyproject.toml based projects (dlib)_error: error: failed to build installable wheels f-优快云博客

https://zhuanlan.zhihu.com/p/677440225

pip安装python库时报Failed building wheel for xxx错误的解决方法_python_脚本之家

Building wheel for horovod (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [277 lines of output]
      /home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/__init__.py:94: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
      !!

              ********************************************************************************
              Requirements should be satisfied by a PEP 517 installer.
              If you are using pip, you can try pip install --use-pep517.
              ********************************************************************************

      !!
        dist.fetch_build_eggs(dist.setup_requires)
      /home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'tests_require'
        warnings.warn(msg)
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-38
      creating build/lib.linux-x86_64-cpython-38/horovod
      copying horovod/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod
      creating build/lib.linux-x86_64-cpython-38/horovod/runner
      copying horovod/runner/mpi_run.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
      copying horovod/runner/launch.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
      copying horovod/runner/task_fn.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
      copying horovod/runner/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
      copying horovod/runner/gloo_run.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
      copying horovod/runner/js_run.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
      copying horovod/runner/run_task.py -> build/lib.linux-x86_64-cpython-38/horovod/runner
      creating build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      copying horovod/tensorflow/gradient_aggregation.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      copying horovod/tensorflow/sync_batch_norm.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      copying horovod/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      copying horovod/tensorflow/functions.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      copying horovod/tensorflow/compression.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      copying horovod/tensorflow/util.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      copying horovod/tensorflow/gradient_aggregation_eager.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      copying horovod/tensorflow/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      copying horovod/tensorflow/mpi_ops.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow
      creating build/lib.linux-x86_64-cpython-38/horovod/mxnet
      copying horovod/mxnet/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/mxnet
      copying horovod/mxnet/functions.py -> build/lib.linux-x86_64-cpython-38/horovod/mxnet
      copying horovod/mxnet/compression.py -> build/lib.linux-x86_64-cpython-38/horovod/mxnet
      copying horovod/mxnet/mpi_ops.py -> build/lib.linux-x86_64-cpython-38/horovod/mxnet
      creating build/lib.linux-x86_64-cpython-38/horovod/common
      copying horovod/common/exceptions.py -> build/lib.linux-x86_64-cpython-38/horovod/common
      copying horovod/common/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/common
      copying horovod/common/util.py -> build/lib.linux-x86_64-cpython-38/horovod/common
      copying horovod/common/process_sets.py -> build/lib.linux-x86_64-cpython-38/horovod/common
      copying horovod/common/basics.py -> build/lib.linux-x86_64-cpython-38/horovod/common
      copying horovod/common/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/common
      creating build/lib.linux-x86_64-cpython-38/horovod/data
      copying horovod/data/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/data
      copying horovod/data/data_loader_base.py -> build/lib.linux-x86_64-cpython-38/horovod/data
      creating build/lib.linux-x86_64-cpython-38/horovod/torch
      copying horovod/torch/sync_batch_norm.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
      copying horovod/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
      copying horovod/torch/functions.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
      copying horovod/torch/compression.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
      copying horovod/torch/optimizer.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
      copying horovod/torch/mpi_ops.py -> build/lib.linux-x86_64-cpython-38/horovod/torch
      creating build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/utils.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/elastic_v2.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/strategy.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/ray_logger.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/runner.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/adapter.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/driver_service.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/worker.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      copying horovod/ray/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/ray
      creating build/lib.linux-x86_64-cpython-38/horovod/spark
      copying horovod/spark/mpi_run.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
      copying horovod/spark/conf.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
      copying horovod/spark/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
      copying horovod/spark/gloo_run.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
      copying horovod/spark/runner.py -> build/lib.linux-x86_64-cpython-38/horovod/spark
      creating build/lib.linux-x86_64-cpython-38/horovod/_keras
      copying horovod/_keras/callbacks.py -> build/lib.linux-x86_64-cpython-38/horovod/_keras
      copying horovod/_keras/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/_keras
      copying horovod/_keras/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/_keras
      creating build/lib.linux-x86_64-cpython-38/horovod/keras
      copying horovod/keras/callbacks.py -> build/lib.linux-x86_64-cpython-38/horovod/keras
      copying horovod/keras/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/keras
      copying horovod/keras/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/keras
      creating build/lib.linux-x86_64-cpython-38/horovod/runner/common
      copying horovod/runner/common/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common
      creating build/lib.linux-x86_64-cpython-38/horovod/runner/driver
      copying horovod/runner/driver/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/driver
      copying horovod/runner/driver/driver_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/driver
      creating build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
      copying horovod/runner/elastic/registration.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
      copying horovod/runner/elastic/discovery.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
      copying horovod/runner/elastic/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
      copying horovod/runner/elastic/driver.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
      copying horovod/runner/elastic/settings.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
      copying horovod/runner/elastic/constants.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
      copying horovod/runner/elastic/rendezvous.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
      copying horovod/runner/elastic/worker.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/elastic
      creating build/lib.linux-x86_64-cpython-38/horovod/runner/http
      copying horovod/runner/http/http_server.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/http
      copying horovod/runner/http/http_client.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/http
      copying horovod/runner/http/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/http
      creating build/lib.linux-x86_64-cpython-38/horovod/runner/task
      copying horovod/runner/task/task_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/task
      copying horovod/runner/task/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/task
      creating build/lib.linux-x86_64-cpython-38/horovod/runner/util
      copying horovod/runner/util/cache.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
      copying horovod/runner/util/streams.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
      copying horovod/runner/util/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
      copying horovod/runner/util/remote.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
      copying horovod/runner/util/threads.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
      copying horovod/runner/util/lsf.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
      copying horovod/runner/util/network.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/util
      creating build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
      copying horovod/runner/common/service/task_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
      copying horovod/runner/common/service/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
      copying horovod/runner/common/service/compute_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
      copying horovod/runner/common/service/driver_service.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/service
      creating build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/timeout.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/config_parser.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/tiny_shell_exec.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/host_hash.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/settings.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/codec.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/env.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/safe_shell_exec.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/network.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/hosts.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      copying horovod/runner/common/util/secret.py -> build/lib.linux-x86_64-cpython-38/horovod/runner/common/util
      creating build/lib.linux-x86_64-cpython-38/horovod/tensorflow/data
      copying horovod/tensorflow/data/compute_worker.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/data
      copying horovod/tensorflow/data/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/data
      copying horovod/tensorflow/data/compute_service.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/data
      creating build/lib.linux-x86_64-cpython-38/horovod/tensorflow/keras
      copying horovod/tensorflow/keras/callbacks.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/keras
      copying horovod/tensorflow/keras/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/keras
      copying horovod/tensorflow/keras/elastic.py -> build/lib.linux-x86_64-cpython-38/horovod/tensorflow/keras
      creating build/lib.linux-x86_64-cpython-38/horovod/torch/elastic
      copying horovod/torch/elastic/state.py -> build/lib.linux-x86_64-cpython-38/horovod/torch/elastic
      copying horovod/torch/elastic/sampler.py -> build/lib.linux-x86_64-cpython-38/horovod/torch/elastic
      copying horovod/torch/elastic/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/torch/elastic
      creating build/lib.linux-x86_64-cpython-38/horovod/spark/data_loaders
      copying horovod/spark/data_loaders/pytorch_data_loaders.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/data_loaders
      copying horovod/spark/data_loaders/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/data_loaders
      creating build/lib.linux-x86_64-cpython-38/horovod/spark/tensorflow
      copying horovod/spark/tensorflow/compute_worker.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/tensorflow
      copying horovod/spark/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/tensorflow
      creating build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/cache.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/_namedtuple_fix.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/params.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/datamodule.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/serialization.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/backend.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/store.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/constants.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/util.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      copying horovod/spark/common/estimator.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/common
      creating build/lib.linux-x86_64-cpython-38/horovod/spark/driver
      copying horovod/spark/driver/host_discovery.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
      copying horovod/spark/driver/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
      copying horovod/spark/driver/job_id.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
      copying horovod/spark/driver/mpirun_rsh.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
      copying horovod/spark/driver/rendezvous.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
      copying horovod/spark/driver/driver_service.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
      copying horovod/spark/driver/rsh.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/driver
      creating build/lib.linux-x86_64-cpython-38/horovod/spark/torch
      copying horovod/spark/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
      copying horovod/spark/torch/remote.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
      copying horovod/spark/torch/datamodule.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
      copying horovod/spark/torch/util.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
      copying horovod/spark/torch/estimator.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/torch
      creating build/lib.linux-x86_64-cpython-38/horovod/spark/task
      copying horovod/spark/task/gloo_exec_fn.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
      copying horovod/spark/task/task_info.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
      copying horovod/spark/task/mpirun_exec_fn.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
      copying horovod/spark/task/task_service.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
      copying horovod/spark/task/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/task
      creating build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
      copying horovod/spark/lightning/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
      copying horovod/spark/lightning/remote.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
      copying horovod/spark/lightning/datamodule.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
      copying horovod/spark/lightning/util.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
      copying horovod/spark/lightning/estimator.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
      copying horovod/spark/lightning/legacy.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/lightning
      creating build/lib.linux-x86_64-cpython-38/horovod/spark/keras
      copying horovod/spark/keras/__init__.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
      copying horovod/spark/keras/remote.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
      copying horovod/spark/keras/datamodule.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
      copying horovod/spark/keras/tensorflow.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
      copying horovod/spark/keras/util.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
      copying horovod/spark/keras/optimizer.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
      copying horovod/spark/keras/bare.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
      copying horovod/spark/keras/estimator.py -> build/lib.linux-x86_64-cpython-38/horovod/spark/keras
      running build_ext
      Running CMake in build/temp.linux-x86_64-cpython-38/RelWithDebInfo:
      cmake /tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63/build/lib.linux-x86_64-cpython-38 -DPYTHON_EXECUTABLE:FILEPATH=/home/leo/miniconda3/envs/topk/bin/python
      cmake --build . --config RelWithDebInfo -- -j8 VERBOSE=1
      -- Could not find CCache. Consider installing CCache to speed up compilation.
      -- The CXX compiler identification is GNU 11.2.0
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /home/leo/miniconda3/envs/topk/bin/x86_64-conda-linux-gnu-c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Build architecture flags: -mf16c -mavx -mfma
      -- Using command /home/leo/miniconda3/envs/topk/bin/python
      -- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS)
      -- Could NOT find MPI (missing: MPI_CXX_FOUND)
      -- Looking for a CUDA compiler
      -- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
      -- The CUDA compiler identification is NVIDIA 12.1.66
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.1.66")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Linking against static NCCL library
      -- Could not find static NCCL library. Trying to find shared lib instead.
      CMake Error at /home/leo/miniconda3/envs/topk/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
        Could NOT find NCCL (missing: NCCL_INCLUDE_DIR NCCL_LIBRARY)
      Call Stack (most recent call first):
        /home/leo/miniconda3/envs/topk/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
        cmake/Modules/FindNCCL.cmake:53 (find_package_handle_standard_args)
        CMakeLists.txt:223 (find_package)


      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63/setup.py", line 213, in <module>
          setup(name='horovod',
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/__init__.py", line 117, in setup
          return distutils.core.setup(**attrs)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 954, in run_commands
          self.run_command(cmd)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/command/bdist_wheel.py", line 384, in run
          self.run_command("build")
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 98, in run
          _build_ext.run(self)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
        File "/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63/setup.py", line 145, in build_extensions
          subprocess.check_call(command, cwd=cmake_build_dir)
        File "/home/leo/miniconda3/envs/topk/lib/python3.8/subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/tmp/pip-install-wwe5_bic/horovod_5ea236ba297f4a71abc6d48f62096b63/build/lib.linux-x86_64-cpython-38', '-DPYTHON_EXECUTABLE:FILEPATH=/home/leo/miniconda3/envs/topk/bin/python']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for horovod
  Running setup.py clean for horovod
Failed to build horovod
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (horovod)

1. 核心问题没有wheel文件,

只要创建wheel文件就好,但是没有现成的horovod-0.28.1-cp38-cp38-linux_x86_64.whl文件,需要自己build构建。

1.为了保险期间还是先升级下settools等等。

#激活虚拟环境(这些包我也不知道有没有用,但是以防万一都装上吧)

sudo apt install -y cmake
pip install --upgrade pip setuptools 
conda install  gxx_linux-64
conda install gcc_linux-64 gxx_linux-64 make cmake -c conda-forge

2. 安装cuda和pytorch

conda install cudatoolkit=11.8 -c nvidia
conda install -c nvidia cudnn
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia

3. 安装NCCL2

去Nvidia官网下载自己对应的版本,下载对应的版本!!!

Log in | NVIDIA Developer

这里以我的为例子

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt install libnccl2=2.16.5-1+cuda11.8 libnccl-dev=2.16.5-1+cuda11.8

4. 安装openmpi

这里我是这样安装的,你看你安装的nccl2发行的时间,去Open MPI: Version 5.0找到相近的时间去下载,注意不要使用3.1.3版本,这个版本官方说有问题。

我这里下载的是openmpi-4.1.6的压缩包然后执行下面的命令

tar -xvzf openmpi-4.1.6.tar.gz
cd openmpi-4.1.6
./configure --prefix=/usr/local
sudo make all install

5. 安装horovod框架

很多教程都是让按照下面代码

HOROVOD_GPU_OPERATIONS=NCCL pip install horovod
或者
HOROVOD_GPU_OPERATIONS=NCCL pip install --no-cache-dir horovod

但是会出现文章最初的报错。解决办法,没轮子造轮子!!!

1. 去horovod · PyPI下载horovod,解压horovod,生成horovod目录文件夹
2. 进入horovod文件夹,然后创建build文件夹。
3. 接下来需要确保自己的nvcc可用,输入一下命令
nvcc --version

如果正常输出,那就什么也不用管,如果输出什么no found,编辑自己的.bashrc文件

nano ~/.bashrc

在里面添加下面两行命令

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

然后退出,在执行下面的命令

source ~./bashrc

重新激活自己的虚拟环境

4. 导航到包含 setup.py 的目录下

这里你可以尝试下面的命令

python setup.py install

但我遇到报错

/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/__init__.py:81: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!

        ********************************************************************************
        Requirements should be satisfied by a PEP 517 installer.
        If you are using pip, you can try pip install --use-pep517.
        ********************************************************************************

!!
  dist.fetch_build_eggs(dist.setup_requires)
running install
/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running `setup.py directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running `setup.py and easy_install.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()
running bdist_egg
running egg_info
writing horovod.egg-info/PKG-INFO
writing dependency_links to horovod.egg-info/dependency_links.txt
writing entry points to horovod.egg-info/entry_points.txt
writing requirements to horovod.egg-info/requires.txt
writing top-level names to horovod.egg-info/top_level.txt
reading manifest file 'horovod.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files found matching 'third_party/eigen/Eigen/Eigen'
warning: no previously-included files found matching 'third_party/eigen/Eigen/IterativeLinearSolvers'
warning: no previously-included files found matching 'third_party/eigen/Eigen/MetisSupport'
warning: no previously-included files found matching 'third_party/eigen/Eigen/Sparse'
warning: no previously-included files found matching 'third_party/eigen/Eigen/SparseCholesky'
warning: no previously-included files found matching 'third_party/eigen/Eigen/SparseLU'
warning: no previously-included files found matching 'third_party/eigen/Eigen/src/IterativeSolvers/*'
warning: no previously-included files found matching 'third_party/eigen/Eigen/src/OrderingMethods/Amd.h'
warning: no previously-included files found matching 'third_party/eigen/Eigen/src/SparseCholesky/*'
warning: no previously-included files found matching 'third_party/eigen/unsupported/test/mpreal/mpreal.h'
warning: no previously-included files found matching 'third_party/eigen/unsupported/Eigen/FFT'
warning: no previously-included files found matching 'third_party/eigen/unsupported/Eigen/MPRealSupport'
warning: no previously-included files found matching 'third_party/eigen/doc/PreprocessorDirectives.dox'
warning: no previously-included files found matching 'third_party/eigen/doc/UsingIntelMKL.dox'
warning: no previously-included files found matching 'third_party/eigen/doc/SparseLinearSystems.dox'
warning: no previously-included files found matching 'third_party/eigen/COPYING.GPL'
warning: no previously-included files found matching 'third_party/eigen/COPYING.LGPL'
warning: no previously-included files found matching 'third_party/eigen/COPYING.README'
adding license file 'LICENSE'
writing manifest file 'horovod.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/horovod
copying horovod/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod
creating build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/mpi_run.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/launch.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/task_fn.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/gloo_run.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/js_run.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
copying horovod/runner/run_task.py -> build/lib.linux-x86_64-cpython-310/horovod/runner
creating build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/gradient_aggregation.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/sync_batch_norm.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/functions.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/compression.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/util.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/gradient_aggregation_eager.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
copying horovod/tensorflow/mpi_ops.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow
creating build/lib.linux-x86_64-cpython-310/horovod/mxnet
copying horovod/mxnet/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/mxnet
copying horovod/mxnet/functions.py -> build/lib.linux-x86_64-cpython-310/horovod/mxnet
copying horovod/mxnet/compression.py -> build/lib.linux-x86_64-cpython-310/horovod/mxnet
copying horovod/mxnet/mpi_ops.py -> build/lib.linux-x86_64-cpython-310/horovod/mxnet
creating build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/exceptions.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/util.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/process_sets.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/basics.py -> build/lib.linux-x86_64-cpython-310/horovod/common
copying horovod/common/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/common
creating build/lib.linux-x86_64-cpython-310/horovod/data
copying horovod/data/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/data
copying horovod/data/data_loader_base.py -> build/lib.linux-x86_64-cpython-310/horovod/data
creating build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/sync_batch_norm.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/functions.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/compression.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/optimizer.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
copying horovod/torch/mpi_ops.py -> build/lib.linux-x86_64-cpython-310/horovod/torch
creating build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/utils.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/elastic_v2.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/strategy.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/ray_logger.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/runner.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/adapter.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/driver_service.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/worker.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
copying horovod/ray/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/ray
creating build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/mpi_run.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/conf.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/gloo_run.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
copying horovod/spark/runner.py -> build/lib.linux-x86_64-cpython-310/horovod/spark
creating build/lib.linux-x86_64-cpython-310/horovod/_keras
copying horovod/_keras/callbacks.py -> build/lib.linux-x86_64-cpython-310/horovod/_keras
copying horovod/_keras/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/_keras
copying horovod/_keras/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/_keras
creating build/lib.linux-x86_64-cpython-310/horovod/keras
copying horovod/keras/callbacks.py -> build/lib.linux-x86_64-cpython-310/horovod/keras
copying horovod/keras/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/keras
copying horovod/keras/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/keras
creating build/lib.linux-x86_64-cpython-310/horovod/runner/common
copying horovod/runner/common/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common
creating build/lib.linux-x86_64-cpython-310/horovod/runner/driver
copying horovod/runner/driver/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/driver
copying horovod/runner/driver/driver_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/driver
creating build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/registration.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/discovery.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/driver.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/settings.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/constants.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/rendezvous.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
copying horovod/runner/elastic/worker.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/elastic
creating build/lib.linux-x86_64-cpython-310/horovod/runner/http
copying horovod/runner/http/http_server.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/http
copying horovod/runner/http/http_client.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/http
copying horovod/runner/http/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/http
creating build/lib.linux-x86_64-cpython-310/horovod/runner/task
copying horovod/runner/task/task_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/task
copying horovod/runner/task/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/task
creating build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/cache.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/streams.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/remote.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/threads.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/lsf.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
copying horovod/runner/util/network.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/util
creating build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
copying horovod/runner/common/service/task_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
copying horovod/runner/common/service/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
copying horovod/runner/common/service/compute_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
copying horovod/runner/common/service/driver_service.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/service
creating build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/timeout.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/config_parser.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/tiny_shell_exec.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/host_hash.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/settings.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/codec.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/env.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/safe_shell_exec.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/network.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/hosts.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
copying horovod/runner/common/util/secret.py -> build/lib.linux-x86_64-cpython-310/horovod/runner/common/util
creating build/lib.linux-x86_64-cpython-310/horovod/tensorflow/data
copying horovod/tensorflow/data/compute_worker.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/data
copying horovod/tensorflow/data/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/data
copying horovod/tensorflow/data/compute_service.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/data
creating build/lib.linux-x86_64-cpython-310/horovod/tensorflow/keras
copying horovod/tensorflow/keras/callbacks.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/keras
copying horovod/tensorflow/keras/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/keras
copying horovod/tensorflow/keras/elastic.py -> build/lib.linux-x86_64-cpython-310/horovod/tensorflow/keras
creating build/lib.linux-x86_64-cpython-310/horovod/torch/elastic
copying horovod/torch/elastic/state.py -> build/lib.linux-x86_64-cpython-310/horovod/torch/elastic
copying horovod/torch/elastic/sampler.py -> build/lib.linux-x86_64-cpython-310/horovod/torch/elastic
copying horovod/torch/elastic/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/torch/elastic
creating build/lib.linux-x86_64-cpython-310/horovod/spark/data_loaders
copying horovod/spark/data_loaders/pytorch_data_loaders.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/data_loaders
copying horovod/spark/data_loaders/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/data_loaders
creating build/lib.linux-x86_64-cpython-310/horovod/spark/tensorflow
copying horovod/spark/tensorflow/compute_worker.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/tensorflow
copying horovod/spark/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/tensorflow
creating build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/cache.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/_namedtuple_fix.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/params.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/datamodule.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/serialization.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/backend.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/store.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/constants.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/util.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
copying horovod/spark/common/estimator.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/common
creating build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/host_discovery.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/job_id.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/mpirun_rsh.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/rendezvous.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/driver_service.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
copying horovod/spark/driver/rsh.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/driver
creating build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/remote.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/datamodule.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/util.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
copying horovod/spark/torch/estimator.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/torch
creating build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/gloo_exec_fn.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/task_info.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/mpirun_exec_fn.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/task_service.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
copying horovod/spark/task/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/task
creating build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/remote.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/datamodule.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/util.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/estimator.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
copying horovod/spark/lightning/legacy.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/lightning
creating build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/__init__.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/remote.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/datamodule.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/tensorflow.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/util.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/optimizer.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/bare.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
copying horovod/spark/keras/estimator.py -> build/lib.linux-x86_64-cpython-310/horovod/spark/keras
running build_ext
Running CMake in build/temp.linux-x86_64-cpython-310/RelWithDebInfo:
cmake /home/leo/horovod-0.28.1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/home/leo/horovod-0.28.1/build/lib.linux-x86_64-cpython-310 -DPYTHON_EXECUTABLE:FILEPATH=/home/leo/miniconda3/bin/python
cmake --build . --config RelWithDebInfo -- -j8 VERBOSE=1
-- Could not find CCache. Consider installing CCache to speed up compilation.
-- The CXX compiler identification is GNU 11.2.0
-- Check for working CXX compiler: /home/leo/miniconda3/bin/x86_64-conda-linux-gnu-c++
-- Check for working CXX compiler: /home/leo/miniconda3/bin/x86_64-conda-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build architecture flags: -mf16c -mavx -mfma
-- Using command /home/leo/miniconda3/bin/python
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_WORKS)
-- Could NOT find MPI (missing: MPI_CXX_FOUND)
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- Looking for a CUDA host compiler - /home/leo/miniconda3/bin/x86_64-conda-linux-gnu-c++
-- The CUDA compiler identification is unknown
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- broken
CMake Error at /usr/share/cmake-3.16/Modules/CMakeTestCUDACompiler.cmake:46 (message):
  The CUDA compiler

    "/usr/local/cuda/bin/nvcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/make cmTC_0ffd3/fast && /usr/bin/make -f CMakeFiles/cmTC_0ffd3.dir/build.make CMakeFiles/cmTC_0ffd3.dir/build
    make[1]: Entering directory '/home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeTmp'
    Building CUDA object CMakeFiles/cmTC_0ffd3.dir/main.cu.o
    /usr/local/cuda/bin/nvcc -ccbin=/home/leo/miniconda3/bin/x86_64-conda-linux-gnu-c++    -x cu -c /home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_0ffd3.dir/main.cu.o
    Linking CUDA executable cmTC_0ffd3
    /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_0ffd3.dir/link.txt --verbose=1
    "" -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/leo/miniconda3/lib -Wl,-rpath-link,/home/leo/miniconda3/lib -L/home/leo/miniconda3/lib  CMakeFiles/cmTC_0ffd3.dir/main.cu.o -o cmTC_0ffd3
    Error running link command: No such file or directory
    make[1]: *** [CMakeFiles/cmTC_0ffd3.dir/build.make:87: cmTC_0ffd3] Error 2
    make[1]: Leaving directory '/home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeTmp'
    make: *** [Makefile:121: cmTC_0ffd3/fast] Error 2





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:177 (enable_language)


-- Configuring incomplete, errors occurred!
See also "/home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeOutput.log".
See also "/home/leo/horovod-0.28.1/build/temp.linux-x86_64-cpython-310/RelWithDebInfo/CMakeFiles/CMakeError.log".
Traceback (most recent call last):
  File "/home/leo/horovod-0.28.1/setup.py", line 213, in <module>
    setup(name='horovod',
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
    return distutils.core.setup(**attrs)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
    return run_commands(dist)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
    dist.run_commands()
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
    super().run_command(command)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/install.py", line 87, in run
    self.do_egg_install()
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/install.py", line 139, in do_egg_install
    self.run_command('bdist_egg')
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
    super().run_command(command)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 167, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 153, in call_command
    self.run_command(cmdname)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
    super().run_command(command)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/command/install_lib.py", line 110, in build
    self.run_command('build_ext')
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
    super().run_command(command)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 91, in run
    _build_ext.run(self)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
    self.build_extensions()
  File "/home/leo/horovod-0.28.1/setup.py", line 145, in build_extensions
    subprocess.check_call(command, cwd=cmake_build_dir)
  File "/home/leo/miniconda3/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/home/leo/horovod-0.28.1', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/home/leo/horovod-0.28.1/build/lib.linux-x86_64-cpython-310', '-DPYTHON_EXECUTABLE:FILEPATH=/home/leo/miniconda3/bin/python']' returned non-zero exit status 1.

换种思路解决问题,执行下面的代码

pip install build
python -m build

此时发现出现horovod-0.28.1-cp38-cp38-linux_x86_64.whl。问题迎来了曙光,接下来关键问题怎么安装.whl呢?

首先找到自己whl文件所在的文件夹下,应该在dist文件夹里,然后进入,执行下面的程序

conda activate 你的虚拟环境 #一定要确保自己的虚拟环境处于激活状态
python -m pip install horovod-0.28.1-cp38-cp38-linux_x86_64.whl

大功告成,安装成功,接下来可以简单验证下是否安装成功。

python
import horovod

如果没报错,说明安装成功了!!!

以上文章仅代表个人观点,不保证按照我的总结流程一定成功!!!随机应变!!!终于可以开始继续弄论文,写程序!!!

C:\Users\Lenovo>pip install astra-toolbox Looking in indexes: https://mirrors.sustech.edu.cn/pypi/simple/ Collecting astra-toolbox Using cached https://mirrors.sustech.edu.cn/pypi/packages/f8/c2/a6a53a7aef3c08f93fa3efe427dc15a3b379cce5d8ec6f2b80f63e98845d/astra-toolbox-1.8b5.tar.gz (452 kB) Preparing metadata (setup.py) ... done Requirement already satisfied: numpy in d:\anaconda3\envs\ctimage\lib\site-packages (from astra-toolbox) (2.0.2) Requirement already satisfied: six in d:\anaconda3\envs\ctimage\lib\site-packages (from astra-toolbox) (1.17.0) Collecting scipy (from astra-toolbox) Using cached https://mirrors.sustech.edu.cn/pypi/packages/3e/77/dab54fe647a08ee4253963bcd8f9cf17509c8ca64d6335141422fe2e2114/scipy-1.13.1-cp39-cp39-win_amd64.whl (46.2 MB) Collecting cython (from astra-toolbox) Using cached https://mirrors.sustech.edu.cn/pypi/packages/6d/c8/66c63e895f6f2fea8e769c3a56ade2140403904fc63ab2e3e5b6443a5e8f/Cython-3.0.12-cp39-cp39-win_amd64.whl (2.8 MB) Building wheels for collected packages: astra-toolbox Building wheel for astra-toolbox (setup.py) ... error error: subprocess-exited-with-error × python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [12 lines of output] Traceback (most recent call last): File "<string>", line 2, in <module> File "<pip-setuptools-caller>", line 34, in <module> File "C:\Users\Lenovo\AppData\Local\Temp\pip-install-s84ervbh\astra-toolbox_8d0413dcee024a51b783bba1086333a4\setup.py", line 76, in <module> subprocess.call(['./autogen.sh']) File "D:\anaconda3\envs\ctimage\lib\subprocess.py", line 349, in call with Popen(*popenargs, **kwargs) as p: File "D:\anaconda3\envs\ctimage\lib\subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "D:\anaconda3\envs\ctimage\lib\subprocess.py", line 1436, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(exec
03-26
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值