win10+cuda10+tensorflow+tensorflow -gpu 安装教程,解决cuda10国内下载文件大小只有42b

本文详细介绍了一位初学者如何从零开始,通过阅读《深度学习之TensorFlow:入门、原理与进阶实战》一书,逐步掌握深度学习基础知识,并在个人电脑上搭建深度学习环境的过程。文章涵盖了Anaconda、TensorFlow、CUDA10.0、CUDNN等关键软件的安装与配置步骤。
部署运行你感兴趣的模型镜像

背景简介

作为一个对深度学习极其富有好奇心,但一直不得入门的小白来说,前几天终于看到了 《深度学习之TensorFlow:入门、原理与进阶实战 》这本书,通过这本书算是终于看懂了一些,于是准备用自己电脑实地操练一番。

需要安装的软件

此处假设当前机器是全新机器!!!

  • Anaconda
  • Tensorflow && Tensorflow-gpu (由于书籍上面的示例多数是v1版本的,所以此处安装 v1.15)
  • CUDA 10.0
  • CUDNN

Anaconda

下载地址:https://repo.anaconda.com/archive/

目前最新版本为:https://repo.anaconda.com/archive/Anaconda3-2020.07-Windows-x86_64.exe

Anaconda 安装就比较傻瓜式了,下一步,下一步,完成就好
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
这里由于我不准备随便开个 cmd 就能用,所以就不选 ADD PATH 了.
在这里插入图片描述

CUDA 10.0

CUDA是什么呢,CUDA 的官方文档是这么介绍CUDA的:

a general purpose parallel computing platform and programming model that leverages the parallel compute engine in NVIDIA GPUs to solve many complex computational problems in a more efficient way than on a CPU.

换句话说CUDANVIDIA推出的用于自家GPU的并行计算框架,也就是说CUDA只能在NVIDIAGPU上运行,而且只有当要解决的计算问题是可以大量并行计算的时候才能发挥CUDA的作用。

下载地址:https://developer.nvidia.com/cuda-downloads

这里选择安装 10.0 版本,从官网上看到最新版本的 tensorflow 支持 10.1 了,但是写了 v2 所以比较担心 v1 不支持所以就不作死安装 10.1了。

  1. 网站默认下载版本是最新版本,由于我们安装的版本是10.0 所以需要先点击 Legacy Rekeases 按钮,下载之前的版本。
    在这里插入图片描述
  2. 然后选择我们需要再在的版本
    在这里插入图片描述
  3. 选择对应的平台开始下载,此处不能直接下载,需要在下载按钮右键复制链接地址然后借助迅雷下载。
    在这里插入图片描述
  4. 打开迅雷,新建下载任务,等待下载完成即可。
    在这里插入图片描述

CUDNN

下载地址:https://developer.nvidia.com/rdp/cudnn-download

CUDNN 可以理解为 CUDA 的一个补丁,用来加速深度学习的一些运算的,特地针对深度学习进行优化了,但是下载需要登陆,但是网上说不安装也能继续使用,具体安装不安装就看自己了,我这边注册了个账号并下载安装了。

Tensorflow && Tensorflow-gpu

使用 Anaconda 安装 Tensorflow 就很简单了。

  1. 首先打开控制台。
    在这里插入图片描述
  2. 依次执行下面的命令即可,网上安装出错很多都是由于网络原因,或者python版本不对导致的。
# 新建环境
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/  
conda config --set show_channel_urls yes
conda create -n tensorflow python=3.7
conda activate tensorflow

# 安装 tensorflow
pip install -i https://mirrors.aliyun.com/pypi/simple/ tensorflow==1.15
pip install -i https://mirrors.aliyun.com/pypi/simple/ tensorflow-gpu==1.15

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch 2.5

PyTorch
Cuda

PyTorch 是一个开源的 Python 机器学习库,基于 Torch 库,底层由 C++ 实现,应用于人工智能领域,如计算机视觉和自然语言处理

自编译tensorflow: 1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.无mkl支持; 软硬件硬件环境:Ubuntu16.04,GeForce GTX 1080 TI 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]://home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 编译: bazel build --config=opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己编译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
PS C:\Users\24904> pip install tensorflow[and-cuda] Requirement already satisfied: tensorflow[and-cuda] in c:\users\24904\anaconda3\lib\site-packages (2.19.0) Requirement already satisfied: absl-py>=1.0.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (2.3.0) Requirement already satisfied: astunparse>=1.6.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (1.6.3) Requirement already satisfied: flatbuffers>=24.3.25 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (25.2.10) Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (0.6.0) Requirement already satisfied: google-pasta>=0.1.1 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (0.2.0) Requirement already satisfied: libclang>=13.0.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (18.1.1) Requirement already satisfied: opt-einsum>=2.3.2 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (3.4.0) Requirement already satisfied: packaging in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (24.1) Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (4.25.3) Requirement already satisfied: requests<3,>=2.21.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (2.32.3) Requirement already satisfied: setuptools in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (75.1.0) Requirement already satisfied: six>=1.12.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (1.16.0) Requirement already satisfied: termcolor>=1.1.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (3.1.0) Requirement already satisfied: typing-extensions>=3.6.6 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (4.11.0) Requirement already satisfied: wrapt>=1.11.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (1.14.1) Requirement already satisfied: grpcio<2.0,>=1.24.3 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (1.73.0) Requirement already satisfied: tensorboard~=2.19.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (2.19.0) Requirement already satisfied: keras>=3.5.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (3.10.0) Requirement already satisfied: numpy<2.2.0,>=1.26.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (1.26.4) Requirement already satisfied: h5py>=3.11.0 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (3.11.0) Requirement already satisfied: ml-dtypes<1.0.0,>=0.5.1 in c:\users\24904\anaconda3\lib\site-packages (from tensorflow[and-cuda]) (0.5.1) Collecting nvidia-cublas-cu12==12.5.3.2 (from tensorflow[and-cuda]) WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)")': /packages/50/18/d32d9ea61a6df42b3887b4a8acb6edcf81bd2cf787c8dbf759f1e63f1414/nvidia_cublas_cu12-12.5.3.2-py3-none-win_amd64.whl.metadata Downloading nvidia_cublas_cu12-12.5.3.2-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-cuda-cupti-cu12==12.5.82 (from tensorflow[and-cuda]) Downloading nvidia_cuda_cupti_cu12-12.5.82-py3-none-win_amd64.whl.metadata (1.6 kB) Collecting nvidia-cuda-nvcc-cu12==12.5.82 (from tensorflow[and-cuda]) Downloading nvidia_cuda_nvcc_cu12-12.5.82-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-cuda-nvrtc-cu12==12.5.82 (from tensorflow[and-cuda]) Downloading nvidia_cuda_nvrtc_cu12-12.5.82-py3-none-win_amd64.whl.metadata (1.5 kB) WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/nvidia-cuda-runtime-cu12/ Collecting nvidia-cuda-runtime-cu12==12.5.82 (from tensorflow[and-cuda]) Downloading nvidia_cuda_runtime_cu12-12.5.82-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-cudnn-cu12==9.3.0.75 (from tensorflow[and-cuda]) Downloading nvidia_cudnn_cu12-9.3.0.75-py3-none-win_amd64.whl.metadata (1.6 kB) Collecting nvidia-cufft-cu12==11.2.3.61 (from tensorflow[and-cuda]) Downloading nvidia_cufft_cu12-11.2.3.61-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-curand-cu12==10.3.6.82 (from tensorflow[and-cuda]) Downloading nvidia_curand_cu12-10.3.6.82-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-cusolver-cu12==11.6.3.83 (from tensorflow[and-cuda]) Downloading nvidia_cusolver_cu12-11.6.3.83-py3-none-win_amd64.whl.metadata (1.6 kB) Collecting nvidia-cusparse-cu12==12.5.1.3 (from tensorflow[and-cuda]) Downloading nvidia_cusparse_cu12-12.5.1.3-py3-none-win_amd64.whl.metadata (1.6 kB) INFO: pip is looking at multiple versions of tensorflow[and-cuda] to determine which version is compatible with other requirements. This could take a while. Collecting tensorflow[and-cuda] Downloading tensorflow-2.18.1-cp312-cp312-win_amd64.whl.metadata (4.1 kB) Collecting tensorboard<2.19,>=2.18 (from tensorflow[and-cuda]) Downloading tensorboard-2.18.0-py3-none-any.whl.metadata (1.6 kB) Collecting tensorflow[and-cuda] Downloading tensorflow-2.18.0-cp312-cp312-win_amd64.whl.metadata (3.3 kB) Collecting tensorflow-intel==2.18.0 (from tensorflow[and-cuda]) Downloading tensorflow_intel-2.18.0-cp312-cp312-win_amd64.whl.metadata (4.9 kB) Collecting tensorflow[and-cuda] Downloading tensorflow-2.17.1-cp312-cp312-win_amd64.whl.metadata (3.3 kB) Collecting tensorflow-intel==2.17.1 (from tensorflow[and-cuda]) Downloading tensorflow_intel-2.17.1-cp312-cp312-win_amd64.whl.metadata (5.0 kB) Collecting nvidia-cublas-cu12==12.3.4.1 (from tensorflow[and-cuda]) Downloading nvidia_cublas_cu12-12.3.4.1-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-cuda-cupti-cu12==12.3.101 (from tensorflow[and-cuda]) Downloading nvidia_cuda_cupti_cu12-12.3.101-py3-none-win_amd64.whl.metadata (1.6 kB) Collecting nvidia-cuda-nvcc-cu12==12.3.107 (from tensorflow[and-cuda]) Downloading nvidia_cuda_nvcc_cu12-12.3.107-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-cuda-nvrtc-cu12==12.3.107 (from tensorflow[and-cuda]) Downloading nvidia_cuda_nvrtc_cu12-12.3.107-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-cuda-runtime-cu12==12.3.101 (from tensorflow[and-cuda]) Downloading nvidia_cuda_runtime_cu12-12.3.101-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-cudnn-cu12==8.9.7.29 (from tensorflow[and-cuda]) Downloading nvidia_cudnn_cu12-8.9.7.29-py3-none-win_amd64.whl.metadata (1.6 kB) Collecting nvidia-cufft-cu12==11.0.12.1 (from tensorflow[and-cuda]) Downloading nvidia_cufft_cu12-11.0.12.1-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-curand-cu12==10.3.4.107 (from tensorflow[and-cuda]) Downloading nvidia_curand_cu12-10.3.4.107-py3-none-win_amd64.whl.metadata (1.5 kB) Collecting nvidia-cusolver-cu12==11.5.4.101 (from tensorflow[and-cuda]) Downloading nvidia_cusolver_cu12-11.5.4.101-py3-none-win_amd64.whl.metadata (1.6 kB) Collecting nvidia-cusparse-cu12==12.2.0.103 (from tensorflow[and-cuda]) Downloading nvidia_cusparse_cu12-12.2.0.103-py3-none-win_amd64.whl.metadata (1.6 kB) Collecting tensorflow[and-cuda] Downloading tensorflow-2.17.0-cp312-cp312-win_amd64.whl.metadata (3.2 kB) Collecting tensorflow-intel==2.17.0 (from tensorflow[and-cuda]) Downloading tensorflow_intel-2.17.0-cp312-cp312-win_amd64.whl.metadata (5.0 kB) Collecting tensorflow[and-cuda] Downloading tensorflow-2.16.2-cp312-cp312-win_amd64.whl.metadata (3.3 kB) Collecting tensorflow-intel==2.16.2 (from tensorflow[and-cuda]) Downloading tensorflow_intel-2.16.2-cp312-cp312-win_amd64.whl.metadata (5.0 kB) Collecting tensorflow[and-cuda] Downloading tensorflow-2.16.1-cp312-cp312-win_amd64.whl.metadata (3.5 kB) Collecting tensorflow-intel==2.16.1 (from tensorflow[and-cuda]) Downloading tensorflow_intel-2.16.1-cp312-cp312-win_amd64.whl.metadata (5.0 kB) ERROR: Cannot install tensorflow[and-cuda]==2.16.1, tensorflow[and-cuda]==2.16.2, tensorflow[and-cuda]==2.17.0, tensorflow[and-cuda]==2.17.1, tensorflow[and-cuda]==2.18.0, tensorflow[and-cuda]==2.18.1 and tensorflow[and-cuda]==2.19.0 because these package versions have conflicting dependencies. The conflict is caused by: tensorflow[and-cuda] 2.19.0 depends on nvidia-nccl-cu12==2.23.4; extra == "and-cuda" tensorflow[and-cuda] 2.18.1 depends on nvidia-nccl-cu12==2.21.5; extra == "and-cuda" tensorflow[and-cuda] 2.18.0 depends on nvidia-nccl-cu12==2.21.5; extra == "and-cuda" tensorflow[and-cuda] 2.17.1 depends on nvidia-nccl-cu12==2.19.3; extra == "and-cuda" tensorflow[and-cuda] 2.17.0 depends on nvidia-nccl-cu12==2.19.3; extra == "and-cuda" tensorflow[and-cuda] 2.16.2 depends on nvidia-nccl-cu12==2.19.3; extra == "and-cuda" tensorflow[and-cuda] 2.16.1 depends on nvidia-nccl-cu12==2.19.3; extra == "and-cuda" To fix this you could try to: 1. loosen the range of package versions you've specified 2. remove package versions to allow pip to attempt to solve the dependency conflict ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
06-16
autodl和vscode跑开源项目 +-----------------------------------------------AutoDL-----------------------------------------------------+ 目录说明: ╔═════════════════╦════════╦════╦═════════════════════════════════════════════════════════════════════════╗ ║目录 ║名称 ║速度║说明 ║ ╠═════════════════╬════════╬════╬═════════════════════════════════════════════════════════════════════════╣ ║/ ║系 统 盘║一般║实例关机数据不会丢失,可存放代码等。会随保存镜像一起保存。 ║ ║/root/autodl-tmp ║数 据 盘║ 快 ║实例关机数据不会丢失,可存放读写IO要求高的数据。但不会随保存镜像一起保存 ║ ╚═════════════════╩════════╩════╩═════════════════════════════════════════════════════════════════════════╝ CPU :16 核心 内存:120 GB GPU :NVIDIA GeForce RTX 4090, 1 存储: 系 统 盘/ :90% 27G/30G 数 据 盘/root/autodl-tmp:65% 33G/50G +----------------------------------------------------------------------------------------------------------+ *注意: 1.系统盘较小请将大的数据存放于数据盘或文件存储中,重置系统时数据盘和文件存储中的数据不受影响 2.清理系统盘请参考:https://www.autodl.com/docs/qa1/ 3.终端中长期执行命令请使用screen等工具开后台运行,确保程序不受SSH连接中断影响:https://www.autodl.com/docs/daemon/ (base) root@autodl-container-5a0b42b505-fc8eec58:~# conda activate chazhen (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~# /root/miniconda3/envs/chazhen/bin/python /root/w.py --- 使用 Python 版本: 3.10.6 --- --- 1. 核心组件检查 (PyTorch & GPU) --- ✅ PyTorch 导入: [成功] 版本: 1.13.1+cu117 ✅ CUDA 可用性: [成功] - PyTorch 编译所用 CUDA 版本: 11.7 - 检测到的 GPU: NVIDIA GeForce RTX 4090 ✅ cuDNN 可用性: [成功] - cuDNN 版本: 8906 --- 2. 其他依赖库检查 --- ✅ causal-conv1d: [成功] 版本: 1.0.0 ✅ mamba-ssm: [成功] 版本: 1.0.1 ✅ numpy: [成功] 版本: 1.26.4 ✅ scikit-image: [成功] 版本: 0.19.2 ✅ opencv-python: [成功] 版本: 4.12.0 ✅ timm: [成功] 版本: 1.0.17 ✅ tqdm: [成功] 版本: 4.67.1 ✅ tensorboard: [成功] 版本: 2.20.0 --- 检查完毕 --- (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~# cd VFIMamba普通训练 (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# tensorboard --logdir=./log TensorFlow installation not found - running with reduced feature set. NOTE: Using experimental fast data loading logic. To disable, pass "--load_fast=false" and report issues on GitHub. More details: https://github.com/tensorflow/tensorboard/issues/4784 Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all TensorBoard 2.20.0 at http://localhost:6008/ (Press CTRL+C to quit) ^C(chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# /root/miniconda3/envs/chazhen/bin/python /root/VFIMamba普通训练/train.py /root/miniconda3/envs/chazhen/lib/python3.10/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) --- 脚本开始运行 --- Traceback (most recent call last): File "/root/VFIMamba普通训练/train.py", line 204, in <module> local_rank = int(os.environ["LOCAL_RANK"]) File "/root/miniconda3/envs/chazhen/lib/python3.10/os.py", line 679, in __getitem__ raise KeyError(key) from None KeyError: 'LOCAL_RANK' (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# pip install six Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: six in /root/miniconda3/envs/chazhen/lib/python3.10/site-packages (1.17.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# cd VFIMamba普通训练 bash: cd: VFIMamba普通训练: No such file or directory (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# torchrun --nproc_per_node=1 train.py --data_path ../autodl-tmp/vimeo_triplet --batch_size 32world size /root/miniconda3/envs/chazhen/lib/python3.10/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) --- 脚本开始运行 --- usage: train.py [-h] [--batch_size BATCH_SIZE] [--data_path DATA_PATH] train.py: error: argument --batch_size: invalid int value: '32world' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 2594) of binary: /root/miniconda3/envs/chazhen/bin/python3.1 Traceback (most recent call last): File "/root/miniconda3/envs/chazhen/bin/torchrun", line 8, in <module> sys.exit(main()) File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/chazhen/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ train.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2025-07-29_09:24:53 host : autodl-container-5a0b42b505-fc8eec58 rank : 0 (local_rank: 0) exitcode : 2 (pid: 2594) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# /root/miniconda3/envs/chazhen/bin/python /root/VFIMamba普通训练/train.py /root/miniconda3/envs/chazhen/lib/python3.10/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) --- 脚本开始运行 --- Traceback (most recent call last): File "/root/VFIMamba普通训练/train.py", line 204, in <module> local_rank = int(os.environ["LOCAL_RANK"]) File "/root/miniconda3/envs/chazhen/lib/python3.10/os.py", line 679, in __getitem__ raise KeyError(key) from None KeyError: 'LOCAL_RANK' (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# /root/miniconda3/envs/chazhen/bin/python /root/w.py --- 使用 Python 版本: 3.10.6 --- --- 1. 核心组件检查 (PyTorch & GPU) --- ✅ PyTorch 导入: [成功] 版本: 1.13.1+cu117 ✅ CUDA 可用性: [成功] - PyTorch 编译所用 CUDA 版本: 11.7 - 检测到的 GPU: NVIDIA GeForce RTX 4090 ✅ cuDNN 可用性: [成功] - cuDNN 版本: 8906 --- 2. 其他依赖库检查 --- ✅ causal-conv1d: [成功] 版本: 1.0.0 ✅ mamba-ssm: [成功] 版本: 1.0.1 ✅ numpy: [成功] 版本: 1.26.4 ✅ scikit-image: [成功] 版本: 0.19.2 ✅ opencv-python: [成功] 版本: 4.12.0 ✅ timm: [成功] 版本: 1.0.17 ✅ tqdm: [成功] 版本: 4.67.1 ✅ tensorboard: [成功] 版本: 2.20.0 --- 检查完毕 --- (chazhen) root@autodl-container-5a0b42b505-fc8eec58:~/VFIMamba普通训练# 然后呢
07-30
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值