Ultralytics开源的YOLOv8训练模型的时候——使用如下命令,双GPU部署训练
yolo train data=D:/YOLO_V8/ultralytics-main/ultralytics-main/ultralytics/cfg/datasets/mydata.yaml model=yolov8n.pt epochs=650 imgsz=640 batch=256 workers=0 patience=200 device=0,1
抛出异常
torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 141
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 340) of binary: /root/miniconda3/envs/llama/bin/python
torch.distributed.elastic.multiprocessing.errors.ChildFailedError
subprocess.CalledProcessError: Command '['D:\\Anaconda\\envs\\YOLO8\\python.exe', '-m', 'torch.distributed.run', '--nproc_per_node', '2', '--master_port', '58127', 'C:\\Users\\amax\\AppData\\Roaming\\Ultralytics\\DDP\\_temp_8gd8 22v32514268826352.py']' returned non-zero exit status 1.<

文章讨论了在训练YOLOv8模型时,双GPU部署遇到的torch.distributed异常,涉及进程管理、环境配置和重启策略。
最低0.47元/天 解锁文章
298





