Traceback (most recent call last):
File "/root/.conda/envs/LLm/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/root/.conda/envs/LLm/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/root/.conda/envs/LLm/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1153, in launch_command
deepspeed_launcher(args)
File "/root/.conda/envs/LLm/lib/python3.10/site-packages/accelerate/commands/launch.py", line 846, in deepspeed_launcher
distrib_run.run(args)
File "/root/.conda/envs/LLm/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/root/.conda/envs/LLm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/.conda/envs/LLm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
mini_qwen_pt.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-02-05_14:36:22
host : intern-studio-153939
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 1083)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
BUG--torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
最新推荐文章于 2025-04-07 23:39:38 发布