ConnectionError: Tried to launch distributed communication on port `xxxxx`, but another process is u

最新推荐文章于 2025-10-07 21:16:45 发布

原创最新推荐文章于 2025-10-07 21:16:45 发布 · 3.6k 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#transformers #accelerate #多进程

人工智能学习笔记专栏收录该内容

277 篇文章

订阅专栏

诸神缄默不语-个人优快云博文目录

这个bug是在用accelerate跑代码时出现的，完整的报错信息是：

ConnectionError: Tried to launch distributed communication on port `xxxxx`, but another process is utilizing it. Please specify a different port (such as using the `----main_process_port` flag or specifying a different `main_process_port` in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to `0`.

事实上改成0没用，我试后有用的解决方案是改成port这个数字+1

默认config文件的路径是.cache/huggingface/accelerate/default_config.yaml，可以直接改这个，如果担心改这个会影响别的代码，可以新建一个config文件，在最后添加这行：main_process_port: 12023（port号）
然后把命令行添上：accelerate launch --config_file {path/to/config/my_config_file.yaml} {script_name.py} {--arg1} {--arg2} ...

参考资料：Launching your 🤗 Accelerate scripts

这种问题问ChatGPT果然没用啊，还得我自己去搜文档，ChatGPT不行啊。