多卡训练报错 RuntimeError: The server socket has failed to listen on any local network address. The server socket
问题描述


RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already

文章讲述了在进行多卡训练时遇到的RuntimeError,原因在于两张GPU卡试图绑定到同一本地网络端口导致冲突。解决方法是调整第二张卡的端口号或采用单设备训练(DP模式)。
最低0.47元/天 解锁文章
1096

被折叠的 条评论
为什么被折叠?



