在训练自己的数据集的时候会遇到各种各样的问题,其中pytorch编译就是一个很麻烦的问题。
从github下载的zip文件里面不包含git库,只能在终端使用git clone下载,网络不稳定就会报很多错误。
我在下载的时候有四个地方没有克隆成功:
- 正克隆到 '/home/**/pytorch/third_party/FP16'... fatal: unable to access 'https://github.com/Maratyszcza/FP16.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated. fatal: 无法克隆 'https://github.com/Maratyszcza/FP16.git' 到子模组路径 '/home/**/pytorch/third_party/FP16' Failed to clone 'third_party/FP16'. Retry scheduled
- 正克隆到 '/home/**/pytorch/third_party/FXdiv'... remote: Enumerating objects: 275, done. remote: Counting objects: 100% (53/53), done. remote: Compressing objects: 100% (27/27), done. error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function. fatal: The remote end hung up unexpectedly fatal: 过早的文件结束符(EOF) fatal: index-pack failed fatal: 无法克隆 'https://github.com/Maratyszcza/FXdiv.git' 到子模组路径 '/home/**/pytorch/third_party/FXdiv' Failed to clone 'third_party/FXdiv'. Retry scheduled
- 正克隆到 '/home/***pytorch/third_party/fbgemm'... fatal: unable to access 'https://github.com/pytorch/fbgemm/': GnuTLS recv error (-54): Error in the pull function. fatal: 无法克隆 'https://github.com/pytorch/fbgemm' 到子模组路径 '/home/**/pytorch/third_party/fbgemm' Failed to clone 'third_party/fbgemm'. Retry scheduled
- 正克隆到 '/home/**pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... fatal: unable to access 'https://github.com/libcpr/cpr.git/': Operation timed out after 0 milliseconds with 0 out of 0 bytes received fatal: 无法克隆 'https://github.com/libcpr/cpr.git' 到子模组路径'/hom**/pytorch/thire/d_party/kineto/libkineto/third_party/dynolog/third_party/cpr' Failed to clone 'third_party/cpr'. Retry scheduled
以下是具体步骤:
1.单独克隆失败的子模块:
对于每个失败的子模块,手动克隆并放置到相应的目录中。例如,对于 FP16
子模块:
git clone https://github.com/Maratyszcza/FP16.git third_party/FP16
git clone https://github.com/Maratyszcza/FXdiv.git third_party/FXdiv git clone https://github.com/pytorch/fbgemm.git third_party/fbgemm git clone https://github.com/libcpr/cpr.git third_party/kineto/libkineto/third_party/dynolog/third_party/cpr
- 运行问题:目标路径 'third_party/FP16' 已经存在,并且不是一个空目录。
方法 1:删除目录并重新克隆
-
删除现有目录:
rm -rf FP16
-
重新克隆:
git clone https://github.com/Maratyszcza/FP16.git FP16
其余子模块也是一样的
rm -rf FXdiv
git clone https://github.com/Maratyszcza/FXdiv.git FXdiv
rm -rf fbgemm
git clone https://github.com/pytorch/fbgemm.git fbgemm
rm -rf kineto/libkineto/third_party/dynolog/third_party/cpr
git clone https://github.com/libcpr/cpr.git kineto/libkineto/third_part
方法 2:进入已有目录并继续克隆
-
进入目录:
cd third_party/FP16
-
检查是否已初始化为 git 仓库:
git status
2.检查并更新子模块:
- 在克隆完成后,进入
pytorch
主目录,并更新子模块:
cd /home/**/pytorch
git submodule update --init --recursive
- 如果网络不稳定,增加 Git 的超时时间可能有助于防止连接超时:
git config --global http.lowSpeedLimit 0
git config --global http.lowSpeedTime 999999