1 问题描述
在训练Bert-VITS2模型时,系统报错,错误信息如下:
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f938874c4d7 in /root/anaconda3/envs/vits2/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f938871636b in /root/anaconda3/envs/vits2/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f93887f0fa8 in /root/anaconda3/envs/vits2/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0xdf9cde (0x7f9389615cde in /root/anaconda3/envs/vits2/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x4cce56 (0x7f93c7c8fe56 in /root/anaconda3/envs/vits2/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x3ee77 (0x7f9388731e77 in /root/anaconda3/envs/vits2/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #6: c10::TensorImpl::~TensorImpl() + 0x1be (0x7f938872a69e in /root/anaconda3/envs/vits2/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f938872a7b9 in /root/anaconda3/envs/vits2/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #8: <unknown function> + 0x751fb8 (0x7f93c7f14fb8 in /roo

文章详述了在训练Bert-VITS2模型时遇到CUDA内存访问错误,通过升级torch版本和调整代码策略解决问题,同时展示了Conda在项目管理中的应用。
最低0.47元/天 解锁文章
656

被折叠的 条评论
为什么被折叠?



