服务器:Ubuntu18,
环境:python3.6,pytorch
报错信息
/opt/conda/conda-bld/pytorch_1591914838379/work/aten/src/ATen/native/cuda/Loss.cu:97: operator(): block: [0,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed.
Traceback (most recent call last):
File “trainning.py”, line 126, in
loss, outputs = model(imgs, targets)
File “/.conda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/models.py”, line 262, in forward
x, layer_loss = module[0](x, targets, img_dim)
File /.conda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File /models.py", line 200, in forward
loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])
RuntimeError: copy_if failed to synchroniz
Loss.cu:97,RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggere
最新推荐文章于 2024-03-22 18:05:04 发布
在Ubuntu18环境下,使用Python3.6和PyTorch进行深度学习训练时,遇到CUDA错误:'Loss.cu:97','cudaErrorAssert: device-side assert triggered'。错误出现在'aten/src/ATen/native/cuda/Loss.cu'文件的第97行。训练过程中loss曾出现nan,尝试通过降低学习率和调整optimizer.step()与optimizer.zero_grad()的迭代方式来解决问题。

最低0.47元/天 解锁文章
2187

被折叠的 条评论
为什么被折叠?



