深度学习训练中cost突然出现NaN

在深度学习训练过程中,作者遇到了cost突然变为NaN的问题。通过检查数据、降低学习率、检查梯度并尝试使用Clip Gradient等方法,问题依然存在。通过NanGuardMode发现可能是梯度过大导致的。最终,发现一个逻辑错误是根源,即在softmax操作中未正确处理Mask导致除0问题。解决方案是先对输入进行缩放,避免能量值过大,然后结合Mask进行softmax计算,防止出现概率为0的情况,从而避免了NaN的产生。

问题:在深度学习训练中,之前的cost是正常的,突然在某一个batch训练中出现Nan。

网络搜索的资料:

1. How to avoid that Theano computing gradient going toward NaN https://stackoverflow.com/questions/40405334/how-to-avoid-that-theano-computing-gradient-going-toward-nan

2. 训练深度学习网络时候,出现Nan是什么原因,怎么才能避免? https://www.zhihu.com/question/49346370

3. Theano调试技巧 https://zhuanlan.zhihu.com/p/24857032


其实1中的说法挺好的:

--- Debugging cls_cost (before final sum) --- Contains NaN: True Contains inf: False Min value: nan, Max value: nan Warning: cls_cost contains invalid numeric entries! Showing a slice: tensor([[nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan]], device='cuda:0') --- Debugging reg_cost (before final sum) --- Contains NaN: True Contains inf: False Min value: nan, Max value: nan Warning: reg_cost contains invalid numeric entries! Showing a slice: tensor([[nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan]], device='cuda:0') --- Debugging iou_cost (before final sum) --- Contains NaN: True Contains inf: False Min value: nan, Max value: nan Warning: iou_cost contains invalid numeric entries! Showing a slice: tensor([[nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan]], device='cuda:0') --- Debugging cost matrix in hungarian_assigner.py --- Cost matrix shape: (200, 61) Contains NaN: True Contains inf (positive): False Contains inf (negative): False Cost matrix min value: nan Cost matrix max value: nan Warning: Cost matrix contains invalid numeric entries! Showing a slice: [[nan nan nan nan nan] [nan nan nan nan nan] [nan nan nan nan nan] [nan nan nan nan nan] [nan nan nan nan nan]] Traceback (most recent call last): 这样应该怎么解决呢?
最新发布
05-26
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值