[debug] Expected to have finished reduction in the prior iteration before starting a new one.

本文解决了PyTorch训练过程中遇到的一个常见问题:在前向传播时未使用某些模块的所有输出,导致运行时错误。通过确保所有定义的模块输出被利用,或者移除未使用的模块,可以有效避免此错误。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题描述

训练网络时出现错误:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one.

原因分析

参考该issue

I met the same issue.
But i solved it.
The reason is that in my model class, I define a fpn module with 5 level output feature maps in the init function,
but in forward function I only use 4 of them.
When I use all of them, the problem was solved.
This is my supposed conclusion: you should use all output of each module in forward function.

简单来说,就是在网络中定义了module,但没有使用
当然,这个module可以使最简单的网络层,也可以是更复杂的结构

解决方案

找到定义但没有使用的module,要么删掉,要么使用

update 20220716:

If finding the unused params becomes hard, there is a way to find all the pramas unused by check their gradient:

        for name, param in runner.model.named_parameters():
            if param.grad is None:
                print(name)

This code should be added between loss.backward() and optimizer.step()

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值