多显卡导致的报错

Traceback (most recent call last):
  File "train.py", line 450, in <module>
    fit_one_epoch(model, train_util, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, UnFreeze_Epoch, Cuda, fp16, scaler, save_period, save_dir)
  File "/home/master1/Ernie/two-stage copy/utils/utils_fit.py", line 27, in fit_one_epoch
    rpn_loc, rpn_cls, roi_loc, roi_cls, total = train_util.train_step(images, boxes, labels, 1, fp16, scaler)
  File "/home/master1/Ernie/two-stage copy/nets/frcnn_training.py", line 321, in train_step
    losses = self.forward(imgs, bboxes, labels, scale)
  File "/home/master1/Ernie/two-stage copy/nets/frcnn_training.py", line 243, in forward
    base_feature = self.model_train(imgs, mode = 'extractor')
  File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
TypeError: Caught TypeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'x'

重点是:TypeError: Caught TypeError in replica 1 on device 1.

这一句。

这个问题主要是因为使用了多个显卡,导致在某阶段,模型和数据不在同一个卡,后面同时报了:

TypeError: forward() missing 1 required positional argument: 'x'

没有数据的错误

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值