Traceback (most recent call last):
File "train.py", line 450, in <module>
fit_one_epoch(model, train_util, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, UnFreeze_Epoch, Cuda, fp16, scaler, save_period, save_dir)
File "/home/master1/Ernie/two-stage copy/utils/utils_fit.py", line 27, in fit_one_epoch
rpn_loc, rpn_cls, roi_loc, roi_cls, total = train_util.train_step(images, boxes, labels, 1, fp16, scaler)
File "/home/master1/Ernie/two-stage copy/nets/frcnn_training.py", line 321, in train_step
losses = self.forward(imgs, bboxes, labels, scale)
File "/home/master1/Ernie/two-stage copy/nets/frcnn_training.py", line 243, in forward
base_feature = self.model_train(imgs, mode = 'extractor')
File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
TypeError: Caught TypeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/master1/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'x'
重点是:TypeError: Caught TypeError in replica 1 on device 1.
这一句。
这个问题主要是因为使用了多个显卡,导致在某阶段,模型和数据不在同一个卡,后面同时报了:
TypeError: forward() missing 1 required positional argument: 'x'
没有数据的错误