Epoch 1/100: 0%| | 0/1763 [00:00<?, ?it/s<class ‘dict’>]D:\dyh\yolov8_back\yolov8\lib\site-packages\torch\autograd_init_.py:200: UserWarning: Error detected in ConvolutionBackward0. Traceback of forward call that caused the error:
File “D:\dyh\ultralytics\unet-pytorch-main\train_mda.py”, line 500, in
fit_one_epoch(model_train, model, loss_history, eval_callback, optimizer, epoch,
File “D:\dyh\ultralytics\unet-pytorch-main\utils\utils_fit.py”, line 39, in fit_one_epoch
outputs = model_train(imgs)
File “D:\dyh\yolov8_back\yolov8\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\dyh\yolov8_back\yolov8\lib\site-packages\torch\nn\parallel\data_parallel.py”, line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File “D:\dyh\yolov8_back\yolov8\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\dyh\ultralytics\unet-pytorch-main\nets\unet_mda.py”, line 113, in forward
up3 = self.mda_256_72(up3)
File “D:\dyh\yolov8_back\yolov8\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\dyh\ultralytics\unet-pytorch-main\nets\MultiDilatelocalAttention_block.py”, line 52, in forward
qkv = self.qkv(x).reshape(B, 3, self.num_dilation, C//self.num_dilation, H, W).permute(2, 1, 0, 3, 4, 5)
File “D:\dyh\yolov8_back\yolov8\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\dyh\yolov8_back\yolov8\lib\site-packages\torch\nn\modules\conv.py”, line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File “D:\dyh\yolov8_back\yolov8\lib\site-packages\torch\nn\modules\conv.py”, line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
(Triggered internally at C:\cb\pytorch_1000000000000\work\torch\csrc\autograd\python_anomaly_mode.cpp:119.)
Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File “D:\dyh\ultralytics\unet-pytorch-main\train_mda.py”, line 500, in
fit_one_epoch(model_train, model, loss_history, eval_callback, optimizer, epoch,
File “D:\dyh\ultralytics\unet-pytorch-main\utils\utils_fit.py”, line 58, in fit_one_epoch
loss.backward()
File “D:\dyh\yolov8_back\yolov8\lib\site-packages\torch_tensor.py”, line 487, in backward
torch.autograd.backward(
File "D:\dyh\yolov8_back\yolov8\lib\site-packages\torch\autograd_init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [6, 72, 64, 64]], which is output 0 of AsStridedBackward0, is at version 3; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Epoch 1/100: 0%| | 0/1763 [00:27<?, ?it/s<class ‘dict’>]
具体问题就是有一个变量需要计算梯度,但是还没计算就被修改了
根据报错提示在一开始设置torch.autograd.set_detect_anomaly(True)
最后将代码定位到一个卷积操作,很普通的cond2d卷积, self.qkv = nn.Conv2d(dim, dim * 3, 1, bias=qkv_bias)
根据gpt的分析,需要将变量进行克隆,防止变量被修改,
以下是改了可以成功运行的代码,将变量xa作为x的克隆进行后续操作,gpt提示的x = x.clone()还是会报错,改个变量名就可以运行了
xa = x.clone()
qkv_temp1 = self.qkv(xa)
qkv_temp1 = qkv_temp1.reshape(B, 3, self.num_dilation, C//self.num_dilation, H, W)
qkv_temp2 = qkv_temp1.permute(2, 1, 0, 3, 4, 5)
#num_dilation,3,B,C//num_dilation,H,W
x = x.reshape(B, self.num_dilation, C//self.num_dilation, H, W).permute(1, 0, 3, 4, 2 )
后话
改完之后,发现下边还有一行代码 x = x.reshape(B, self.num_dilation, C//self.num_dilation, H, W).permute(1, 0, 3, 4, 2)
应该就是因为前边的qkv操作将x的值改变了,导致这行代码的梯度没法计算. .reshape和.permute操作好像都是生成一个副本,不会改变原有的特征图,也就回归到了一开始的报错提示了,问题出现在卷积上。