深度解析：梯度计算与Loss函数的实战应用-优快云博客

LOSS及其梯度

Typical Loss

Mean Squared Error（MSE）

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vAobZ2Gh-1644422227026)(H:\codes\pytorch\Deep_Learning_PyTorch_note\LOSS及其梯度.assets\image-20220124175140504.png)]$

Derivative

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jj1nJAcq-1644422227033)(H:\codes\pytorch\Deep_Learning_PyTorch_note\LOSS及其梯度.assets\image-20220124175228319.png)]$

autograd.grad

>>> x=torch.ones(1)
>>> w=torch.full([1],2.)
>>> mse=F.mse_loss(x*w,torch.ones(1))
>>> mse
tensor(1.)
>>> torch.autograd.grad(mse,[w])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\lib\site-packages\torch\autograd\__init__.py", line 226, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
>>> w.requires_grad_()
tensor([2.], requires_grad=True)
>>> torch.autograd.grad(mse,[w])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\lib\site-packages\torch\autograd\__init__.py", line 226, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
>>> mse=F.mse_loss(x*w,torch.ones(1))
>>> torch.autograd.grad(mse,[w])
(tensor([2.]),)

loss.backward

>>> x=torch.ones(1)
>>> w=torch.full([1],2.)
>>> mse=F.mse_loss(x*w,torch.ones(1))
>>> mse
tensor(1.)
>>> torch.autograd.grad(mse,[w])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\lib\site-packages\torch\autograd\__init__.py", line 226, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
>>> w.requires_grad_()
tensor([2.], requires_grad=True)
>>> torch.autograd.grad(mse,[w])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\lib\site-packages\torch\autograd\__init__.py", line 226, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
>>> mse=F.mse_loss(x*w,torch.ones(1))
>>> mse.backward()
>>> w.grad
tensor([2.])

Softmax

>>> a=torch.rand(3)
>>> a.requires_grad_()
tensor([0.7876, 0.1037, 0.9175], requires_grad=True)
>>> p=F.softmax(a,dim=0)
>>> p.backward()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\lib\site-packages\torch\_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "D:\Python\lib\site-packages\torch\autograd\__init__.py", line 143, in backward
    grad_tensors_ = _make_grads(tensors, grad_tensors_)
  File "D:\Python\lib\site-packages\torch\autograd\__init__.py", line 50, in _make_grads
    raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs
>>> p=F.softmax(a,dim=0)
>>> torch.autograd.grad(p[1],[a],retain_graph=True)
(tensor([-0.0722,  0.1545, -0.0822]),)
>>> torch.autograd.grad(p[2],[a])
(tensor([-0.1630, -0.0822,  0.2452]),)