李沐-动手学深度学习（多层感知机）_李沐动手深度学习的代码有bug-优快云博客

本文链接：https://blog.youkuaiyun.com/m0_68604571/article/details/142741317

今天学李沐老师的动手学深度学习的多层感知机，在学习过程中，代码运行出现了报错：

Traceback (most recent call last):
  File "D:\zmm\pycharm project\pythonProject\study1\gzj1.py", line 28, in <module>
    d2l.train_ch3(net,train_iter,test_iter,loss,num_epochs,updater)
  File "D:\zmm\pycharm project\pythonProject\study1\d2l\torch.py", line 335, in train_ch3
    train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\zmm\pycharm project\pythonProject\study1\d2l\torch.py", line 273, in train_epoch_ch3
    l = loss(y_hat, y)
        ^^^^^^^^^^^^^^
  File "D:\Environment\python\Lib\site-packages\torch\nn\modules\loss.py", line 1183, in __init__
    super().__init__(weight, size_average, reduce, reduction)
  File "D:\Environment\python\Lib\site-packages\torch\nn\modules\loss.py", line 30, in __init__
    super().__init__(size_average, reduce, reduction)
  File "D:\Environment\python\Lib\site-packages\torch\nn\modules\loss.py", line 23, in __init__
    self.reduction: str = _Reduction.legacy_get_string(size_average, reduce)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Environment\python\Lib\site-packages\torch\nn\_reduction.py", line 35, in legacy_get_string
    if size_average and reduce:
       ^^^^^^^^^^^^
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

出现错误有以下两种原因：

一是因为我们的损失过大，我们可以在初始化w的时候不该用标准正态（方差太大），应该用均值为0，方差为0.01的正态分布，这样损失就会降低。

二是因为在nn.CrossEntropyLoss()，参数reduction默认为"mean"，表示对所有样本的loss取均值，最终返回只有一个值；参数reduction取"none"，表示保留每一个样本的loss，这里是描点绘图，自然需要记录每一个样本的loss，所以应将参数reduction设置为"none"。

下面为改正后的代码：

import torch
from torch import nn
from d2l import torch as d2l

batch_size=256
train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size)
num_inputs,num_outputs,num_hiddens=784,10,256

w1=nn.Parameter(torch.normal(mean=0,std=0.01,size=(num_inputs,num_hiddens),requires_grad=True))
b1=nn.Parameter(torch.zeros(num_hiddens,requires_grad=True))
w2=nn.Parameter(torch.normal(mean=0,std=0.01,size=(num_hiddens,num_outputs),requires_grad=True))
b2=nn.Parameter(torch.zeros(num_outputs,requires_grad=True))

params=[w1,b1,w2,b2]
def relu(x):
    a=torch.zeros_like(x)
    return torch.max(x,a)

def net(x):
    x=x.reshape((-1,num_inputs))
    h=relu(x@w1+b1)
    return (h@w2+b2)

loss=nn.CrossEntropyLoss(reduction='none')

num_epochs,lr=10,0.1
updater=torch.optim.SGD(params,lr=lr)
d2l.train_ch3(net,train_iter,test_iter,loss,num_epochs,updater)
d2l.plt.show()

运行结果如图所示：