【小白踩坑】tensorboardX --Did you forget call .eval() on your model?

文章讲述了在使用tensorboardX的SummaryWriter记录神经网络训练过程时遇到的非确定性节点警告,指出忘记在模型上调用.eval()可能导致的问题,并强调add_graph应在训练结束后而非每个epoch迭代中使用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

使用tensorboardX 的SummaryWriter进行writer保存网络过程中遇到错误。

报错信息

TracerWarning: Trace had nondeterministic nodes. Did you forget call .eval() on your model? Nodes:
	%x.3 : Float(16, 16, 500, 1, strides=[8000, 500, 1, 1], requires_grad=1, device=cpu) = aten::dropout(%263, %88, %89) # C:\Users\lenovo\.conda\envs\EEGnet\lib\site-packages\torch\nn\functional.py:1266:0
	%input.17 : Float(16, 4, 16, 500, strides=[32000, 1, 2000, 4], requires_grad=1, device=cpu) = aten::dropout(%266, %133, %134) # C:\Users\lenovo\.conda\envs\EEGnet\lib\site-packages\torch\nn\functional.py:1266:0
	%input.29 : Float(16, 4, 4, 125, strides=[2000, 1, 500, 4], requires_grad=1, device=cpu) = aten::dropout(%270, %186, %187) # C:\Users\lenovo\.conda\envs\EEGnet\lib\site-packages\torch\nn\functional.py:1266:0
This may cause errors in trace checking. To disable trace checking, pass check_trace=False to torch.jit.trace()
  _check_trace(
C:\Users\lenovo\.conda\envs\EEGnet\lib\site-packages\torch\jit\_trace.py:1093: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:
Tensor-likes are not close!

Mismatched elements: 16 / 16 (100.0%)
Greatest absolute difference: 0.05825650691986084 at index (1, 0) (up to 1e-05 allowed)
Greatest relative difference: 0.1222612684738262 at index (15, 0) (up to 1e-05 allowed)
  _check_trace(

提炼出重点:

 Trace had nondeterministic nodes. Did you forget call .eval() on your model? 

解决方法

  1. 排查对应出错代码为:
writer.add_graph(net, (inputs,))
  1. 整体代码:
 writer = SummaryWriter('./Result')
    # 训练 循环
    for epoch in range(200):
        print("\nEpoch ", epoch)

        running_loss = 0.0
        for i in range(len(X_train) // batch_size - 1):
            s = i * batch_size
            e = i * batch_size + batch_size

            inputs = torch.from_numpy(X_train[s:e])
            labels = torch.FloatTensor(np.array([y_train[s:e]]).T * 1.0)

            # wrap them in Variable
            inputs, labels = Variable(inputs), Variable(labels)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()

            optimizer.step()

            running_loss += loss.item()
            # if i == 1:
            #     with SummaryWriter(comment='Net') as w:
            #         w.add_graph(net, (inputs,))

        # 验证
        params = ["acc", "auc", "fmeasure"]
        print(params)
        print("Training Loss ", running_loss)
        train_acc, train_auc, train_fmeasure = evaluate(net, X_train, y_train, params)
        print("Train - ", train_acc, train_auc, train_fmeasure)
        val_acc, val_auc, val_fmeasure = evaluate(net, X_train, y_train, params)
        print("Valvidation - ", val_acc, val_auc, val_fmeasure)
        test_acc, test_auc, test_fmeasure = evaluate(net, X_train, y_train, params)
        print("Test - ", test_acc, test_auc, test_fmeasure)
        # net.eval()
        writer.add_graph(net, (inputs,))
        tags = ["data/train_val_acc", "data/train_val_acc", "data/train_test_acc"]  # 绘图的tags
        writer.add_scalars(tags[1], {'trainACC': train_acc, 'valACC': val_acc, 'testACC':test_acc}, epoch)  # 绘制train和val的accuracy,放在一个图上
        writer.add_scalar(tags[2], optimizer.param_groups[0]["lr"], epoch)  # 绘制学习率曲线,放在一个图上
  1. 分析发现,问题在于add_graph(net, (inputs,))应在训练完成后,而不应该再epoch迭代中就保存。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值