求大家帮忙看看pytorch写的bp为啥梯度不更新？

最新推荐文章于 2024-05-19 01:06:15 发布

原地打转的风铃

最新推荐文章于 2024-05-19 01:06:15 发布

阅读量1.2k

点赞数

CC 4.0 BY-SA版权

文章标签： pytorch 深度学习 python

本文链接：https://blog.youkuaiyun.com/captainhook2/article/details/122617741

博主使用PyTorch实现BP神经网络时遇到梯度不更新的问题，代码中已确认设备为CPU，但所有参数的grad_fn均为None。博主寻求帮助以解决此问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

pytorch的bp梯度不更新的问题

最近学习pytorch框架，写了个bp做练习。反向传递以后所有参数的grad_fn全部都是None。查一查大家遇到这个问题都是因为device不匹配。可是我只用cpu没用gpu。请大佬帮忙看看哪里出现了问题

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
def load_mnist():
    path = r'./dataset/mnist.npz' #放置mnist.py的目录。注意斜杠
    f = np.load(path)
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']
    f.close()
    return (x_train, y_train), (x_test, y_test)
(x_train, y_train), (x_test, y_test)=load_mnist()
x_train=torch.from_numpy(x_train).float()
x_train=x_train.reshape(-1, 784)
y_train=torch.from_numpy(y_train).float()
x_test=torch.from_numpy(x_test).float()
x_test=x_test.reshape(-1, 784)
y_test=torch.from_numpy(y_test).float()
y_train_onehot=torch.zeros(y_train.shape[0],10)
for i in range(y_train.shape[0]):
    y_train_onehot[i][int(y_train[i])]=1

input_layer=784
hidden_layer1=256
hidden_layer2=128
hidden_layer3=64
output_layer=10
batch_size=25
epoch=10
#搭建三层网络
w1=torch.rand(hidden_layer1,input_layer,requires_grad=True)
b1=torch.rand(hidden_layer1,requires_grad=True)
w2=torch.rand(hidden_layer2,hidden_layer1,requires_grad=True)
b2=torch.rand(hidden_layer2,requires_grad=True)
w3=torch.rand(hidden_layer3,hidden_layer2,requires_grad=True)
b3=torch.rand(hidden_layer3,requires_grad=True)
w4=torch.rand(output_layer,hidden_layer3,requires_grad=True)
b4=torch.rand(output_layer,requires_grad=True)
loss_func = nn.MSELoss()
optimizer = torch.optim.SGD([w1,w2,w3,w4,b1,b2,b3,b4], lr = 0.005)

def forward(input_x):
    x=input_x@(w1.t())+b1
    x=F.relu(x)
    x=x@(w2.t())+b2
    x=F.relu(x)
    x=x@(w3.t())+b3
    x=F.relu(x)
    x=x@(w4.t())+b4
    x=F.softmax(x,dim=1)
    return x

for j in range(epoch):
    for i in range(int(60000/25)):
        y_pred=forward(x_train[i*25:i*25+25])
        loss=loss_func(y_pred,y_train_onehot[i*25:i*25+25])
        #print(loss)
        optimizer.zero_grad()
        loss.backward()
        #print(w4.grad_fn)
        optimizer.step()
        #print(w1[0][0])