关于pytorch官网教程中的What is torch.nn really?（二）

原创

已于 2022-12-20 11:46:27 修改 · 542 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#pytorch #深度学习 #python

于 2022-12-17 13:06:31 首次发布

文章目录

原文在这里 What is torch.nn really?
本来并没有打算分篇，只是写着写着发现已经很长了，于是就分了。
接下来就是逐步用 torch.nn中的函数来替换前面手搓的神经网络。

Using `torch.nn.functional`

使用pytorch的nn，对于我们的代码有这么几点好处：更短，更好理解，更灵活。
第一步，使用torch.nn.functional提供的activation和loss函数，来代替我们手写的。

import torch.nn.functional as F

loss_func = F.cross_entropy

def model(xb):
    return xb @ weights + bias

通常torch.nn.functional会import为F，当然这只是个所谓的习惯或者惯例。

先比较一下之前的定义：

def log_softmax(x):
    return x - x.exp().sum(-1).log().unsqueeze(-1)
    
def nll(input, target):
    return -input[range(target.shape[0]), target].mean()

loss_func = nll

def model(xb):
    return log_softmax(xb @ weights + bias)

显然，区别是

使用F.cross_entropy作为loss function，移除了log_softmax和nll的定义。
model不再调用log_softmax。

前面已经提到过在pytorch中nll+log_softmax等价于cross_entropy，详情就看相关文档吧。

然而，调用方式是没变的，仍然还是

print(loss_func(model(xb), yb), accuracy(model(xb), yb))

运行结果也是一样的。当然到这里还没有开始训练。
从这里我们可以验证一下nil，log_softmax和cross_entropy的关系：
新代码中实际上是cross_entropy(xb @ weights + bias)
原本的代码则是nll(log_softmax(xb @ weights + bias))
所以，就是这样了。

Refactor using `nn.Module`

接下来使用 nn.Module 和 nn.Parameter。注意这里的Module是nn中的一个类，是大写的M，不要与python中的module的概念混淆起来。

from torch import nn

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))
        self.bias = nn.Parameter(torch.zeros(10))

    def forward(self, xb):
        return xb @ self.weights + self.bias

显然，这里定义的Mnist_Logistic是nn.Module的一个子类。在这个类里边，保存了参数weights和bias，必须要注意的地方在于，这里与前面不同，没有标注requires_grad,
因为Parameter的定义是这样的：
class torch.nn.parameter.Parameter(data=None, requires_grad=True)
当然，上面的784和10这种所谓的magic number看上去还是挺碍眼的，不过这里就先不管吧。
除了保存参数之外，还定义了一个forward方法，其内容等价于前面定义的model。
于是model的定义又改版了，从一个函数变成了一个对象，这也是符合程序设计思想的演变：

model = Mnist_Logistic()

然而，调用方式还是没有变的：

print(loss_func(model(xb), yb))

不要奇怪上面的调用方式，pytorch的底层逻辑保证这里model(xb)调用的是model.forward(xb)，我想，这里不应该有什么疑问。

接下来，是定义一个函数来封装训练过程：

def fit():
    for epoch in range(epochs):
        for i in range((n - 1) // bs + 1):
            start_i = i * bs
            end_i = start_i + bs
            xb = x_train[start_i:end_i]
            yb = y_train[start_i:end_i]
            pred = model(xb)
            loss = loss_func(pred, yb)

            loss.backward()
            with torch.no_grad():
                for p in model.parameters():
                    p -= p.grad * lr
                model.zero_grad()

可以看到，在with torch.no_grad()上下文中的变化，一目了然，就不废话了。

接下来做个check，当然，是先训练，然后打印一下看看loss是否下降：

fit()
print(loss_func(model(xb), yb))

Refactor using `nn.Linear`

接下来改进Mnist_Logistic的定义

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Linear(784, 10)

    def forward(self, xb):
        return self.lin(xb)

nn.Linear类构造了一个线性层，代替了前面手工定义self.weights和self.bias, 以及xb @ self.weights + self.bias。
nn.Linear的声明是这样的：

class torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)

在其定义中，有这么一些语句：

def __init__(self, in_features: int, out_features: int, bias: bool = True,
                 device=None, dtype=None) -> None:
        factory_kwargs = {
   
   'device': device, 'dtype': dtype}
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
        if bias:
            self.bias = Parameter(torch.empty