深度学习——深度学习计算（五）_深度学习的计算-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_52952281/article/details/144477601

层和块

⽣成⼀个⽹络，其中包含⼀个具有256个单元和ReLU激活函数的全连接隐藏层，然后是⼀个具有10个隐藏单元且不带激活函数的全连接输出层。

import torch 
import torch.nn as nn 
from torch.nn import functional as F

num_inputs = 20
net  = nn.Sequential(nn.Flatten(),
                     nn.Linear(in_features= num_inputs, out_features=256),
                     nn.ReLU(),
                     nn.Linear(256, 10))
x = torch.rand(2, 20)
print(x)
net.parameters()

1. 自定义块，类似于python中的类，但是可以保存前向传播参数和进行反向传播

class MLP(nn.Module):
    # ⽤模型参数声明层。这⾥，我们声明两个全连接的层
    def __init__(self):
        # 调⽤MLP的⽗类Module的构造函数来执⾏必要的初始化。
        # 这样，在类实例化时也可以指定其他函数参数，例如模型参数params（稍后将介绍）
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.out = nn.Linear(256, 10)
    def forward(self, x):
        # 注意，这⾥我们使⽤ReLU的函数版本，其在nn.functional模块中定义。
        return self.out(F.relu(self.hidden(x)))
    
net = MLP()
net(x)

1. 顺序块快

# 递归实现前向传播
class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for idx, module in enumerate(args):
            # 这⾥， module是Module⼦类的⼀个实例。我们把它保存在'Module'类内的成员
            # 变量_modules中。 _module的类型是OrderedDict
            self._modules[str(idx)] = module

    def forward(self, x):
        # OrderedDict保证了按照成员添加的顺序遍历它们
        for block in self._modules.values():
            x = block(x)
        return x

net = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
net(x)

1. 在前向传播函数中执⾏代码

但是实际训练过程中，我们的框架并不仅仅是简单的顺序架构，需要更强的灵活性。我们可能希望合并既不是上⼀层的结果也不是可更新参数的项，我们称之为常数参数

class FixedHiddenMLP(nn.Module):
    def __init__(self) :
        super().__init__()
        # 不计算梯度的随机权重参数。因此其在训练期间保持不变
        self.rand_weight = torch.rand((20, 20), requires_grad=True)
        self.Linear = nn.Linear(20, 20)
    def forward(self, x):
        x = self.Linear(x)
        # 使⽤创建的常量参数以及relu和mm函数
        x = F.relu(torch.mm(x, self.rand_weight) + 1)
        # 复⽤全连接层。这相当于两个全连接层共享参数
        x = self.Linear(x)
        # 控制流
        while x.abs().sum() > 1:
            x /= 2
        return x.sum()
net = FixedHiddenMLP()
net(x)

1. 组合块，嵌套块

class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.Linear(20, 64), nn.ReLU(),
                                 nn.Linear(64, 32), nn.ReLU())
        self.linear = nn.Linear(32, 16)
    def forward(self, x):
        return self.linear(self.net(x))
chimera = nn.Sequential(NestMLP(), nn.Linear(16, 20), FixedHiddenMLP (())
chimera[0].state_dict()

参数管理

访问参数，⽤于调试、诊断和可视化；
参数初始化；
在不同模型组件间共享参数。

、网络参数访问

我们从已有模型中访问参数。当通过Sequential类定义模型时，我们可以通过索引来访问模型的任意层。这就像模型是⼀个列表⼀样，每层的参数都在其属性中。

print(chimera[2].state_dict())

通过模型输出，我们可知这个全连接层包含两个参数，分别是该层的权重和偏置。两者都存储为单精度浮点数（ﬂoat32）。注意，参数名称允许唯⼀标识每个参数，即使在包含数百个层的⽹络中也是如此。

# 一次性访问所有的参数
print(*[(name, param.shape) for name, param in chimera[0].named_parameters()])
print(*[(name, param.shape) for name, param in chimera.named_parameters()])

、参数初始化

默认情况下， PyTorch会根据⼀个范围均匀地初始化权重和偏置矩阵，这个范围是根据输⼊和输出维度计算出的。 PyTorch的nn.init模块提供了多种预置初始化⽅法。

# 1. 内置初始化
def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, mean=0, std=0.01)     #将权重设置为期望=0，方差为0.01的高斯随机分布
        nn.init.zeros_(m.bias)
net.apply(init_normal)


def init_constant(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 1)          # 初始化为常数
        nn.init.zeros_(m.bias)
net.apply(init_constant)

# 对特定的网络层进行初始化


def init_xavier(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)
def init_42(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 42)
net[0].apply(init_xavier)
net[2].apply(init_42)

# 2. 自定义初始化

、参数绑定

shared = nn.Linear(8, 8)
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
                    shared, nn.ReLU(),
                    shared, nn.ReLU(),
                    nn.Linear(8, 1))
x = torch.rand(6, 4)
net(x)
# 检查参数是否相同
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
# 确保它们实际上是同⼀个对象，⽽不只是有相同的值
print(net[2].weight.data[0] == net[4].weight.data[0])

、延后初始化

我们定义了⽹络架构，但没有指定输⼊维度。
我们添加层时没有指定前⼀层的输出维度。
我们在初始化参数时，甚甚至⾄没有⾜够的信息来确定模型应该包含多少参数。

当使⽤卷积神经⽹络时，由于输⼊维度（即图像的分辨率）将影响每个后续层的维数，有了该技术将更加⽅方便便。现在我们在编写代码时⽆须知道维度是什么就可以设置参数，这种能⼒可以⼤⼤简化定义和修改模型的任务。

文件读写及模型保存

到⽬前为⽌，我们讨论了如何处理数据，以及如何构建、训练和测试深度学习模型。然⽽，有时我们希望保存训练的模型，以备将来在各种环境中使⽤（⽐如在部署中进⾏预测）。此外，当运⾏⼀个耗时较⻓的训练过程时，最佳的做法是定期保存中间结果，以确保在服务器电源被不⼩⼼断掉时，我们不会损失⼏天的计算结果。因此，现在是时候学习如何加载和存储权重向量和整个模型了。

# 1. 加载和保存张量

import torch
from torch import nn
from torch.nn import functional as F

x = torch.arange(4)
torch.save(x, 'x-file')

x2 = torch.load('x-file')
x2
y = torch.zeros(4)
torch.save([x, y],'x-files')
x2, y2 = torch.load('x-files')
(x2, y2)


mydict = {'x': x, 'y': y}
torch.save(mydict, 'mydict')
mydict2 = torch.load('mydict')
mydict2

# 2. 加载和保存模型
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)
    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))
    
net = MLP()
x = torch.randn(size=(2, 20))
y = net(x)
torch.save(net.state_dict(), 'mlp.params')
clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))
clone.eval()

Y_clone = clone(x)
Y_clone == y

GPU计算

import torch
from torch import nn
torch.device('cpu'), torch.device('cuda'), torch.device('cuda:1')
torch.cuda.device_count()
print(torch.cuda.is_available())

!nvidia-smi
print(torch.__version__)

多GPU计算

Import time
def try_gpu(i=): #@save
    # 如果存在，则返回gpu(i)，否则返回cpu()
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')
def try_all_gpus(): #@save
    # 返回所有可⽤的GPU，如果没有GPU，则返回[cpu(),]
    devices = [torch.device(f'cuda:{i}')
        for i in range(torch.cuda.device_count())]
    return devices if devices else [torch.device('cpu')]
x = torch.ones(2, 3, device='cuda:0')
y = torch.rand((2, 3), device='cuda:1')

z = x.cuda(1)
# print(x, '\n', y, '\n', z)

xtime_start = time.time()
# xy = x + y
xtime_end = time.time()

ztime_start = time.time()
zy = z + y
ztime_end = time.time()
print(xtime_end - xtime_start, ztime_end - ztime_start)

net = nn.Sequential(nn.Linear(2 * 3, 1,device='cuda:0'))
net = net.to(device='cuda:0')