PyTorch神经网络工具箱nn详解：从基础模块到实战应用-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00956/article/details/148392909

PyTorch神经网络工具箱nn详解：从基础模块到实战应用

pytorch-book PyTorch tutorials and fun projects including neural talk, neural style, poem writing, anime generation (《深度学习框架PyTorch：入门与实战》) 项目地址: https://gitcode.com/gh_mirrors/py/pytorch-book

引言

在深度学习领域，PyTorch因其灵活性和易用性广受欢迎。本章将深入探讨PyTorch的核心神经网络模块nn，帮助读者掌握构建深度学习模型的关键技术。我们将从基础模块开始，逐步深入到复杂网络结构的构建，最后介绍优化技巧和实际应用。

1. nn.Module：神经网络构建基石

1.1 Module基础概念

nn.Module是PyTorch中所有神经网络模块的基类，它提供了构建复杂网络所需的基本功能。理解Module的工作原理是掌握PyTorch的关键。

import torch
from torch import nn

# 自定义全连接层示例
class Linear(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()  # 必须调用父类初始化
        self.W = nn.Parameter(torch.randn(in_features, out_features))
        self.b = nn.Parameter(torch.randn(out_features))
    
    def forward(self, x):
        return x @ self.W + self.b.expand_as(x @ self.W)

1.2 Module关键特性

参数管理：自动跟踪所有nn.Parameter对象
子模块嵌套：支持模块的层次化组织
自动求导：无需手动实现反向传播
设备移动：方便地在CPU/GPU间切换

2. 常用神经网络层详解

2.1 卷积神经网络层

2.1.1 二维卷积层

# 创建3x3卷积核，输入3通道，输出64通道
conv = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)

输出尺寸计算公式： $H_{out} = \lfloor \frac{H_{in} + 2×padding - kernel_size}{stride} \rfloor + 1$

2.1.2 转置卷积

# 上采样转置卷积
trans_conv = nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1)

2.2 循环神经网络层

2.2.1 LSTM层

# 输入维度100，隐藏层维度256，2层LSTM
lstm = nn.LSTM(input_size=100, hidden_size=256, num_layers=2)

2.2.2 GRU层

# 简化版LSTM，计算效率更高
gru = nn.GRU(input_size=100, hidden_size=256, num_layers=1)

2.3 注意力机制层

# 多头注意力机制
attention = nn.MultiheadAttention(embed_dim=512, num_heads=8)

3. 网络构建高级技巧

3.1 Sequential与ModuleList

# 使用Sequential构建线性网络
model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10)
)

# 使用ModuleList构建复杂分支网络
class ComplexNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.branches = nn.ModuleList([
            nn.Linear(32, 64),
            nn.Conv2d(3, 32, 3)
        ])

3.2 参数初始化策略

def init_weights(m):
    if isinstance(m, nn.Linear):
        nn.init.xavier_uniform_(m.weight)
        m.bias.data.fill_(0.01)

model.apply(init_weights)

4. 损失函数与优化

4.1 常用损失函数

# 分类任务
criterion = nn.CrossEntropyLoss()

# 回归任务
mse_loss = nn.MSELoss()

# 自定义损失函数
class CustomLoss(nn.Module):
    def forward(self, input, target):
        return (input - target).abs().mean()

4.2 优化器配置

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)

# 学习率调度器
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

5. 实战技巧与最佳实践

5.1 模型保存与加载

# 保存整个模型
torch.save(model, 'model.pth')

# 仅保存参数
torch.save(model.state_dict(), 'params.pth')

# 加载模型
loaded_model = torch.load('model.pth')

5.2 混合精度训练

scaler = torch.cuda.amp.GradScaler()

with torch.cuda.amp.autocast():
    output = model(input)
    loss = criterion(output, target)
    
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()