torch.nn到底是什么?(精简版)

本文从零开始构建神经网络,逐步引入PyTorch的高级功能,如torch.nn、torch.optim、Dataset和DataLoader,展示如何高效地训练模型。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

此文为《torch.nn到底是什么?》的总结版。

首先创建基本的神经网络,然后逐步添加torch.nntorch.optimDatesetDataLoader的功能,以显示每一部分的具体作用。

1、设置MNIST数据

使用经典的 MNIST 数据集,该数据集由手写数字(0-9)的黑白图像组成。

使用 pathlib 来处理路径(Python3标准库的一部分),用 requests 下载数据。

from pathlib import Path
import requests

DATA_PATH = Path("data")
PATH = DATA_PATH / "mnist"

PATH.mkdir(parents=True, exist_ok=True)

URL = "http://deeplearning.net/data/mnist/"
FILENAME = "mnist.pkl.gz"

if not (PATH / FILENAME).exists():
        content = requests.get(URL + FILENAME).content
        (PATH / FILENAME).open("wb").write(content)

该数据集的格式为NumPy array,使用 pickle 存储。

import pickle
import gzip

with gzip.open((PATH / FILENAME).as_posix(), "rb") as f:
        ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")

每个图片大小为28x28,并存储为长度为784(=28x28)的扁平行。

查看其中的一个图片:

from matplotlib import pyplot
import numpy as np

pyplot.imshow(x_train[0].reshape((28, 28)), cmap="gray")
print(x_train.shape)

输出为:
在这里插入图片描述

(50000, 784)

PyTorch使用 tensor 而不是 NumPy array,所以我们需要将其转换。

import torch

x_train, y_train, x_valid, y_valid = map(
    torch.tensor, (x_train, y_train, x_valid, y_valid)
)
n, c = x_train.shape
x_train, x_train.shape, y_train.min(), y_train.max()
print(x_train, y_train)
print(x_train.shape)
print(y_train.min(), y_train.max())

输出:

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]]) tensor([5, 0, 4,  ..., 8, 4, 8])
torch.Size([50000, 784])
tensor(0) tensor(9)

2、从头构建神经网络(不使用 torch.nn

首先只使用PyTorch tensor 操作创建一个模型。

#initializing the weights with Xavier initialisation (by multiplying with 1/sqrt(n)).

import math

weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
bias = torch.zeros(10, requires_grad=True)

def log_softmax(x):
    return x - x.exp().sum(-1).log().unsqueeze(-1)

def model(xb):
    return log_softmax(xb @ weights + bias)

def nll(input, target):
    return -input[range(target.shape[0]), target].mean()

def accuracy(out, yb):
    preds = torch.argmax(out, dim=1)
    return (preds == yb).float().mean()
    
loss_func = nll

bs = 64  # batch size

xb = x_train[0:bs]  # a mini-batch from x
yb = y_train[0:bs]

preds = model(xb)  # predictions

print(preds[0], preds.shape)
print(loss_func(preds, yb))
print(accuracy(preds, yb))

输出:

tensor([-1.7022, -3.0342, -2.4138, -2.6452, -2.7764, -2.0892, -2.2945, -2.5480,
        -2.3732, -1.8915], grad_fn=<SelectBackward>) torch.Size([64, 10])

tensor(2.3783, grad_fn=<NegBackward>)
tensor(0.0938)

现在我们可以进行训练。对于每次迭代,将会做以下几件事:

  • 选择一批数据(mini-batch)
  • 使用模型进行预测
  • 计算损失
  • loss.backward() 更新模型的梯度,即权重和偏置
from IPython.core.debugger import set_trace

lr = 0.5  # learning rate
epochs = 2  # how many epochs to train for

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
       #set_trace()
        start_i = i * bs
        end_i = start_i + bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        with torch.no_grad():
            weights -= weights.grad * lr
            bias -= bias.grad * lr
            weights.grad.zero_()
            bias.grad.zero_()

print(loss_func(model(xb), yb), accuracy(model(xb), yb))

输出:

tensor(0.0806, grad_fn=<NegBackward>) tensor(1.)

3、使用 torch.nn.functional

如果使用了负对数似然损失函数和 log softnax 激活函数,那么Pytorch提供的F.cross_entropy 结合了两者。所以我们甚至可以从我们的模型中移除激活函数。

import torch.nn.functional as F

loss_func = F.cross_entropy

def model(xb):
    return xb @ weights + bias

注意,在 model 函数中我们不再需要调用 log_softmax。让我们确认一下,损失和精确度与前边计算的一样:

print(loss_func(model(xb), yb), accuracy(model(xb), yb))

输出:

tensor(0.0806, grad_fn=<NllLossBackward>) tensor(1.)

4、使用 nn.Module 重构

继承 nn.Module(它本身是一个类并且能够跟踪状态)建立子类,并实例化模型:

from torch import nn

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))
        self.bias = nn.Parameter(torch.zeros(10))

    def forward(self, xb):
        return xb @ self.weights + self.bias
        
model = Mnist_Logistic()

print(loss_func(model(xb), yb))

输出:

tensor(2.3558, grad_fn=<NllLossBackward>)

将训练循环包装到一个 fit 函数中,以便我们以后运行。

def fit():
    for epoch in range(epochs):
        for i in range((n - 1) // bs + 1):
            start_i = i * bs
            end_i = start_i + bs
            xb = x_train[start_i:end_i]
            yb = y_train[start_i:end_i]
            pred = model(xb)
            loss = loss_func(pred, yb)

            loss.backward()
            with torch.no_grad():
                for p in model.parameters():
                    p -= p.grad * lr
                model.zero_grad()

fit()

print(loss_func(model(xb), yb))

输出:

tensor(0.0826, grad_fn=<NllLossBackward>)

5、使用 nn.Linear 重构

使用PyTorch 的 nn.Linear 类建立一个线性层,以替代手动定义和初始化 self.weightsself.bias、计算 xb @ self.weights + self.bias 等工作。

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Linear(784, 10)

    def forward(self, xb):
        return self.lin(xb)

model = Mnist_Logistic()
print(loss_func(model(xb), yb))

输出:

tensor(2.3156, grad_fn=<NllLossBackward>)

我们仍然能够像之前那样使用 fit 方法

fit()

print(loss_func(model(xb), yb))

输出:

tensor(0.0809, grad_fn=<NllLossBackward>)

6、使用 optim 重构

定义一个函数来创建模型和优化器,以便将来可以重用它。

from torch import optim

def get_model():
    model = Mnist_Logistic()
    return model, optim.SGD(model.parameters(), lr=lr)

model, opt = get_model()
print(loss_func(model(xb), yb))

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
        start_i = i * bs
        end_i = start_i + bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

print(loss_func(model(xb), yb))

输出:

tensor(2.2861, grad_fn=<NllLossBackward>)
tensor(0.0815, grad_fn=<NllLossBackward>)

7、使用 Dataset 重构

from torch.utils.data import TensorDataset

train_ds = TensorDataset(x_train, y_train)
model, opt = get_model()

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
        xb, yb = train_ds[i * bs: i * bs + bs]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

print(loss_func(model(xb), yb))

输出:

tensor(0.0800, grad_fn=<NllLossBackward>)

8、使用 DataLoader 重构

from torch.utils.data import DataLoader

train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs)

for epoch in range(epochs):
    for xb, yb in train_dl:
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

print(loss_func(model(xb), yb))

输出:

tensor(0.0821, grad_fn=<NllLossBackward>)

9、增加验证

train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)

valid_ds = TensorDataset(x_valid, y_valid)
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)

我们将在每个epoch结束时计算和打印验证损失。(注意,我们总是在训练之前调用model.train(),在推理之前调用 model.eval(),因为这些由诸如 nn.BatchNorm2dnn.Dropout 等层使用,以确保这些不同阶段的适当行为。)

model, opt = get_model()

for epoch in range(epochs):
    model.train()
    for xb, yb in train_dl:
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

    model.eval()
    with torch.no_grad():
        valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)

    print(epoch, valid_loss / len(valid_dl))

输出:

0 tensor(0.2981)
1 tensor(0.3033)

10、创建 fit()get_data()

loss_batch 函数计算每个批次的损失。

def loss_batch(model, loss_func, xb, yb, opt=None):
    loss = loss_func(model(xb), yb)

    if opt is not None:
        loss.backward()
        opt.step()
        opt.zero_grad()

    return loss.item(), len(xb)

fit 运行必要的操作来训练我们的模型并计算每个epoch的训练和验证损失。

import numpy as np

def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
    for epoch in range(epochs):
        model.train()
        for xb, yb in train_dl:
            loss_batch(model, loss_func, xb, yb, opt)

        model.eval()
        with torch.no_grad():
            losses, nums = zip(
                *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]
            )
        val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)

        print(epoch, val_loss)

get_data 为训练集合验证集返回 DataLoader

def get_data(train_ds, valid_ds, bs):
    return (
        DataLoader(train_ds, batch_size=bs, shuffle=True),
        DataLoader(valid_ds, batch_size=bs * 2),
    )

现在,我们获取 DataLoader 和拟合模型的整个过程可以在3行代码中运行:

train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
model, opt = get_model()
fit(epochs, model, loss_func, opt, train_dl, valid_dl)

输出:

0 0.3055081913471222
1 0.31777948439121245

11、总结

我们现在有一个通用数据流水线和训练循环,你可以使用它来训练多种类型PyTorch模型。 各部分的功能总结如下:

  • torch.nn
    • Module:创建一个可调用的对象,其行为类似于一个函数,但也可以包含状态(例如神经网络层权重)。 它知道它包含哪些参数,并且可以将所有梯度归零,循环遍历它们更新权重等。
    • Parametertensor 的包装器(wrapper),它告诉 Module 它具有在反向传播期间需要更新的权重。 只更新具有 requires_grad 属性的 tensor
    • functional:一个模块(通常按惯例导入到F命名空间中),它包含激活函数,损失函数等,以及非状态(non-stateful)版本的层,如卷积层和线性层。
  • torch.optim:包含 SGD 等优化器,可在后向传播步骤中更新 Parameter 的权重。
  • Dataset:带有 __len____getitem__ 的对象的抽象接口,包括 PyTorch 提供的类,如TensorDataset
  • DataLoader:获取任何 Dataset 并创建一个返回批量数据的迭代器。
class HybridFeatureFusion(nn.Module): def init(self, in_channels=None, hidden_dim=256, nhead=8, dim_feedforward=1024, dropout=0.0, enc_act=“gelu”, num_encoder_layers=1): super(HybridFeatureFusion, self).init() # encoder transformer if in_channels is None: in_channels = [256, 512, 1024] self.hidden_dim = hidden_dim self.nhead = nhead self.dim_feedforward = dim_feedforward self.dropout = dropout self.num_encoder_layers = num_encoder_layers self.pe_temperature =10000 encoder_layer = TransformerEncoderLayer( hidden_dim, nhead=nhead, dim_feedforward=dim_feedforward, dropout=dropout, activation=enc_act) # self.input_proj = nn.Sequential( # nn.Conv2d(in_channel, hidden_dim, kernel_size=1, bias=False), # nn.BatchNorm2d(hidden_dim) # ) # channel projection self.input_proj = nn.ModuleList() for in_channel in in_channels: self.input_proj.append( nn.Sequential( nn.Conv2d(in_channel, hidden_dim, kernel_size=1, bias=False), nn.BatchNorm2d(hidden_dim) ) ) self.encoder = TransformerEncoder(copy.deepcopy(encoder_layer), num_encoder_layers) self.cross_attn1 = CrossScaleAttention(256) self.cross_attn2 = CrossScaleAttention(256) self.ca = ChannelAttention(256) self.fusion_norm = nn.ModuleList([ nn.Sequential( nn.BatchNorm2d(256), nn.ReLU(inplace=True)) for _ in range(3)]) # Step 3: 多尺度卷积扩展 self.aspp = ASPP(256, 512) self.final_conv = nn.Sequential( nn.Conv2d(512, 1024, 3, padding=1), nn.BatchNorm2d(1024), nn.ReLU(), nn.Conv2d(1024, 2048, 3, padding=1), nn.BatchNorm2d(2048), nn.ReLU() ) @staticmethod def build_2d_sincos_position_embedding(w, h, embed_dim=256, temperature=10000.): ''' 动态生成位置编码 ''' grid_w = torch.arange(int(w), dtype=torch.float32) grid_h = torch.arange(int(h), dtype=torch.float32) grid_w, grid_h = torch.meshgrid(grid_w, grid_h, indexing='ij') assert embed_dim % 4 == 0, \ 'Embed dimension must be divisible by 4 for 2D sin-cos position embedding' pos_dim = embed_dim // 4 omega = torch.arange(pos_dim, dtype=torch.float32) / pos_dim omega = 1. / (temperature ** omega) out_w = grid_w.flatten()[..., None] @ omega[None] out_h = grid_h.flatten()[..., None] @ omega[None] return torch.concat([out_w.sin(), out_w.cos(), out_h.sin(), out_h.cos()], dim=1)[None, :, :] def forward(self, feats): # 通道映射 1024->256 proj_feats = [self.input_proj[i](feat) for i, feat in enumerate(feats)] # 展平 flatten [B, C, H, W] to [B, HxW, C] h, w = proj_feats[2].shape[2:] src_flatten = proj_feats[2].flatten(2).permute(0, 2, 1) # 获取位置编码 pos_embed = self.build_2d_sincos_position_embedding( w, h, self.hidden_dim, self.pe_temperature).to(src_flatten.device) # 提取全局特征 memory = self.encoder(src_flatten, pos_embed=pos_embed) # 对输出结果进行unflatten,变回原来的大小 proj_feats[2] = memory.permute(0, 2, 1).reshape(-1, self.hidden_dim, h, w).contiguous() feats = proj_feats ### Step 1: 跨尺度注意力交互 feats[1] = self.fusion_norm[1](self.cross_attn2(feats[1], feats[2])) feats[0] = self.fusion_norm[0](self.cross_attn1(feats[0], feats[1])) ### Step 2: 空间对齐与动态加权 feats[0] = F.adaptive_avg_pool2d(feats[0], feats[2].shape[-1] // 2) feats[1] = F.adaptive_avg_pool2d(feats[1], feats[2].shape[-1] // 2) feats[2] = F.adaptive_avg_pool2d(feats[2], feats[2].shape[-1] // 2) combined = self.fusion_norm[2](self.ca(feats[0]) + self.ca(feats[1]) + self.ca(feats[2])) ### Step 3: 多尺度卷积扩展 output = self.final_conv(self.aspp(combined)) return output 我现在正在进行异常检测任务,帮我优化一下这个特征融合模块,降低部分参数量,输出的特征用于还原预训练特征,最终稿用来定位异常位置,给出完整优化代码
04-03
如何禁用分布式训练# from header import * # from datasets import * # from model import * # from config import * # # # def parser_args(): # parser = argparse.ArgumentParser(description='train parameters') # # 基础参数 # parser.add_argument('--model', type=str) # 模型类型(如ImageBind+Vicuna) # parser.add_argument('--local_rank', default=0, type=int) # GPU编号 # parser.add_argument('--save_path', type=str) # parser.add_argument('--log_path', type=str) # 日志 # # # 模型配置参数 # parser.add_argument('--imagebind_ckpt_path', type=str) # ImageBind预训练权重路径 # parser.add_argument('--vicuna_ckpt_path', type=str) # Vicuna预训练权重路径 # parser.add_argument('--delta_ckpt_path', type=str) # 第一阶段训练的增量参数路径 # parser.add_argument('--max_tgt_len', type=int) # 输入序列的最大长度 # parser.add_argument('--stage', type=int) # 阶段 微调/测试 # # # 数据配置参数 # parser.add_argument('--data_path', type=str) # 训练数据路径 # parser.add_argument('--image_root_path', type=str) # 图像数据根目录 # # return parser.parse_args() # # # def initialize_distributed(args): # # 从环境变量获取分布式训练参数 # args['master_ip'] = os.getenv('MASTER_ADDR', 'localhost') # 主节点IP # args['master_port'] = os.getenv('MASTER_PORT', '6000') # 主节点端口 # args['world_size'] = int(os.getenv('WORLD_SIZE', '1')) # 总GPU数量 # args['local_rank'] = int(os.getenv('RANK', '0')) % torch.cuda.device_count() # # # 设置当前GPU设备 # device = args['local_rank'] % torch.cuda.device_count() # torch.cuda.set_device(device) # # # 初始化分布式训练环境(使用NCCL后端) # deepspeed.init_distributed(dist_backend='nccl') # # def set_random_seed(seed): # if seed is not None and seed > 0: # random.seed(seed) # np.random.seed(seed) # torch.manual_seed(seed) # torch.random.manual_seed(seed) # torch.cuda.manual_seed(seed) # torch.cuda.manual_seed_all(seed) # # # def config_env(args): # args['root_dir'] = '../' # 项目根目录 # args['mode'] = 'train' # 运行模式(训练/测试) # # # 加载配置文件(假设lo
03-22
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值