torch.nn到底是什么?(精简版)

本文从零开始构建神经网络,逐步引入PyTorch的高级功能,如torch.nn、torch.optim、Dataset和DataLoader,展示如何高效地训练模型。

此文为《torch.nn到底是什么?》的总结版。

首先创建基本的神经网络,然后逐步添加torch.nntorch.optimDatesetDataLoader的功能,以显示每一部分的具体作用。

1、设置MNIST数据

使用经典的 MNIST 数据集,该数据集由手写数字(0-9)的黑白图像组成。

使用 pathlib 来处理路径(Python3标准库的一部分),用 requests 下载数据。

from pathlib import Path
import requests

DATA_PATH = Path("data")
PATH = DATA_PATH / "mnist"

PATH.mkdir(parents=True, exist_ok=True)

URL = "http://deeplearning.net/data/mnist/"
FILENAME = "mnist.pkl.gz"

if not (PATH / FILENAME).exists():
        content = requests.get(URL + FILENAME).content
        (PATH / FILENAME).open("wb").write(content)

该数据集的格式为NumPy array,使用 pickle 存储。

import pickle
import gzip

with gzip.open((PATH / FILENAME).as_posix(), "rb") as f:
        ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")

每个图片大小为28x28,并存储为长度为784(=28x28)的扁平行。

查看其中的一个图片:

from matplotlib import pyplot
import numpy as np

pyplot.imshow(x_train[0].reshape((28, 28)), cmap="gray")
print(x_train.shape)

输出为:
在这里插入图片描述

(50000, 784)

PyTorch使用 tensor 而不是 NumPy array,所以我们需要将其转换。

import torch

x_train, y_train, x_valid, y_valid = map(
    torch.tensor, (x_train, y_train, x_valid, y_valid)
)
n, c = x_train.shape
x_train, x_train.shape, y_train.min(), y_train.max()
print(x_train, y_train)
print(x_train.shape)
print(y_train.min(), y_train.max())

输出:

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]]) tensor([5, 0, 4,  ..., 8, 4, 8])
torch.Size([50000, 784])
tensor(0) tensor(9)

2、从头构建神经网络(不使用 torch.nn

首先只使用PyTorch tensor 操作创建一个模型。

#initializing the weights with Xavier initialisation (by multiplying with 1/sqrt(n)).

import math

weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
bias = torch.zeros(10, requires_grad=True)

def log_softmax(x):
    return x - x.exp().sum(-1).log().unsqueeze(-1)

def model(xb):
    return log_softmax(xb @ weights + bias)

def nll(input, target):
    return -input[range(target.shape[0]), target].mean()

def accuracy(out, yb):
    preds = torch.argmax(out, dim=1)
    return (preds == yb).float().mean()
    
loss_func = nll

bs = 64  # batch size

xb = x_train[0:bs]  # a mini-batch from x
yb = y_train[0:bs]

preds = model(xb)  # predictions

print(preds[0], preds.shape)
print(loss_func(preds, yb))
print(accuracy(preds, yb))

输出:

tensor([-1.7022, -3.0342, -2.4138, -2.6452, -2.7764, -2.0892, -2.2945, -2.5480,
        -2.3732, -1.8915], grad_fn=<SelectBackward>) torch.Size([64, 10])

tensor(2.3783, grad_fn=<NegBackward>)
tensor(0.0938)

现在我们可以进行训练。对于每次迭代,将会做以下几件事:

  • 选择一批数据(mini-batch)
  • 使用模型进行预测
  • 计算损失
  • loss.backward() 更新模型的梯度,即权重和偏置
from IPython.core.debugger import set_trace

lr = 0.5  # learning rate
epochs = 2  # how many epochs to train for

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
       #set_trace()
        start_i = i * bs
        end_i = start_i + bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        with torch.no_grad():
            weights -= weights.grad * lr
            bias -= bias.grad * lr
            weights.grad.zero_()
            bias.grad.zero_()

print(loss_func(model(xb), yb), accuracy(model(xb), yb))

输出:

tensor(0.0806, grad_fn=<NegBackward>) tensor(1.)

3、使用 torch.nn.functional

如果使用了负对数似然损失函数和 log softnax 激活函数,那么Pytorch提供的F.cross_entropy 结合了两者。所以我们甚至可以从我们的模型中移除激活函数。

import torch.nn.functional as F

loss_func = F.cross_entropy

def model(xb):
    return xb @ weights + bias

注意,在 model 函数中我们不再需要调用 log_softmax。让我们确认一下,损失和精确度与前边计算的一样:

print(loss_func(model(xb), yb), accuracy(model(xb), yb))

输出:

tensor(0.0806, grad_fn=<NllLossBackward>) tensor(1.)

4、使用 nn.Module 重构

继承 nn.Module(它本身是一个类并且能够跟踪状态)建立子类,并实例化模型:

from torch import nn

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))
        self.bias = nn.Parameter(torch.zeros(10))

    def forward(self, xb):
        return xb @ self.weights + self.bias
        
model = Mnist_Logistic()

print(loss_func(model(xb), yb))

输出:

tensor(2.3558, grad_fn=<NllLossBackward>)

将训练循环包装到一个 fit 函数中,以便我们以后运行。

def fit():
    for epoch in range(epochs):
        for i in range((n - 1) // bs + 1):
            start_i = i * bs
            end_i = start_i + bs
            xb = x_train[start_i:end_i]
            yb = y_train[start_i:end_i]
            pred = model(xb)
            loss = loss_func(pred, yb)

            loss.backward()
            with torch.no_grad():
                for p in model.parameters():
                    p -= p.grad * lr
                model.zero_grad()

fit()

print(loss_func(model(xb), yb))

输出:

tensor(0.0826, grad_fn=<NllLossBackward>)

5、使用 nn.Linear 重构

使用PyTorch 的 nn.Linear 类建立一个线性层,以替代手动定义和初始化 self.weightsself.bias、计算 xb @ self.weights + self.bias 等工作。

class Mnist_Logistic(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Linear(784, 10)

    def forward(self, xb):
        return self.lin(xb)

model = Mnist_Logistic()
print(loss_func(model(xb), yb))

输出:

tensor(2.3156, grad_fn=<NllLossBackward>)

我们仍然能够像之前那样使用 fit 方法

fit()

print(loss_func(model(xb), yb))

输出:

tensor(0.0809, grad_fn=<NllLossBackward>)

6、使用 optim 重构

定义一个函数来创建模型和优化器,以便将来可以重用它。

from torch import optim

def get_model():
    model = Mnist_Logistic()
    return model, optim.SGD(model.parameters(), lr=lr)

model, opt = get_model()
print(loss_func(model(xb), yb))

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
        start_i = i * bs
        end_i = start_i + bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

print(loss_func(model(xb), yb))

输出:

tensor(2.2861, grad_fn=<NllLossBackward>)
tensor(0.0815, grad_fn=<NllLossBackward>)

7、使用 Dataset 重构

from torch.utils.data import TensorDataset

train_ds = TensorDataset(x_train, y_train)
model, opt = get_model()

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
        xb, yb = train_ds[i * bs: i * bs + bs]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

print(loss_func(model(xb), yb))

输出:

tensor(0.0800, grad_fn=<NllLossBackward>)

8、使用 DataLoader 重构

from torch.utils.data import DataLoader

train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs)

for epoch in range(epochs):
    for xb, yb in train_dl:
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

print(loss_func(model(xb), yb))

输出:

tensor(0.0821, grad_fn=<NllLossBackward>)

9、增加验证

train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)

valid_ds = TensorDataset(x_valid, y_valid)
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)

我们将在每个epoch结束时计算和打印验证损失。(注意,我们总是在训练之前调用model.train(),在推理之前调用 model.eval(),因为这些由诸如 nn.BatchNorm2dnn.Dropout 等层使用,以确保这些不同阶段的适当行为。)

model, opt = get_model()

for epoch in range(epochs):
    model.train()
    for xb, yb in train_dl:
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        opt.step()
        opt.zero_grad()

    model.eval()
    with torch.no_grad():
        valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)

    print(epoch, valid_loss / len(valid_dl))

输出:

0 tensor(0.2981)
1 tensor(0.3033)

10、创建 fit()get_data()

loss_batch 函数计算每个批次的损失。

def loss_batch(model, loss_func, xb, yb, opt=None):
    loss = loss_func(model(xb), yb)

    if opt is not None:
        loss.backward()
        opt.step()
        opt.zero_grad()

    return loss.item(), len(xb)

fit 运行必要的操作来训练我们的模型并计算每个epoch的训练和验证损失。

import numpy as np

def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
    for epoch in range(epochs):
        model.train()
        for xb, yb in train_dl:
            loss_batch(model, loss_func, xb, yb, opt)

        model.eval()
        with torch.no_grad():
            losses, nums = zip(
                *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]
            )
        val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)

        print(epoch, val_loss)

get_data 为训练集合验证集返回 DataLoader

def get_data(train_ds, valid_ds, bs):
    return (
        DataLoader(train_ds, batch_size=bs, shuffle=True),
        DataLoader(valid_ds, batch_size=bs * 2),
    )

现在,我们获取 DataLoader 和拟合模型的整个过程可以在3行代码中运行:

train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
model, opt = get_model()
fit(epochs, model, loss_func, opt, train_dl, valid_dl)

输出:

0 0.3055081913471222
1 0.31777948439121245

11、总结

我们现在有一个通用数据流水线和训练循环,你可以使用它来训练多种类型PyTorch模型。 各部分的功能总结如下:

  • torch.nn
    • Module:创建一个可调用的对象,其行为类似于一个函数,但也可以包含状态(例如神经网络层权重)。 它知道它包含哪些参数,并且可以将所有梯度归零,循环遍历它们更新权重等。
    • Parametertensor 的包装器(wrapper),它告诉 Module 它具有在反向传播期间需要更新的权重。 只更新具有 requires_grad 属性的 tensor
    • functional:一个模块(通常按惯例导入到F命名空间中),它包含激活函数,损失函数等,以及非状态(non-stateful)版本的层,如卷积层和线性层。
  • torch.optim:包含 SGD 等优化器,可在后向传播步骤中更新 Parameter 的权重。
  • Dataset:带有 __len____getitem__ 的对象的抽象接口,包括 PyTorch 提供的类,如TensorDataset
  • DataLoader:获取任何 Dataset 并创建一个返回批量数据的迭代器。
``` import torch import torch.nn as nn import torch.optim as optim import torch.nn.functional as F import numpy as np class ActorCritic(nn.Module): def __init__(self, state_dim, action_dim): super(ActorCritic, self).__init__() self.fc1 = nn.Linear(state_dim, 128) self.fc2 = nn.Linear(128, 128) self.actor = nn.Linear(128, action_dim) self.critic = nn.Linear(128, 1) def forward(self, x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) action_probs = F.softmax(self.actor(x), dim=-1) state_value = self.critic(x) return action_probs, state_value class A2CScheduler: def __init__(self, state_dim, action_dim, lr=0.001, gamma=0.99): self.model = ActorCritic(state_dim, action_dim) self.optimizer = optim.Adam(self.model.parameters(), lr=lr) self.gamma = gamma def select_action(self, state): state = torch.FloatTensor(state).unsqueeze(0) action_probs, _ = self.model(state) action = torch.multinomial(action_probs, 1).item() return action, action_probs[:, action] def update(self, trajectory): rewards, log_probs, state_values = [], [], [] for (state, action, reward, log_prob, state_value) in trajectory: rewards.append(reward) log_probs.append(log_prob) state_values.append(state_value) returns = [] R = 0 for r in reversed(rewards): R = r + self.gamma * R returns.insert(0, R) returns = torch.tensor(returns) log_probs = torch.stack(log_probs) state_values = torch.stack(state_values).squeeze() advantage = returns - state_values actor_loss = -log_probs * advantage.detach() critic_loss = F.mse_loss(state_values, returns) loss = actor_loss.mean() + critic_loss self.optimizer.zero_grad() loss.backward() self.optimizer.step() # 结合 `mp-quic-go` 使用 # 1. 获取状态信息 (如带宽、RTT、丢包等) # 2. 选择路径 (基于 `select_action` 方法) # 3. 收集数据并训练模型 (基于 `update` 方法)```请详细解释每一行代码的含义和意义
04-02
import torch import random import torch.nn as nn import torch.nn.functional as F import numpy as np import matplotlib.pyplot as plt from torchvision import datasets #datasets包含了常用数据集 from torch.utils.data import DataLoader from torch.utils.data import Subset import os os.environ['KMP_DUPLICATE_LIB_OK'] = 'True' # 定义 AlexNet 的结构 print("当前工作目录:", os.getcwd()) class AlexNet(nn.Module): def __init__(self): super().__init__() # 由于 MNIST 为 28x28,而最初 AlexNet 的输入图片是 227x227 的。所以网络层数和参数需要调节 self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1) self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2) self.relu1 = nn.ReLU() self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1) self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2) self.relu2 = nn.ReLU() self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1) self.conv4 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1) self.conv5 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1) self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2) self.relu3 = nn.ReLU() self.fc6 = nn.Linear(256 * 3 * 3, 1024) self.fc7 = nn.Linear(1024, 512) self.fc8 = nn.Linear(512, 10) def forward(self, x): x = self.conv1(x) x = self.pool1(x) x = self.relu1(x) x = self.conv2(x) x = self.pool2(x) x = self.relu2(x) x = self.conv3(x) x = self.conv4(x) x = self.conv5(x) x = self.pool3(x) x = self.relu3(x) x = x.view(-1, 256 * 3 * 3) x = self.fc6(x) x = F.relu(x) x = self.fc7(x) x = F.relu(x) x = self.fc8(x) return x # 划分数据集,使用原数据集的 1/10 def select_subset(dataset, ratio = 0.2): #这个比例决定了训练集的规模,1最大,全部用于训练,但会增加训练时间 subset_size = int(len(dataset) * ratio) #图片数量强制为整型 indices = np.random.choice(range(len(dataset)), subset_size, replace=False)#随机从数据集中选取subset_size个不重复的索引 return Subset(dataset, indices)#返回实例集加索引。 # 展示正确分类的图片 def plot_correctly_classified_images(model, dataset, device, num_images=10): model.eval() correctly_classified_imgs = [] for img, label in dataset: img = img.type(torch.FloatTensor).unsqueeze(0).unsqueeze(0).to(device) with torch.no_grad(): pred = model(img) pred_label = torch.argmax(pred).item() if pred_label == label: correctly_classified_imgs.append((img.cpu().squeeze(), label, pred_label)) if len(correctly_classified_imgs) >= num_images: break plt.figure(figsize=(10, 10)) for i, (img, true_label, pred_label) in enumerate(correctly_classified_imgs): plt.subplot(5, 2, i + 1) plt.imshow(img.numpy(), cmap='gray') plt.title(f"True: {true_label}, Pred: {pred_label}") plt.axis('off') plt.tight_layout() plt.show() # 展示错误分类的图片 def plot_misclassified_images(model, dataset, device, num_images=10): model.eval() misclassified_imgs = [] for img, label in dataset: img = img.type(torch.FloatTensor).unsqueeze(0).unsqueeze(0).to(device) with torch.no_grad(): pred = model(img) pred_label = torch.argmax(pred).item() if pred_label != label: misclassified_imgs.append((img.cpu().squeeze(), label, pred_label)) if len(misclassified_imgs) >= num_images: break plt.figure(figsize=(10, 10)) for i, (img, true_label, pred_label) in enumerate(misclassified_imgs): plt.subplot(5, 2, i + 1) plt.imshow(img.numpy(), cmap='gray') plt.title(f"True: {true_label}, Pred: {pred_label}") plt.axis('off') plt.tight_layout() plt.show() print("CUDA 可用:", torch.cuda.is_available()) # 返回 True 则支持 print("GPU 数量:", torch.cuda.device_count()) device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print(f"使用设备: {device}") print(torch.__version__) print(torch.version.cuda) # 对训练数据集进行切分,rαti0用于控制干净样本与投毒样本比例 def fetch_datasets(full_dataset, trainset, ratio):#功能是对给定比例的样本进行投毒 character = [[] for i in range(len(full_dataset.classes))] for index in trainset.indices: img, label = full_dataset[index] character[label].append(img) poison_trainset = [] clean_trainset = [] target = 0 for i, data in enumerate(character): num_poison_train_inputs = int(len(data) * ratio[0]) for img in data[:num_poison_train_inputs]: # 对投毒样本添加标签 target = (i + 1) % 10 # i是当前样本的原始标签 poison_img = img poison_img = torch.from_numpy(np.array(poison_img)/255.0)#转换成张量 poison_trainset.append((poison_img, target)) for img in data[num_poison_train_inputs:]: # 于净数据集标签不变 img = np.array(img) img = torch.from_numpy(img/255.0) clean_trainset.append((img, i)) result_datasets = {} result_datasets['poisonTrain'] = poison_trainset #数据字典 result_datasets['cleanTrain'] = clean_trainset return result_datasets # 投毒比例 clean_rate = 0.5 poison_rate = 0.5 trainset_all = datasets.MNIST(root='../data', download=True, train=True)#train=true是下载训练集,false是下载测试集 trainset = select_subset(trainset_all) all_datasets = fetch_datasets(full_dataset=trainset_all, trainset=trainset, ratio=[poison_rate, clean_rate]) poison_trainset = all_datasets['poisonTrain'] clean_trainset = all_datasets['cleanTrain'] all_trainset = poison_trainset.__add__(clean_trainset) # 从库中获取测试集 cLean_test_all = datasets.MNIST(root='../data', download=True, train=False) clean_test = select_subset(cLean_test_all) clean_testset = [] for img, label in clean_test: img = np.array(img) # 特换为一个NumPy数组 img = torch.from_numpy(img/255.0) clean_testset.append((img, label)) # 数据加载器 trainset_dataLoader = DataLoader(dataset=all_trainset, batch_size=64, shuffle=True) #数据分批,且打乱数据 print("开始对模型投毒") # 实例化Alexnet device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") net = AlexNet().to(device) # 定义交义熵损失函数 loss_fn = torch.nn.CrossEntropyLoss().to(device) clean_acc_list = [] optimizer = torch.optim.Adam(net.parameters(), lr=0.001) epoch = 5 clean_correct = 0 file = open("training_log.txt", "w") for epoch in range(epoch): running_loss = 0.0 for index, (imgs, labels) in enumerate(trainset_dataLoader, 0): imgs = imgs.unsqueeze(1) imgs = imgs.type(torch.FloatTensor) imgs, labels = imgs.to(device), labels.to(device) optimizer.zero_grad()#每个批次梯度不累积 outputs = net(imgs) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() print("Epoch:{},loss:{}".format(epoch + 1, running_loss)) file.write("Epoch:"+str(epoch+1)+",loss:"+str(running_loss)+"\n") file.flush() print("测试每一轮干净样木准确率:Epoch"+str(epoch+1)+"--------------") clean_correct = 0 for img, label in clean_testset: img = img.type(torch.FloatTensor) img = img.unsqueeze(0).unsqueeze(0).to(device) pred = net(img)#输出是一个向量。 pred = torch.reshape(pred, (10,)) top_pred = torch.argmax(pred) if top_pred.item() == label: clean_correct += 1 clean_acc = clean_correct / len(clean_testset)*100 clean_acc_list.append(clean_acc) print("干净样本准确率为:"+str(clean_acc)+'%\n') plot_misclassified_images(net, clean_testset, device) plot_correctly_classified_images(net, clean_testset, device, num_images=10) file.close() plt.rcParams['font.size'] = 16 plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False plt.figure(figsize=(10, 6)) plt.plot(range(1, len(clean_acc_list)+1), clean_acc_list, label='Accuracy', marker='o', linestyle='-') plt.title(f'投毒比例={poison_rate}') plt.xlabel('训练轮数(epoch)') plt.ylabel('准确率(%)') plt.ylim(0, 100) plt.legend() plt.grid(True) plt.show() # See PyCharm help at https://www.jetbrains.com/help/pycharm/
最新发布
09-30
小目标检测作为目标检测研究的重要方向,在智能安防、自动驾驶、医学图像分析以及遥感监测等场景中具有广泛的应用价值[12]。然而小目标通常具有尺度小、纹理弱、易被下采样过程丢失等问题,其技术挑战仍然存在。 Ren 等 [13] 提出的 Faster R-CNN 首次尝试在不同层级特征图上进行检测,但其特征传递机制并未针对小目标进行优化。随后,Lin 等 [14] 提出的 FPN(Feature Pyramid Network)通过自上而下的横向连接与上采样操作,将高层语义特征与低层高分辨率特征融合,显著提升了小目标的检测精度,成为后续多尺度检测方法的核心组件。在此基础上,Liu 等 [15] 提出的 PANet 增加了自下而上的路径聚合通路,进一步强化了低层细节特征向高层的传递,有效改善了小目标的表征能力。与此同时,NAS-FPN [16]、BiFPN [17] 等改进型特征金字塔结构通过自动搜索或加权融合策略,实现了更高效的多尺度特征交互,逐渐成为小目标检测模型的核心组件。 近年来,YOLO 系列作为单阶段检测器的代表,因其高效性在实时目标检测中广泛应用 [18]。早期版本的 YOLO 在小目标检测中存在精度不足的问题,主要受限于下采样造成的特征损失 [19]。为此,YOLOv3 引入多尺度预测机制,在不同尺度特征图上同时检测目标 [20];YOLOv4 与 YOLOv5 融合 CSPNet、PANet 等结构,进一步增强特征表达能力。最新的 YOLOv7 与 YOLOv8 在轻量化与检测精度之间取得平衡,并通过引入更深层次的特征融合与训练策略,使小目标检测性能得到持续提升 [21]。这些改进使 YOLO 系列不仅保持了实时性优势,也逐步缩小了在小目标检测任务中与双阶段检测器的差距。请你作为目标检测专家,润色我写的相关工作部分,并给出润色后的参考文献,降重。下面是两个模块的代码import torch.nn as nn import torch.nn.functional as F class ChannelAttention(nn.Module): """通道注意力机制,增强重要特征""" def __init__(self, in_planes, ratio=16): super().__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.max_pool = nn.AdaptiveMaxPool2d(1) self.fc = nn.Sequential( nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False), nn.ReLU(), nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False) ) self.sigmoid = nn.Sigmoid() def forward(self, x): avg_out = self.fc(self.avg_pool(x)) max_out = self.fc(self.max_pool(x)) out = avg_out + max_out return x * self.sigmoid(out) # 直接应用注意力权重到输入 class HaarDownsampling(nn.Module): """稳定版Haar小波下采样层""" def __init__(self, in_ch): super().__init__() # 定义Haar小波核 ll_kernel = torch.tensor([[1, 1], [1, 1]], dtype=torch.float32) / 4.0 hl_kernel = torch.tensor([[1, -1], [1, -1]], dtype=torch.float32) / 4.0 lh_kernel = torch.tensor([[1, 1], [-1, -1]], dtype=torch.float32) / 4.0 hh_kernel = torch.tensor([[1, -1], [-1, 1]], dtype=torch.float32) / 4.0 # 组合核并重复用于所有输入通道 kernels = torch.stack([ll_kernel, hl_kernel, lh_kernel, hh_kernel]) kernels = kernels.unsqueeze(1) # [4, 1, 2, 2] kernels = kernels.repeat(in_ch, 1, 1, 1) # [4*in_ch, 1, 2, 2] # 注册为不可训练参数 self.register_buffer('weight', kernels) self.groups = in_ch # 添加通道注意力 self.ca = ChannelAttention(in_ch * 4) def forward(self, x): # 使用分组卷积实现小波变换 x = F.conv2d( x, self.weight, stride=2, groups=self.groups ) # 应用通道注意力 return self.ca(x) class StableDWTBlock(nn.Module): """稳定版小波下采样模块""" def __init__(self, c1, c2, *args, **kwargs): super().__init__() self.c1 = c1 self.c2 = c2 # Haar小波变换层 self.haar = HaarDownsampling(c1) # 残差路径(保留原始特征) self.residual = nn.Sequential( nn.Conv2d(c1, c1, kernel_size=3, stride=2, padding=1, groups=c1), nn.Conv2d(c1, c1, kernel_size=1), nn.BatchNorm2d(c1), nn.SiLU(inplace=True) ) # 特征融合模块 - 避免通道数为0的问题 self.feature_fusion = nn.Sequential( nn.Conv2d(c1 * 5, c2, kernel_size=1), nn.BatchNorm2d(c2), nn.SiLU(inplace=True), nn.Conv2d(c2, c2, kernel_size=3, padding=1), nn.BatchNorm2d(c2), nn.SiLU(inplace=True) ) # 空间注意力层 self.spatial_att = nn.Sequential( nn.Conv2d(c2, 1, kernel_size=1), nn.Sigmoid() ) # 输出层 self.output_conv = nn.Conv2d(c2, c2, kernel_size=1) def forward(self, x): # 原始输入用于残差 identity = x # Haar小波变换 haar_out = self.haar(x) B, C, H, W = haar_out.shape # 调整小波特征形状 haar_out = haar_out.view(B, -1, 4, H, W) haar_out = haar_out.permute(0, 2, 1, 3, 4) haar_out = haar_out.reshape(B, -1, H, W) # [B, 4*c1, H/2, W/2] # 残差路径 res_out = self.residual(identity) # 融合特征(小波特征 + 残差特征) fused = torch.cat([haar_out, res_out], dim=1) # [B, 5*c1, H/2, W/2] # 特征融合处理 features = self.feature_fusion(fused) # [B, c2, H/2, W/2] # 空间注意力 spatial_map = self.spatial_att(features) # [B, 1, H/2, W/2] # 应用空间注意力 weighted_features = features * spatial_map # 输出层 return self.output_conv(weighted_features) class DWTBlock(nn.Module): """渐进式小波下采样模块(训练策略优化)""" def __init__(self, c1, c2, *args, **kwargs): super().__init__() self.c1 = c1 self.c2 = c2 # 标准卷积下采样 self.conv_down = nn.Sequential( nn.Conv2d(c1, c2, kernel_size=3, stride=2, padding=1), nn.BatchNorm2d(c2), nn.SiLU(inplace=True) ) # 小波下采样模块 self.dwt_block = StableDWTBlock(c1, c2) # 融合权重(可学习参数) self.alpha = nn.Parameter(torch.tensor(0.0)) def forward(self, x): # 标准卷积路径 conv_path = self.conv_down(x) # 小波路径 dwt_path = self.dwt_block(x) # 动态融合(训练初期主要使用卷积路径) return torch.sigmoid(self.alpha) * dwt_path + (1 - torch.sigmoid(self.alpha)) * conv_path class VGCA(nn.Module): """改进版方差引导通道注意力 (带残差连接)""" def __init__(self, channels, reduction=8): super().__init__() self.channels = channels # 方差路径 - 计算通道方差并生成门控权重 self.var_gate = nn.Sequential( nn.Linear(channels, channels // reduction), nn.ReLU(), nn.Linear(channels // reduction, channels), nn.Sigmoid() ) # 空间路径 - 轻量级空间特征增强 self.spatial = nn.Sequential( nn.Conv2d(channels, channels, 3, padding=1, groups=channels), nn.BatchNorm2d(channels) ) # 残差连接后的卷积调整层 self.res_conv = nn.Conv2d(channels, channels, kernel_size=1) def forward(self, x): b, c, h, w = x.shape identity = x # 保存原始输入用于残差连接 # 添加安全检查,避免小尺寸特征图的方差计算问题 if h > 1 and w > 1: # 计算通道方差 channel_var = x.var(dim=(2, 3), keepdim=True, unbiased=False) channel_var = torch.log(1 + channel_var) # 数值稳定化 else: # 当特征图太小无法计算方差时,使用平均值代替 channel_mean = x.mean(dim=(2, 3), keepdim=True) channel_var = torch.log(1 + channel_mean) # 生成门控权重 gate = self.var_gate(channel_var.view(b, c)) # [b,c] gate = gate.view(b, c, 1, 1) # 重塑为[b,c,1,1] # 空间特征增强 spatial_feat = self.spatial(x) # 方差指导的特征调制 modulated = gate * spatial_feat # 残差连接 (添加缩放参数) res_connection = identity + modulated # 卷积调整保持维度 output = self.res_conv(res_connection) return output def init_param_one(param): if param is not None: nn.init.constant_(param, 0.1) class AMSPP(nn.Module): def __init__(self, c1, c2, k=5): super().__init__() c_ = c1 // 2 # hidden channels self.cv1 =GhostConv(c1, c_, 1, 1) self.cv2 =GhostConv(c_ * 4, c2, 1, 1) self.cv = GhostConv(c2, c2, 1, 1) self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2) self.a = nn.AvgPool2d(kernel_size=k, stride=1, padding=k // 2) self.act = nn.Sigmoid() self.A3d = VGCA(channels=c2) # self.A3d = SEAttention(channel=c2) self.norm = nn.BatchNorm2d(c2) self.s = nn.SiLU() self.alpha = torch.nn.Parameter(torch.Tensor([0.1])) self.reset_parameters() def reset_parameters(self): init_param_one(self.alpha) def forward(self, x: torch.Tensor) -> torch.Tensor: x = self.cv1(x) x1, x2 = torch.split(x, x.size(1) // 2, dim=1) y1_m = self.m(x1) y2_m = self.m(y1_m) y3_m = self.m(y2_m) y1_a = self.a(x2) y2_a = self.a(y1_a) y3_a = self.a(y2_a) y3_m = y3_m - self.alpha * (y1_m + y2_m) y3_a = y3_a - self.alpha * (y1_a + y2_a) z = self.cv2(torch.cat((x1, y1_m, y2_m, y3_m, x2, y1_a, y2_a, y3_a), dim=1)) return self.s(self.norm(self.cv(self.A3d(z))))
09-28
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值