Day 49 CBAM 注意力

@浙大疏锦行

今日任务:

  1. 通道注意力模块复习
  2. 空间注意力模块
  3. CBAM 的定义

作业:尝试对今天的模型检查参数数目,并用tensorboard查看训练过程

学习资料:【深度学习注意力机制系列】—— CBAM注意力机制(附pytorch实现)

文章:CBAM: Convolutional Block Attention Module

CBAM 注意力

只关注通道维度的SE模块不同,CBAM(Convolutional Block Attention Module,卷积块注意力模块)同时关注了通道维度和空间维度,并将通道注意力和空间注意按顺序组合在一起,共同作用于输入的特征图。

  • 通道注意力:分析“哪些通道的特征更关键”(如图像中的颜色、纹理通道等)——关注“什么
  • 空间注意力:定位“关键特征在图像中的具体位置”(如物体所在的区域)——关注“哪里

CBAM 注意力模块的优势:

  • 轻量级:仅增加少量计算量(全局池化 + 简单卷积),适合嵌入各种 CNN 架构
  • 即插即用:无需修改原有模型主体结构,直接作为模块插入卷积层之间
  • 双重优化:同时提升通道和空间维度的特征质量,适合复杂场景(小目标检测、语义分割)。

(1)通道注意力模块

CBAM 模块中的通道注意力整体结构与SE模块中的相似,但在池化步骤中加入了全局最大池化

  • 全局平均池化:捕捉图像的整体统计信息,感受野是全局的
  • 全局最大池化:捕捉图像中最具辨别力的特征,例如物体的边缘、纹理等

    具体步骤:并行池化 → 共享参数的MLP → 两个输出相加并使用sigmoid激活 → Reweight

    # 通道注意力
    import torch.nn as nn
    import torch
    class ChannelAttention(nn.Module):
        def __init__(self,in_channels,reduction_ratio=16):
            super(ChannelAttention,self).__init__()
            # 1-全局池化(两重)
            self.avg_pool = nn.AdaptiveAvgPool2d(1) # 全局平均池化
            self.max_pool = nn.AdaptiveMaxPool2d(1) # 全局最大池化
            # 2-全连接层
            self.fc = nn.Sequential(
                nn.Linear(in_channels,in_channels//reduction_ratio,bias=False),
                nn.ReLU(inplace=True),
                nn.Linear(in_channels//reduction_ratio,in_channels,bias=False)
            )
            self.sigmoid = nn.Sigmoid() # 两个池化,sigmoid函数单独写
        def forward(self,x):
            b,c,h,w = x.shape
            avg_out = self.fc(self.avg_pool(x).view(b,c)) # 注意全连接层对维度的要求
            max_out = self.fc(self.max_pool(x).view(b,c)) # 注意维度
            add_out = avg_out + max_out # 逐元素相加
            attention_weight = self.sigmoid(add_out).view(b,c,1,1) # 注意维度
            return x * attention_weight # 注意力权重与原始特征相乘,广播机制

    (2)空间注意力模块

    核心思想:沿通道轴进行池化操作,将多通道的信息压缩成单通道的特征图,突出重要的位置信息

    注:空间注意力模块中的池化操作是在通道维度上进行(多通道到单通道),可以保留空间维度(H,W),这与通道注意力模块的全局池化有区别。

    具体步骤:

    1. 沿通道维度的池化: 对通道注意力模块的输出 F',分别进行沿通道维度的平均池化(每个位置上所有通道的平均响应)和沿通道维度的最大池化(每个位置上所有通道中最突出的响应)→ 两个 H ✖ W ✖ 1 的特征图
    2. 通道拼接: 将这两个 H ✖ W ✖ 1 的特征图在通道维度上进行拼接 → 一个 H ✖ W ✖ 2  的特征图。
    3. 卷积与激活: 使用一个标准的 7✖7 默认值)卷积层对拼接后的特征图进行卷积操作(将通道数从2降为1)。再经过一个Sigmoid激活函数,得到最终的空间注意力权重 Ms,其形状为 H ✖ W ✖ 1 。(由多通道到单通道
    4. 重标定: 将 Ms 与输入特征图 F'逐位置相乘。
    # 空间注意力
    class SpatialAttention(nn.Module):
        def __init__(self,kernel_size=7):
            super().__init__()
            self.conv = nn.Conv2d(2,1,kernel_size,padding=kernel_size//2,bias=False) # 保证输出尺寸和输入相同
            self.sigmoid = nn.Sigmoid()
    
        def forward(self,x):
            avg_out = torch.mean(x,dim=1,keepdim=True) # 平均池化,(B,1,H,W)
            max_out,_ = torch.max(x,dim=1,keepdim=True) # 最大池化,(B,1,H,W);返回元组(value,indices)
            out = torch.cat([avg_out,max_out],dim=1) # 拼接,(B,2,H,W)
            attention_weight = self.sigmoid(self.conv(out))  # 卷积提取空间特征并激活
            return x * attention_weight # 注意力权重与原始特征相乘,广播机制

    (3)混合注意力模块

    CBAM:输入的特征 → 通道注意力 → 空间注意力 → 增强的特征图

    # 混合注意力
    class CBAM(nn.Module):
        def __init__(self,in_channels,reduction_ratio=16,kernel_size=7):
            super().__init__()
            self.channel = ChannelAttention(in_channels=in_channels,reduction_ratio=reduction_ratio)
            self.spatial = SpatialAttention(kernel_size=kernel_size)
    
        def forward(self,x):
            channel_out = self.channel(x)
            spatial_out = self.spatial(channel_out)
            return spatial_out

    关于CBAM 在CNN架构中插入的位置:每个卷积块末尾(推荐)

    # 定义带有CBAM的CNN模型
    class CBAM_CNN(nn.Module):
        def __init__(self):
            super(CBAM_CNN, self).__init__()
            
            # ---------------------- 第一个卷积块(带CBAM) ----------------------
            self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
            self.bn1 = nn.BatchNorm2d(32) # 批归一化
            self.relu1 = nn.ReLU()
            self.pool1 = nn.MaxPool2d(kernel_size=2)
            self.cbam1 = CBAM(in_channels=32)  # 在第一个卷积块后添加CBAM
            
            # ---------------------- 第二个卷积块(带CBAM) ----------------------
            self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
            self.bn2 = nn.BatchNorm2d(64)
            self.relu2 = nn.ReLU()
            self.pool2 = nn.MaxPool2d(kernel_size=2)
            self.cbam2 = CBAM(in_channels=64)  # 在第二个卷积块后添加CBAM
            
            # ---------------------- 第三个卷积块(带CBAM) ----------------------
            self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
            self.bn3 = nn.BatchNorm2d(128)
            self.relu3 = nn.ReLU()
            self.pool3 = nn.MaxPool2d(kernel_size=2)
            self.cbam3 = CBAM(in_channels=128)  # 在第三个卷积块后添加CBAM
            
            # ---------------------- 全连接层 ----------------------
            self.fc1 = nn.Linear(128 * 4 * 4, 512)
            self.dropout = nn.Dropout(p=0.5)
            self.fc2 = nn.Linear(512, 10)
    
        def forward(self, x):
            # 第一个卷积块
            x = self.conv1(x)
            x = self.bn1(x)
            x = self.relu1(x)
            x = self.pool1(x)
            x = self.cbam1(x)  # 应用CBAM
            
            # 第二个卷积块
            x = self.conv2(x)
            x = self.bn2(x)
            x = self.relu2(x)
            x = self.pool2(x)
            x = self.cbam2(x)  # 应用CBAM
            
            # 第三个卷积块
            x = self.conv3(x)
            x = self.bn3(x)
            x = self.relu3(x)
            x = self.pool3(x)
            x = self.cbam3(x)  # 应用CBAM
            
            # 全连接层
            x = x.view(-1, 128 * 4 * 4)
            x = self.fc1(x)
            x = self.relu3(x)
            x = self.dropout(x)
            x = self.fc2(x)
            
            return x
    
    # 初始化模型并移至设备
    model = CBAM_CNN().to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=3, factor=0.5)

    训练(CNN + CBAM)

    复用之前的数据处理和训练函数的代码:

    # 前置准备
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torchvision import datasets, transforms
    from torch.utils.data import DataLoader
    import matplotlib.pyplot as plt
    import numpy as np
    
    # 设置中文字体支持
    plt.rcParams["font.family"] = ["SimHei"]
    plt.rcParams['axes.unicode_minus'] = False  # 解决负号显示问题
    
    # 检查GPU是否可用
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"使用设备: {device}")
    
    # 数据预处理(与原代码一致)
    train_transform = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
        transforms.RandomRotation(15),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
    ])
    
    test_transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
    ])
    
    # 加载数据集(与原代码一致)
    train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
    test_dataset = datasets.CIFAR10(root='./data', train=False, transform=test_transform)
    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
    # 训练
    def train(model, train_loader, test_loader, criterion, optimizer, scheduler, device, epochs):
        model.train()
        all_iter_losses = []
        iter_indices = []
        train_acc_history = []
        test_acc_history = []
        train_loss_history = []
        test_loss_history = []
        
        for epoch in range(epochs):
            running_loss = 0.0
            correct = 0
            total = 0
            
            for batch_idx, (data, target) in enumerate(train_loader):
                data, target = data.to(device), target.to(device)
                optimizer.zero_grad()
                output = model(data)
                loss = criterion(output, target)
                loss.backward()
                optimizer.step()
                
                iter_loss = loss.item()
                all_iter_losses.append(iter_loss)
                iter_indices.append(epoch * len(train_loader) + batch_idx + 1)
                
                running_loss += iter_loss
                _, predicted = output.max(1)
                total += target.size(0)
                correct += predicted.eq(target).sum().item()
                
                if (batch_idx + 1) % 100 == 0:
                    print(f'Epoch: {epoch+1}/{epochs} | Batch: {batch_idx+1}/{len(train_loader)} '
                          f'| 单Batch损失: {iter_loss:.4f} | 累计平均损失: {running_loss/(batch_idx+1):.4f}')
            
            epoch_train_loss = running_loss / len(train_loader)
            epoch_train_acc = 100. * correct / total
            train_acc_history.append(epoch_train_acc)
            train_loss_history.append(epoch_train_loss)
            
            # 测试阶段
            model.eval()
            test_loss = 0
            correct_test = 0
            total_test = 0
            
            with torch.no_grad():
                for data, target in test_loader:
                    data, target = data.to(device), target.to(device)
                    output = model(data)
                    test_loss += criterion(output, target).item()
                    _, predicted = output.max(1)
                    total_test += target.size(0)
                    correct_test += predicted.eq(target).sum().item()
            
            epoch_test_loss = test_loss / len(test_loader)
            epoch_test_acc = 100. * correct_test / total_test
            test_acc_history.append(epoch_test_acc)
            test_loss_history.append(epoch_test_loss)
            
            scheduler.step(epoch_test_loss)
            
            print(f'Epoch {epoch+1}/{epochs} 完成 | 训练准确率: {epoch_train_acc:.2f}% | 测试准确率: {epoch_test_acc:.2f}%')
        
        plot_iter_losses(all_iter_losses, iter_indices)
        plot_epoch_metrics(train_acc_history, test_acc_history, train_loss_history, test_loss_history)
        
        return epoch_test_acc
    
    # 绘图函数
    def plot_iter_losses(losses, indices):
        plt.figure(figsize=(10, 4))
        plt.plot(indices, losses, 'b-', alpha=0.7, label='Iteration Loss')
        plt.xlabel('Iteration')
        plt.ylabel('Loss')
        plt.title('The Loss of Each Iteration')
        plt.legend()
        plt.grid(True)
        plt.tight_layout()
        plt.show()
    
    # 7. 绘制每个 epoch 的准确率和损失曲线
    def plot_epoch_metrics(train_acc, test_acc, train_loss, test_loss):
        epochs = range(1, len(train_acc) + 1)
        
        plt.figure(figsize=(12, 4))
        
        # 绘制准确率曲线
        plt.subplot(1, 2, 1)
        plt.plot(epochs, train_acc, 'b-', label='Train Accuracy')
        plt.plot(epochs, test_acc, 'r-', label='Test Accuracy')
        plt.xlabel('Epoch')
        plt.ylabel('Accuracy (%)')
        plt.title('Accuracy Curve')
        plt.legend()
        plt.grid(True)
        
        # 绘制损失曲线
        plt.subplot(1, 2, 2)
        plt.plot(epochs, train_loss, 'b-', label='Train Loss')
        plt.plot(epochs, test_loss, 'r-', label='Test Loss')
        plt.xlabel('Epoch')
        plt.ylabel('Loss')
        plt.title('Loss Curve')
        plt.legend()
        plt.grid(True)
        
        plt.tight_layout()
        plt.show()
    
    # 执行训练
    epochs = 50
    print("开始使用带CBAM的CNN训练模型...")
    final_accuracy = train(model, train_loader, test_loader, criterion, optimizer, scheduler, device, epochs)
    print(f"训练完成!最终测试准确率: {final_accuracy:.2f}%")
    
    # # 保存模型
    # torch.save(model.state_dict(), 'cifar10_cbam_cnn_model.pth')
    # print("模型已保存为: cifar10_cbam_cnn_model.pth")

    作业:CNN + CBAM + Tensorboard

    1.tensorboard 监控

    复用之前的代码

    # 训练(使用tensorboard)
    from torch.utils.tensorboard import SummaryWriter  
    import torchvision
    import time
    import os 
    # ======================== TensorBoard 核心配置 ========================
    # 创建 TensorBoard 日志目录(自动避免重复)
    log_dir = "runs/cifar10_cnn_cbam"
    if os.path.exists(log_dir):
        version = 1
        while os.path.exists(f"{log_dir}_v{version}"):
            version += 1
        log_dir = f"{log_dir}_v{version}"
    writer = SummaryWriter(log_dir)  # 初始化 SummaryWriter
    
    # 5. 训练模型(整合 TensorBoard 记录)
    def train(model, train_loader, test_loader, criterion, optimizer, scheduler, device, epochs, writer):
        model.train()
    
        global_step = 0  #全局步骤,用于 TensorBoard 标量记录
    
        # (可选)记录模型结构:用一个真实样本走一遍前向传播,让 TensorBoard 解析计算图
        dataiter = iter(train_loader)
        images, labels = next(dataiter)
        images = images.to(device)
        writer.add_graph(model, images)  # 写入模型结构到 TensorBoard
    
        # (可选)记录原始训练图像:可视化数据增强前/后效果
        img_grid = torchvision.utils.make_grid(images[:8].cpu())  # 取前8张
        writer.add_image('原始训练图像(增强前)', img_grid, global_step=0)
    
        for epoch in range(epochs):
            running_loss = 0.0
            correct = 0
            total = 0
    
            # 记录时间
            epoch_start = time.time()
            
            for batch_idx, (data, target) in enumerate(train_loader):
                data, target = data.to(device), target.to(device)
                
                optimizer.zero_grad()
                output = model(data)
                loss = criterion(output, target)
                loss.backward()
                optimizer.step()
    
                # 统计准确率
                running_loss += loss.item()
                _, predicted = output.max(1)
                total += target.size(0)
                correct += predicted.eq(target).sum().item()
    
                # ======================== TensorBoard 标量记录 ========================
                # 每 100 个 batch 打印控制台日志(同原代码)
                if (batch_idx + 1) % 100 == 0:
                    batch_loss = loss.item()
                    batch_acc = 100. * correct / total
    
                    # 记录标量数据(loss\accuracy)
                    writer.add_scalar('Train/Batch Loss', batch_loss, global_step)
                    writer.add_scalar('Train/Batch Accuracy', batch_acc, global_step)
                    # 记录学习率(可选)
                    writer.add_scalar('Train/Learning Rate', optimizer.param_groups[0]['lr'], global_step)
                    
                    print(f'Epoch: {epoch+1}/{epochs} | Batch: {batch_idx+1}/{len(train_loader)} '
                            f'| 单Batch损失: {batch_loss:.4f} | 累计平均损失: {running_loss/(batch_idx+1):.4f}')
    
                    # 每 200 个 batch 记录一次参数直方图(可选,耗时稍高)
                    if (batch_idx + 1) % 200 == 0:
                        for name, param in model.named_parameters():
                            writer.add_histogram(f'Weights/{name}', param, global_step)
                            if param.grad is not None:
                                writer.add_histogram(f'Gradients/{name}', param.grad, global_step)
    
                global_step += 1  # 全局步骤递增
    
            # 计算 epoch 级训练指标
            epoch_train_loss = running_loss / len(train_loader)
            epoch_train_acc = 100. * correct / total
            # ======================== TensorBoard  epoch 标量记录 ========================
            writer.add_scalar('Train/Epoch Loss', epoch_train_loss, epoch)
            writer.add_scalar('Train/Epoch Accuracy', epoch_train_acc, epoch)
    
            # 测试阶段
            model.eval()
            test_loss = 0
            correct_test = 0
            total_test = 0
            wrong_images = []  # 存储错误预测样本(用于可视化)
            wrong_labels = []
            wrong_preds = []
    
            with torch.no_grad():
                for data, target in test_loader:
                    data, target = data.to(device), target.to(device)
                    output = model(data)
                    test_loss += criterion(output, target).item()
                    _, predicted = output.max(1)
                    total_test += target.size(0)
                    correct_test += predicted.eq(target).sum().item()
    
                    # 收集错误预测样本(用于可视化)
                    wrong_mask = (predicted != target)
                    if wrong_mask.sum() > 0:
                        wrong_batch_images = data[wrong_mask][:8].cpu()  # 最多存8张
                        wrong_batch_labels = target[wrong_mask][:8].cpu()
                        wrong_batch_preds = predicted[wrong_mask][:8].cpu()
                        wrong_images.extend(wrong_batch_images)
                        wrong_labels.extend(wrong_batch_labels)
                        wrong_preds.extend(wrong_batch_preds)
    
            # 计算 epoch 级测试指标
            epoch_test_loss = test_loss / len(test_loader)
            epoch_test_acc = 100. * correct_test / total_test
    
            # ======================== TensorBoard 测试集记录 ========================
            writer.add_scalar('Test/Epoch Loss', epoch_test_loss, epoch)
            writer.add_scalar('Test/Epoch Accuracy', epoch_test_acc, epoch)
    
            # 计算每个epoch的速度
            epoch_end = time.time()
            epoch_duration = epoch_end - epoch_start
            samples_per_epoch = len(train_loader.dataset) # 每个epoch处理的样本总数
            epoch_speed = samples_per_epoch / epoch_duration
            # 记录速度指标
            writer.add_scalar('Train/Epoch_Speed', epoch_speed, epoch)
    
            # (可选)可视化错误预测样本
            # if wrong_images:
            #     wrong_img_grid = torchvision.utils.make_grid(wrong_images)
            #     writer.add_image('错误预测样本', wrong_img_grid, epoch)
            #     # 写入错误标签文本(可选)
            #     wrong_text = [f"真实: {classes[wl]}, 预测: {classes[wp]}" 
            #                  for wl, wp in zip(wrong_labels, wrong_preds)]
            #     writer.add_text('错误预测标签', '\n'.join(wrong_text), epoch)
    
            # 更新学习率调度器
            scheduler.step(epoch_test_loss)
    
            print(f'Epoch {epoch+1}/{epochs} 完成 | 训练准确率: {epoch_train_acc:.2f}% | 测试准确率: {epoch_test_acc:.2f}%')
    
        # 关闭 TensorBoard 写入器
        writer.close()
        return epoch_test_acc
    
    # (可选)CIFAR-10 类别名
    classes = ('plane', 'car', 'bird', 'cat',
               'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    
    # 7. 执行训练(传入 TensorBoard writer)
    epochs = 50
    print("开始使用带CBAM的CNN训练模型...")
    print(f"TensorBoard 日志目录: {log_dir}")
    print("训练后执行: tensorboard --logdir=runs 查看可视化")
    
    final_accuracy = train(model, train_loader, test_loader, criterion, optimizer, scheduler, device, epochs, writer)
    print(f"训练完成!最终测试准确率: {final_accuracy:.2f}%")

    2.参数数量查看

    训练完成后再使用torchsummary.summary查看

    注:先查看参数数量再训练可能导致训练异常,比如BatchNorm状态改变

    # 查看模型的参数数量
    from torchsummary import summary
    model = CBAM_CNN().to(device)
    summary(model, input_size=(3, 32, 32))  # CIFAR-10 输入尺寸

    总共有 1,153,584 个参数,

    评论
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值