YOLOv11模型改进-模块-引入空间自适应特征调制模块SAFM 提升多尺度、小目标检测

         本篇文章将介绍一个新的改进机制——空间自适应特征调制模块SAFM ,并阐述如何将其应用于YOLOv11中,显著提升模型性能。首先,SAFM 通过对归一化输入特征进行通道分割、不同方式处理各部分并生成多尺度特征,经卷积聚合后用 GELU 函数归一化得到注意力图,再依此自适应调制原始特征,从而从多尺度特征表示中学习长程依赖以助于图像重建。随后,我们将详细讨论他的模型结构,以及如何将SAFM 模块与YOLOv11相结合,以提升目标检测的性能。

代码:YOLOv8_improve/YOLOv11.md at master · tgf123/YOLOv8_improve

视频讲解:YOLOv11模型改进讲解,教您如何修改YOLOv11_哔哩哔哩_bilibili

YOLOv11原模型

改进后的模型

  

1. SAFM(Spatially - Adaptive Feature Modulation)结构介绍          

      SAFM 是一种空间自适应特征调制机制,它主要应用于图像超分辨率任务中。通过对输入特征进行动态调制,SAFM 能够自适应地选择代表性的特征表示,从而更好地恢复图像的细节和清晰度。这种机制的核心在于它能够从多尺度的角度处理特征,同时考虑局部和全局的信息,使得生成的特征更加具有判别性和鲁棒性。 

  1. 通道分割与初步处理
    • 首先对归一化的输入特征进行通道分割操作,产生四个部分组件。
    • 其中一个部分用深度卷积进行处理。
    • 其余三个部分则被送入多尺度特征生成单元,经过下采样、卷积和上采样等操作来生成多尺度特征。
  2. 多尺度特征整合
    • 对输入特征应用自适应最大池化来生成更多的多尺度特征。
    • 然后将这些多尺度特征通过卷积连接起来,聚合局部和全局关系。
  3. 特征调制
    • 得到聚合后的特征后,通过 GELU 非线性函数进行归一化以估计注意力图。
    • 最后根据估计的注意力自适应地调制原始输入特征。

2. SAFM代码部分

import torch
import torch.nn as nn
import torch.nn.functional as F

# https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Spatially-Adaptive_Feature_Modulation_for_Efficient_Image_Super-Resolution_ICCV_2023_paper.pdf
class SimpleSAFM(nn.Module):
    def __init__(self, dim, ratio=4):
        super().__init__()
        self.dim = dim
        self.chunk_dim = dim // ratio

        self.proj = nn.Conv2d(dim, dim, 3, 1, 1, bias=False)
        self.dwconv = nn.Conv2d(self.chunk_dim, self.chunk_dim, 3, 1, 1, groups=self.chunk_dim, bias=False)
        self.out = nn.Conv2d(dim, dim, 1, 1, 0, bias=False)
        self.act = nn.GELU()

    def forward(self, x):
        h, w = x.size()[-2:]

        x0, x1 = self.proj(x).split([self.chunk_dim, self.dim - self.chunk_dim], dim=1)

        x2 = F.adaptive_max_pool2d(x0, (h // 8, w // 8))
        x2 = self.dwconv(x2)
        x2 = F.interpolate(x2, size=(h, w), mode='bilinear')
        x2 = self.act(x2) * x0

        x = torch.cat([x1, x2], dim=1)
        x = self.out(self.act(x))
        return x


# Convolutional Channel Mixer
class CCM(nn.Module):
    def __init__(self, dim, ffn_scale, use_se=False):
        super().__init__()
        self.use_se = use_se
        hidden_dim = int(dim * ffn_scale)

        self.conv1 = nn.Conv2d(dim, hidden_dim, 3, 1, 1, bias=False)
        self.conv2 = nn.Conv2d(hidden_dim, dim, 1, 1, 0, bias=False)
        self.act = nn.GELU()

    def forward(self, x):
        x = self.act(self.conv1(x))
        x = self.conv2(x)
        return x


class AttBlock(nn.Module):
    def __init__(self, dim, ffn_scale, use_se=False):
        super().__init__()

        self.conv1 = SimpleSAFM(dim, ratio=3)

        self.conv2 = CCM(dim, ffn_scale, use_se)

    def forward(self, x):
        out = self.conv1(x)
        out = self.conv2(out)
        return out + x


class SAFMNPP(nn.Module):
    def __init__(self, input_dim, dim, n_blocks=6, ffn_scale=1.5, use_se=False, upscaling_factor=2):
        super().__init__()
        self.scale = upscaling_factor

        self.to_feat = nn.Conv2d(input_dim, dim, 3, 1, 1, bias=False)

        self.feats = nn.Sequential(*[AttBlock(dim, ffn_scale, use_se) for _ in range(n_blocks)])

        self.to_img = nn.Sequential(
            nn.Conv2d(dim, input_dim * upscaling_factor ** 2, 3, 1, 1, bias=False),
            nn.PixelShuffle(upscaling_factor)
        )

    def forward(self, x):
        res = F.interpolate(x, scale_factor=self.scale, mode='bilinear', align_corners=False)
        x = self.to_feat(x)
        x = self.feats(x)
        return self.to_img(x) + res


if __name__ == '__main__':
    #############Test Model Complexity #############
    # from fvcore.nn import flop_count_table, FlopCountAnalysis, ActivationCountAnalysis

    x = torch.randn(1, 256, 8, 8)

    model = SAFMNPP(256,dim=256, n_blocks=6, ffn_scale=1.5, upscaling_factor=2)
    print(model)
    # print(flop_count_table(FlopCountAnalysis(model, x), activations=ActivationCountAnalysis(model, x)))
    output = model(x)
    print(output.shape)

3. 将SAFM引入到YOLOv11中

第一: 将下面的核心代码复制到D:\model\yolov11\ultralytics\change_model\路径下,如下图所示。

 

第二:在task.py中导入SAFM

第三:在task.py中的模型配置部分下面代码

第四:将模型配置文件复制到YOLOV11.YAMY文件中

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]

  n: [ 0.50, 0.25, 1024 ] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [ 0.50, 0.50, 1024 ] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [ 0.50, 1.00, 512 ] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [ 1.00, 1.00, 512 ] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [ 1.00, 1.50, 512 ] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs


# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]# 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 13

  - [-1, 1, SAFMNPP, [512]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

第五:运行成功


from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorld

if __name__=="__main__":

    # 使用自己的YOLOv8.yamy文件搭建模型并加载预训练权重训练模型
    model = YOLO(r"D:\model\yolov11\ultralytics\cfg\models\11\yolo11_SAFM.yaml")\
        .load(r'D:\model\yolov11\yolo11n.pt')  # build from YAML and transfer weights

    results = model.train(data=r'D:\model\yolov11\ultralytics\cfg\datasets\VOC_my.yaml',
                          epochs=300,
                          imgsz=640,
                          batch=64,
                          # cache = False,
                          # single_cls = False,  # 是否是单类别检测
                          # workers = 0,
                          # resume=r'D:/model/yolov8/runs/detect/train/weights/last.pt',
                          # amp = True
                          )

### YOLO Detection Model `nc` Parameter Meaning and Usage In the context of object detection models like those within the YOLO (You Only Look Once) family, the `nc` parameter stands for "number of classes". This parameter specifies how many different categories or types of objects the model can detect. For instance, if a dataset contains images with three distinct kinds of items—say cars, bicycles, and pedestrians—the value assigned to `nc` would be set accordingly. The configuration files used when training these networks often include this setting explicitly so that during both training and inference phases, the architecture knows exactly what it should expect regarding output dimensions related directly back towards classification tasks associated specifically under each bounding box prediction made by any given layer responsible therein[^1]. When configuring custom versions based on pre-existing architectures such as replacing traditional feature extraction layers with an Enhanced Backbone using EfficientNetV1's compound scaling methodology, ensuring correct assignment remains critical since changes might affect overall performance metrics across various scales of detected objects[^2]. For practical implementation purposes, here is how one typically sets up the number of classes (`nc`) in Python code while defining configurations: ```python model_config = { 'nc': 80, # Example: COCO Dataset has 80 classes. } ``` --related questions-- 1. How does changing the `nc` parameter impact model size and speed? 2. What considerations are there when selecting values for `nc` in multi-label datasets? 3. Can you provide examples where adjusting `nc` led to significant improvements in specific applications? 4. In which scenarios could reducing the `nc` improve efficiency without compromising accuracy significantly?
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值