本篇文章将介绍一个新的改进机制——空间自适应特征调制模块SAFM ,并阐述如何将其应用于YOLOv11中,显著提升模型性能。首先,SAFM 通过对归一化输入特征进行通道分割、不同方式处理各部分并生成多尺度特征,经卷积聚合后用 GELU 函数归一化得到注意力图,再依此自适应调制原始特征,从而从多尺度特征表示中学习长程依赖以助于图像重建。随后,我们将详细讨论他的模型结构,以及如何将SAFM 模块与YOLOv11相结合,以提升目标检测的性能。
代码:YOLOv8_improve/YOLOv11.md at master · tgf123/YOLOv8_improve
视频讲解:YOLOv11模型改进讲解,教您如何修改YOLOv11_哔哩哔哩_bilibili


1. SAFM(Spatially - Adaptive Feature Modulation)结构介绍
SAFM 是一种空间自适应特征调制机制,它主要应用于图像超分辨率任务中。通过对输入特征进行动态调制,SAFM 能够自适应地选择代表性的特征表示,从而更好地恢复图像的细节和清晰度。这种机制的核心在于它能够从多尺度的角度处理特征,同时考虑局部和全局的信息,使得生成的特征更加具有判别性和鲁棒性。
- 通道分割与初步处理
- 首先对归一化的输入特征进行通道分割操作,产生四个部分组件。
- 其中一个部分用深度卷积进行处理。
- 其余三个部分则被送入多尺度特征生成单元,经过下采样、卷积和上采样等操作来生成多尺度特征。
- 多尺度特征整合
- 对输入特征应用自适应最大池化来生成更多的多尺度特征。
- 然后将这些多尺度特征通过卷积连接起来,聚合局部和全局关系。
- 特征调制
- 得到聚合后的特征后,通过 GELU 非线性函数进行归一化以估计注意力图。
- 最后根据估计的注意力自适应地调制原始输入特征。
2. SAFM代码部分
import torch
import torch.nn as nn
import torch.nn.functional as F
# https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Spatially-Adaptive_Feature_Modulation_for_Efficient_Image_Super-Resolution_ICCV_2023_paper.pdf
class SimpleSAFM(nn.Module):
def __init__(self, dim, ratio=4):
super().__init__()
self.dim = dim
self.chunk_dim = dim // ratio
self.proj = nn.Conv2d(dim, dim, 3, 1, 1, bias=False)
self.dwconv = nn.Conv2d(self.chunk_dim, self.chunk_dim, 3, 1, 1, groups=self.chunk_dim, bias=False)
self.out = nn.Conv2d(dim, dim, 1, 1, 0, bias=False)
self.act = nn.GELU()
def forward(self, x):
h, w = x.size()[-2:]
x0, x1 = self.proj(x).split([self.chunk_dim, self.dim - self.chunk_dim], dim=1)
x2 = F.adaptive_max_pool2d(x0, (h // 8, w // 8))
x2 = self.dwconv(x2)
x2 = F.interpolate(x2, size=(h, w), mode='bilinear')
x2 = self.act(x2) * x0
x = torch.cat([x1, x2], dim=1)
x = self.out(self.act(x))
return x
# Convolutional Channel Mixer
class CCM(nn.Module):
def __init__(self, dim, ffn_scale, use_se=False):
super().__init__()
self.use_se = use_se
hidden_dim = int(dim * ffn_scale)
self.conv1 = nn.Conv2d(dim, hidden_dim, 3, 1, 1, bias=False)
self.conv2 = nn.Conv2d(hidden_dim, dim, 1, 1, 0, bias=False)
self.act = nn.GELU()
def forward(self, x):
x = self.act(self.conv1(x))
x = self.conv2(x)
return x
class AttBlock(nn.Module):
def __init__(self, dim, ffn_scale, use_se=False):
super().__init__()
self.conv1 = SimpleSAFM(dim, ratio=3)
self.conv2 = CCM(dim, ffn_scale, use_se)
def forward(self, x):
out = self.conv1(x)
out = self.conv2(out)
return out + x
class SAFMNPP(nn.Module):
def __init__(self, input_dim, dim, n_blocks=6, ffn_scale=1.5, use_se=False, upscaling_factor=2):
super().__init__()
self.scale = upscaling_factor
self.to_feat = nn.Conv2d(input_dim, dim, 3, 1, 1, bias=False)
self.feats = nn.Sequential(*[AttBlock(dim, ffn_scale, use_se) for _ in range(n_blocks)])
self.to_img = nn.Sequential(
nn.Conv2d(dim, input_dim * upscaling_factor ** 2, 3, 1, 1, bias=False),
nn.PixelShuffle(upscaling_factor)
)
def forward(self, x):
res = F.interpolate(x, scale_factor=self.scale, mode='bilinear', align_corners=False)
x = self.to_feat(x)
x = self.feats(x)
return self.to_img(x) + res
if __name__ == '__main__':
#############Test Model Complexity #############
# from fvcore.nn import flop_count_table, FlopCountAnalysis, ActivationCountAnalysis
x = torch.randn(1, 256, 8, 8)
model = SAFMNPP(256,dim=256, n_blocks=6, ffn_scale=1.5, upscaling_factor=2)
print(model)
# print(flop_count_table(FlopCountAnalysis(model, x), activations=ActivationCountAnalysis(model, x)))
output = model(x)
print(output.shape)
3. 将SAFM引入到YOLOv11中
第一: 将下面的核心代码复制到D:\model\yolov11\ultralytics\change_model\路径下,如下图所示。
第二:在task.py中导入SAFM包
第三:在task.py中的模型配置部分下面代码
第四:将模型配置文件复制到YOLOV11.YAMY文件中
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [ 0.50, 0.25, 1024 ] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
s: [ 0.50, 0.50, 1024 ] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
m: [ 0.50, 1.00, 512 ] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
l: [ 1.00, 1.00, 512 ] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
x: [ 1.00, 1.50, 512 ] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
# YOLO11n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]]# 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA, [1024]] # 10
# YOLO11n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, False]] # 13
- [-1, 1, SAFMNPP, [512]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
第五:运行成功
from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorld
if __name__=="__main__":
# 使用自己的YOLOv8.yamy文件搭建模型并加载预训练权重训练模型
model = YOLO(r"D:\model\yolov11\ultralytics\cfg\models\11\yolo11_SAFM.yaml")\
.load(r'D:\model\yolov11\yolo11n.pt') # build from YAML and transfer weights
results = model.train(data=r'D:\model\yolov11\ultralytics\cfg\datasets\VOC_my.yaml',
epochs=300,
imgsz=640,
batch=64,
# cache = False,
# single_cls = False, # 是否是单类别检测
# workers = 0,
# resume=r'D:/model/yolov8/runs/detect/train/weights/last.pt',
# amp = True
)