本文介绍
为提升 YOLOv8 在目标检测任务中的特征表达能力,本文借鉴 TPAMI2025 Hyper-YOLO 所提出的Mix Aggregation Network(MANet)模块改进YOLOv8的C2f模块。 MANet融合了三种经典的卷积变体:1x1卷积、深度可分离卷积及C2f模块,从而有效提升了了Backbone的特征提取能力。实验结果如下(本文通过VOC数据验证算法性能,epoch为100,batchsize为32,imagesize为640*640):
Model | mAP50-95 | mAP50 | run time (h) | params (M) | interence time (ms) |
---|---|---|---|---|---|
YOLOv8 | 0.549 | 0.760 | 1.051 | 3.01 | 0.2+0.3(postprocess) |
YOLO11 | 0.553 | 0.757 | 1.142 | 2.59 | 0.2+0.3(postprocess) |
YOLOv8_C2f-MANet | 0.572 | 0.782 | 1.402 | 3.59 | 0.4+0.3(postprocess) |
重要声明:本文改进后代码可能只是并不适用于我所使用的数据集,对于其他数据集可能存在有效性。
本文改进是为了降低最新研究进展至YOLO的代码迁移难度,从而为对最新研究感兴趣的同学提供参考。
代码迁移
重点内容
步骤一:迁移代码
ultralytics框架的模块代码主要放在ultralytics/nn
文件夹下,此处为了与官方代码进行区分,可以新增一个extra_modules
文件夹,然后将我们的代码添加进入。
下面代码可以加入
block.py
文件中,因为其本质上就是对C2f模块的改进。
具体代码如下:
class MANet(nn.Module):
def __init__(self, c1, c2, n=1, shortcut=False, p=1, kernel_size=3, g=1, e=0.5):
super().__init__()
self.c = int(c2 * e)
self.cv_first = Conv(c1, 2 * self.c, 1, 1)
self.cv_final = Conv((4 + n) * self.c, c2, 1)
self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))
self.cv_block_1 = Conv(2 * self.c, self.c, 1, 1)
dim_hid = int(p * 2 * self.c)
self.cv_block_2 = nn.Sequential(Conv(2 * self.c, dim_hid, 1, 1), GroupConv(dim_hid, dim_hid, kernel_size, 1),
Conv(dim_hid, self.c, 1, 1))
def forward(self, x):
y = self.cv_first(x)
y0 = self.cv_block_1(y)
y1 = self.cv_block_2(y)
y2, y3 = y.chunk(2, 1)
y = list((y0, y1, y2, y3))
y.extend(m(y[-1]) for m in self.m)
return self.cv_final(torch.cat(y, 1))
步骤二:创建模块并导入
为了与之前所定义的C2f改进模块对齐,本文通过对上述代码简单改写,实现下面内容。此时需要在当前目录新建一个block.py
文件用以统一管理自定义的C2f模块(当然也可以直接在ultralytics/nn/modules/block.py
中直接添加)。内容如下:
import torch
import torch.nn as nn
from ..modules import C2f, Conv, DWConv
class C2f_MANet(C2f):
def __init__(self, c1, c2, n=1, shortcut=False, p=1, kernel_size=3, g=1, e=0.5):
super().__init__(c1, c2, n, shortcut, g, e)
self.cv2 = Conv((4 + n) * self.c, c2, 1)
self.cv_block_1 = Conv(2 * self.c, self.c, 1, 1)
dim_hid = int(p * 2 * self.c)
self.cv_block_2 = nn.Sequential(Conv(2 * self.c, dim_hid, 1, 1), DWConv(dim_hid, dim_hid, kernel_size, 1),
Conv(dim_hid, self.c, 1, 1))
def forward(self, x):
y = self.cv1(x)
y0 = self.cv_block_1(y)
y1 = self.cv_block_2(y)
y2, y3 = y.chunk(2, 1)
y = list((y0, y1, y2, y3))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
def forward_split(self, x):
raise NotImplementedError
添加完成之后需要新增一个__init__.py
文件,将添加的模块导入到__init__.py
文件中,这样在调用的时候就可以直接使用from extra_modules import *
。__init__.py
文件需要撰写以下内容:
from .block import C2f_MANet
具体目录结构如下图所示:
nn/
└── extra_modules/
├── __init__.py
├── block.py
步骤三:修改tasks.py
文件
首先在tasks.py
文件中添加以下内容:
from ultralytics.nn.extra_modules import *
然后找到parse_model()
函数,在函数查找如下内容:
if m in base_modules:
c1, c2 = ch[f], args[0]
if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
c2 = make_divisible(min(c2, max_channels) * width, 8)
使用较老ultralytics版本的同学,此处可能不是
base_modules
,而是相关的模块的字典集合,此时直接添加到集合即可;若不是就找到base_modules
所指向的集合进行添加,添加方式如下:
base_modules = frozenset(
{
Classify, Conv, ConvTranspose, GhostConv, Bottleneck, GhostBottleneck,
SPP, SPPF, C2fPSA, C2PSA, DWConv, Focus, BottleneckCSP, C1, C2, C2f, C3k2,
RepNCSPELAN4, ELAN1, ADown, AConv, SPPELAN, C2fAttn, C3, C3TR, C3Ghost,
torch.nn.ConvTranspose2d, DWConvTranspose2d, C3x, RepC3, PSA, SCDown, C2fCIB,
A2C2f,
# 自定义模块
C2f_MANet,
}
)
其次找到parse_model()
函数,在函数查找如下内容:
if m in repeat_modules:
args.insert(2, n) # number of repeats
n = 1
与base_modules
同理,具体添加方式如下:
repeat_modules = frozenset( # modules with 'repeat' arguments
{
BottleneckCSP, C1, C2, C2f, C3k2, C2fAttn, C3, C3TR, C3Ghost, C3x, RepC3,
C2fPSA, C2fCIB, C2PSA, A2C2f,
# 自定义模块
C2f_MANet,
}
)
步骤四:修改配置文件
在相应位置添加如下代码即可。
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 129 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPS
s: [0.33, 0.50, 1024] # YOLOv8s summary: 129 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPS
m: [0.67, 0.75, 768] # YOLOv8m summary: 169 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPS
l: [1.00, 1.00, 512] # YOLOv8l summary: 209 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPS
x: [1.00, 1.25, 512] # YOLOv8x summary: 209 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPS
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f_MANet, [128, True, 2, 3]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f_MANet, [256, True, 2, 3]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f_MANet, [512, True, 2, 3]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f_MANet, [1024, True, 2, 3]]
- [-1, 1, SPPF, [1024, 5]] # 9
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f_MANet, [512, False, 2, 3]] # 12
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f_MANet, [256, False, 2, 3]] # 15 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f_MANet, [512, False, 2, 3]] # 18 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)