本文介绍
为提升 YOLOv8 多尺度特征融合能力,本文借鉴 TPAMI2025 Hyper-YOLO 所提出的尺度融合方式HyperC2Net改进YOLOv8的Neck部分。 HyperC2Net有助于在语义层和位置上传递高阶消息,从而提高Neck提取高阶特征的能力。 HyperC2Net融合五阶段特征图以构建超图结构,并将超图结构分别融合进 B 3 B_3 B3、 B 4 B_4 B4和 B 5 B_5 B5,最后通过Bottom-Up结构进行高阶信息的传递。 实验结果如下(本文通过VOC数据验证算法性能,epoch为100,batchsize为32,imagesize为640*640):
Model | mAP50-95 | mAP50 | run time (h) | params (M) | interence time (ms) |
---|---|---|---|---|---|
YOLOv8 | 0.549 | 0.760 | 1.051 | 3.01 | 0.2+0.3(postprocess) |
YOLO11 | 0.553 | 0.757 | 1.142 | 2.59 | 0.2+0.3(postprocess) |
yolov8_HyperC2Net | 0.561 | 0.770 | 1.229 | 3.29 | 0.4+0.3(postprocess) |
重要声明:本文改进后代码可能只是并不适用于我所使用的数据集,对于其他数据集可能存在有效性。
本文改进是为了降低最新研究进展至YOLO的代码迁移难度,从而为对最新研究感兴趣的同学提供参考。
代码迁移
重点内容
步骤一:迁移代码
ultralytics框架的模块代码主要放在ultralytics/nn
文件夹下,此处为了与官方代码进行区分,可以新增一个extra_modules
文件夹,然后将我们的代码添加进入。
具体代码如下:
import torch
import torch.nn as nn
__all___ = ['HyperComputeModule']
class MessageAgg(nn.Module):
def __init__(self, agg_method="mean"):
super().__init__()
self.agg_method = agg_method
def forward(self, X, path):
"""
X: [n_node, dim]
path: col(source) -> row(target)
"""
X = torch.matmul(path, X)
if self.agg_method == "mean":
norm_out = 1 / torch.sum(path, dim=2, keepdim=True)
norm_out[torch.isinf(norm_out)] = 0
X = norm_out * X
return X
elif self.agg_method == "sum":
pass
return X
class HyPConv(nn.Module):
def __init__(self, c1, c2):
super().__init__()
self.fc = nn.Linear(c1, c2)
self.v2e = MessageAgg(agg_method="mean")
self.e2v = MessageAgg(agg_method="mean")
def forward(self, x, H):
x = self.fc(x)
# v -> e
E = self.v2e(x, H.transpose(1, 2).contiguous())
# e -> v
x = self.e2v(E, H)
return x
class HyperComputeModule(nn.Module):
def __init__(self, c1, c2, threshold):
super().__init__()
self.threshold = threshold
self.hgconv = HyPConv(c1, c2)
self.bn = nn.BatchNorm2d(c2)
self.act = nn.SiLU()
def forward(self, x):
b, c, h, w = x.shape[0], x.shape[1], x.shape[2], x.shape[3]
x = x.view(b, c, -1).transpose(1, 2).contiguous()
feature = x.clone()
distance = torch.cdist(feature, feature)
hg = distance < self.threshold
hg = hg.float().to(x.device).to(x.dtype)
x = self.hgconv(x, hg).to(x.device).to(x.dtype) + x
x = x.transpose(1, 2).contiguous().view(b, c, h, w)
x = self.act(self.bn(x))
return x
步骤二:创建模块并导入
此时需要在当前目录新增一个__init__.py
文件,将添加的模块导入到__init__.py
文件中,这样在调用的时候就可以直接使用from extra_modules import *
。__init__.py
文件需要撰写以下内容:
from .hyper_yolo import HyperComputeModule
具体目录结构如下图所示:
nn/
└── extra_modules/
├── __init__.py
└── hyper_yolo .py
步骤三:修改tasks.py
文件
首先在tasks.py
文件中添加以下内容:
from ultralytics.nn.extra_modules import *
然后找到parse_model()
函数,在函数查找如下内容:
if m in base_modules:
c1, c2 = ch[f], args[0]
if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
c2 = make_divisible(min(c2, max_channels) * width, 8)
使用较老ultralytics版本的同学,此处可能不是
base_modules
,而是相关的模块的字典集合,此时直接添加到集合即可;若不是就找到base_modules
所指向的集合进行添加,添加方式如下:
base_modules = frozenset(
{
Classify, Conv, ConvTranspose, GhostConv, Bottleneck, GhostBottleneck,
SPP, SPPF, C2fPSA, C2PSA, DWConv, Focus, BottleneckCSP, C1, C2, C2f, C3k2,
RepNCSPELAN4, ELAN1, ADown, AConv, SPPELAN, C2fAttn, C3, C3TR, C3Ghost,
torch.nn.ConvTranspose2d, DWConvTranspose2d, C3x, RepC3, PSA, SCDown, C2fCIB,
A2C2f,
# 自定义模块
HyperComputeModule,
}
)
步骤四:修改配置文件
在相应位置添加如下代码即可。
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 129 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPS
s: [0.33, 0.50, 1024] # YOLOv8s summary: 129 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPS
m: [0.67, 0.75, 768] # YOLOv8m summary: 169 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPS
l: [1.00, 1.00, 512] # YOLOv8l summary: 209 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPS
x: [1.00, 1.25, 512] # YOLOv8x summary: 209 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPS
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-B1/2
- [-1, 1, Conv, [128, 3, 2]] # 1
- [-1, 3, C2f, [128, True]] # 2-B2/4
- [-1, 1, Conv, [256, 3, 2]] # 3
- [-1, 6, C2f, [256, True]] # 4-B3/8
- [-1, 1, Conv, [512, 3, 2]] # 5
- [-1, 6, C2f, [512, True]] # 6-B4/16
- [-1, 1, Conv, [1024, 3, 2]] # 7
- [-1, 3, C2f, [1024, True]] # 8
- [-1, 1, SPPF, [1024, 5]] # 9-B5/32
# YOLOv8.0n head
head:
# Semantic Collecting
- [0, 1, nn.AvgPool2d, [8, 8, 0]] # 10
- [2, 1, nn.AvgPool2d, [4, 4, 0]] # 11
- [4, 1, nn.AvgPool2d, [2, 2, 0]] # 12
- [9, 1, nn.Upsample, [None, 2, 'nearest']] # 13
- [[10, 11, 12, 6, 13], 1, Concat, [1]] # cat 14
# Hypergraph Compution
- [-1, 1, Conv, [512, 1, 1]] # 15
- [-1, 1, HyperComputeModule, [512, 6]] # 16
- [-1, 3, C2f, [512, True]] # 17
# Semantic Collecting
- [-1, 1, nn.AvgPool2d, [2, 2, 0]] # 18
- [[-1, 9], 1, Concat, [1]] # cat 19
- [-1, 1, Conv, [1024, 1, 1]] # 20 P5
- [[17, 6], 1, Concat, [1]] # cat 21
- [-1, 3, C2f, [512]] # 22 P4
- [17, 1, nn.Upsample, [None, 2, 'nearest']] # 23
- [[-1, 4], 1, Concat, [1]] # cat 24
- [-1, 3, C2f, [256]] # 25 P3/N3
- [-1, 1, Conv, [256, 3, 2]] # 26
- [[-1, 22], 1, Concat, [1]] # 27 cat
- [-1, 3, C2f, [512]] # 28 N4
- [-1, 1, Conv, [512, 3, 2]] # 29
- [[-1, 20], 1, Concat, [1]] # 30 cat
- [-1, 3, C2f, [1024]] # 31 N5
- [[25, 28, 31], 1, Detect, [nc]] # Detect(N3, N4, N5)