本文介绍
为提升 YOLOv8 在目标检测任务中的特征表达能力,我们借鉴了 CVPR2025 MambaOut 提出的核心模块GatedCNNBlock。该模块是在移除 RNN-like 状态空间模型(SSM)后,仅保留高效的卷积门结构,通过门控机制提升跨层特征融合能力。与传统 Bottleneck 块相比,GatedCNNBlock 通过自适应选择性提取关键特征,增强了信息流在通道间的响应性。实验结果如下(本文通过VOC数据验证算法性能,epoch为100,batchsize为32,imagesize为640*640):
| Model | mAP50-95 | mAP50 | run time (h) | params (M) | interence time (ms) |
|---|---|---|---|---|---|
| YOLOv8 | 0.549 | 0.760 | 1.051 | 3.01 | 0.2+0.3(postprocess) |
| YOLO11 | 0.553 | 0.757 | 1.142 | 2.59 | 0.2+0.3(postprocess) |
| YOLOv8_C2f-MambaOut | 0.536 | 0.753 | 1.183 | 2.79 | 0.3+0.3(postprocess) |

重要声明:本文改进后代码可能只是并不适用于我所使用的数据集,对于其他数据集可能存在有效性。
本文改进是为了降低最新研究进展至YOLO的代码迁移难度,从而为对最新研究感兴趣的同学提供参考。
代码迁移
重点内容
步骤一:迁移代码
ultralytics框架的模块代码主要放在ultralytics/nn文件夹下,此处为了与官方代码进行区分,可以新增一个extra_modules文件夹,然后将我们的代码添加进入。
具体代码如下:
import torch
import torch.nn as nn
from functools import partial
from timm.models.layers import DropPath, trunc_normal_
class GatedCNNBlock_BCHW(nn.Module):
r""" Our implementation of Gated CNN Block: https://arxiv.org/pdf/1612.08083
Args:
conv_ratio: control the number of channels to conduct depthwise convolution.
Conduct convolution on partial channels can improve practical efficiency.
The idea of partial channels is from ShuffleNet V2 (https://arxiv.org/abs/1807.11164) and
also used by InceptionNeXt (https://arxiv.org/abs/2303.16900) and FasterNet (https://arxiv.org/abs/2303.03667)
"""
def __init__(self, dim, expansion_ratio=8/3, kernel_size=7, conv_ratio=1.0,
norm_layer=partial(nn.LayerNorm,eps=1e-6),
act_layer=nn.SELU,
drop_path=0.,
**kwargs):
super().__init__()
self.norm = norm_layer(dim)
hidden = int(expansion_ratio * dim)
self.fc1 = nn.Linear(dim, hidden * 2)
self.act = act_layer()
conv_channels = int(conv_ratio * dim)
self.split_indices = (hidden, hidden - conv_channels, conv_channels)
self.conv = nn.Conv2d(conv_channels, conv_channels, kernel_size=kernel_size, padding=kernel_size//2, groups=conv_channels)
# self.conv = nn.Sequential(
# nn.Conv2d(conv_channels, conv_channels, kernel_size=kernel_size, padding=kernel_size//2, groups=conv_channels),
# nn.Conv2d(conv_channels, conv_channels, kernel_size=1)
# )
self.fc2 = nn.Linear(hidden, dim

最低0.47元/天 解锁文章
1304

被折叠的 条评论
为什么被折叠?



