YOLOv8改进 | 使用CVPR2025 EfficientVim中的EfficientViMBlock模块改进C2f模块

用EfficientViMBlock改进YOLOv8的C2f模块

最新推荐文章于 2025-11-17 23:59:55 发布

原创

最新推荐文章于 2025-11-17 23:59:55 发布 · 873 阅读

32 ·

CC 4.0 BY-SA版权

文章标签：

#YOLO #深度学习 #机器学习 #人工智能

文章目录

本文介绍
代码迁移

本文介绍

为提升 YOLOv8 框架对全局以来关系的捕捉能力，本文借鉴 CVPR2025 EfficientViM 所提出的EfficientViMBlock模块改进YOLOv8的C2f模块。 EfficientViM基于状态空间模型（SSM）设计了新颖的HSM-SSD结构，从而实现在保证计算效率的前提下高效捕捉全局依赖关系。具体来说，HSM-SSD通过对压缩后的隐藏状态执行通道混合，再配合所提出的多阶段隐藏状态融合策略，获得了较优的推理吞吐量和模型精度。实验结果如下（本文通过VOC数据验证算法性能，epoch为100，batchsize为32，imagesize为640*640）：

Model	mAP50-95	mAP50	run time (h)	params (M)	interence time (ms)
YOLOv8	0.549	0.760	1.051	3.01	0.2+0.3(postprocess)
YOLO11	0.553	0.757	1.142	2.59	0.2+0.3(postprocess)
yolov8_C2f-EfficientViM	0.521	0.740	1.081	2.81	0.2+0.3(postprocess)

在这里插入图片描述

重要声明：本文改进后代码可能只是并不适用于我所使用的数据集，对于其他数据集可能存在有效性。

本文改进是为了降低最新研究进展至YOLO的代码迁移难度，从而为对最新研究感兴趣的同学提供参考。

代码迁移

重点内容

步骤一：迁移代码

ultralytics框架的模块代码主要放在ultralytics/nn文件夹下，此处为了与官方代码进行区分，可以新增一个extra_modules文件夹，然后将我们的代码添加进入。

具体代码如下：

import torch
import torch.nn as nn

__all__ = ['EfficientViMBlock']

class LayerNorm1D(nn.Module):
    """LayerNorm for channels of 1D tensor(B C L)"""
    def __init__(self, num_channels, eps=1e-5, affine=True):
        super(LayerNorm1D, self).__init__()
        self.num_channels = num_channels
        self.eps = eps
        self.affine = affine

        if self.affine:
            self.weight = nn.Parameter(torch.ones(1, num_channels, 1))
            self.bias = nn.Parameter(torch.zeros(1, num_channels, 1))
        else:
            self.register_parameter('weight', None)
            self.register_parameter('bias', None)

    def forward(self, x):
        mean = x.mean(dim=1, keepdim=True)  # (B, 1, H, W)
        var = x.var(dim=1, keepdim=True, unbiased=False)  # (B, 1, H, W)

        x_normalized = (x - mean) / torch.sqrt(var + self.eps)  # (B, C, H, W)

        if self.affine:
            x_normalized = x_normalized * self.weight + self.bias

        return x_normalized

class ConvLayer2D(nn.Module):
    def __init__(self, in_dim, out_dim, kernel_size=3, stride=1, padding=0, dilation=1, groups=1, norm=nn.BatchNorm2d, act_layer=nn.ReLU, bn_weight_init=1):
        super(ConvLayer2D, self).__init__()
        self.conv = nn.Conv2d(
            in_dim,
            out_dim,
            kernel_size=(kernel_size, kernel_size),
            stride=(stride, stride),
            padding=(padding, padding),
            dilation=(dilation, dilation),
            groups=groups,
            bias=False
        )
        self.norm = norm(num_features=out_dim) if norm else None
        self.act = act_layer() if act_layer else None
        
        if self.norm:
            torch.nn.init.constant_(self.norm.weight, bn_weight_init)
            torch.nn.init.constant_(self.norm.bias, 0)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.conv(x)
        if self.norm:
            x = self.norm(x)
        if self.act:
            x = self.act(x)
        return x
    
    
class ConvLayer1D(nn.Module)