YOLOv8改进 | 有效涨点 | 使用CVPR2025 OverLock中的RepConBlock改进C2f模块

用RepConvBlock改进YOLOv8的C2f模块

原创

已于 2025-06-12 16:41:14 修改 · 920 阅读

27 ·

CC 4.0 BY-SA版权

文章标签：

#YOLO #深度学习 #人工智能

于 2025-06-10 09:29:32 首次发布

文章目录

本文介绍
代码迁移

本文介绍

为提升 YOLOv8 在目标检测任务中的特征表达能力，本文通过 CVPR2025 OverLock 所提出的Deep-stage Decomposition Strategy中Base-Net的核心模块RepConvBlock改进YOLOv8的C2f模块。RepConvBlock通过膨胀卷积、通道注意力等其他模块实现了在学习到较大感受野的同时增强卷积神经网络的特征表达能力。 实验结果如下（本文通过VOC数据验证算法性能，epoch为100，batchsize为32，imagesize为640*640）：

Model	mAP50-95	mAP50	run time (h)	params (M)	interence time (ms)
YOLOv8	0.549	0.760	1.051	3.01	0.2+0.3(postprocess)
YOLO11	0.553	0.757	1.142	2.59	0.2+0.3(postprocess)
YOLOv8_C2f-RepConvBlock	0.557	0.765	1.481	2.54	0.4+0.3(postprocess)

在这里插入图片描述

重要声明：本文改进后代码可能只是并不适用于我所使用的数据集，对于其他数据集可能存在有效性。

本文改进是为了降低最新研究进展至YOLO的代码迁移难度，从而为对最新研究感兴趣的同学提供参考。

代码迁移

重点内容

步骤一：迁移代码

ultralytics框架的模块代码主要放在ultralytics/nn文件夹下，此处为了与官方代码进行区分，可以新增一个extra_modules文件夹，然后将我们的代码添加进入。

具体代码如下：

import torch.nn as nn
import torch.nn.functional as F
from einops import rearrange
from torch.utils.checkpoint import checkpoint
from timm.models.layers import DropPath, to_2tuple

__all__ = ['RepConvBlock']

def get_conv2d(in_channels, 
               out_channels, 
               kernel_size, 
               stride, 
               padding, 
               dilation, 
               groups, 
               bias,
               attempt_use_lk_impl=True):
    
    kernel_size = to_2tuple(kernel_size)
    if padding is None:
        padding = (kernel_size[0] // 2, kernel_size[1] // 2)
    else:
        padding = to_2tuple(padding)
    need_large_impl = kernel_size[0] == kernel_size[1] and kernel_size[0] > 5 and padding == (kernel_size[0] // 2, kernel_size[1] // 2)

    if attempt_use_lk_impl and need_large_impl:
        print('---------------- trying to import iGEMM implementation for large-kernel conv')
        try:
            from depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM
            print('---------------- found iGEMM implementation ')
        except:
            DepthWiseConv2dImplicitGEMM = None
            print('---------------- found no iGEMM. use original conv. follow https://github.com/AILab-CVC/UniRepLKNet to install it.')
        if DepthWiseConv2dImplicitGEMM is not None and need_large_impl and in_channels == out_channels \
                and out_channels == groups and stride == 1 and dilation == 1:
            print(f'===== iGEMM Efficient Conv Impl, channels {
     
     in_channels}, kernel size {
     
     kernel_size} =====')
            return DepthWiseConv2dImplicitGEMM(in_channels, kernel_size, bias=bias)
    
    return nn.Conv2d(in_channels, out_channels, 
                     kernel_size=kernel_size, 
                     stride=stride,
                     padding=padding, 
                     dilation=dilation, 
                     groups=groups, 
                     bias=bias)


def get_bn(dim, use_sync_bn=False):
    if use_sync_bn:
        return nn.SyncBatchNorm(dim)
    else:
        return nn.BatchNorm2d(dim)


def fuse_bn(conv, bn):
    conv_bias = 0 if conv.bias is None else conv.bias
    std = (bn.running_var + bn.eps).sqrt()
    return conv.weight * (bn.weight / std).reshape(-1, 1, 1, 1), bn.bias + (conv_bias - bn.running_mean) * bn.weight / std

def convert_dilated_to_nondilated(kernel, dilate_rate):
    identity_kernel = torch.ones((1, 1, 1, 1)).to(kernel.device)
    if kernel.size(1) == 1:
        #   This is a DW kernel
        dilated = F.conv_transpose2d(kernel, identity_kernel, stride=dilate_rate)
        return dilated
    else:
        #   This is a dense or group-wise (but not DW) kernel
        slices = []
        for i in range(kernel.size(1)):
            dilated = F.conv_transpose2d(kernel[:,i:i+1,:,:], identity_kernel, stride=dilate_rate)
            slices.append(dilated)
        return torch.cat(slices, dim=1)

def merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilated_r):
    large_k = large_kernel.size(2)
    dilated_k = dilated_kernel.size(2)
    equivalent_kernel_size = dilated_r * (dilated_k - 1) + 1
    equivalent_kernel = convert_dilated_to_nondilated(dilated_kernel, dilated_r)
    rows_to_pad = large_k // 2 - equivalent_kernel_size // 2
    merged_kernel = large_kernel + F.pad(equivalent_kernel, [rows_to_pad] * 4)
    return merged_kernel

class SEModule(nn.Module):
    def __init__(self, dim, red=8, inner_act=nn.GELU, out_act=nn.Sigmoid):
        super().__init__()