本文介绍
为提升 YOLOv8 在目标检测任务中的特征表达能力,本文借鉴 PRCV2024(Oral) MAF-YOLO 所提出的重参化异构高效层聚合网络(RepHELAN)模块中的DepthBottleneckUni改进YOLOv8的C2f模块。 DepthBottleneckUni确保了整体模型架构和卷积设计对异构大型卷积核的利用。因此,这保证了在同时实现多尺度感受野的同时保留与小目标相关的信息。 实验结果如下(本文通过VOC数据验证算法性能,epoch为100,batchsize为32,imagesize为640*640):
| Model | mAP50-95 | mAP50 | run time (h) | params (M) | interence time (ms) |
|---|---|---|---|---|---|
| YOLOv8 | 0.549 | 0.760 | 1.051 | 3.01 | 0.2+0.3(postprocess) |
| YOLO11 | 0.553 | 0.757 | 1.142 | 2.59 | 0.2+0.3(postprocess) |
| YOLOv8_C2f-DepthBottleneckUni | 0.557 | 0.765 | 1.481 | 2.54 | 0.4+0.3(postprocess) |

重要声明:本文改进后代码可能只是并不适用于我所使用的数据集,对于其他数据集可能存在有效性。
本文改进是为了降低最新研究进展至YOLO的代码迁移难度,从而为对最新研究感兴趣的同学提供参考。
代码迁移
重点内容
步骤一:迁移代码
ultralytics框架的模块代码主要放在ultralytics/nn文件夹下,此处为了与官方代码进行区分,可以新增一个extra_modules文件夹,然后将我们的代码添加进入。
具体代码如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
from itertools import repeat
import collections.abc
# from ..modules import Conv
class Conv(nn.Module):
'''Normal Conv with SiLU activation'''
def __init__(self, in_channels, out_channels, kernel_size = 1, stride = 1, groups=1, bias=False):
super().__init__()
padding = kernel_size // 2
self.conv = nn.Conv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=bias,
)
self.bn = nn.BatchNorm2d(out_channels)
self.act = nn.SiLU()
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
return self.act(self.conv(x))
def _ntuple(n):
def parse(x):
if isinstance(x, collections.abc.Iterable) and not isinstance(x, str):
return tuple(x)
return tuple(repeat(x, n))
return parse
to_1tuple = _ntuple(1)
to_2tuple = _ntuple(2)
to_3tuple = _ntuple(3)
to_4tuple = _ntuple(4)
to_ntuple = _ntuple
def get_conv2d_uni(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias,
attempt_use_lk_impl=True):
kernel_size = to_2tuple(kernel_size)
if padding is None:
padding = (kernel_size[0] // 2, kernel_size[1] // 2)
else:
padding = to_2tuple(padding)
need_large_impl = kernel_size[0] == kernel_size[1] and kernel_size[0] > 5 and padding == (kernel_size[0] // 2, kernel_size[1] // 2)
# if attempt_use_lk_impl and need_large_impl:
# print('---------------- trying to import iGEMM implementation for large-kernel conv')
# try:
# from depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM
# print('---------------- found iGEMM implementation ')
# except:
# DepthWiseConv2dImplicitGEMM = None
# print('---------------- found no iGEMM. use original conv. follow https://github.com/AILab-CVC/UniRepLKNet to install it.')
# if DepthWiseConv2dImplicitGEMM is not None and need_large_impl and in_channels == out_channels \
# and out_channels == groups and stride == 1 and dilation == 1:
# print(f'===== iGEMM Efficient Conv Impl, channels {in_channels}, kernel size {kernel_size} =====')
# return DepthWiseConv2dImplicitGEMM(in_channels, kernel_size, bias=bias)
return nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, groups=groups, bias=bias)
def get_bn(channels):
return nn.BatchNorm2d(channels)
def fuse_bn(conv, bn):
kernel = conv.weight
running_mean = bn.running_mean
running_var = bn.running_var
gamma = bn.weight
beta = bn.bias
eps = bn.eps
std = (running_var + eps).sqrt()
t = (gamma / std).reshape(-1, 1, 1, 1)
return kernel * t, beta - running_mean * gamma / std
def convert_dilated_to_nondilated(kernel, dilate_rate):
identity_kernel = torch.ones((1, 1, 1, 1), dtype=kernel.dtype

最低0.47元/天 解锁文章
1万+

被折叠的 条评论
为什么被折叠?



