【目标检测实验系列】YOLOv5模型改进:引入轻量化多维动态卷积ODConv,减少计算量的同时保持精度稳定或略微上涨!(内含源代码,超详细改进代码流程)

1. 文章主要内容

       本篇博客主要涉及轻量化多维动态卷积ODConv,融合到YOLOv5模型中,减少计算量的同时保持精度稳定或略微上涨。(通读本篇博客需要7分钟左右的时间)

2. 介绍

       ODconv沿着空间、输入通道、输出通道以及卷积核空间的核维度学习更丰富的注意力,且采用更少的卷积核,使其在取得更优性能的同时也能降低计算量。
在这里插入图片描述

3. 详细代码改进流程

3.1 ODconv源代码

       源代码如下:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.autograd

class ODConv(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=nn.BatchNorm2d,
                 reduction=0.0625, kernel_num=1):
        padding = (kernel_size - 1) // 2
        super(ODConv, self).__init__(
            ODConv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups,
                     reduction=reduction, kernel_num=kernel_num),
            norm_layer(out_planes),
            nn.SiLU()
        )

class Attention(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size,
    groups=1,
    reduction=0.0625,
    kernel_num=4,
    min_channel=16):
        super(Attention, self).__init__()
        attention_channel = max(int(in_planes * reduction), min_channel)
        self.kernel_size = kernel_size
        self.kernel_num = kernel_num
        self.temperature = 1.0

        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Conv2d(in_planes, attention_channel, 1, bias=False)
        self.bn = nn.BatchNorm2d(attention_channel)
        self.relu = nn.ReLU(inplace=True)

        self.channel_fc = nn.Conv2d(attention_channel, in_planes, 1, bias=True)
        self.func_channel = self.get_channel_attention

        if in_planes == groups and in_planes == out_planes:  # depth-wise convolution
            self.func_filter = self.skip
        else:
            self.filter_fc = nn.Conv2d(attention_channel, out_planes, 1, bias=True)
            self.func_filter = self.get_filter_attention

        if kernel_size == 1:  # point-wise convolution
            self.func_spatial = self.skip
        else:
            self.spatial_fc = nn.Conv2d(attention_channel, kernel_size * kernel_size, 1, bias=True)
            self.func_spatial = self.get_spatial_attention

        if kernel_num == 1:
            self.func_kernel = self.skip
        else:
            self.kernel_fc = nn.Conv2d(attention_channel, kernel_num, 1, bias=True)
            self.func_kernel = self.get_kernel_attention
        self.bn_1 = nn.LayerNorm([attention_channel,1,1])
        self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            if isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def update_temperature(self, temperature):
        self.temperature = temperature

    @staticmethod
    def skip(_):
        return 1.0

    def get_channel_attention(self, x):
        channel_attention = torch.sigmoid(self.channel_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)
        return channel_attention

    def get_filter_attention(self, x):
        filter_attention = torch.sigmoid(self.filter_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)
        return filter_attention

    def get_spatial_attention(self, x):
        spatial_attention = self.spatial_fc(x).view(x.size(0), 1, 1, 1, self.kernel_size, self.kernel_size)
        spatial_attention = torch.sigmoid(spatial_attention / self.temperature)
        return spatial_attention

    def get_kernel_attention(self, x):
        kernel_attention = self.kernel_fc(x).view(x.size(0), -1, 1, 1, 1, 1)
        kernel_attention = F.softmax(kernel_attention / self.temperature, dim=1)
        return kernel_attention

    def forward(self, x):
        x = self.avgpool(x)
        x = self.fc(x)
        x = self.bn_1(x)
        x = self.relu(x)
        return self.func_channel(x), self.func_filter(x), self.func_spatial(x), self.func_kernel(x)

class ODConv2d(nn.Module):
    def __init__(self,
    in_planes,
    out_planes,
    kernel_size=3,
    stride=1,
    padding=0,
    dilation=1,
    groups=1,
    reduction=0.0625,
    kernel_num=1):
        super(ODConv2d, self).__init__()
        self.in_planes = in_planes
        self.out_planes = out_planes
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.kernel_num = kernel_num
        self.attention = Attention(in_planes, out_planes, kernel_size, groups=groups,
                                   reduction=reduction, kernel_num=kernel_num)
        self.weight = nn.Parameter(torch.randn(kernel_num, out_planes, in_planes//groups, kernel_size, kernel_size),
                                   requires_grad=True)
        self._initialize_weights()

        if self.kernel_size == 1 and self.kernel_num == 1:
            self._forward_impl = self._forward_impl_pw1x
        else:
            self._forward_impl = self._forward_impl_common

    def _initialize_weights(self):
        for i in range(self.kernel_num):
            nn.init.kaiming_normal_(self.weight[i], mode='fan_out', nonlinearity='relu')

    def update_temperature(self, temperature):
        self.attention.update_temperature(temperature)

    def _forward_impl_common(self, x):

        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)
        batch_size, in_planes, height, width = x.size()
        x = x * channel_attention
        x = x.reshape(1, -1, height, width)
        aggregate_weight = spatial_attention * kernel_attention * self.weight.unsqueeze(dim=0)
        aggregate_weight = torch.sum(aggregate_weight, dim=1).view(
            [-1, self.in_planes // self.groups, self.kernel_size, self.kernel_size])
        output = F.conv2d(x, weight=aggregate_weight, bias=None, stride=self.stride, padding=self.padding,
                          dilation=self.dilation, groups=self.groups * batch_size)
        output = output.view(batch_size, self.out_planes, output.size(-2), output.size(-1))
        output = output * filter_attention
        return output

    def _forward_impl_pw1x(self, x):
        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)
        x = x * channel_attention
        output = F.conv2d(x, weight=self.weight.squeeze(dim=0), bias=None, stride=self.stride, padding=self.padding,
                          dilation=self.dilation, groups=self.groups)
        output = output * filter_attention
        return output

    def forward(self, x):
        return self._forward_impl(x)

3.2 建立一个yolov5-odconv.yaml文件

       注意到,这里博主直接使用ODConv代替Head P5前一层的标准卷积(事实上可以替换结构中的任意标准卷积), 另外注意nc改为自己数据集的类别数

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 10  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8  小目标
  - [30,61, 62,45, 59,119]  # P4/16 中目标
  - [116,90, 156,198, 373,326]  # P5/32  大目标

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2  output_channel, kernel_size, stride, padding
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, ODConv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

3.3 将ODConv引入到yolo.py文件中

       在下图的红色圈内位置处,引入相关的类即可。
在这里插入图片描述

3.4 修改train.py启动文件

       修改配置文件为yolov5-odconv.yaml即可,如下图所示:
在这里插入图片描述

4. 总结

       本篇博客主要介绍了轻量化多维动态卷积ODConv结构,多维度关注数据特征,减少计算量的同时保持精度稳定或略微上涨。另外,在修改过程中,要是有任何问题,评论区交流;如果博客对您有帮助,请帮忙点个赞,收藏一下;后续会持续更新本人实验当中觉得有用的点子,如果很感兴趣的话,可以关注一下,谢谢大家啦!

### 轻量化空间注意力机制的概念与实现 #### 什么是轻量化空间注意力机制? 轻量化空间注意力机制是一种优化的空间注意力技术,旨在减少计算复杂度的同时提升模型对特征图中不同位置的关注能力。通过引入这种机制,模型可以更高效地捕捉图像中的重要区域,从而改善整体性能[^2]。 #### 实现方法 以下是基于卷积神经网络的一种常见轻量化空间注意力机制的实现方式: 1. **生成空间注意力图** 首先提取输入特征图的不同统计信息(如均值和最大值),并将其作为后续处理的基础。这一步可以通过全局平均池化 (Global Average Pooling, GAP) 和全局最大池化 (Global Max Pooling, GMP) 来完成。 2. **融合多尺度上下文信息** 将上述两种统计信息进行拼接叠加操作,形成一个多维张量表示。随后利用少量参数的卷积层来捕获局部依赖关系,进一步降低计算开销。 3. **激活函数应用** 使用 Sigmoid 函数将得到的结果转换成范围在 [0, 1] 的权重矩阵,该矩阵即为空间注意力图。这些权值用于调整原始特征图上各个像素的重要性程度。 4. **加权组合原特征图** 把经过变换后的空间注意力图乘回初始输入特征图之上,最终获得增强版的新特征表达形式。 下面给出一段 Python 示例代码展示如何构建这样一个简单的轻量化空间注意力模块: ```python import torch.nn as nn class LightweightSpatialAttention(nn.Module): def __init__(self, kernel_size=7): super(LightweightSpatialAttention, self).__init__() assert kernel_size in (3, 7), 'kernel size must be 3 or 7' padding = 3 if kernel_size == 7 else 1 self.conv = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False) self.sigmoid = nn.Sigmoid() def forward(self, x): avg_out = torch.mean(x, dim=1, keepdim=True) max_out, _ = torch.max(x, dim=1, keepdim=True) out = torch.cat([avg_out, max_out], dim=1) out = self.conv(out) return self.sigmoid(out) * x ``` 此段脚本定义了一个名为 `LightweightSpatialAttention` 的类继承自 PyTorch 中的标准 Module 类型。它接受一个可选参数指定卷积核大小,默认设置为 7×7 。内部结构包括两个主要部分——卷积运算以及非线性映射过程;前者负责压缩通道数至单一维度以便于后续操作,后者则用来限定输出数值区间位于零到一之间以适合作为掩码作用因子。 #### 性能优势分析 相比传统全连接层构成的大规模注意单元设计思路而言,采用此类精简架构具备如下几个显著优点: - 参数数量大幅削减; - 计算成本有效控制; - 更易于部署于资源受限环境当中比如移动端设备者嵌入式平台等场景下运行推理任务时显得尤为重要[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

弗兰随风小欢

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值