YOLOv5改进（CoordConv）

最新推荐文章于 2025-03-15 09:53:55 发布

aidem_brown

最新推荐文章于 2025-03-15 09:53:55 发布

阅读量207

点赞数

文章标签： YOLO 深度学习机器学习

原文链接：https://blog.youkuaiyun.com/eagleflying_cau/article/details/131150638

版权

本文介绍了如何在YOLOv5模型中集成CoordConv模块，通过增加输入特征的i、j坐标信息，提高卷积层的定位精度。作者还提供了模型结构和yaml配置文件示例，展示了CoordConv如何应用于不同层级的特征融合。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.CoodConev原文地址：

https://arxiv.org/pdf/1807.03247v2.pdf

2.改进策略，在原有卷积上增加了i、j坐标

3.方法：

（1）将下面代码放到models/commen.py中

lass AddCoords(nn.Module):

def __init__(self, with_r=False):
super().__init__()
self.with_r = with_r

def forward(self, input_tensor):
"""
Args:
input_tensor: shape(batch, channel, x_dim, y_dim)
"""
batch_size, _, x_dim, y_dim = input_tensor.size()

xx_channel = torch.arange(x_dim).repeat(1, y_dim, 1)
yy_channel = torch.arange(y_dim).repeat(1, x_dim, 1).transpose(1, 2)

xx_channel = xx_channel.float() / (x_dim - 1)
yy_channel = yy_channel.float() / (y_dim - 1)

xx_channel = xx_channel * 2 - 1
yy_channel = yy_channel * 2 - 1

xx_channel = xx_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)
yy_channel = yy_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)

ret = torch.cat([
input_tensor,
xx_channel.type_as(input_tensor),
yy_channel.type_as(input_tensor)], dim=1)

if self.with_r:
rr = torch.sqrt(torch.pow(xx_channel.type_as(input_tensor) - 0.5, 2) + torch.pow(yy_channel.type_as(input_tensor) - 0.5, 2))
ret = torch.cat([ret, rr], dim=1)

return ret

class CoordConv(nn.Module):

def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, with_r=False):
super().__init__()
self.addcoords = AddCoords(with_r=with_r)
in_channels += 2
if with_r:
in_channels += 1
self.conv = Conv(in_channels, out_channels, k=kernel_size, s=stride)

def forward(self, x):
x = self.addcoords(x)
x = self.conv(x)
return x
2.将CoordConv加到models/yolo.py的parse_model中

3.构建yaml文件

# Parameters
nc: 1 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# YOLOv5 v6.0 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 3, C3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
]

# YOLOv5 v6.0 head
head:
[[-1, 1, CoordConv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 13

[-1, 1, CoordConv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)

[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)

[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)

[17, 1, CoordConv, [256, 1 ]], # 24
[20, 1, CoordConv, [512, 1]], # 25
[23, 1, CoordConv, [1024, 1]], # 26

[[24, 25, 26], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
4.运行
————————————————