[tricks] C o o r d C o n v

最新推荐文章于 2025-05-25 10:04:34 发布

原创最新推荐文章于 2025-05-25 10:04:34 发布 · 284 阅读

0 ·

CC 4.0 BY-SA版权

OBD 专栏收录该内容

9 篇文章

订阅专栏

Uber的研究揭示了卷积神经网络在坐标建模任务中的局限，并提出CoordConv解决平移不变性问题。该方法改进了模型性能，提升训练效率，参数更少。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

卷积神经网络拥有权重共享、局部连接和平移等变性等非常优秀的属性，使其在多种视觉任务上取得了极大成功。但在涉及坐标建模的任务上（如目标检测、图像生成等），其优势反而成为了缺陷，并潜在影响了最终的模型性能。Uber 在这项研究中揭示出问题的本质就在于卷积的平移等变性，并据此提出了对应的解决方案 CoordConv。CoordConv 解决了坐标变换问题，具有更好的泛化能力，训练速度提高 150 倍，参数比卷积少 10-100 倍，并能极大提升多种视觉任务的表现。

拿走不谢

class AddCoords(nn.Module):

    def __init__(self, with_r=False):
        super().__init__()
        self.with_r = with_r

    def forward(self, input_tensor):
        """
        Args:
            input_tensor: shape(batch, channel, x_dim, y_dim)
        """
        batch_size, _, x_dim, y_dim = input_tensor.size()

        xx_channel = torch.arange(x_dim).repeat(1, y_dim, 1)
        yy_channel = torch.arange(y_dim).repeat(1, x_dim, 1).transpose(1, 2)

        xx_channel = xx_channel.float() / (x_dim - 1)
        yy_channel = yy_channel.float() / (y_dim - 1)

        xx_channel = xx_channel * 2 - 1
        yy_channel = yy_channel * 2 - 1

        xx_channel = xx_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)
        yy_channel = yy_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)

        ret = torch.cat([
            input_tensor,
            xx_channel.type_as(input_tensor),
            yy_channel.type_as(input_tensor)], dim=1)

        if self.with_r:
            rr = torch.sqrt(torch.pow(xx_channel.type_as(input_tensor) - 0.5, 2) + torch.pow(yy_channel.type_as(input_tensor) - 0.5, 2))
            ret = torch.cat([ret, rr], dim=1)

        return ret