论文阅读理解 - Dilated Convolution

本文深入探讨了Dilated Convolution,它在保持高分辨率的同时,能够整合多尺度内容信息,尤其适用于语义分割任务。通过Dilated Convolution,网络可以在不减少分辨率的情况下实现接收野的指数增长。文中详细介绍了Caffe中Dilated Convolution的实现,并引用了一篇相关论文,阐述了其如何增强context模块以提升dense prediction的准确性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Dilated Convolution

[Paper]: Multi-scale Context Aggregation by Dilated Convolutions

[Caffe-Code]

1. Caffe 中的定义

Dilated Convolution 已经可在 Caffe 官方的卷积层参数中定义.

message ConvolutionParameter {
  // Factor used to dilate the kernel, (implicitly) zero-filling the resulting holes. 
  // (Kernel dilation is sometimes referred to by its use in the
  //  algorithme à trous from Holschneider et al. 1987.)
  repeated uint32 dilation = 18; // The dilation; defaults to 1
}
layer {
  name: "ct_conv1_1"
  type: "Convolution"
  bottom: "fc-final"
  top: "ct_conv1_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 1
  }
  convolution_param {
    num_output: 42
    pad: 33
    kernel_size: 3
  }
}
layer {
  name: "ct_relu1_1"
  type: "ReLU"
  bottom: "ct_conv1_1"
  top: "ct_conv1_1"
}
layer {
  name: "ct_conv1_2"
  type: "Convolution"
  bottom: "ct_conv1_1"
  top: "ct_conv1_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 1
  }
  convolution_param {
    num_output: 42
    pad: 0
    kernel_size: 3
  }
}
layer {
  name: "ct_relu1_2"
  type: "ReLU"
  bottom: "ct_conv1_2"
  top: "ct_conv1_2"
}
layer {
  name: "ct_conv2_1"
  type: "Convolution"
  bottom: "ct_conv1_2"
  top: "ct_conv2_1"
  convolution_param {
    num_output: 84
    kernel_size: 3
    dilation: 2
  }
}
layer {
  name: "ct_relu2_1"
  type: "ReLU"
  bottom: "ct_conv2_1"
  top: "ct_conv2_1"
}
layer {
  name: "ct_conv3_1"
  type: "Convolution"
  bottom: "ct_conv2_1"
  top: "ct_conv3_1"
  convolution_param {
    num_output: 168
    kernel_size: 3
    dilation: 4
  }
}
layer {
  name: "ct_relu3_1"
  type: "ReLU"
  bottom: "ct_conv3_1"
  top: "ct_conv3_1"
}
layer {
  name: "ct_conv4_1"
  type: "Convolution"
  bottom: "ct_conv3_1"
  top: "ct_conv4_1"
  convolution_param {
    num_output: 336
    kernel_size: 3
    dilation: 8
  }
}
layer {
  name: "ct_relu4_1"
  type: "ReLU"
  bottom: "ct_conv4_1"
  top: "ct_conv4_1"
}
layer {
  name: "ct_conv5_1"
  type: "Convolution"
  bottom: "ct_conv4_1"
  top: "ct_conv5_1"
  convolution_param {
    num_output: 672
    kernel_size: 3
    dilation: 16
  }
}
layer {
  name: "ct_relu5_1"
  type: "ReLU"
  bottom: "ct_conv5_1"
  top: "ct_conv5_1"
}
layer {
  name: "ct_fc1"
  type: "Convolution"
  bottom: "ct_conv5_1"
  top: "ct_fc1"
  convolution_param {
    num_output: 672
    kernel_size: 3
  }
}
layer {
  name: "ct_fc1_relu"
  type: "ReLU"
  bottom: "ct_fc1"
  top: "ct_fc1"
}
layer {
  name: "ct_final"
  type: "Convolution"
  bottom: "ct_fc1"
  top: "ct_final"
  convolution_param {
    num_output: 21
    kernel_size: 1
  }
}

2. Paper - Multi-scale Context Aggregation by Dilated Convolutions

语义分割属于 dense prediction 问题, 不同于图像分类问题.

Dilated Convolutions 能够整合多尺度内容信息,且不损失分辨率,支持接受野的指数增长.

图像分类任务通过连续的 Pooling 和 Subsampling 层整合多尺度的内容信息,降低图像分别率,以得到全局预测输出.

Dense Prediction 需要结合多尺度内容推理(multi-scale contextual reasoning)与 full-resolution 输出.

处理 multi-scale reasoning 与 full-resolution dense prediction 冲突的方法:

  • 利用重复的 up-convolutions 操作,重构丢失的分辨率,保留downsampled 层的全局信息.
  • 利用图像不同 rescaled 的信息作为网络输入,并结合其输出. 不过无法确定哪个 rescaled 输入图像是最需要的.

Dilated Convolutions 不会降低图像分辨率,或分析 rescaled 图像,整合了多尺度的内容信息. 可以以任何分辨率加入到已有的网络结构中.

2.1 Dilated Convolution

定义离散函数: F:Z2R F : Z 2 → R , 假设 Ωr=[r,r]2Z2 Ω r = [ − r , r ] 2 ⋂ Z 2 k:ΩR k : Ω → R 是大小为 (2r+1)2 ( 2 r + 1 ) 2 的离散 filter. 则离散卷积操作

### 实现频率自适应空洞卷积 为了实现频率自适应空洞卷积(Frequency-Adaptive Dilated Convolution),需要设计一种能够动态调整膨胀率的机制。这种机制允许网络根据不同位置的空间上下文信息自动学习最优的膨胀参数,从而提高语义分割的效果。 下面是一个基于PyTorch框架的简单实现示例: ```python import torch import torch.nn as nn import torch.nn.functional as F class FrequencyAdaptiveDilatedConv(nn.Module): def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation_range=(1, 6)): super(FrequencyAdaptiveDilatedConv, self).__init__() # 定义可学习的dilation系数预测层 self.dilation_predictor = nn.Conv2d(in_channels, len(dilation_range), kernel_size=1) self.softmax = nn.Softmax(dim=1) # 创建多个不同dilation rate的conv分支 self.convs = nn.ModuleList([ nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride, padding=d * (kernel_size // 2), dilation=d, bias=False) for d in range(*dilation_range)]) # 初始化权重 for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') def forward(self, x): # 预测每个像素点对应的dilation权重分布 weights = self.softmax(self.dilation_predictor(x)) outputs = [] for i, conv in enumerate(self.convs): output_i = conv(x) * weights[:,i:i+1,:,:].expand_as(output_i) outputs.append(output_i) result = sum(outputs) return result ``` 此模块通过引入一个额外的小型卷积层来估计输入特征图上每一个空间位置的最佳膨胀因子,并利用softmax函数将其转换成概率形式作为各个分支输出加权求和的比例[^1]。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值