OBELISK1
Abstract
Deep CNN 通过在端到端可训练的体系结构中用学习的卷积滤波器取代手工制作的特征提取功能,在大多数图像分析任务中达到了 SOTA。尽管如此,卷积核规格仍需经过大量人工设计——卷积操作的感受野的形状和大小是一个非常敏感的部分,必须针对不同的图像分析应用进行调整。3D全卷积多尺度体系结构(3D-UNet),具有跳跃连接,擅长语义分割和 landmark 定位,但有巨大的内存需求,并且依赖于大量注释数据集,这是医学图像分析中更广泛应用的一个重要限制。
大量的验证实验表明,稀疏可变形卷积的性能是由于它们能够用很少的表达性滤波器参数捕获大的空间上下文,并且学习复杂的形状和外观特征并不总是需要网络深度。与传统 CNN 的结合进一步改善了具有较大形状变化的小器官的描绘
Deep networks have set the state-of-the-art in most image analysis tasks by replacing handcrafted features with learned convolution filters within end-to-end trainable architectures. Still, the specifications of a convolutional network are subject to much manual design – the shape and size of the receptive field for convolutional operations is a very sensitive part that has to be tuned for different image analysis applications. 3D fully-convolutional multi-scale architectures with skip-connection that excel at semantic segmentation and landmark localisation have huge memory requirements and rely on large annotated datasets - an important limitation for wider adaptation in medical image analysis.
Extensive validation experiments indicate that the performance of sparse deformable convolutions is due to their ability to capture large spatial context with few expressive filter parameters and that network depth is not always necessary to learn complex shape and appearance features. A combination with conventional CNNs further improves the delineation of small organs with large shape variations and the fast inference time using flexible image sampling may offer new potential use cases for deep networks in computer-assisted, image-guided interventions.
Introduction and Motivation
为了解决 patch-based 分类的计算耗时长问题,提出了所谓的完全卷积网络(FCN)在编-解码器体系结构中使用内在的多尺度方法和残差(或跳跃)连接,以在精度和计算需求之间获得良好的权衡,然而仍依赖多达几十个卷积层。我们注意到,大多数医学分割任务只处理不到十几类结构,这就提出了一个问题:是否确实需要具有许多卷积层的非常深的网络才能获得高质量的结果。
To address computational issues of patch-based classification so called fully convolutional networks (FCNs) have been proposed (Long et al., 2015; Ronneberger et al., 2015) that use an intrinsic multi-scale approach within an encoder-decoder architecture and residual or skip connections to obtain a good trade-off between accuracy and computational demand, while still relying on dozens of convolutional layers. However, we note that most object and medical segmentation tasks deal with fewer than a dozen classes of anatomy, which raises the question whether very deep networks with many spatial convolution layers are indeed required to obtain high-quality results.
在这项工作中,我们提出了一个扩展卷积的替代概念,其在连续可微空间中学习大而稀疏卷积核的空间滤波器偏移量( b i a s bias bias)和系数( w e i g h t weight weight)。我们坚信,这些能够自动调整滤波器布局的空间可变形卷积核对于提高网络对医学 3D 图像的适用性是一条极其重要的线索。
In this work, we present an alternative concept to dilated convolutions in which both the spatial filter offsets and coefficients of a large and sparse convolutional kernel are learned in a continuous, differentiable space. We strongly believe that these spatially deformable kernels that automatically adapt their filter layout are an extremely important clue to deepen the understanding of the processes within convolutional networks and improve the applicability to medical volumetric data.
Method

使用 im2col 算子在 CNN 中实现传统卷积以提取重叠块,然后使用滤波器组进行矩阵乘法并重塑为预期的特征图尺寸。 OBELISK 中的可变形卷积遵循类似的原理,但用连续采样的空间滤波器偏移布局替换了矩形卷积,该布局添加到特征图坐标(此处为 3×5 网格)和使用 gridsample 算子的双线性插值。滤波器组的后续硬件优化单精度矩阵乘法在计算时间方面同样有效,但可以在单层内捕获更多空间上下文,并且可训练参数很少。
Implementation of conventional convolutions in deep networks using the im2col operator to extract overlapping patches followed by a matrix multiplication with a filter bank and reshaping to the expected feature map dimensions. The deformable convolutions in OBELISK follow a similar principle, but replace the rectangular patch-extraction with a continuously sampled spatial filter offset layout, which is added to the feature map coordinates (here a 3×5 grid) and bilinear interpolation using the gridsample operator. The subsequent hardware-optimised single-precision matrix multiplication of a filter bank is equivalently effective in terms of computation time, but may capture much more spatial context within a single layer and with few trainable parameters.
这一概念可以很容易地集成到常用的 U-Net 或 V-Net 架构中。
为了提高通用性,我们现在将描述用于多通道输入的 OBELISK(2D 情况下),并定义一个更通用的可变形卷积操作:给定一个大小为 B × C i n × H × W B×C_{in}×H×W B×Cin×H×W 的输入张量(特征图),一个大小为 B × 1 × S s p × 2 B×1×S_{sp}×2 B×1×Ssp×2 的空间采样张量,一个大小为 1 × K × 1 × 2 1×K×1×2 1×K×1×2 的可变形偏移张量和一个大小为 1 × C o u t × C i n ⋅ K 1 × C_{out} × C_{in} \cdot K 1×Cout×Cin⋅K 的权重张量。这里, B B B 是批量大小, C i n C_{in} Cin 是输入通道数, C o u t C_{out} Cout

本文提出了一种名为OBELISK的新型卷积方法,通过学习大而稀疏的卷积核空间滤波器偏移量,替代传统的固定卷积核。这种方法在3D医学图像分析中表现出色,特别是在处理小器官和形状变化时。通过结合传统CNN,OBELISK能够在减少参数数量的同时捕捉更多空间上下文,提高了网络的效率和准确性。在有限的训练数据集上,OBELISK与浅层U-Net的组合展示了接近SOTA的表现,且对预处理和数据增强的依赖性较低。
最低0.47元/天 解锁文章
7345

被折叠的 条评论
为什么被折叠?



