《An attempt at beating the 3D U-Net》 2019

最新推荐文章于 2024-05-29 16:36:09 发布

原创最新推荐文章于 2024-05-29 16:36:09 发布 · 1k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python #计算机视觉 #深度学习 #pytorch #图像处理

论文阅读同时被 2 个专栏收录

4 篇文章

订阅专栏

语义分割

3 篇文章

订阅专栏

本文介绍了将3D U-Net应用于2019年肾和肾肿瘤分割挑战，并通过添加残差和预激活残差块进行增强。尽管交叉验证结果显示改进微小，但基于略高的Dice分数，选择残差3D U-Net进行测试集预测。最终，我们的方法在测试集上以91.23的Composite Dice分数超过了105个竞争对手，赢得了KiTS2019挑战赛。

一、Abstract
二、Method
三、References
四、Code
- （1）readme.md

一、Abstract

The U-Net is arguably the most successful segmentation architecture in the medical domain. Here we apply a 3D U-Net to the 2019 Kidney and Kidney Tumor Segmentation Challenge and attempt to improve upon it by augmenting it with residual and pre-activation residual blocks. Cross-validation results on the training cases suggest only very minor, barely measurable improvements. Due to marginally higher dice scores, the residual 3D U-Net is chosen for test set prediction. With a Composite Dice score of 91.23 on the test set, our method outperformed all 105 competing teams and won the KiTS2019 challenge by a small margin.

二、Method

Based on the success of the U-Net architecture, we develop and train three different U-Net inspired architectures: a 3D ’plain’ U-Net (no residual/dense connections), a residual [4] 3D U-Net and a pre-activation [5] residual 3D UNet.

As stated previously, we employ three 3D U-Net architectures for our experiments. All U-Nets use 3D convolutions, ReLU/LReLU nonlinearities and instance normalization. Upsampling is done via transposed convolution and downsampling is done with strided convolutions. All networks start with some number of feature maps at the highest resolution. This number is is doubled with each downsampling operation(up to a maximum of 320 feature maps) in the encoder and halved with each transposed convolution in the decoder. We always downsample by a factor of 2. Downsampling is done until further downsampling would result in a spatial feature map size < 4.

Plain 3D U-Net For both the encoder and decoder we use two conv-instnorm LReLU blocks between poolings/upsamplings. This architecture uses 30 feature maps at the highest resolution.

Residual 3D U-Net This architecture uses residual blocks in the encoder as opposed to a simple sequence of convolutions. The residual blocks are implemented similar to [4]: conv-instnorm-ReLU-conv-instnorm-ReLU(where the addition of the residual takes place before the last ReLU activation). We start with just one residual block at the highest resolution and increase the number of residual blocks after each downsampling operation. The decoder uses only one conv-instnorm-ReLU per resolution. To accommodate the larger memory footprint of residual networks, we reduce the initial number of feature maps from 30 to 24.

Pre-activation residual 3D U-Net Inspired by [5] we also use a variant of the residual 3D U-Net that uses pre-activation residual blocks: instnorm-ReLU-conv-instnorm-ReLU-conv.

All networks are trained with stochastic gradient descent and a batch size of 2. We found that a patch size of 80 × 160 × 160 yields sufficient contextual information while retaining necessary fine grained image information. We define an epoch as iteration over 250 batches and train for a total of 1000 epochs. The sum of cross-entropy and dice loss is used as training objective and we use supervision at different resolutions to encourage gradient flows deeper into the network. The training of a single network utilizes 12 GB of VRAM and runs for about 5 days. Training was done on Nvidia Titan Xp GPUs (single GPU training). All networks were implemented with the PyTorch framework [12] (version 1.1). We base our implementation on nnU-Net3 [7].

Dice scores for kidney were computed by treating both the actual kidney label as well as the tumor label as foreground and everything else as background. This constitutes the same setup that is used in the challenge evaluation. The dice computation of the tumors is done simply on the tumor labels. No other metrics are considered as the challenge is evaluated on the geometric mean of kidney and tumor dice.

三、References

2、《3d u-net: learning dense volumetric segmentation from sparse
annotation》2016
4、《Deep residual learning for image recognition 》2016
11、《Fully convolutional neural networks for volumetric medical
image segmentation》2016

四、Code

（1）readme.md

In 3D biomedical image segmentation, dataset properties like imaging modality, image sizes, voxel spacings, class ratios etc vary drastically. For example, images in the Liver and Liver Tumor Segmentation Challenge dataset are computed tomography (CT) scans, about 512x512x512 voxels large, have isotropic voxel spacings and their intensity values are quantitative (Hounsfield Units). The Automated Cardiac Diagnosis Challenge dataset on the other hand shows cardiac structures in cine MRI with a typical image shape of 10x320x320 voxels, highly anisotropic voxel spacings and qualitative intensity values. In addition, the ACDC dataset suffers from slice misalignments and a heterogeneity of out-of-plane spacings which can cause severe interpolation artifacts if not handled properly.

In current research practice, segmentation pipelinesare designed manually and with one specific dataset in mind. Hereby, many pipeline settingsdepend directly or indirectly on the properties of the dataset and display a complex co-dependence: image size, for example, affects the patch size, which in turn affects the required receptive field of the network, a factor that itself influences several other hyperparameters in the pipeline. As a result, pipelines that were developed on one (type of) dataset are inherently incomaptible with other datasets in the domain.

nnU-Net is the first segmentation method that is designed to deal with the dataset diversity found in the domain. It condenses and automates the keys decisions for designing a successful segmentation pipeline for any given dataset.

问题与思考：
1、segmentation pipelines是什么意思？
2、pipeline settings是什么意思？
3、patch size是什么意思？

nnU-Net makes the following contributions to the field:
1、Standardized baseline: nnU-Net is the first standardized deep learning benchmark in biomedical segmentation. Without manual effort, researchers can compare their algorithms against nnU-Net on an arbitrary number of datasets to provide meaningful evidence for proposed improvements.
2、Out-of-the-box segmentation method: nnU-Net is the first plug-and-play tool for state-of-the-art biomedical segmentation. Inexperienced users can use nnU-Net out of the box for their custom 3D segmentation problem without need for manual intervention.
3、Framework: nnU-Net is a framework for fast and effective development of segmentation methods. Due to its modular structure, new architectures and methods can easily be integrated into nnU-Net. Researchers can then benefit from its generic nature to roll out and evaluate their modifications on an arbitrary number of datasets in a standardized environment.