Paper reading report: 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_42752238/article/details/107126706

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

Abstract. This paper introduces a network for volumetric segmentation that learns from sparsely annotated volumetric images. We outline two attractive use cases of this method: (1) In a semi-automated setup, the user annotates some slices in the volume to be segmented. The network learns from these sparse annotations and provides a dense 3D segmentation. (2) In a fully-automated setup, we assume that a representative, sparsely annotated training set exists. Trained on this data set, the network densely segments new volumetric images. The proposed network extends the previous u-net architecture from Ronneberger et al. by replacing all 2D operations with their 3D counterparts. The implementation performs on-the-ﬂy elastic deformations for eﬃcient data augmentation during training. It is trained end-to-end from scratch, i.e., no pre-trained network is required. We test the performance of the proposed method on a complex, highly variable 3D structure, the Xenopus kidney, and achieve good results for both use cases.

Çiçek, Abdulkadir, Lienkamp, Brox, & Ronneberger - Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 Lecture Notes in Computer Science - 2016
Title: 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

Author: Ozgu¨n Cicek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger
**Published:**Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 Lecture Notes in Computer Science - 2016

Introduction:

Complete annotation for 3D volumes slice by slice is neither a effective nor a efficient method to deal with 3D training sets.

The researchers proposed a deep network that generate dense volumetric segmentations with only partly annotated 2D slices. The whole architecture is based on the previous 2D U-net, while the operations are substituted by 3D version. Moreover, the researchers adjust the operation of doubling channels and added batch normalization. This avoided botttlnecks and gained faster convergence.

In many biomedical applications, only very few images are required to train a network that generalizes reasonably well, especially in 3D occasions. This is because each image already comprises repetitive structures with corresponding variation.
Related Work:
Milletari et al. [9] present a CNN combined with a Hough voting approach for 3D segmentation. However, their method is not end-to-end and only works for compact blob-like structures.
The approach of Kleesiek et al. [6] is one of few end-to-end 3D CNN approaches for 3D segmentation. However, their network is not deep and has only one max-pooling after the first convolutions; therefore, it is unable to analyze structures at multiple scales. In this work by Tran et al., the U-net architecture is applied to videos and full annotation is available for training. The highlight of the present paper is that it can be trained from scratch on sparsely annotated volumes and can work on arbitrarily large volumes due to its seamless tiling strategy.

Network Architecture:

The whole architecture is very similar to the 2D version. The researchers moved the operation of doubling channel numbers before max pooling. This helps with efficiency according to Szegedy et al… The researchers also introduced batch normalization before each ReLU.
*BN带来的好处。
(1) 减轻了对参数初始化的依赖，这是利于调参的朋友们的。
(2) 训练更快，可以使用更高的学习率。
(3) BN一定程度上增加了泛化能力，dropout等技术可以去掉。
https://www.zhihu.com/question/38102762 言有三的回答
I need to read more to understand BN.

Another important part is the weighted softmax loss function. Setting the weights of unlabeled pixels to zero makes it possible to learn from only the labelled ones in the partly annotated volumes.
The training part and the expriment part is omitted.