[论文精读]U-Net: Convolutional Networks for BiomedicalImage Segmentation

①The expectations for machine learning and deep learning in medicine often lie not in classification accuracy, but in region segmentation and other aspects

②They consider the sliding-window model by Ciresan et al. as slow in training and inaccuracy brought by maxpooling

③⭐U-Net takes upsampling instead of pooling

④什么重叠贴图策略？？我没能明白，为啥这样就能预测

⑤They use elastic deformations to augment there data, which keeps the invariance

1.3. Network Architecture

①The whole framework:

②3*3 convolutions include no padding

③Stride of maxpooling is 2

④Double the number of channels when downsampling

⑤Up-conv 2*2 halves the number of feature channels

1.4. Training

①Momentum: 0.99

②Softmax function:

$p_{k}(\mathbf{x})=\exp(a_{k}(\mathbf{x}))/\left(\sum_{k^{\prime}=1}^{K}\exp(a_{k^{\prime}}(\mathbf{x}))\right)$

where $a_{k}\left ( \textbf{x} \right )$ is activation in the $k$ feature channel at the $\textbf{x}$ pixel position

③Cross entropy function:

$E=\sum_{\mathbf{x}\in\Omega}w(\mathbf{x})\log(p_{\ell(\mathbf{x})}(\mathbf{x}))$

where $\ell\in \left \{ 1,...,K \right \}$ denotes true label of every pixel, $w$ denotes weight map

④Weight map:

$w(\mathbf{x})=w_c(\mathbf{x})+w_0\cdot\exp\left(-\frac{(d_1(\mathbf{x})+d_2(\mathbf{x}))^2}{2\sigma^2}\right)$

where $w_{c}$ is balacing weight map, $d_{1}$ denotes the distance to the nearest cell border, $d_{2}$ denotes the distance to the second nearest cell border

⑤Initialization: $w_{0}=10, \sigma \approx 5$

⑥Setting of weights: standard deviation is $\sqrt{\frac{2}{N}}$ , where $N$ is the number of incoming nodes of one neuron

1.4.1. Data Augmentation

①Shift and rotation invariance are needed for robustness, especially random elastic deformations of the training samples are important to segmentation

②"They generate smooth deformations using random displacement vectors on a coarse 3 by 3 grid"

③Then compute bicubic interpolation to get per-pixel displacements