【第二十周】U-Net：用于生物图像分割的卷积神经网络

L-含光承影

已于 2025-01-19 21:11:39 修改

阅读量1.6k

点赞数 11

文章标签： cnn 人工智能神经网络

于 2025-01-19 21:04:37 首次发布

本文链接：https://blog.youkuaiyun.com/m0_59510256/article/details/145184860

版权

文章目录

摘要
Abstract
文章信息
研究动机
U-Net网络结构
U-Net网络搭建
数据增强
损失函数
转置卷积
创新性与不足
- 创新性：
- 不足：
总结

摘要

U-Net（Convolutional Networks for Biomedical Image Segmentation）是一种用于图像分割的深度学习网络，最初设计用于医学图像分割任务。其核心结构由对称的编码器-解码器组成：编码器通过卷积和池化操作逐步提取图像的抽象特征并降低分辨率，从而捕捉目标的全局语义信息；解码器通过上采样和卷积操作逐步恢复分辨率，并结合编码器提供的低层特征图（通过跳跃连接）重建目标的细节信息，从而实现精确的分割。为了解决深层网络中的细节丢失问题，U-Net 引入了跳跃连接，将编码器的低层特征图与解码器的高层特征图拼接，从而保留细节信息并提升分割精度。针对医学图像标注数据有限的问题，U-Net 采用了弹性形变的数据增强技术，增强了模型在少量数据上的泛化能力。然而，U-Net 也存在一些局限性：如对小目标的检测能力有限，对多尺度目标的适应性不足，以及对数据增强的依赖较强。为了解决这些问题，许多方法被提出，例如引入注意力机制（如 CBAM、SE Block）增强特征表达能力，使用多尺度特征融合（如空洞卷积、金字塔池化）提升对多尺度目标的适应性，以及通过自监督学习减少对标注数据的依赖。这些改不仅提升了U-Net的性能，也进一步扩大了其应用范围。

Abstract

U-Net (Convolutional Networks for Biomedical Image Segmentation) is a deep learning network designed for image segmentation, initially developed for medical image segmentation tasks. Its core structure consists of a symmetric encoder-decoder architecture: the encoder gradually extracts abstract features from the image and reduces resolution through convolutional and pooling operations, capturing the global semantic information of the target; the decoder gradually restores resolution through upsampling and convolutional operations, combining low-level feature maps from the encoder (via skip connections) to reconstruct detailed information of the target, thereby achieving precise segmentation. To address the issue of detail loss in deep networks, U-Net introduces skip connections, which concatenate low-level feature maps from the encoder with high-level feature maps from the decoder, preserving detail information and improving segmentation accuracy. To tackle the limited availability of annotated medical image data, U-Net employs data augmentation techniques such as elastic deformation, enhancing the model’s generalization ability with small datasets. However, U-Net also has some limitations: for example, its ability to detect small targets is limited, its adaptability to multi-scale targets is insufficient, and it heavily relies on data augmentation. To address these issues, many methods have been proposed, such as introducing attention mechanisms (e.g., CBAM, SE Block) to enhance feature representation, using multi-scale feature fusion (e.g., dilated convolution, pyramid pooling) to improve adaptability to multi-scale targets, and leveraging self-supervised learning to reduce dependence on annotated data. These improvements not only enhance the performance of U-Net but also further expand its application scope.

文章信息

Title：U-Net: Convolutional Networks for Biomedical Image Segmentation
Author：Olaf Ronneberger, Philipp Fischer, and Thomas Brox
Source：https://arxiv.org/abs/1505.04597

研究动机

从2012年Alexnet的提出以来，卷积神经网络已经广泛运用于计算机视觉任务。卷积网络的典型用途是分类任务，其中图像的输出是单个类别标签。然而，在许多视觉任务中，尤其是在生物医学图像处理中，期望的输出应当包括定位，即，假设将类标签分配给每个像素。此外，在生物医学任务中，用于训练的数据很少。所以，本文构建了一种全卷积网络用来分割图像，这种网络需要的训练图像很少，却能产生精确的分割结果。

U-Net网络结构

U-Net网络是一个全卷积网络，是一个编码-解码的结构，有基本的对称性。
在这里插入图片描述
U-Net的网络架构如上图所示，U-Net 可以分为三部分：
第一部分是主干特征提取部分，遵循经典的卷积网络架构，是卷积和最大池化的堆叠，利用主干特征提取部分我们可以获得五个初步有效特征层。
第二部分是加强特征提取部分，对主干特征提取部分得到的五个初步有效特征层进行逐步的上采样和拼接融合，得到与第一个初步有效特征层有相同通道数的特征层。
第三部分是预测部分，对第二部分得到的特征层进行卷积操作，对每一个像素点分类，得到图像分割结果图。
下面对具体的卷积层和上采样层进行说明：

conv $3\times 3$ ，ReLU：此结构中所有的卷积都是 $s t r i d e = 1$ ， $p a dd in g = 0$ ，此结构在主干特征提取部分使用，除连接输入的卷积层用的第一个卷积核通道是64外，其他卷积层的通道数都是输入通道数的2倍（通道数加倍以弥补下采样带来的损失）。
copy and crop：对主干特征提取部分得到的前四个初步有效特征层进行裁剪，以便与上采样得来的特征层进行拼接。
max pool $2\times 2$ :做 $2\times 2$ 的最大池化操作（下采样）， $s t r i d e = 2$ ， $p a dd in g = 0$ ，池化前后的通道数不变，宽高减半。
up-conv $2\times 2$ ：上采样，可用转置卷积或双线性插值等，转置卷积前后宽高加倍，通道数减半。
conv $1\times 1$ ：卷积核为 $1\times 1$ 的卷积操作， $s t r i d e = 1$ ， $p a dd in g = 0$ ，用在预测部分，卷积后，通道数变为类别数（包括背景类别）。

U-Net 还是一个编码-解码的结构，编码器就是下采样路径，通过卷积和池化操作提取特征。解码器就是上采样路径，通过上采样和跳跃连接恢复分辨率。

U-Net网络搭建

首先，为了方便搭建网络，可以定义一个标准的卷积块。另外吗，本网络不是严格按中文中的网络设置进行搭建的。在实现中，实际上可以在下采样的卷积操作时设置 padding = 1 ，以保持卷积前后的宽高不变，这样在与上采样的结果进行拼接时就不需要裁剪（文中的上采样后宽高加倍），最终得到的分割结果也是和输入图像的尺寸一样，而不是输入图像中间的一部分。

def _block(in_channels, features, name):
        """
        定义一个标准的卷积块。

        参数:
            in_channels (int): 输入通道数。
            features (int): 输出通道数。
            name (str): 块的名称（用于调试）。

        返回:
            nn.Sequential: 包含两个卷积层、两个批归一化层和两个 ReLU 激活函数的序列模块。
        """
        return nn.Sequential(
            nn.Conv2d(
                in_channels=in_channels,
                out_channels=features,
                kernel_size=3,
                padding=1,  #不同于原文，设置 padding=1 使卷积前后的宽高保持不变
                bias=False,
            ),
            nn.BatchNorm2d(num_features=features),  # 批归一化层
            nn.ReLU(inplace=True),  # ReLU 激活函数
            nn.Conv2d(
                in_channels=features,
                out_channels=features,
                kernel_size=3,
                padding=1,
                bias=False,
            ),
            nn.BatchNorm2d(num_features=features),  # 批归一化层
            nn.ReLU(inplace=True),  # ReLU 激活函数
        )

标准卷积块的参数信息：
in_channels：是卷积块的输入通道数，int型；
features：是卷积块的输出通道数，int型；
name：是块的名称，方便调试，string型。
本函数返回的是经过两组卷积、Batchnormal、ReLu操作后的结果。
接下来搭建编码器部分：
先说明网络中的参数:
in_channels : 输入图像的通道数（例如，灰度图为 1，RGB 图为 3），int型。
out_channels : 输出图像的通道数（例如，二分类任务为 1，多分类任务为类别数），int型。
init_features : 初始特征通道数（决定网络的宽度），int型。

        features = init_features  #定义编码器的通道数

        # 编码器部分（下采样路径）
        # 第一个编码器块
        self

最低0.47元/天解锁文章