【论文阅读】Augmented Transformer network for MRI brain tumor segmentation

Zhang M, Liu D, Sun Q, et al. Augmented transformer network for MRI brain tumor segmentation[J]. Journal of King Saud University-Computer and Information Sciences, 2024: 101917. [开源]
IF 6.9 SCIE JCI 1.58 Q1 计算机科学2区

【核心思想】

本文提出了一种新型的MRI脑肿瘤分割方法,称为增强型transformer 网络(AugTransU-Net),旨在解决现有transformer 相关的U-Net模型在捕获长程依赖和全局背景方面的局限性。本文的创新之处在于构建了改进的增强型transformer 模块,这些模块结合了标准transformer 块中的增强短路(Augmented Shortcuts),被策略性地放置在分割网络的瓶颈处,以保持特征多样性并增强特征交互和多样性。

【方法】

image-20240122153749527

  1. 架构设计:AugTransU-Net利用层次化的3D U-Net作为骨干网络,引入了配对注意力模块(paired attention modules)到编码器和解码器层,同时利用改进的transformer 层通过增强短路(Augmented Shortcuts)在瓶颈处。

  2. 特征增强:通过增强短路(Augmented Shortcuts),可以在多头自注意力块中增加额外的分支,以保持特征多样性和增强特征表示。传统的Shortcuts连接只是将输入特征复制到输出,这限制了其增强特征多样性的能力。具有增强Shortcuts方式的 Transformer 模型已被用来避免特征崩溃并产生更多样化的特征。增强短路的公式为:

    Aug ⁡ − S = ∑ i = 1 T T l i ( Z l ; θ l i ) , l ∈ [ 1 , 2 , … , L ] \operatorname{Aug}_{-} S=\sum_{i=1}^{T} T_{l i}\left(Z_{l} ; \theta_{l i}\right), l \in[1,2, \ldots, L] AugS=i=1TTli(Zl;θ

### Multi-Level Attention Network for Retinal Vessel Segmentation Retinal vessel segmentation plays a crucial role in diagnosing various eye-related diseases, such as diabetic retinopathy and glaucoma. A multi-level attention network is one of the advanced deep learning approaches that enhance the accuracy and precision of vessel segmentation by leveraging attention mechanisms at multiple levels of the neural network architecture. #### Architecture Overview The architecture typically follows an encoder-decoder structure with additional attention modules strategically placed to emphasize relevant features during decoding. This design allows the model to focus on salient regions while suppressing irrelevant background information, which is particularly important in retinal images where vessels are thin and may blend into the background. 1. **Encoder Path**: The encoder consists of convolutional layers followed by pooling operations to extract hierarchical features from the input image. Commonly used architectures like U-Net or ResNet can serve as the backbone for feature extraction. 2. **Attention Modules**: Multiple attention blocks are integrated at different levels (shallow, intermediate, and deep) within the decoder path. These blocks compute attention maps that dynamically weigh the importance of each feature map channel and spatial location. For instance: - **Channel Attention**: Computes the significance of each channel across the entire feature map. - **Spatial Attention**: Focuses on critical spatial locations within each channel. 3. **Decoder Path**: As the feature maps pass through upsampling layers, attention-weighted features from the encoder are combined to refine the segmentation output progressively. Skip connections help preserve spatial details lost during encoding. 4. **Final Segmentation Layer**: A 1x1 convolution layer is applied at the end to produce the final binary segmentation mask indicating vessel and non-vessel pixels. #### Implementation Details Implementing a multi-level attention network involves defining both the encoder and attention-augmented decoder parts. Below is a simplified example using PyTorch: ```python import torch import torch.nn as nn import torch.nn.functional as F class AttentionBlock(nn.Module): def __init__(self, in_channels): super(AttentionBlock, self).__init__() self.conv1 = nn.Conv2d(in_channels, in_channels // 8, kernel_size=1) self.conv2 = nn.Conv2d(in_channels // 8, 1, kernel_size=1) def forward(self, x): # Channel attention att = F.relu(self.conv1(x)) att = torch.sigmoid(self.conv2(att)) return x * att class MultiLevelAttentionUNet(nn.Module): def __init__(self, in_channels=3, out_channels=1): super(MultiLevelAttentionUNet, self).__init__() self.encoder = nn.Sequential( nn.Conv2d(in_channels, 64, kernel_size=3, padding=1), nn.ReLU(), nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2) ) self.attention_blocks = nn.ModuleList([ AttentionBlock(64), AttentionBlock(128), AttentionBlock(256) ]) self.decoder = nn.Sequential( nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2), nn.ReLU(), nn.Conv2d(128 + 64, 128, kernel_size=3, padding=1), nn.ReLU(), nn.ConvTranspose2d(128, out_channels, kernel_size=2, stride=2), nn.Sigmoid() ) def forward(self, x): enc_features = [] for layer in self.encoder: x = layer(x) if isinstance(layer, nn.MaxPool2d): enc_features.append(x) x = self.attention_blocks[0](x) dec_input = torch.cat((x, enc_features[-1]), dim=1) x = self.decoder(dec_input) return x ``` This implementation provides a foundational framework that can be further enhanced by adding more complex attention mechanisms or integrating pre-trained models for better performance. Additionally, training should involve loss functions tailored for segmentation tasks, such as Dice loss or Binary Cross-Entropy Loss.
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值