【Python/Pytorch - 网络模型】-- 手把手搭建Swin-Transformer模型 - MLP模型

最新推荐文章于 2025-02-26 21:25:49 发布

电科_银尘

最新推荐文章于 2025-02-26 21:25:49 发布

阅读量874

点赞数 25

分类专栏：人工智能网络模型文章标签： python pytorch transformer

本文链接：https://blog.youkuaiyun.com/wangshuqian1314/article/details/145888498

版权

人工智能网络模型专栏收录该内容

11 篇文章

订阅专栏

在这里插入图片描述
文章目录

文章目录

写在前面
MLP模块
论文下载

写在前面

根据博主文章，学习Transformer、Vit、SwinTransformer、SwinUnetr原理，原理博主文章已讲解较为详细，本文主要从代码角度学习各个模块的原理。

图解Vit 1：Vision Transformer——图像与Transformer基础 - 知乎
 图解Vit 2：Vision Transformer——视觉问题中的注意力机制 - 知乎
 图解Vit 3：Vision Transformer——ViT模型全流程拆解（Layer Normalization, Position Embedding） - 知乎
 图解+代码 Swin Transformer 1: W-MSA和Patch Merging - 知乎
 图解+代码 Swin Transformer 2: SW-MSA(Shifted Window Multi-head Self Attention) - 知乎

MLP模块

# Copyright (c) MONAI Consortium
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#     http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import annotations

import torch.nn as nn

from monai.networks.layers import get_act_layer
from monai.utils import look_up_option

SUPPORTED_DROPOUT_MODE = {"vit", "swin"}


class MLPBlock(nn.Module):
    """
    A multi-layer perceptron block, based on: "Dosovitskiy et al.,
    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>"
    """

    def __init__(
        self, hidden_size: int, mlp_dim: int, dropout_rate: float = 0.0, act: tuple | str = "GELU", dropout_mode="vit"
    ) -> None:
        """
        Args:
            hidden_size: dimension of hidden layer.
            mlp_dim: dimension of feedforward layer. If 0, `hidden_size` will be used.
            dropout_rate: fraction of the input units to drop.
            act: activation type and arguments. Defaults to GELU. Also supports "GEGLU" and others.
            dropout_mode: dropout mode, can be "vit" or "swin".
                "vit" mode uses two dropout instances as implemented in
                https://github.com/google-research/vision_transformer/blob/main/vit_jax/models.py#L87
                "swin" corresponds to one instance as implemented in
                https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_mlp.py#L23


        """

        super().__init__()

        if not (0 <= dropout_rate <= 1):
            raise ValueError("dropout_rate should be between 0 and 1.")
        mlp_dim = mlp_dim or hidden_size
        self.linear1 = nn.Linear(hidden_size, mlp_dim) if act != "GEGLU" else nn.Linear(hidden_size, mlp_dim * 2)
        self.linear2 = nn.Linear(mlp_dim, hidden_size)
        self.fn = get_act_layer(act)
        self.drop1 = nn.Dropout(dropout_rate)
        dropout_opt = look_up_option(dropout_mode, SUPPORTED_DROPOUT_MODE)
        if dropout_opt == "vit":
            self.drop2 = nn.Dropout(dropout_rate)
        elif dropout_opt == "swin":
            self.drop2 = self.drop1
        else:
            raise ValueError(f"dropout_mode should be one of {SUPPORTED_DROPOUT_MODE}")

    def forward(self, x):
        x = self.fn(self.linear1(x))
        x = self.drop1(x)
        x = self.linear2(x)
        x = self.drop2(x)
        return x

第一点

class  MLPBlock(nn.Module):
def __init__( ):
super().__init__()

def forward(self, x):
return x

定义了一个MLPBlock类，该类继承父类nn.Module。def init():该函数是该类的构造函数，用于初始化实例。super().init():调用了父类nn.Module的构造函数，确保父类的初始化逻辑被执行。
def forward(self, x) 是该类的核心函数，定义了输入数据如何通过该层进行前向传播。

第二点

self.linear1 = nn.Linear(hidden_size, mlp_dim) if act != "GEGLU" else nn.Linear(hidden_size, mlp_dim * 2)

nn.Linear() 是一个线性层（全连接层），用于将输入张量从一个维度映射到另一个维度。
它有两个参数 in_features：输入张量的特征维度（hidden_size）。out_features：输出张量的特征维度（mlp_dim 或 mlp_dim * 2）。

第三点

self.drop1 = nn.Dropout(dropout_rate)

Dropout 是一种常用的正则化技术，用于防止神经网络的过拟合。其工作原理是：在训练过程中，随机将输入张量中的某些元素设置为零（即“丢弃”这些元素），同时按照比例缩放剩余元素的值。Dropout 能够减少神经元之间的共适应现象，从而提高模型的泛化能力。
nn.Dropout(dropout_rate)是 PyTorch 中的一个类，用于实现 Dropout 层。dropout_rate 是一个浮点数，表示丢弃概率。例如，如果 dropout_rate 设为 0.5，意味着在训练过程中，每个元素有 50% 的概率被丢弃。

更详细的理解：
在训练过程中，Dropout 会随机将输入张量中的某些元素设置为零（即“丢弃”这些元素），同时按照比例缩放剩余元素的值。这有助于减少过拟合，提高模型的泛化能力。以下是对这一过程的更详细解释：

工作原理

丢弃元素：
在训练时，Dropout 层会随机选择输入张量中的一部分元素，并将它们设置为零。这个过程模拟了神经元的随机失活。
丢弃的概率由 dropout_rate 参数控制。例如，如果 dropout_rate 设为 0.5，则每个元素有 50% 的概率被丢弃。
缩放剩余元素：
被保留下来的元素的值会被放大（缩放），以保持张量的整体期望值不变。缩放因子为 1/(1 - dropout_rate)。
例如，如果 dropout_rate 为 0.5，每个被保留的元素会乘以 2，以补偿丢弃元素所导致的和的减少。

数学表示假设输入张量为
x，形状为 [batch_size, num_features]，dropout_rate 为 p。Dropout 的输出 y 可以表示为：
Python复制
mask = (torch.rand_like(x) > p).float() # 随机生成掩码
y = x * mask / (1 - p) # 丢弃并缩放
训练与推理阶段的行为差异

训练阶段：
使用 Dropout，随机丢弃元素，以防止过拟合。
梯度仅通过未被丢弃的元素传播。
推理阶段：
Dropout 被关闭，所有元素都被保留，不进行缩放。
这样可以确保输出是稳定的，并且不会丢失信息。

示例代码以下代码展示了如何在 PyTorch 中使用 Dropout 层：

import torch
import torch.nn as nn

# 创建一个 Dropout 层，丢弃概率为 0.5
dropout = nn.Dropout(p=0.5)

# 输入张量
input_tensor = torch.randn(2, 4)  # 形状为 (2, 4)

# 在训练模式下应用 Dropout
output_tensor = dropout(input_tensor)

print("Input Tensor:")
print(input_tensor)
print("\nOutput Tensor:")
print(output_tensor)
输出示例
Input Tensor:
tensor([[-0.2419,  0.6929,  1.4928, -0.6680],
        [-0.4224, -0.8534, -0.2863,  0.9220]])

Output Tensor:
tensor([[ 0.0000,  1.3858,  2.9856, -0.0000],
        [-0.8448, -0.0000, -0.0000,  1.8440]])
在输出张量中，一些元素被设置为零，而被保留的元素被放大了两倍（因为 p=0.5）。

作用

减少过拟合：通过随机丢弃神经元，模型无法完全依赖某些特定的神经元，从而提高泛化能力。
增强鲁棒性：模型能够更好地处理输入数据中的噪声和不确定性。

第四点

激活层：激活层是神经网络中不可或缺的组成部分，其主要作用是为神经网络引入非线性。在没有激活层的情况下，神经网络仅由线性变换组成，无论网络有多深，其整体函数仍是一个线性函数。激活层的存在使得神经网络能够学习和表示复杂的非线性模式，从而能够解决复杂的分类和回归问题。
1.ReLU (Rectified Linear Unit)