Vision Transformer（vit）的主干

O_o381

已于 2024-11-29 19:29:46 修改

阅读量1.3k

点赞数 43

文章标签： python transformer pytorch 深度学习

于 2024-11-29 19:14:39 首次发布

本文链接：https://blog.youkuaiyun.com/qq_61706514/article/details/144142288

版权

图解：

代码：

class VisionTransformer(nn.Module):
    def __init__(self, img_size=224, patch_size=16, in_c=3, num_classes=1000,
                 embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0, qkv_bias=True,
                 qk_scale=None, representation_size=None, distilled=False, drop_ratio=0.,
                 attn_drop_ratio=0., drop_path_ratio=0., embed_layer=PatchEmbed, norm_layer=None,
                 act_layer=None):
        """
        Args:
            img_size (int, tuple): input image size
#输入图像的大小，通常是 224 或其他标准尺寸
            patch_size (int, tuple): patch size
#每个块（patch）的大小，例如 16x16
            in_c (int): number of input channels
#输入图像的通道数，RGB 图像是 3
            num_classes (int): number of classes for classification head
#最终分类的类别数，默认 1000 类
            embed_dim (int): embedding dimension
#嵌入维度，即每个 patch 被映射到的向量的维度，默认是 768
            depth (int): depth of transformer
#Transformer 的深度，即堆叠的块（Block）数量。
            num_heads (int): number of attention heads
#注意力头的数量，默认设为 12
            mlp_ratio (int): ratio of mlp hidden dim to embedding dim
# MLP 隐藏层的维度与嵌入维度的比例。
            qkv_bias (bool): enable bias for qkv if True
#是否为 QKV（查询、键、值）矩阵添加偏置
            qk_scale (float): override default qk scale of head_dim ** -0.5 if set
#如果设定，将会覆盖默认的 qk 缩放因子
            representation_size (Optional[int]): enable and set representation layer (pre-logits) to this value if set
#如果设置了这个值，将会有一个表示层（pre-logits）
            distilled (bool): model includes a distillation token and head as in DeiT models
#vit中可以不管这个参数
            drop_ratio (float): dropout rate
# Dropout 的比例
            attn_drop_ratio (float): attention dropout rate
#注意力层的 Dropout 比例
            drop_path_ratio (float): stochastic depth rate
#droppath比例
            embed_layer (nn.Module): patch embedding layer
#用于嵌入图像的层，默认使用 PatchEmbed
            norm_layer: (nn.Module): normalizati

最低0.47元/天解锁文章