复现EfficientNet

 Efficient-B0的总体结构,其中Conv=(Conv+BN_Swish),其中,如果一个module重复2次或者2次以上,那么stride=2仅仅是重复中第1次时候的参数,后续的重复module的stride=1

+-------+--------------------------+------------+--------------+---------+--------+
| Stage |          Module          | input_size | out_channels | repeats | stride |
+-------+--------------------------+------------+--------------+---------+--------+
|   1   |        Conv(3x3)         |  224x224   |      32      |    1    |   2    |
|   2   |       MBConv1,k3x3       |  112x112   |      16      |    1    |   1    |
|   3   |       MBConv6,k3x3       |  112x112   |      24      |    2    |   2    |
|   4   |       MBConv6,k5x5       |   56x56    |      40      |    2    |   2    |
|   5   |       MBConv6,k3x3       |   28x28    |      80      |    3    |   2    |
|   6   |       MBConv6,k5x5       |   14x14    |     112      |    3    |   1    |
|   7   |       MBConv6,k5x5       |   14x14    |     192      |    4    |   2    |
|   8   |       MBConv6,k3x3       |    7x7     |     320      |    1    |   1    |
|   9   | Conv(1x1) & Pooling & FC |    7x7     |     1280     |    1    |  None  |
+-------+--------------------------+------------+--------------+---------+--------+

 MBConv Module

MBConv6:第1个升维的1x1卷积层,它的卷积核个数是输入矩阵的channel的6倍

当MBConv1时,不使用第1个升维的1x1卷积层,即Stage2中的MBConv结构中没有第1个升维的1x1的卷积层,

shortcut:当且仅当输入MBConv的矩阵与输出矩阵的shape相同时才存在

MBConv: MobilenetV3_Conv
image ---> Conv(1x1,cin->cin*n) --> Depthwise --> SE --> ConvBN(1x1,cin*n->cout)-->dropout--> + --->output
        |       升维                                           降维                            |
        |                                                                                     |
        ---------------------------------------------------------------------------------------
其中升维之后进行BN+Swish, Depthwise之后进行BN+Swish,降维之后,只有BN

配置参数

import torch 
import torch.nn as nn

from math import ceil

base_model=[
    # expand_ratio, channels, repeats, stride, kernel_size
    [1, 16, 1, 1, 3],
    [6, 24, 2, 2, 3],
    [6, 40, 2, 2, 5],
    [6, 80, 3, 2, 3],
    [6, 112,3, 1, 5],
    [6, 192,4, 2, 5],
    [6, 320,1, 1, 3],
]

phi_values = {
            
    # tuple of :(phi_value, resolution, drop_rate)
    "b0": (0, 224, 0.2 ), # alpha, beta, gamma, depth  = alphi*phi
    "b1": (0.5, 240, 0.2), 
    "b2": (1, 260, 0.3),
    "b3": (2, 300, 0.3),
    "b4": (3, 380, 0.4),
    "b5": (4, 456, 0.4),
    "b6": (5, 528, 0.5),
    "b7": (6, 600, 0.5),
}

卷积+BN+Swish

# CBS :Conv+Bn+Silu
class CNNBlock(nn.Module):
    def __init__(self, int_channels, out_channels, kernel_size, stride, padding, groups=1):
        super(CNNBlock,self).__init__()
        self.cnn = nn.Conv2d(
            int_channels,
            out_channels,
            kernel_size,
            stride, 
            padding,
            groups = groups,
            bias=False
        )
        # group=1:normal conv,    group=int_channels:Depthwise conv
        self.bn = nn.BatchNorm2d(out_channels)
        self.silu = nn.SiLU() # Silu <-> Swish
    def forward(self,x):
        x = self.cnn(x)
        x = self.bn(x)
        x = self.silu(x)
        return x

SE注意力机制

# SE attention
class SqueezeExcitation(nn.Module):
    def __init__(self,in_channels ,reduced_dim):
        super(SqueezeExcitation,self).__init__()
        self.se = nn.Sequential(
            nn.AdaptiveAvgPool2d(1), # [B, C, H, W] -> [b, C, 1, 1]
            nn.Conv2d(in_channels,reduced_dim, 1),   # [b, r, 1, 1]
            nn.SiLU(),
            nn.Conv2d(reduced_dim,in_channels,1),    # [b, C, 1, 1]
            nn.Sigmoid()
        )
    def forward(self,x):
        return x * self.se(x)       # [B,C,H,W]  * [B,C,1,1]

MBConv

class InvertedResidualBlock(nn.Module):
    def __init__(self,
                 in_channels, out_channels, kernel_size,stride, padding, 
                 expand_ratio, 
                 reduction = 4, # squeeze excitation 
                 survival_prob = 0.8 # for stochastic depth
                 ) :
        super(InvertedResidualBlock, self).__init__()
        self.survival_prob = 0.8
        self.use_residual = in_channels == out_channels and stride == 1
        hidden_dim = in_channels * expand_ratio
        self.expand = in_channels != hidden_dim
        reduced_dim = int(in_channels / reduction)

        # increase dimension, when expand_ratio=1this CNNBlock is discarded.
        if self.expand:
            self.expand_conv = CNNBlock(
                # in_channels,hidden_dim,kernel_size=3, stride = 1, padding = 1,
                in_channels,hidden_dim,kernel_size=1, stride = 1, padding = 0,
            )
        self.conv = nn.Sequential(
            CNNBlock(
                hidden_dim,hidden_dim,kernel_size,stride, padding,groups=hidden_dim
            ),
            SqueezeExcitation(hidden_dim, reduced_dim),
            nn.Conv2d(hidden_dim, out_channels, 1, bias=False),
            nn.BatchNorm2d(out_channels)
        )
    
    # randomly drop (or keep) layers during training
    # Dropout layers are only present in the this modules that are connected using shortcuts
    def stochastic_depth(self,x):
        if not self.training:
            return x
        binary_tensor = torch.rand(x.shape[0],1,1,1, device=x.device) < self.survival_prob
        return torch.div(x, self.survival_prob) * binary_tensor
    
    def forward(self, inputs):
        x = self.expand_conv(inputs) if self.expand else inputs
        if self.use_residual:
            x = self.conv(x)
            x = self.stochastic_depth(x)
            x = x + inputs
            return x
        else :
            return self.conv(x)

EfficientNet 

class EfficientNet(nn.Module):

    def __init__(self, version, num_classes) :
        super(EfficientNet, self).__init__()
        width_factor, depth_factor,dropout_rate = self.calculate_factors(version)
        last_channels = ceil(1280 * width_factor)
        self.pool = nn.AdaptiveAvgPool2d(1)
        self.features = self.create_features(width_factor,depth_factor,last_channels)
        self.classifier = nn.Sequential(
            nn.Dropout(dropout_rate),
            nn.Linear(last_channels, num_classes),
        )

    def calculate_factors(self, version, alpha = 0.2, beta = 1.1):
        phi, res, drop_rate = phi_values[version]
        depth_factor = alpha ** phi
        width_factor = beta ** phi
        return width_factor, depth_factor, drop_rate
    
    def create_features(self, width_factor,depth_factor,last_channels):
        channels = int(32 * width_factor)
        features = [CNNBlock(3, channels, 3, stride=2, padding=1)]

        in_channels = channels

        for expand_ratio, channels, repeats, stride, kernel_size in base_model:
            out_channels = 4 * ceil( int(channels*width_factor) / 4)
            layer_repeats = ceil(repeats * depth_factor)

            for layer in range(layer_repeats):
                features.append(
                    InvertedResidualBlock(
                        in_channels,
                        out_channels,
                        expand_ratio = expand_ratio, 
                        stride = stride if layer == 0 else 1,
                        kernel_size = kernel_size,
                        padding = kernel_size // 2,
                    )
                )
                in_channels = out_channels
        features.append(
            CNNBlock(in_channels,last_channels,kernel_size=1, stride=1, padding=0)
        )
        return nn.Sequential( *features )
    
    def forward(self,x):
        x = self.features(x)
        x = self.pool(x)
        x = self.classifier( x.view( x.shape[0], -1 ) )
        return x

test

def test():
    device = "cuda" if torch.cuda.is_available() else "cpu"
    version = "b0"
    phi, res, drop_rate = phi_values[version]
    num_examples, num_classes = 4, 10
    x = torch.randn( (num_examples, 3, res, res) ).to(device)

    model = EfficientNet(version=version, num_classes=num_classes).to(device)
    print(model(x).shape)
test()

### EfficientNet-B0 的网络结构详解 EfficientNet 是一种基于卷积神经网络 (CNN) 的架构,其基础版本称为 EfficientNet-B0。它由谷歌大脑团队提出,并通过神经架构搜索技术设计而成[^4]。以下是关于 EfficientNet-B0 网络结构的具体分析: #### 1. **MBConv 结构** EfficientNet 使用了一种名为 MBConv(Mobile Inverted Residual Bottleneck Convolution)的核心模块作为构建单元。此模块结合了深度可分离卷积和 squeeze-and-excitation (SE) 模块,显著提高了计算效率并增强了特征表达能力[^2]。 具体来说,MBConv 的主要特点如下: - 输入张量首先被升维到更高的通道数。 - 应用深度可分离卷积以减少计算复杂度。 - SE 模块用于动态调整每个通道的重要性权重。 - 输出再降维回原始通道数或目标维度。 这一过程可以表示为以下伪代码形式: ```python def MBConv(input_tensor, expansion_ratio, output_channels): expanded = expand_input(input_tensor, expansion_ratio) depthwise_convolved = apply_depthwise_conv(expanded) se_output = apply_squeeze_excitation(depthwise_convolved) pointwise_convolved = reduce_to_output(se_output, output_channels) return residual_connection(pointwise_convolved, input_tensor) ``` #### 2. **网络详细结构** EfficientNet-B0 的整体网络结构分为多个阶段,每一阶段包含若干重复的 MBConv 单元。这些阶段的设计考虑到了深度、宽度以及输入图像分辨率之间的平衡关系[^3]。 | 阶段 | 类型 | 重复次数 | 输入尺寸 | 输出尺寸 | |------|--------------|----------|----------------|---------------| | Stage 1 | Standard Conv | 1 | $224 \times 224$ | $112 \times 112$ | | Stage 2 | MBConv ($k=3$, no SE) | 2 | $112 \times 112$ | $56 \times 56$ | | Stage 3 | MBConv ($k=3$, with SE)| 2 | $56 \times 56$ | $28 \times 28$ | | Stage 4 | MBConv ($k=5$, with SE)| 2 | $28 \times 28$ | $14 \times 14$ | | Stage 5 | MBConv ($k=3$, with SE)| 3 | $14 \times 14$ | $14 \times 14$ | | Stage 6 | MBConv ($k=5$, with SE)| 3 | $14 \times 14$ | $7 \times 7$ | | Stage 7 | MBConv ($k=5$, with SE)| 4 | $7 \times 7$ | $7 \times 7$ | 上述表格中的参数定义如下: - $k$: 表示卷积核大小。 - “no SE” 和 “with SE”: 是否应用 Squeeze-and-Excitation 模块。 - 所有阶段均采用步幅操作逐步降低空间分辨率。 #### 3. **复合扩展策略** 为了进一步提升性能,EfficientNet 提出了一个统一的复合扩展方法。这种方法引入了一个单一超参数 $\phi$ 来协调模型的深度、宽度和输入分辨率的增长比例。对于 B0 版本而言,$\phi=1.0$ 对应于初始配置;其他变体则通过对该基线按一定倍率进行缩放实现。 最终得到的 EfficientNet-B0 参数设置如下: - 初始学习率为 0.016。 - 图像分辨率为 $224 \times 224$。 - 总共约 5 百万参数数量。 --- ### 示例代码片段 下面是一个简单的 PyTorch 实现部分展示如何搭建 MBConv 层: ```python import torch.nn as nn class MBConv(nn.Module): def __init__(self, in_channels, out_channels, kernel_size, stride, expansion_ratio): super(MBConv, self).__init__() hidden_dim = int(in_channels * expansion_ratio) layers = [] if expansion_ratio != 1: layers.append( nn.Conv2d(in_channels, hidden_dim, 1, bias=False), nn.BatchNorm2d(hidden_dim), nn.SiLU() ) layers.extend([ nn.Conv2d(hidden_dim, hidden_dim, kernel_size, padding="same", groups=hidden_dim, stride=stride, bias=False), nn.BatchNorm2d(hidden_dim), nn.SiLU(), nn.Conv2d(hidden_dim, out_channels, 1, bias=False), nn.BatchNorm2d(out_channels) ]) self.block = nn.Sequential(*layers) def forward(self, x): return self.block(x) + x if x.shape == self.block(x).shape else self.block(x) ``` ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值