从零实现GoogLeNet/InceptionNet架构详解-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00400/article/details/148441331

从零实现GoogLeNet/InceptionNet架构详解

Machine-Learning-Collection A resource for learning about Machine learning & Deep Learning 项目地址: https://gitcode.com/gh_mirrors/ma/Machine-Learning-Collection

引言

GoogLeNet（也称为InceptionNet）是Google团队在2014年提出的深度卷积神经网络架构，在ImageNet竞赛中取得了优异成绩。本文将深入解析如何从零开始实现这一经典网络架构，帮助读者理解其核心设计理念和实现细节。

GoogLeNet架构概述

GoogLeNet的主要创新点在于引入了Inception模块，这种模块能够并行处理不同尺度的特征，同时控制计算复杂度。整个网络由多个Inception模块堆叠而成，深度达到22层（包括池化层）。

核心组件实现

基础卷积块

class conv_block(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(conv_block, self).__init__()
        self.relu = nn.ReLU()
        self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
        self.batchnorm = nn.BatchNorm2d(out_channels)

    def forward(self, x):
        return self.relu(self.batchnorm(self.conv(x)))

这个基础构建块包含卷积层、批归一化层和ReLU激活函数，是网络中最基本的单元。批归一化的加入有助于缓解梯度消失问题，使网络能够训练得更深。

Inception模块

Inception模块是GoogLeNet的核心创新，它通过四种并行的卷积操作来提取不同尺度的特征：

class Inception_block(nn.Module):
    def __init__(
        self, in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_1x1pool
    ):
        super(Inception_block, self).__init__()
        # 1x1卷积分支
        self.branch1 = conv_block(in_channels, out_1x1, kernel_size=1)
        
        # 1x1卷积降维后接3x3卷积
        self.branch2 = nn.Sequential(
            conv_block(in_channels, red_3x3, kernel_size=1),
            conv_block(red_3x3, out_3x3, kernel_size=(3, 3), padding=1),
        )
        
        # 1x1卷积降维后接5x5卷积
        self.branch3 = nn.Sequential(
            conv_block(in_channels, red_5x5, kernel_size=1),
            conv_block(red_5x5, out_5x5, kernel_size=5, padding=2),
        )
        
        # 3x3最大池化后接1x1卷积
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            conv_block(in_channels, out_1x1pool, kernel_size=1),
        )

    def forward(self, x):
        return torch.cat(
            [self.branch1(x), self.branch2(x), self.branch3(x), self.branch4(x)], 1
        )

这种设计有三大优势：

多尺度特征提取：同时使用1x1、3x3、5x5卷积核
计算效率：通过1x1卷积先降维减少计算量
信息丰富性：池化操作保留原始特征信息

辅助分类器

GoogLeNet引入了辅助分类器来解决深度网络梯度消失问题：

class InceptionAux(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(InceptionAux, self).__init__()
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(p=0.7)
        self.pool = nn.AvgPool2d(kernel_size=5, stride=3)
        self.conv = conv_block(in_channels, 128, kernel_size=1)
        self.fc1 = nn.Linear(2048, 1024)
        self.fc2 = nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.pool(x)
        x = self.conv(x)
        x = x.reshape(x.shape[0], -1)
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

这些辅助分类器在训练阶段提供额外的梯度信号，帮助底层参数更新，但在推理阶段会被移除。

完整网络架构

GoogLeNet的主体结构如下：

初始卷积和池化层
多个Inception模块堆叠
全局平均池化
全连接分类层

class GoogLeNet(nn.Module):
    def __init__(self, aux_logits=True, num_classes=1000):
        super(GoogLeNet, self).__init__()
        # 网络结构定义...
        
    def forward(self, x):
        # 前向传播过程...
        if self.aux_logits and self.training:
            return aux1, aux2, x
        else:
            return x

网络在训练时会输出三个结果：两个辅助分类器输出和主分类器输出，而在推理时只返回主分类器结果。

实现细节与技巧

参数初始化：虽然代码中没有显式展示，但实际应用中应正确初始化卷积层和全连接层的权重。
批归一化：每个卷积层后都跟随批归一化，这对训练深度网络至关重要。
Dropout：在全连接层前使用Dropout(p=0.4)防止过拟合。
输入尺寸：网络设计输入为224x224的RGB图像，这是ImageNet的标准尺寸。
辅助分类器控制：通过aux_logits参数可以灵活控制是否使用辅助分类器。

验证网络输出

代码最后包含了一个简单的验证测试：

if __name__ == "__main__":
    BATCH_SIZE = 5
    x = torch.randn(BATCH_SIZE, 3, 224, 224)
    model = GoogLeNet(aux_logits=True, num_classes=1000)
    print(model(x)[2].shape)
    assert model(x)[2].shape == torch.Size([BATCH_SIZE, 1000])

这段代码验证了网络能够正确处理输入数据，并输出预期的形状（批量大小×类别数）。