VGG-传统神经网络之巅峰

最新推荐文章于 2025-06-16 01:03:41 发布

原创最新推荐文章于 2025-06-16 01:03:41 发布 · 6.1k 阅读

31 ·

CC 4.0 BY-SA版权

文章标签：

#神经网络 #深度学习 #计算机视觉

目标检测专栏收录该内容

24 篇文章

订阅专栏

本文详细介绍了VGGNet，一种由牛津大学提出的深度卷积神经网络，它在ILSVRC竞赛中的优异表现和3x3卷积核心。VGG16结构、模型特性（如小卷积核、堆叠设计）及如何通过预训练加速训练过程都被深入剖析，并提供了PyTorch实现代码。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. VGG简介

VGGNet是由牛津大学视觉几何小组（Visual Geometry Group, VGG）提出的一种深层卷积网络结构，他们以7.32%的错误率赢得了2014年ILSVRC分类任务的亚军（冠军由GoogLeNet以6.65%的错误率夺得）和25.32%的错误率夺得定位任务（Localization）的第一名（GoogLeNet错误率为26.44%），网络名称VGGNet取自该小组名缩写。VGGNet是首批把图像分类的错误率降低到10%以内模型，同时该网络所采用的 $3×33\times3$ 卷积核的思想是后来许多模型的基础，该模型发表在2015年国际学习表征会议（International Conference On Learning Representations, ICLR）后至今被引用的次数已经超过1万4千余次。

2. 模型结构

在这里插入图片描述
在原论文中的VGGNet包含了6个版本的演进，分别对应VGG11、VGG11-LRN、VGG13、VGG16-1、VGG16-3和VGG19，不同的后缀数值表示不同的网络层数（VGG11-LRN表示在第一层中采用了LRN的VGG11，VGG16-1表示后三组卷积块中最后一层卷积采用卷积核尺寸为 $1×11\times1$ ，相应的VGG16-3表示卷积核尺寸为 $3×33\times3$ ），本节介绍的VGG16为VGG16-3。图中的VGG16体现了VGGNet的核心思路，使用 $3×33\times3$ 的卷积组合代替大尺寸的卷积（2个 $3×3卷积即可与3\times3卷积即可与$ $5×55\times5$ 卷积拥有相同的感受视野），网络参数设置如表所示。

网络层	输入尺寸	核尺寸	输出尺寸	参数个数
卷积层 $C_{11}$	$224×224×3224\times224\times3$	$3×3×64/13\times3\times64/1$	$224×224×64224\times224\times64$	$(3×3×3+1)×64(3\times3\times3+1)\times64$
卷积层 $C_{12}$	$224×224×64224\times224\times64$	$3×3×64/13\times3\times64/1$	$224×224×64224\times224\times64$	$(3×3×64+1)×64(3\times3\times64+1)\times64$
下采样层 $S_{max1}$	$224×224×64224\times224\times64$	$2×2/22\times2/2$	$112×112×64112\times112\times64$	$0$
卷积层 $C_{21}$	$112×112×64112\times112\times64$	$3×3×128/13\times3\times128/1$	$112×112×128112\times112\times128$	$(3×3×64+1)×128(3\times3\times64+1)\times128$
卷积层 $C_{22}$	$112×112×128112\times112\times128$	$3×3×128/13\times3\times128/1$	$112×112×128112\times112\times128$	$(3×3×128+1)×128(3\times3\times128+1)\times128$
下采样层 $S_{max2}$	$112×112×128112\times112\times128$	$2×2/22\times2/2$	$56×56×12856\times56\times128$	$0$
卷积层 $C_{31}$	$56×56×12856\times56\times128$	$3×3×256/13\times3\times256/1$	$56×56×25656\times56\times256$	$(3×3×128+1)×256(3\times3\times128+1)\times256$
卷积层 $C_{32}$	$56×56×25656\times56\times256$	$3×3×256/13\times3\times256/1$	$56×56×25656\times56\times256$	$(3×3×256+1)×256(3\times3\times256+1)\times256$
卷积层 $C_{33}$	$56×56×25656\times56\times256$	$3×3×256/13\times3\times256/1$	$56×56×25656\times56\times256$	$(3×3×256+1)×256(3\times3\times256+1)\times256$
下采样层 $S_{max3}$	$56×56×25656\times56\times256$	$2×2/22\times2/2$	$28×28×25628\times28\times256$	$0$
卷积层 $C_{41}$	$28×28×25628\times28\times256$	$3×3×512/13\times3\times512/1$	$28×28×51228\times28\times512$	$(3×3×256+1)×512(3\times3\times256+1)\times512$
卷积层 $C_{42}$	$28×28×51228\times28\times512$	$3×3×512/13\times3\times512/1$	$28×28×51228\times28\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
卷积层 $C_{43}$	$28×28×51228\times28\times512$	$3×3×512/13\times3\times512/1$	$28×28×51228\times28\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
下采样层 $S_{max4}$	$28×28×51228\times28\times512$	$2×2/22\times2/2$	$14×14×51214\times14\times512$	$0$
卷积层 $C_{51}$	$14×14×51214\times14\times512$	$3×3×512/13\times3\times512/1$	$14×14×51214\times14\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
卷积层 $C_{52}$	$14×14×51214\times14\times512$	$3×3×512/13\times3\times512/1$	$14×14×51214\times14\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
卷积层 $C_{53}$	$14×14×51214\times14\times512$	$3×3×512/13\times3\times512/1$	$14×14×51214\times14\times512$	$(3×3×512+1)×512(3\times3\times512+1)\times512$
下采样层 $S_{max5}$	$14×14×51214\times14\times512$	$2×2/22\times2/2$	$7×7×5127\times7\times512$	$0$
全连接层 $FC_{1}$	$7×7×5127\times7\times512$	$(7×7×512)×4096(7\times7\times512)\times4096$	$1×40961\times4096$	$(7×7×512+1)×4096(7\times7\times512+1)\times4096$
全连接层 $FC_{2}$	$1×40961\times4096$	$4096×40964096\times4096$	$1×40961\times4096$	$(4096+1)×4096(4096+1)\times4096$
全连接层 $FC_{3}$	$1×40961\times4096$	$4096×10004096\times1000$	$1×10001\times1000$	$(4096+1)×1000(4096+1)\times1000$

在这里插入图片描述

3. 模型特性

整个网络都使用了同样大小的卷积核尺寸 $3×33\times3$ 和最大池化尺寸 $2×22\times2$ 。
$1×11\times1$ 卷积的意义主要在于线性变换，而输入通道数和输出通道数不变，没有发生降维。
两个 $3×33\times3$ 的卷积层串联相当于1个 $5×55\times5$ 的卷积层，感受野大小为 $5×55\times5$ 。同样地，3个 $3×33\times3$ 的卷积层串联的效果则相当于1个 $7×77\times7$ 的卷积层。这样的连接方式使得网络参数量更小，而且多层的激活函数令网络对特征的学习能力更强。
VGGNet在训练时有一个小技巧，先训练浅层的的简单网络VGG11，再复用VGG11的权重来初始化VGG13，如此反复训练并初始化VGG19，能够使训练时收敛的速度更快。
在训练过程中使用多尺度的变换对原始数据做数据增强，使得模型不易过拟合。

4. VGG16模型PyTorch代码实现

import torch
import torch.nn as nn
from torchvision.models.utils import load_state_dict_from_url


#--------------------------------------#
#   VGG16的结构
#--------------------------------------#
class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=True):
        super(VGG, self).__init__()
        self.features = features
        #--------------------------------------#
        #   平均池化到7x7大小
        #--------------------------------------#
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        #--------------------------------------#
        #   分类部分
        #--------------------------------------#
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        #--------------------------------------#
        #   特征提取
        #--------------------------------------#
        x = self.features(x)
        #--------------------------------------#
        #   平均池化
        #--------------------------------------#
        x = self.avgpool(x)
        #--------------------------------------#
        #   平铺后
        #--------------------------------------#
        x = torch.flatten(x, 1)
        #--------------------------------------#
        #   分类部分
        #--------------------------------------#
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

'''
假设输入图像为(600, 600, 3)，随着cfg的循环，特征层变化如下：
600,600,3 -> 600,600,64 -> 600,600,64 -> 300,300,64 -> 300,300,128 -> 300,300,128 -> 150,150,128 -> 150,150,256 -> 150,150,256 -> 150,150,256 
-> 75,75,256 -> 75,75,512 -> 75,75,512 -> 75,75,512 -> 37,37,512 ->  37,37,512 -> 37,37,512 -> 37,37,512
到cfg结束，我们获得了一个37,37,512的特征层
'''

cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M']

#--------------------------------------#
#   特征提取部分
#--------------------------------------#
def make_layers(cfg, batch_norm=False):
    layers = []
    in_channels = 3
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)

def decom_vgg16(pretrained = False):
    model = VGG(make_layers(cfg))
    if pretrained:
        state_dict = load_state_dict_from_url("https://download.pytorch.org/models/vgg16-397923af.pth", model_dir="./model_data")
        model.load_state_dict(state_dict)
    #----------------------------------------------------------------------------#
    #   获取特征提取部分，最终获得一个37,37,1024的特征层
    #----------------------------------------------------------------------------#
    features    = list(model.features)[:30]
    #----------------------------------------------------------------------------#
    #   获取分类部分，需要除去Dropout部分
    #----------------------------------------------------------------------------#
    classifier  = list(model.classifier)
    del classifier[6]
    del classifier[5]
    del classifier[2]

    features    = nn.Sequential(*features)
    classifier  = nn.Sequential(*classifier)
    return features, classifier