VGG论文阅读笔记

本文详细解读了VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION(VGG)论文,探讨了VGG网络的特点,包括使用小卷积核和1x1卷积层增加非线性,以及网络架构如224x224输入、3x3卷积、2x2最大池化等。此外,还介绍了训练策略,如初始化、批量大小和学习率调整,并讨论了测试阶段的处理方法和评估结果,证实了网络深度对性能的影响以及多尺度、多裁剪评估的有效性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION——VGG

论文作者 KarenSimonyan & Andrew Zisserman

原文 点击打开链接


I.  Features网络新特点

1.    Very deep networks, large scale image recognition更深,数据集更大

2.    Using small receptive field 更小的卷积核

a stack of two 3×3 conv.layers (without spatial pooling in between) has an effective receptive field of5×5; three such layers have a 7 × 7 effective receptivefield.

decreases the number of parameters :  e.g.  72C2 = 49C2-->3(32C2)= 27C2

2.  The incorporation of 1 × 1 conv. layers is away to increase the nonlinearity of the decision

function without affecting thereceptivefields of the conv. layers发现用1 × 1卷积层可以提升效果

 

II. Architecture网络架构

● input224×224,subtracting the mean RGB value 输入图像224×224,只有一个channel, RGB取均值

● smallreceptive field: 3 × 3;

● 1 ×1 convolution filters are used;

● convstride:1 (padding=1)

ve max-pooling layers: 2 × 2 pixel window,with stride 2.池化层2 × 2,不重叠

● Theconfiguration of the fully connected layers is the same in all networks 每个网络后都跟3个全连接层

● Allhidden layers are equipped with ReLU, none (except one) contain Local Response Normalization (LRN) 都用ReLU,都不用LRN(测试结果表明加LRN效果没有明显提升)

AlexNet的对比:模仿AlexNet5conv+3fc结构,但减小卷积核的大小,增加层数,有更多的Channel数(下图AlexNet

IIITraining

1.    the training is carried out by optimizing the multinomial logistic regression objective using mini-batch gradient descent (based on back-propagation (LeCunet al., 1989)) with momentum. (这里还不太理解)多项式回归,梯度下降,BP

batch size: 256  momentum:0.9  weight decay: L2=5·10

dropout regularization for therst two fully-connected layers前两层有Dropout

learning rate was initially set to 102, and then decreased by a factorof 10 when the validation set accuracy stopped improving

2.    initialization权值初始化

began with training the configuration A , shallow enough to betrained with random initialization. When training deeper architectures, weinitialized the first four convolutional layers and the last three fully connected layerswith the layers of net A

先随机初始化A(因为A相对浅,可以随机赋初值),用A训练后的值初始化更深的网络

it is possible to initialize the weights without pre-training by usingthe random initialization procedure

不用预训练参数赋初值也是可能的(论文中没有展开叙述)

3.    Trainingimage size (S --- the smallest side of an isotopically-rescaled training image)

输入图像先经过预处理缩放,有single-scale和multi-scale两种

1) single-scaletraining S=256/384

2) multi-scaletraining S∈[Smin,Smax]

每次数据输入的时候,每张图片被重新缩放,缩放的短边S随机从[256,512]中选择一个。


IV.  Testing

input image rescaled---network applied---classscore map spatially averaged

multi-crop evaluation每张图像可以剪裁出多个224×224图像


V.  Evaluation测试结果

A与A-LRN差别不大----加LRN没有明显作用

层越深效果越好,到VGG19在现有数据集训练效果达到饱和

C优于B----加conv1可以提升效果

D优于C----conv3比conv1效果好

multi-scale优于single scale

multi-crop & dense 效果最好

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值