VERY DEEP CONVOLUTIONALNETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

本文深入探讨了VGG网络的不同配置,特别是使用堆叠的小型卷积层和1×1卷积层来增强网络性能的方法。通过减少参数数量,VGG网络能够在不牺牲性能的情况下增加非线性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

#VGG NETWORK

Here I´d like to share my thoughts on VGG network paper, detail some problems I´ve encountered.I will mainly focus on the second part of the paper ,which is CONVNET CONFIGURATION.

CONVNET CONFIGURATION

Here we have the architecture of the VGG net from the original paper:

这里写图片描述

we have a few different architecture according to the image above, they are called VGG13, VGG11, VGG16…etc. The number of the names means how many learnable layers(convolution layer, fully connected layer). the paper has propose 2 approach from my point of view in part 2 to enhance the network performance:
First ,is using a stack of small convolution layers instead of a larger one
Second, using 1 * 1 convolution layers.

A Stack of smaller convolution filters

We can only find 2 different sizes of filters, one is 33, another is 11. The reason why VGG use a stack 33 filters is less parameters without reducing the receptive field.For a 7 * 7 filter, its receptive field is 7 * 7, as for a stack of three 33 filters with stride 1 and padding 1, we can conclude that it still has a 7 * 7 receptive field:

receptive field size before 1st filterreceptive field size after 1st filter
13
receptive field size before 2nd filterreceptive field size after 2nd filter
35
receptive field size before 3rd filterreceptive field size after 3rd filter
57

the formula for computing receptive field is:

Rout = Rin + (kernel size - 1) + 1

Rin means input receptive field size, and Rout receptive field size, but the parameter was reduced, according to the paper:

Second, we decrease the number of parameters: assuming that both the input and the output
of a three-layer 3 × 3 convolution stack has C channels, the stack is parametrised by 
3(3ˇ2 * Cˇ2) = 27Cˇ2weights; at the same time, a single 7 × 7 conv. layer would require 
7ˇ2 * Cˇ2 = 49Cˇ2 parameters, i.e. 81% more. This can be seen as imposing a regularisation 
on the 7 × 7 conv. filters, forcing them to have a decomposition through the 3 × 3 filters 
(with non-linearity injected in between).

the C stands for channel number, in VGG, last layer´s output channel number is basically the same as the next layer, so we can see a Cˇ2 in the paper.So, we now know the pros of using a stack of smaller filters: less parameter without sacrificing the network performance.

1*1 CONVOLUTION FILTERS

11 convolution filters were first proposed by the paper “network in network”[1], some one would wonder why we use a 11 filter, its just multiplication. When it comes to 2 dimensions , they are right, in 2 dimensions, it is just multiplying the number of filter, but we are not dealing with the 2 dimensional case, according to the VGG architecture, we are dealing with dozens , or even hundreds of the channels, then its not just multiplication:
Imagine we have a feature map with size 10 * 10 * 100, and a filter 1 * 1 * 100, lets assume the filter are sliding through the feature map from top-left to right-bottom:

featuremap[0][0] // an array contains 100 elements, the first location which filter will slide through
filter[0][0] // a 1*1*1 size filter also contains an array with 100 elements

// now, lets do the dot product:
featuremap[0][0].dot(filter[0][0])

// the dot product above can be seen as:
// featuremap[0][0][0] * filter[0][0][0] + featuremap[0][0][1] * filter[0][0][1] + featuremap[0][0][2] + filter[0][0][2] + ..... + featuremap[0][0][99] * filter[0][0][99]
// it is same as :
// featuremap[0][0][0] * w0 + featuremap[0][0][1] * w1 + featuremap[0][0][2] * w2 + .... + featuremap[0][0][99] * w99

Do you see the patterns here? We have a fully connected layer by sliding the 11 filters through the whole feature map! You can treat filters parameters as the weights in a fc layer, then followed by a active function, in this case, a ReLU function, then we have a simple fc network in our CNN!
The advantage of using a 1
1 filters according to the paper is:

The incorporation of 1 × 1 conv. layers  is a way to increase the non-
linearity of the decision function without affecting the receptive fields 
of the conv. layers.

which been said: We can use less parameters in our network to increase the non-linearity to inhance the performance.

reference:
[1]:Lin, M., Chen, Q., and Yan, S. Network in network. In Proc. ICLR, 2014.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值