Papers Notes_2_ VGG--Very Deep Convolutional Networks for Lage-scale Image Recognition

Papers Notes_2_ VGG--Very Deep Convolutional Networks for Lage-scale Image Recognition

Architecture

在这里插入图片描述
input-224×224 RGB image
preprocess-subtract the mean RGB value,computed on the training set,from each pixel
stride-fixed to 1 pixel
padding-the spartial resolution is preserved after convolution, i.e. the padding is 1 pixel for 3×3 conv.
Max-pooling-performed over a 2×2 pixel window, with stride 2
All hidden layers are equipped with the rectification none-linearity(ReLU).
None contain Local Response Normalization(LRN)→do not improve the performance

  1. 3×3 conv.
    a stack of two 3×3 conv. layers has an effective receptive field of 5×5
    three such layers have a 7×7 effective receptive field
    why a stack of three 3×3 conv. layers instead of a single 7×7 layer?
    ① incorporate three non-linear rectification layers instead of a single one→make the decision function more discriminative
    ② decrease the number of parameters
    three-layer 3×3 conv. stack with C channels, parameters→ 3 ( 3 2 C 2 ) = 27 C 2 3(3^2C^2)=27C^2 3(32C2)=27C2
    a single 7×7 conv. layer, parameters→ 7 2 C 2 = 49 C 2 7^2C^2=49C^2 72C2=49C2
  2. 1×1 conv.
    increase the non-linearity of decision function without affecting the receptive fields of the conv. layers→an additional non-linearity is introduced by rectification function

Training

  1. details
    batch size-256
    momentum-0.9
    weight decay-0.0005
    dropout ratio-0.5, for the first two fully-connected layers
    learning rate-0.01, decreased by a factor of 10 when the validation set accuracy stopped improving. decreased 3 times, learning stopped after 370K iterations(74 epochs)
  2. initialization
    began with training A, shallow enough to be trained with random initialization
    training deeper, initialised first four conv. layers and the last three fully-connected layers with the layers of net A
    random initialization→sample the weights from a normal distribution with the zero mean and 0.01 variance. biases initialised with 0
  3. augment
    ① randomly crop from rescaled training images(one crop per image per SGD iteration)
    ② random horizontal flipping
    ③ random RGB colour shift(same with AlexNet)
  4. rescale
    S: smallest side of an isotropically-rescaled training image
    two approaches for setting S
    ① fix S(256 or 384)→single-scale training
    ② randomly sample S from a certain range [ S m i n , S m a x ] [S_{min},S_{max}] [Smin,Smax] (use [256,384])→multi-scale training

Testing

  1. rescale
    Q: smallest side of an isotropically-rescaled image, not necessarily equal to the training scale S
    input image size different→FC covert to conv.
    first FC layer→7×7 conv. layer
    the last two FC layers→1×1 conv. layers
    why 7×7?
    padding makes sure that the spartial resolution is preserved after conv. layer
    then resolution reduces only because of maxpool, specifically, reduces half
    i.e. structure A has 5 maxpool layers→224/2^5=7, so the output of conv. stack is 7×7×512
    for the first FC layer, one channel corresponds to 49 weights→convert to conv. layer, use the same parameters, so the kernel size needs to be 7×7
    no FC layer→input image size does not need to be 224×224→apply to the whole(uncropped) image
    result→a class map, the number of channels equals to the number of classes, a variable spatial resolution dependent on the input image size
    spatially average(sum-pooled)→obtain a fixed-size vector of class scores for the image
  2. augment
    rescale+horizontal flipping, average→the final scores

Conclusion

在这里插入图片描述
在这里插入图片描述
confirm the importance of depth in visual representations

References

Very Deep Convolutional Networks for Lage-scale Image Recognition

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值