AlexNet
CONV - MAXPOOL - NORM(not common)
CONV - MAXPOOL - NORM
CONV - CONV - CONV - MAXPOOL
FC - FC - FC
local response normalization
VGG
- deeper (16-19layers)
- small filter (3x3 CONV stride 1, pad 1)
- 2 x 2 MAX POOL, stride 2
- stack of 3x3 has same effective receptive field as 7x7(three stacks)
- more nonlinearity
- fewer parameters
- 2nd in classication, 1st in localization
- VGG16, VGG19
FC7 feature(4096) generailize well to other tasks
detection: multiple instance
- localization: one instance
GoogleNet
- 22 layers
- No fc layers
- “Inception” module
- a good local network topology(network within a network)
- stack inception modules on top of each other
- In: conv pool conv pool + inception x
- Auxiliary classifcation outputs to inject additioanl gradient at lower layers (AvgPool-1x1Conv-FC-FC-Softmax)
Inception
Naive
- feature map too depth too hight
- use bottleneck layers(1x1 conv) to reduce feature depth
with dimension reduction
ResNet
- 152 layers
- batchnorm
- Swept all classification and detection competitions in ILSVRC’15 and COCO’15
- deeper model worse, but not caused by overfitting: optimization problem
- hypothesis: residual is easier to learn than total, learn identity, and with delta
- try to predict residual F(x)F(x) instead of output H(x)H(x)
- for deeper networks, use “bottleneck” layer to imporve efficiency
- proj down -> transform -> proj up
- proj down -> transform -> proj up
Summary
- inception: resnet + inception
Other arch
- NiN (Network in Network)
- Identity Mapping in resnet
- wide residual network
- stochastic depth(drop layer during training)
- based on resnet
- drop -> set identity to 1
- fracnet(not res block)
- Densely Connected CNN
Others
common tricks
- use “bottleneck” layer(1x1 conv) to improve efficiency(project down -> operation -> project up)
- residual module: network deeper
- learning rate: divided by 10 when validation error plateaus
- weight decay