Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014). [Citations: 1986].
• Use small fliter size and small stride in the first conv layer.
• Training and testing the network densely over the whole image and over multiple scales.
• Depth of the network.
• Input (3 × 224 × 224).
• conv1-1 (64@3 × 3, s1, p1), relu1-1.
• conv1-2 (64@3 × 3, s1, p1), relu1-2.
• pool1 (2 × 2, s2), output 64 × 112 × 112.
• conv2-1 (128@3 × 3, s1, p1), relu2-1.
• conv2-2 (128@3 × 3, s1, p1), relu2-2.
• pool2 (2 × 2, s2), output 128 × 56 × 56.
• conv3-1 (256@3 × 3, s1, p1), relu3-1.
• conv3-2 (256@3 × 3, s1, p1), relu3-2.
• conv3-3 (256@3 ×
1 Motivation
[Ways to Improve Accuracy]• Use small fliter size and small stride in the first conv layer.
• Training and testing the network densely over the whole image and over multiple scales.
• Depth of the network.
2 Architecture
[In a Nutshell (138M Parameters)]• Input (3 × 224 × 224).
• conv1-1 (64@3 × 3, s1, p1), relu1-1.
• conv1-2 (64@3 × 3, s1, p1), relu1-2.
• pool1 (2 × 2, s2), output 64 × 112 × 112.
• conv2-1 (128@3 × 3, s1, p1), relu2-1.
• conv2-2 (128@3 × 3, s1, p1), relu2-2.
• pool2 (2 × 2, s2), output 128 × 56 × 56.
• conv3-1 (256@3 × 3, s1, p1), relu3-1.
• conv3-2 (256@3 × 3, s1, p1), relu3-2.
• conv3-3 (256@3 ×