论文笔记 | Network In Network

最新推荐文章于 2023-08-01 16:37:23 发布

原创最新推荐文章于 2023-08-01 16:37:23 发布 · 1.6k 阅读

0 ·

CC 4.0 BY-SA版权

ConvNets 专栏收录该内容

39 篇文章

订阅专栏

本文介绍了一种新的卷积神经网络架构——微神经网络（Microneural Networks），该架构采用更复杂的结构来更好地抽象输入数据。通过使用全局平均池化层代替传统的全连接层进行分类，有效防止了过拟合，并建立了特征图与类别之间的对应关系。实验结果显示，这种方法提高了模型的泛化能力。

Authors

Min Lin, Qiang Chen, Shuicheng Yan
这里写图片描述

Abstract

Instead of linear filters, we bulid micro neural networks( micro neural network) with more complex strutures to abstract the data within the receptive field.

1 Introduction

The convolution filter in cnn is a generalized linear model(GLM)
这里写图片描述
Instead of adopting the traditional fully connected layers for classification in CNN, we directly output the spatial average of the feature map from the last mlpconv layer as the confidence of categories via a global average pooling layer.
Global average enforces correspondance between feature maps and categoriges.Furthermore, the fc layers are prone to overfitting and headvily depend on dropout regularization, while global avage pooling is itself a structural regularizer, which natively prevents overfitting for the overall structure.

2 convolutional neural network

As in CNN, filters from higher layers map to large regions in the original input. It generates a higher level concept by combining the lower level concepts from the layer below. Therefore, we argue that it would be beneficial to do a better abstraction on each local patch, before combining them into higher level concepts.
Maxout network: The number of feature maps is reduced by maximum pooling over affine feature maps. Maximization: piecewise linear approximator –>convex functions

Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio.
Maxout networks. arXiv preprint arXiv:1302.4389, 2013.

SMLP: applies a shared multilayer perceptron on different patches of the input image

C¸ a˘glar G¨ulc¸ehre and Yoshua Bengio. Knowledge matters: Importance of prior information for
optimization. arXiv preprint arXiv:1301.4083, 2013.

3 Network in Network

3.1 MLP convolution layers

这里写图片描述

MAXout layers:

3.2 Gloable Average Pooling

Traditionally, the feature maps of the last convolutional layer are vectorized and fed into fully connected layers followed by a softmax logistic regression layer.
However, fc layers are prone to overfitting. So we get a idea, which is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. We have some advantage:
1. enforcing correspondences between feature maps and categories,thus the features can easily interpreted as categories confidence maps.
2. No parameter to optimize in the global average pooling.
3. Global average pooling sums out the spatial information thus it is more robust to spatial translations of the input.
Global average pooling is made possible by the mlpconv layers, as they makes better approximation to the confidence maps than GLMs.

4 Expriment

这里写图片描述
Introducing dropout layers in between the mlpconv layers reduced the test error by more than 20%.

4.6 Global average Pooling as a regularizer

这里写图片描述
The effectiveness of the global average pooling layer sa a regularizer, but it is worse than the dropout regularizer result, and too demanding for linear convolution layers.