Motivation
1、Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions.
2、the motivation behind this scale-decreased architecture design: “High resolution may be needed to detect the presence of a feature, while its exact position need not to be determined with equally high precision.”
3、Intuitively, a scale-decreased backbone throws away the spatial information by down-sampling, making it challenging to recover by a decoder network.
4、While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection).
Proposal
We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.
Improvements
1、First, the scales of intermediate feature maps should be able to increase or decrease anytime so that the model can retain spatial information as it grows deeper.
2、Second, the connections between feature maps should be able to go across feature scales to facilitate multi-scale feature fusion.
Search Space
本文的搜索空间分为三个部分:
1、Scale permutations(dicide the ordering of blocks): permuting intermediate and output blocks respectively, resulting in a search space size of (N − 5)!5!.
2、Cross-scale connections(dicide the inputs for each block): The parent blocks can be any block with a lower ordering or block from the stem network. . The search space has a size of
3、Block adjustments: allow block to adjust its scale level and type.
思路要点
emmmmm,感觉可以借鉴的地方,就是这个模型的“自由度”了。
这篇论文所建模型的自由度非常高,但是带来的模型搜索复杂度也很高。
这篇论文没有写这个模型训练需要多久,说明时间不会太短。
嘛,不过这个思路确实是理想中NAS该有的样子了。