更像是17年encoding network的一个应用,实验部分比较好
18,17
1.FCN framework
global receptive fields: conv(no linearities) + downsample
spatial resolution loss
- encoder: dilated conv
- pro: expand receptive field
- con: isolate pixels from context scene, misclassified
- decoder: upsample, deeplabv3+
multiple scale object
multi-resolution pyramid-based representation: SPP module
Q: Is capturing contextual information the same as increasing the receptive field size?
2.Architecture
Featuremap Attention
本文的核心贡献, dense feature map经过一个encoding layer得到context embedding,然后通过FC得到一个classwise的score,作为权重(一种独特的Attention)
Semantic Encoding loss
实际上就是multi-label classification loss,分割网络加入一支classification loss可以提高结果
eg: Learning Multi-level Region Consistency with Dense Multi-label Networks for Semantic Segmentation[IJCAI2017]
Encoding Layer
本文的基石,
对比
方法和传统方法的对比,以前使用bag of words或fisher vector, Dictionary一般通过聚类/GMM得到
步骤