【论文笔记】Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual

提出一种新的视觉定位方法,通过细粒度语义分割提高定位精度与鲁棒性,尤其是在四季变化与视角变化下。该方法使用无监督训练减少语义标注工作量,并通过实验证明了在不同条件下的有效性。

针对视觉定位的语义分割网络, ICCV 2019

论文工作的动机:

using more segmentation labels to create more discriminative, yet still robust, representations for semantic visual localization

本文主要针对Long-term的视觉定位任务,提出了一个精细化分割网络,可以对场景进行精细化fine-grained的分割,得到相比于现有语义分割网络的分割结果更加丰富的语义标签。
同时,网络对于四季交替等场景中的外观变化足够鲁棒,不同季节对同一场景的语义标签输出是一致的,实验证明该网络能够提升视觉定位任务的性能。
此外,为了减少语义标注的工作量,作者使用了无监督训练的方式。
在这里插入图片描述

如下图,使用k-means得到含有丰富分割标签的训练数据,使用带有2D-2Dcorrespondence关系的数据集实现对同一场景能够输出一致的分割标签。
在这里插入图片描述

对于视觉定位任务,实际上语义信息不是必须的,只要有一致、稳定的分割标签即可。本文的目的只是输出更加精细的分割标签用于视觉定位,并非得到语义。根据论文的训练模式,训练数据的语义标签信息是根据k-means得到的,而k-means没有提取语义的能力。论文探究了网络输出的类别ID(cluster indices,横轴)与数据集的标准语义标签(横轴)之间的相关性:

作者:Xiaohang Zhan,Ziwei Liu,Ping Luo,Xiaoou Tang,Chen Change Loy 摘要:Deep convolutional networks for semantic image segmentation typically require large-scale labeled data, e.g. ImageNet and MS COCO, for network pre-training. To reduce annotation efforts, self-supervised semantic segmentation is recently proposed to pre-train a network without any human-provided labels. The key of this new form of learning is to design a proxy task (e.g. image colorization), from which a discriminative loss can be formulated on unlabeled data. Many proxy tasks, however, lack the critical supervision signals that could induce discriminative representation for the target image segmentation task. Thus self-supervision's performance is still far from that of supervised pre-training. In this study, we overcome this limitation by incorporating a "mix-and-match" (M&M) tuning stage in the self-supervision pipeline. The proposed approach is readily pluggable to many self-supervision methods and does not use more annotated samples than the original process. Yet, it is capable of boosting the performance of target image segmentation task to surpass fully-supervised pre-trained counterpart. The improvement is made possible by better harnessing the limited pixel-wise annotations in the target dataset. Specifically, we first introduce the "mix" stage, which sparsely samples and mixes patches from the target set to reflect rich and diverse local patch statistics of target images. A "match" stage then forms a class-wise connected graph, which can be used to derive a strong triplet-based discriminative loss for fine-tuning the network. Our paradigm follows the standard practice in existing self-supervised studies and no extra data or label is required. With the proposed M&M approach, for the first time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值