[SRR-FSD] Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection(CVPR. 2021)

最新推荐文章于 2022-10-29 14:43:41 发布

Ah丶Weii

最新推荐文章于 2022-10-29 14:43:41 发布

阅读量810

点赞数

CC 4.0 BY-SA版权

分类专栏：笔记

本文链接：https://blog.youkuaiyun.com/weixin_43823854/article/details/119410816

本文针对少量样本目标检测（FSOD）任务，探讨了视觉信息的局限性，并提出通过语义关系推理来改善性能。研究发现，即使在数据稀缺的情况下，基础类与新颖类之间的语义关系保持不变。SRR-FSD方法首次在FSOD中引入语义关系推理，提高了模型的稳定性和性能，尤其在数据极其有限的场景下。该方法基于Faster R-CNN，通过构建动态关系图来建模类别间的关系，减少视觉与语言之间的域差距。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. Motivation

few shot 本身存在的意义：

In other words, we are unable to alleviate the situation of scarce cases by simply spend- ing more money on annotation even big data is accessible.
Therefore, the study of few-shot learning is an imperative and long-lasting task.
长尾分布在真实世界中也是一个的问题，数据缺少data scarcity。
long-tail distribution is an inherent characteristic of the real world.
Its performance is largely affected by the data scarcity of novel classes

FSOD（Few Shot Obejct Detection）中包含novel classes以及base classes，base classes是指包含大量标注信息的数据集，而novel classes则是只有少部分标注信息的数据集。

In FSOD, there are base classes in which suffi-cient objects are annotated with bounding boxes and novel classes in which very few labeled objects are available.

FSOD的目标在于通过base classes的协助，来学习novel classes中优先的数据，从而在测试中能够检测出所有的novel objects。

The few-shot detectors are expected to learn from limited data in novel classes with the aid of abundant data in base classes and to be able to detect all novel objects in a held-out testing set

目前大部分的方法采用meat-learning以及metric learning，然后将他们应用于全监督的检测器中。

To achieve this, most recent few- shot detection methods adopt the ideas from meta-learning and metric learning for few-shot recognition and apply them to conventional detection frameworks, e.g. Faster R-CNN [35], YOLO [34].

在本文中，对于explicit shots自己implicit shots的定义：

其中explicit shots指的就是FSOD中novel classes的k-shot；而implicit shots则是只预训练模型的数据集。

The explicit shots refer to the available labeled objects from the novel classes.
In terms of implicit shots, initializing the backbone net- work with a model pretrained on a large-scale image clas- sification dataset is a common practice for training an ob- ject detector.

但是对于implicit shots来说，有可能大规模的预训练的数据集中的类别会和novel classes重复。因此，如果去除了explict shots，如图1所示，大部分的方法在同等的shot的情况下，expilict以及implicit shots相比，就会有性能的下降。

在这里插入图片描述

作者认为出现这个问题的原因在于视觉信息的独立。随着训练数据的减少，视觉信息会越来越局限。

We believe the reason for shot sensitivity is due to exclusive dependence on the visual information.
As a result, visual information becomes limited as image data becomes scarce.

但是作者指出，有一项是不变的常数，那就是base classes和novel classes中的semantic relation。

在这里插入图片描述

2. Contribution

To our knowledge, our work is the first to investigate semantic relation reasoning for the few-shot detection task and show its potential to improve a strong baseline.
Our SRR-FSD achieves stable performance w.r.t the shot variation, outperforming state-of-the-art FSOD methods under several existing settings especially when the novel class data is extremely limited.
We suggest a more realistic FSOD setting in which implicit shots of novel classes are removed from the classification dataset for the pretrained model, and show that our SRR-FSD can maintain a more steady performance compared to previous methods if using the new pretrained model.