Towards Open World Object Detection概述(OWOD论文)

原创

已于 2025-07-15 17:05:23 修改 · 1.5k 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#目标检测 #人工智能 #计算机视觉

于 2025-06-04 18:29:12 首次发布

论文：https://arxiv.org/abs/2103.02603
代码：https://github.com/JosephKJ/OWOD

Towards Open World Object Detection

迈向开放世界目标检测

Abstract 摘要

Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosity about these unknown instances aids in learning about them, when the corresponding knowledge is eventually available. This motivates us to propose a novel computer vision problem called: ‘Open World Object Detection’, where a model is tasked to: 1) identify objects that have not been introduced to it as ‘unknown’, without explicit supervision to do so, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, when the corresponding labels are progressively received. We formulate the problem, introduce a strong evaluation protocol and provide a novel solution, which we call ORE: Open World Object Detector, based on contrastive clustering and energy based unknown identification. Our experimental evaluation and ablation studies analyse the efficacy of ORE in achieving Open World objectives. As an interesting by-product, we find that identifying and characterising unknown instances helps to reduce confusion in an incremental object detection setting, where we achieve state-ofthe-art performance, with no extra methodological effort. We hope that our work will attract further research into this newly identified, yet crucial research direction.¹
人类天生具备识别环境中未知物体实例的本能。当相关知识最终可获得时，对这些未知实例的内在好奇心有助于人们认知它们。这促使我们提出一个名为"开放世界目标检测"的新型计算机视觉问题，该模型需要完成两项任务：1）在没有明确监督的情况下，将未接触过的物体识别为"未知"；2）在逐步获得相应标签时，能够持续学习这些已识别的未知类别而不遗忘先前习得的类别。我们构建了问题框架，制定了严格的评估标准，并提出基于对比聚类和能量检测的未知识别新方法ORE（开放世界目标检测器）。通过实验评估和消融研究，我们分析了ORE在实现开放世界目标方面的有效性。一个有趣的副产品是，我们发现识别和表征未知实例有助于减少增量目标检测中的混淆现象——在不增加方法复杂度的情况下，该方法实现了最先进的性能表现。我们希望这项工作能吸引更多学者投身这一新发现但至关重要的研究方向。¹

1. Introduction 引言

Deep learning has accelerated progress in Object Detection research ², ³, ⁴, ⁵, ⁶, where a model is tasked to identify and localise objects in an image. All existing approaches work under a strong assumption that all the classes that are to be detected would be available at training phase. Two challenging scenarios arises when we relax this assumption: 1) A test image might contain objects from unknown classes, which should be classified as unknown. 2) As and when information (labels) about such identified unknowns become available, the model should be able to incrementally learn the new class. Research in developmental psychology ⁷, ⁸ finds out that the ability to identify what one doesn’t know, is key in captivating curiosity. Such a curiosity fuels the desire to learn new things ⁹, ¹⁰. This motivates us to propose a new problem where a model should be able to identify instances of unknown objects as unknown and subsequently learns to recognise them when training data progressively arrives, in a unified way. We call this problem setting as Open World Object Detection.
深度学习加速了目标检测研究的进展², ³, ⁴, ⁵, ⁶，其任务是让模型识别并定位图像中的对象。现有方法都基于一个强假设：所有待检测类别在训练阶段都是已知的。当我们放宽这个假设时，会面临两个挑战性场景：1）测试图像可能包含未知类别的对象，这些对象应被分类为"未知"；2）当这些被识别的未知对象信息（标签）可用时，模型应能增量学习新类别。发展心理学研究表明 ⁷, ⁸，识别未知事物的能力是激发好奇心的关键。这种好奇心激发了对学习新事物的渴望 ⁹, ¹⁰。这促使我们提出一个新问题：在统一的方式下，模型应能够将未知物体的实例识别为未知，并在训练数据逐步到达时学会识别它们。我们将这个问题的设定称为“开放世界目标检测”。
The number of classes that are annotated in standard vision datasets like Pascal VOC ¹¹ and MS-COCO ¹² are very low (20 and 80 respectively) when compared to the infinite number of classes that are present in the open world. Recognising an unknown as an unknown requires strong generalization. Scheirer et al. ¹³ formalise this as Open Set classification problem. Henceforth, various methodologies (using 1-vs-rest SVMs and deep learning models) has been formulated to address this challenging setting. Bendale et al. ¹⁴ extend Open Set to an Open World classification setting by additionally updating the image classifier to recognise the identified new unknown classes. Interestingly, as seen in Fig. 1, Open World object detection is unexplored, owing to the difficulty of the problem setting.
与开放世界中存在的无限类别相比，像Pascal VOC¹¹ 和MS-COCO¹² 这样的标准视觉数据集中标注的类别数量非常少（分别为20和80）。将未知类别识别为未知需要强大的泛化能力。Scheirer等人 ¹³ 将这一问题形式化为开放集分类问题。此后，人们提出了各种方法（使用一对多支持向量机和深度学习模型）来应对这一具有挑战性的场景。Bendale等人 ¹⁴ 将开放集扩展到开放世界分类场景，其方法还包括更新图像分类器以识别已确认的新未知类别。有趣的是，如图1 所示，由于问题场景的复杂性，开放世界目标检测领域尚未被探索。

在这里插入图片描述

图1: 开放世界目标检测（F）是一个尚未被正式定义和解决的新问题。尽管与开放集和开放世界分类相关，但开放世界目标检测提出了自身独特的挑战，解决这些挑战将提高目标检测器的实用性。

The advances in Open Set and Open World image classification cannot be trivially adapted to Open Set and Open World object detection, because of a fundamental difference in the problem setting: The object detector is trained to detect unknown objects as background. Instances of many unknown classes would have been already introduced to the object detector along with known objects. As they are not labelled, these unknown instances would be explicitly learned as background, while training the detection model. Dhamija et al. ¹⁵ find that even with this extra training signal, the state-of-the-art object detectors result in false positive detections, where the unknown objects end up being classified as one of the known classes, often with very high probability. Miller et al. ¹⁶ propose to use dropout sampling to get an estimate of the uncertainty of the object detection prediction. This is the only peer-reviewed research work in the open set object detection literature. Our proposed Open World Object Detection goes a step further to incrementally learn the new classes, once they are detected as unknown and an oracle provides labels for the objects of interest among all the unknowns. To the best of our knowledge this has not been tried in the literature.
开放集与开放世界图像分类领域的进展无法直接套用于开放集与开放世界目标检测，因为问题设定存在本质差异：目标检测器被训练成将未知物体识别为背景。许多未知类别的实例早已伴随着已知物体被输入目标检测器。由于未被标注，这些未知实例会在训练检测模型时被明确学习为背景。Dhamija等人¹⁵ 发现，即便存在这种额外的训练信号，最先进的目标检测器仍会产生误检——未知物体最终会被归类为某个已知类别，且往往伴随极高概率。Miller等人¹⁶提出使用Dropout采样来估计目标检测预测的不确定性，这是开放集目标检测文献中唯一经过同行评审的研究工作。我们提出的开放世界目标检测更进一步：当新类别被检测为未知物体且人工标注者提供所有未知物体中目标对象的标签后，系统将逐步学习这些新类别。据我们所知，这一方法尚未在现有文献中被尝试过。
The Open World Object Detection setting is much more natural than the existing closed-world, static-learning setting. The world is diverse and dynamic in the number, type and configurations of novel classes. It would be naive to assume that all the classes to expect at inference are seen during training. Practical deployments of detection systems in robotics, self-driving cars, plant phenotyping, healthcare and surveillance cannot afford to have complete knowledge on what classes to expect at inference time, while being trained in-house. The most natural and realistic behavior that one can expect from an object detection algorithm deployed in such settings would be to confidently predict an unknown object as unknown, and known objects into the corresponding classes. As and when more information about the identified unknown classes becomes available, the system should be able to incorporate them into its existing knowledge base. This would define a smart object detection system, and ours is an effort towards achieving this goal. The key contributions of our work are:
开放世界物体检测的设置比现有的封闭世界静态学习设置更加自然。世界在新颖类别的数量、类型和配置方面是多样且动态的。若假设推理时预期的所有类别都在训练阶段见过，这种想法未免过于天真。在机器人、自动驾驶汽车、植物表型分析、医疗保健和监控等实际应用场景中部署检测系统时，我们无法在内部训练阶段就完全掌握推理时可能遇到的所有类别。在这些场景下，人们对物体检测算法最自然且现实的期望是：它能自信地将未知物体识别为"未知"，将已知物体归类到对应类别。当已识别未知类别的更多信息可用时，该系统应能将其纳入现有知识库。这将定义一个智能的物体检测系统，而我们的工作正是为实现这一目标而努力。本研究的核心贡献包括：

We introduce a novel problem setting, Open World Object Detection, which models the real-world more closely.
我们提出了一种新颖的问题设置——开放世界物体检测，它能更贴近地模拟现实世界。
We develop a novel methodology, called ORE, based on contrastive clustering, an unknown-aware proposal network and energy based unknown identification to address the challenges of open world detection.
我们开发了一种名为ORE的新方法，该方法基于对比聚类、未知感知提案网络和基于能量的未知识别技术，以解决开放世界检测面临的挑战。
We introduce a comprehensive experimental setting, which helps to measure the open world characteristics of an object detector, and benchmark ORE on it against competitive baseline methods.
我们引入了一种全面的实验设置，有助于衡量目标检测器的开放世界特性，并在此基础上将ORE与竞争性基线方法进行基准测试。
As an interesting by-product, the proposed methodology achieves state-of-the-art performance on Incremental Object Detection, even though not primarily designed for it.
作为一个有趣的副产品，所提出的方法在增量目标检测任务中取得了最先进性能，尽管该方法并非专门为此设计。

2. Related Work 相关工作

Open Set Classification: The open set setting considers knowledge acquired through training set to be incomplete, thus new unknown classes can be encountered during testing. Scheirer et al. ¹⁷ developed open set classifiers in a one-vs-rest setting to balance the performance and the risk of labeling a sample far from the known training examples (termed as open space risk). Follow up works ¹⁸, ¹⁹ extended the open set framework to multi-class classifier setting with probabilistic models to account for the fading away classifier confidences in case of unknown classes.
开放集分类：开放集设定认为通过训练集获得的知识是不完整的，因此在测试过程中可能会遇到新的未知类别。Scheirer等人¹⁷ 在一对多分类场景中开发了开放集分类器，以平衡性能与远离已知训练样本的样本被标记的风险（称为开放空间风险）。后续研究¹⁸, ¹⁹将开放集框架扩展至多分类器场景，通过概率模型来解决面对未知类别时分类器置信度衰减的问题。
Bendale and Boult ²⁰ identified unknowns in the feature space of deep networks and used a Weibull distribution to estimate the set risk (called OpenMax classifier). A generative version of OpenMax was proposed in ²¹ by synthesizing novel class images. Liu et al. ²² considered a long-tailed recognition setting where majority, minority and unknown classes coexist. They developed a metric learning framework identify unseen classes as unknown. In similar spirit, several dedicated approaches target on detecting the out of distribution samples ²³ or novelties ²⁴. Recently, self-supervised learning ²⁵ and unsupervised learning with reconstruction ²⁶ have been explored for open set recognition. However, while these works can recognize unknown instances, they cannot dynamically update themselves in an incremental fashion over multiple training episodes. Further, our energy based unknown detection approach has not been explored before.
Bendale和Boult²⁰ 识别出深度网络特征空间中的未知类别，并采用威布尔分布估算集合风险（称为OpenMax分类器）。文献²¹ 提出生成式OpenMax方法，通过合成新类别图像实现拓展。Liu等人²² 研究了一个多数类、少数类和未知类共存的开放长尾识别场景，开发出基于度量学习的框架来将未见类别识别为未知。类似地，多篇专项研究致力于检测分布外样本²³ 或新颖样本²⁴ 。近期，自监督学习²⁵ 与基于重构的无监督学习²⁶ 也被探索用于开放集识别。然而，这些方法虽然能识别未知实例，却无法在多轮训练中以增量方式动态更新模型。此外，我们基于能量的未知检测方法此前尚未被探索过。
Open World Classification: ¹⁴ first proposed the open world setting for image recognition. Instead of a static classifier trained on a fixed set of classes, they proposed a more flexible setting where knowns and unknowns both coexist. The model can recognize both types of objects and adaptively improve itself when new labels for unknown are provided. Their approach extends Nearest Class Mean classifier to operate in an open world setting by re-calibrating the class probabilities to balance open space risk. ²⁷ studies open world face identity learning while ²⁸ proposed to use an exemplar set of seen classes to match them against a new sample, and rejects it in case of a low match with all previously known classes. However, they don’t test on image classification benchmarks and study product classification in e-commerce applications.
开放世界分类：¹⁴首次提出了面向图像识别的开放世界设定。与在固定类别集上训练的静态分类器不同，他们提出了一种更灵活的设定——已知类别与未知类别共存。该模型能同时识别两类对象，并在提供未知类别新标签时自适应优化。¹⁴通过重新校准类别概率以平衡开放空间风险，将最近类均值分类器扩展至开放世界场景。²⁷研究了开放世界人脸身份学习，而²⁸提出使用已见类别的范例集与新样本进行匹配，若与所有已知类别匹配度均较低则予以拒绝。但两者均未在图像分类基准测试中进行验证，而是针对电子商务应用中的商品分类展开研究。
Open Set Detection: Dhamija et al. ¹⁵ formally studied the impact of open set setting on popular object detectors. They noticed that the state of the art object detectors often classify unknown classes with high confidence to seen classes. This is despite the fact that the detectors are explicitly trained with a background class ²⁹, ², ³⁰ and/or apply one-vs-rest classifiers to model each class ³¹, ⁵. A dedicated body of work ¹⁶, ³², ³³ focuses on developing measures of (spatial and semantic) uncertainty in object detectors to reject unknown classes. E.g., ¹⁶, ³² uses Monte Carlo Dropout ³⁴ sampling in a SSD detector to obtain uncertainty estimates. These methods, however, cannot incrementally adapt their knowledge in a dynamic world.
开放集检测：Dhamija等人¹⁵首次系统研究了开放集设定对主流目标检测器的影响。他们发现，即便检测器已通过背景类训练²⁹, ², ³⁰和/或采用一对多分类器建模每个类别³¹, ⁵，当前最优检测器仍会以高置信度将未知类别误判为已知类别。为此，系列研究¹⁶, ³², ³³致力于构建目标检测中的（空间与语义）不确定性度量以排除未知类别。例如¹⁶, ³²在SSD检测器中采用蒙特卡洛Dropout采样³⁴来获取不确定性估计。但这类方法尚无法在动态环境中实现知识增量更新。

3. Open World Object Detection 开放世界目标检测

Let us formalise the definition of Open World Object Detection in this section. At any time $t$ , we consider the set of known object classes as $\mathcal{K}^t = \{1, 2, .., C\} ⊂ \mathcal{N}^+$ where $\mathcal{N}^+$ denotes the set of positive integers. In order to realistically model the dynamics of real world, we also assume that their exists a set of unknown classes $\mathcal{U} = \{C + 1, ...\}$ , which may be encountered during inference. The known object classes $K_t$ are assumed to be labeled in the dataset $D^t = \{X^t, Y^t\}$ where $X$ and $Y$ denote the input images and labels respectively. The input image set comprises of $M$ training images, $Xt = \{I_1, . . . , I_M\}$ and associated object labels for each image forms the label set $Y^t = \{Y_1, . . . , Y_M \}$ . Each $Y_i = \{y_1, y_2, .., y_K \}$ encodes a set of $K$ object instances with their class labels and locations i.e., $y_k = [l_k, x_k, y_k, w_k, h_k]$ , where $l_k ∈ K_t$ and $x_k, y_k, w_k, h_k$ denote the bounding box center coordinates, width and height respectively.
让我们在此节正式定义开放世界目标检测。在任意时刻 $t$ ，已知物体类别集合定义为 $\mathcal{K}^t = \{1, 2, .., C\} ⊂ \mathcal{N}^+$ ，其中 $\mathcal{N}^+$ 表示正整数集。为真实模拟现实世界的动态性，我们同时假设存在未知类别集合 $\mathcal{U} = \{C + 1, ...\}$ ，这些类别可能在推理过程中遇到。已知物体类别 $K_t$ 在数据集 $D^t = \{X^t, Y^t\}$ 中被标记，其中 $X$ 和 $Y$ 分别表示输入图像和标签。输入图像集包含 $M$ 张训练图像 $Xt = \{I_1, . . . , I_M\}$ ，每幅图像的关联物体标签构成标签集 $Y^t = \{Y_1, . . . , Y_M \}$

最低0.47元/天解锁文章