- 博客(23)
- 收藏
- 关注
原创 论文:SOLO: Segmenting Objects by Locations
作者摘要我们提出了一种新的、非常简单的实例分割方法。与许多其他密集预测任务(例如语义分割)相比,任意数量的实例使实例分割更具挑战性。为了预测每个实例的掩码,主流方法要么遵循“先检测后分割”策略(例如,Mask R-CNN),要么先预测嵌入向量,然后使用聚类技术将像素分组到单个实例中。我们通过引入“实例类别”的概念,从全新的角度看待实例分割的任务,它根据实例的位置和大小为实例中的每个像素分配类别,从而很好地将实例分割转换为单次分类- 可解决的问题。我们展示了一个更简单灵活的实例分割框架,具有强大的性能,
2022-03-10 15:27:51
4415
1
原创 论文:图像分割之YOLACT Real-time Instance Segmentation
作者摘要We present a simple, fully-convolutional model for real-time instance segmentation that achieves 29.8 mAP on MS COCO at 33.5 fps evaluated on a single Titan Xp, which is significantly faster than any previous competitive approach. Moreover, we obta
2022-03-07 16:34:52
3654
原创 单阶段实例分割综述
本文比较全面地介绍了实例分割在单阶段方法上的进展,根据基于局部掩码、基于全局掩码和按照位置分割这三个类别,分析了相关19篇论文的研究情况,并介绍了它们的优缺点。实例分割是一项具有挑战性的计算机视觉任务,需要预测对象实例及其每像素分割掩码。这使其成为语义分割和目标检测的混合体。自 Mask R-CNN 以来,实例分割的SOTA方法主要是 Mask RCNN 及其变体(PANet、Mask Score RCNN 等)。它采用先检测再分割的方法,先进行目标检测,提取每个目标实例周围的边界框,然后在每个边界框
2022-03-04 15:19:31
1283
原创 论文:Language-Aware Fine-Grained Object Representation for Referring Expression Comprehension
作者AbstractReferring expression comprehension expects to accurately locate an object described by a language expression, which requires precise language-aware visual object representations. However, existing methods usually use rectangular object repres
2022-02-27 11:01:10
546
1
原创 论文:YOLOX: Exceeding YOLO Series in 2021
作者AbstractIn this report, we present some experienced improvements to YOLO series, forming a new high-performance detector— YOLOX. We switch the YOLO detector to an anchor-free manner and conduct other advanced detection techniques, i.e., a decoupled h
2022-02-22 16:02:48
659
原创 论文:Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding
作者AbstractAn LBYL (‘Look Before You Leap’) Network is proposed for end-to-end trainable one-stage visual grounding. The idea behind LBYL-Net is intuitive and straightforward: we follow a language’s description to localize the target object based on its
2022-02-20 20:29:19
697
1
原创 End-to-End Semi-Supervised Object Detection with Soft Teacher
作者摘要本文提出了一种端到端的半监督目标检测方法,与以前更复杂的多阶段方法相比。在课程中,端到端的训练逐渐提高了伪标签的质量,而越来越精确的伪标签反过来又有利于目标检测训练。在此框架下,我们还提出了两种简单而有效的方法:一种软teacher机制,其中每个未标记边界框的分类损失由teacher网络产生的分类分数进行加权;一种框抖动方法,用于为框回归学习选择可靠的伪框。在COCO基准上,在不同的标记比率下,该方法的性能大大优于以前的方法,即。E1%、5%和10%。此外,当标记数据量相对较大时,我们的方法也表
2022-02-17 13:58:01
1168
原创 论文:Linguistic Structure Guided Context Modeling for Referring Image Segmentation
作者AbstractReferring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence. Multimodal context of the sentence is crucial to distinguish the referent from the background. Existing methods either ins
2021-12-19 11:09:23
611
原创 论文:A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
作者AbstractReferring expression comprehension aims to localize the object instance described by a natural language expression. Current referring expression methods have achieved good performance. However , none of them is able to achieve real-time infer
2021-12-17 00:46:26
528
原创 论文:Exploring Phrase Grounding without Training: Contextualisation and Extension to Text-Based Image
作者摘要Grounding phrases in images links the visual and the textual modalities and is useful for many image understanding and multimodal tasks. All known models heavily rely on annotated data and complex trainable systems to perform phrase grounding – exc
2021-12-14 00:45:03
727
原创 论文:Zero-Shot Grounding of Objects from Natural Language Queries
作者摘要A phrase grounding system localizes a particular object in an image referred to by a natural language query. In previous work, the phrases were restricted to have nouns that were encountered in training, we extend the task to Zero-Shot Grounding(ZS
2021-12-09 09:21:59
447
原创 论文:Real-Time Referring Expression Comprehension by Single-Stage Grounding Network
作者摘要In this paper , we propose a novel end-to-end model, namely Single-Stage Grounding network (SSG), to localize the referent given a referring expression within an image. Different from previous multi-stage models which rely on object proposals or de
2021-12-01 19:54:43
1133
原创 Onestage Grounding
1.Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding(2018 CVPR)论文地址:http://openaccess.thecvf.com/content_CVPR_2019/papers.代码:https://github.com/hassanhub/MultiGrounding.2.Real-Time Referring Expression Comprehension by Single-Stage
2021-11-30 11:07:43
317
原创 论文:Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding(2018CVPR)
作者摘要We address the problem of phrase grounding by learning a multi-level common semantic space shared by the textual and visual modalities. This common space is instantiated at multiple layers of a Deep Convolutional Neural Network by exploiting its fe
2021-11-29 20:46:34
670
原创 论文:Improving One-stage Visual Grounding by Recursive Sub-query Construction
作者摘要We improve one-stage visual grounding by addressing current limitations on grounding long and complex queries. Existing one-stage methods encode the entire language query as a single sentence embedding vector,e.g., taking the embedding from BERT or
2021-11-22 20:42:30
720
原创 论文:Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding
作者Abstract在本文中,我们处理弱监督引用表达式基础任务,用于根据查询语句定位图像中的引用对象,其中图像区域和查询之间的映射在训练阶段不可用。在传统的方法中,首先选择与引用表达式最匹配的对象区域,然后从所选区域重构查询语句,其中重构差作为反向传播的损失。然而,现有的方法忽略了匹配正确性未知的事实,近似地进行匹配和重构。为了克服这一局限性,本文设计了一个判别三元组作为解决方案的基础,通过该三元组,可以以非常可伸缩的方式将查询转换为一个或多个判别三元组。在区分性三元组的基础上,我们进一步提出了三元组
2021-11-17 00:26:03
527
原创 论文:Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
作者Abstract解决referring expression grounding的主流框架基于两个阶段的过程:1)使用目标检测器检测proposals 2)将所指对象与其中一个proposal联系起来。现有的两阶段解决方案大多侧重于基础步骤,其目的是使expression与proposal保持一致。在本文中,我们认为这些方法忽略了两个阶段中proposal的作用之间的明显不匹配:它们仅基于检测置信度(即,表达式不可知)生成proposal,希望proposal在表达式中包含所有正确的实例(即,表达
2021-11-09 20:06:22
954
原创 论文:TransVG: End-to-End Visual Grounding with Transformers
作者Abstract在本文中,我们提出了一个简洁而有效的基于转换的视觉基础框架,即TransVG,以解决将语言查询与图像上相应区域的基础任务。最先进的方法,包括两阶段或一阶段的方法,依赖于一个复杂的模块和手动设计的机制来执行查询推理和多模式融合。然而,在融合模块设计中,由于查询分解和图像场景图等机制的参与,使得模型很容易过度适应特定场景的数据集,限制了视觉语言环境之间的充分交互。为了避免这种警告,我们建议通过利用Transformer建立多模态对应关系,并通过经验证明,复杂的融合模块(例如,模块化注意
2021-11-02 10:34:47
3438
1
原创 论文:Visual Grounding with Transformers
作者摘要本文中,我们提出了一种基于transformer的可视接地方法。与以前的proposal and rank框架(严重依赖预训练对象检测器)或proposal free框架(通过融合文本嵌入来升级现成的单级检测器)不同,我们的方法构建在transformer编码器-解码器之上,独立于任何预训练检测器或单词嵌入模型。我们的方法被称为VGTR——带transformer的视觉接地,旨在在文本描述的指导下学习语义区分视觉特征,而不损害其定位能力。这种信息流使我们的VGTR在捕获视觉和语言模式的上下文级
2021-10-28 17:01:43
3414
原创 DETR-端到端的目标检测框架
论文:https://arxiv.org/abs/2005.12872代码:https://github.com/facebookresearch/detrDETR第一个将 Transformer 成功整合为检测 pipeline 中心构建块的目标检测框架。基于Transformers的端到端目标检测,没有NMS后处理步骤、真正的没有anchor,且对标超越Faster RCNN。DETR将检测视为集合预测问题,简化了目标检测的整体流程。它没有现在主流的目标检测器中的anchor、label ass
2021-10-25 19:53:58
989
原创 从基本原理到梯度下降,看梯度下降如何运用到神经网络里的
搭建基本模块——神经元在说神经网络之前,我们讨论一下神经元(Neurons),它是神经网络的基本单元。神经元先获得输入,然后执行某些数学运算后,再产生一个输出。比如一个2输入神经元的例子:在这个神经元中,输入总共经历了3步数学运算,先将两个输入乘以权重(weight):x1→x1 × w1x2→x2 × w2把两个结果想加,再加上一个偏置(bias):(x1 × w1)+(x2 × w2)+ b最后将它们经过激活函数(activation function)处理得到输出:y =
2021-10-24 20:19:55
384
原创 论文:MDETR - Modulated Detection for End-to-End Multi-Modal Understanding用于端到端多模态理解的调制检测
作者摘要
2021-10-24 11:16:25
824
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人