《YOLACT++ Better Real-time Instance Segmentation》论文笔记

YOLACT++是YOLACT的改进版,提高了实例分割性能,但牺牲了一些速度。主要改进包括:1) 引入Deformable Convolution增强网络表达;2) 设计Fast Mask Re-Scoring Network评估mask质量;3) 优化预测头以调整anchor设置。在COCO数据集上,YOLACT++达到34.1mAP,帧率为33.5 FPS。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

代码地址:yolact

1. 概述

导读:这篇文章的方法YOLACT++是在YOLACT的基础上进行改进得到的(之前关于YOLACT的文章可以参考之前的博文:链接),这篇文章给出的算法相比之前的版本其在instance分割性能上有了很大进步,作为取舍的另外一方面,在速度上略微有所下降。其具体的改进体现在以下3点:1)为网络引入Deformable Convolution,增强网络表达能力,带来更好的检测器与mask prototypes;2)针对目标设置更好的anchor尺寸和长宽比例;3)在网络中引入了快速mask re-scoring分支,这里的方法借鉴Mask Scoring RCNN,为mask引入质量评价;最后,其在COCO数据集上实例分割性能达到34.1mAP,帧率为33.5 FPS。

将文章的算法与之前的一些算法进行比较见下图所示:
在这里插入图片描述
PS: 这里将不再阐述YOLACT的具体实现,主要阐述与YOLACT++不一样的地方,也就是文章提出的改进的地方。

2. 方法设计

2.1 Fast Mask Re-Scoring Network

这里从Mask Scoring RCNN受到启发,采用更加适合mask性能评估的方式去度量分割mask的质量,而不是采用分类的置信度,文章中使用与GT mask之间计算得到的mask IoU作为mask性能的评估指标,其对应的网络结构(卷积+池化)见下图所示:
在这里插入图片描述
上面部分网络分支的输入是剪裁之后(在阈值化之前)的mask预测结果,输出是对于每个类别的mask IoU。最后的mask性能评估是使用分类置信度与mask IoU进行乘积得到的。

对比Mask Scoring RCNN文章在以下方面有所不同:

  • 1)对于mask评分的输入不同:这篇文章的方法只使用了整图分割截图剪裁结果作为输入,而在Mask Scoring RCNN中是使用了pooling之后的特征与mask预测结果concat之后作为输入,下面是其对应的网络结构:
    在这里插入图片描述
    这部分的改进主要是基于速度考量;
  • 2)这里将原有的FC层去掉,直接使用全局池化进行替换从而提升速度;

2.2 Deformable Convolution with Intervals

为了增加网络的表达能力,文章在backbone中引入了DCNv2,具体的就是在ResNet网络的 C 3 C_3 C3 C 5

### DN-DETR Paper Explanation #### Introduction to DN-DETR DN-DETR (Denoising DETR) is an advanced object detection model that builds upon the foundation of DETR (DEtection TRansformer). The introduction of query denoising significantly enhances both the speed and performance during training[^2]. This method addresses some limitations inherent in traditional DETR models. #### Core Idea Behind Query DeNoising The core innovation lies within the concept of "query de-noising." In standard DETR architectures, queries are used as learnable embeddings representing potential objects. However, these initial random or predefined queries can lead to suboptimal matching between predicted boxes and ground truth labels due to instability issues observed in Fig 2 from section IV[^3]. To mitigate this problem, researchers introduced a novel approach where noise is added artificially into the system through synthetic targets embedded with indicator tokens replacing decoder embeddings partially. By doing so, it forces the network not only to recognize actual instances but also distinguish them against false positives generated intentionally via noisy inputs[^4]. This mechanism allows for more robust learning processes since networks must now focus on distinguishing true positive detections amidst distractors rather than relying solely on perfect matches provided by bipartite graph algorithms like Hungarian Loss alone. #### How Query De-Noising Accelerates Training Process Integrating denoising techniques directly influences several aspects positively: 1. **Enhanced Learning Efficiency**: Adding auxiliary tasks such as identifying corrupted samples provides additional signals helping guide optimization towards better local minima faster. 2. **Improved Stability During Optimization**: Incorporation of label smoothing effects reduces overfitting risks associated with hard assignments made under pure assignment-based losses which tend to be less stable especially early stages when predictions may still contain many errors. 3. **Direct Approximation Towards Ground Truth Boxes**: Instead of waiting until convergence before achieving accurate mappings between hypotheses and real-world entities, direct supervision encourages quicker alignment even at earlier epochs thus accelerating overall progress toward optimal solutions. ```python def dn_detr_loss(predictions, target_boxes, noised_predictions=None): # Standard DETR loss components classification_loss = compute_classification_loss(...) l1_regression_loss = compute_l1_regression_loss(...) total_loss = classification_loss + l1_regression_loss if noised_predictions is not None: # Additional denoising loss component denoise_loss = compute_denoise_loss(noised_predictions, ...) total_loss += denoise_loss return total_loss ``` --related questions-- 1. What specific changes were implemented in the architecture compared to original DETR? 2. Can you explain how adding noise helps improve stability according to Figure 2 mentioned in Section IV? 3. Are there any particular challenges faced while implementing query de-noising in practice? 4. How does incorporating denoising affect inference time versus training improvements achieved? 5. Could similar principles apply outside object detection frameworks, e.g., instance segmentation or keypoint estimation?
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值