Stream Query Denoising for Vectorized HD Map Construction

本文介绍了一种基于StreamMapNet的方法,通过参考DN-DETR的去噪技术,为局部地图感知设计了一种动态加噪和去噪机制。文章详细描述了噪声生成过程,包括使用前一帧GT的ChamferDistance确定匹配,以及自适应阈值的选择。实验结果显示在Nuscenes数据集上,这种方法提高了感知稳定性并与其他算法进行了性能对比。
部署运行你感兴趣的模型镜像

参考代码:截止2024.02未开源

动机与出发点
这篇文章是在StreamMapNet的基础上做的,为了在局部地图感知任务上提升时序上的感知稳定性,参考DN-DETR中的去噪方案,为局部地图感知提出一种针对局部地图元素的加噪声方案以及去噪逻辑。注意的是,这里DN去噪操作是在上一帧GT的基础上做的,原因是上一帧的感知结果存在相对GT存在更大不确定性(感知结果质量、地图元素新增和去除),为了训练的稳定性。文章由于noise机制的引入增加了一些超参数的引入,不过文章也通过消融实验给出了对应合适的取值。

方法设计
文章算法的结构见下图所示:
在这里插入图片描述
明显的差异是其在StreamMapNet的基础上增加了去噪分支,这个分支需要确定参与去噪的地图元素与当前帧的对应关系(也就是上图中的Adaptive Temporal Matching),以及寻找到对应关系之后依据元素之间的距离计算加噪声的力度(也就是上图中对应的Dynamic Query Noising)。之后就是在噪声元素基础上构建query进行解码和去噪了。

噪声的生成
1. 噪声源与对应GT的确定
为什么选择前一帧的GT作为去噪步骤的信息来源,在上面的内容中已经做了解释,按照文中的解释是为了训练的稳定性。那么怎么确定前一帧GT和当前帧GT呢?自然是使用CD(Chamfer distance)距离了,这个距离会按照距离值计算两帧GT每个元素之间的最匹配的索引和对应的距离值(自然在计算距离之前需要根据ego-motion将前一帧元素变换到当前帧下,也就是得到 y ^ \hat{y} y^):
D , i d x = min ⁡ j ∈ [ 1 , m ] ( C D ( y i t , y ^ j t − 1 ) ) D,idx=\min_{j\in[1,m]}(CD(y_i^t,\hat{y}_j^{t-1})) D,idx=j[1,m]min(CD(yit,y^jt1))
那么拿到两帧之间的匹配关系之后,就可以依据每个元素的的宽高(有外接矩形确定)和设定的超参数 α \alpha α(容忍度系数调节)来确定匹配的阈值:
δ = α w + h 2 \delta=\alpha\frac{w+h}{2} δ=α2w+h
这种自适应的匹配阈值和固定阈值对性能带来的影响见下表:
在这里插入图片描述

2. 噪声添加
这里将地图元素描述为有序点的形式,用一个外接矩形去包围它,那么调整外接矩形中心的位置和宽高就可以调整这个地面元素上点的位置了,因为设定了地图元素上每个点相对矩形的相对位置是不变的。也就是下图中对地图元素做sift和scale操作
在这里插入图片描述
那么sift和scale操作的幅度是多大呢?一个思路是可以根据匹配的结果来确定,那么就可以在基准噪声( η = { Δ x , Δ y , Δ w , Δ h } \eta=\{\Delta x,\Delta y,\Delta w,\Delta h\} η={Δx,Δy,Δw,Δh})的基础上添加一个增益因子 R d e c a y R_{decay} Rdecay
R d e c a y = 1 − D γ ⋅ δ α R_{decay}=1-\frac{D}{\gamma\cdot\frac{\delta}{\alpha}} Rdecay=1γαδD
其中, γ \gamma γ为增益控制因子,那么在增益的加持下添加noise的过程描述为:
B i n s = { x , y , w , h } + { Δ x , Δ y , Δ w , Δ h } ⋅ R d e c a y B_{ins}=\{x,y,w,h\}+\{\Delta x,\Delta y,\Delta w,\Delta h\}\cdot R_{decay} Bins={x,y,w,h}+{Δx,Δy,Δw,Δh}Rdecay
那么这个超参数的取值和对性能的影响见下表:
在这里插入图片描述

noising query的生成
对于加了noise之后地图元素上的某个点 p i m = { x i , y i } p_i^m=\{x_i,y_i\} pim={xi,yi},将它的位置进行编码 P E ( ⋅ ) 是 M L P PE(\cdot)是MLP PE()MLP
P i m = M L P ( p t ) ( C o n c a t ( P E ( x i m ) , P E ( y i m ) ) ) P_i^m=MLP^{(pt)}(Concat(PE(x_i^m),PE(y_i^m))) Pim=MLP(pt)(Concat(PE(xim),PE(yim)))
那么这个元素的编码就是元素上所有点编码的组合:
P o s q = M L P ( p o s ) ( C o n c a t ( P 1 m , … , P n m ) ) Pos_q=MLP^{(pos)}(Concat(P_1^m,\dots,P_n^m)) Posq=MLP(pos)(Concat(P1m,,Pnm))
这个关于位置的编码再加上关于类别的编码就是这个noise之后的地图元素的query了:
Q d e n o i s e = M L P ( f u s e ) ( C o n c a t ( C q , P o s q ) ) Q_{denoise}=MLP^{(fuse)}(Concat(C_q,Pos_q)) Qdenoise=MLP(fuse)(Concat(Cq,Posq))

上面说到的两个工作在base上的影响见下表所示:
在这里插入图片描述

实验结果
在Nuscenes下与其它一些局部地图感知算法的性能比较:
在这里插入图片描述

您可能感兴趣的与本文相关的镜像

Stable-Diffusion-3.5

Stable-Diffusion-3.5

图片生成
Stable-Diffusion

Stable Diffusion 3.5 (SD 3.5) 是由 Stability AI 推出的新一代文本到图像生成模型,相比 3.0 版本,它提升了图像质量、运行速度和硬件效率

### DN-DETR Paper Explanation #### Introduction to DN-DETR DN-DETR (Denoising DETR) is an advanced object detection model that builds upon the foundation of DETR (DEtection TRansformer). The introduction of query denoising significantly enhances both the speed and performance during training[^2]. This method addresses some limitations inherent in traditional DETR models. #### Core Idea Behind Query DeNoising The core innovation lies within the concept of "query de-noising." In standard DETR architectures, queries are used as learnable embeddings representing potential objects. However, these initial random or predefined queries can lead to suboptimal matching between predicted boxes and ground truth labels due to instability issues observed in Fig 2 from section IV[^3]. To mitigate this problem, researchers introduced a novel approach where noise is added artificially into the system through synthetic targets embedded with indicator tokens replacing decoder embeddings partially. By doing so, it forces the network not only to recognize actual instances but also distinguish them against false positives generated intentionally via noisy inputs[^4]. This mechanism allows for more robust learning processes since networks must now focus on distinguishing true positive detections amidst distractors rather than relying solely on perfect matches provided by bipartite graph algorithms like Hungarian Loss alone. #### How Query De-Noising Accelerates Training Process Integrating denoising techniques directly influences several aspects positively: 1. **Enhanced Learning Efficiency**: Adding auxiliary tasks such as identifying corrupted samples provides additional signals helping guide optimization towards better local minima faster. 2. **Improved Stability During Optimization**: Incorporation of label smoothing effects reduces overfitting risks associated with hard assignments made under pure assignment-based losses which tend to be less stable especially early stages when predictions may still contain many errors. 3. **Direct Approximation Towards Ground Truth Boxes**: Instead of waiting until convergence before achieving accurate mappings between hypotheses and real-world entities, direct supervision encourages quicker alignment even at earlier epochs thus accelerating overall progress toward optimal solutions. ```python def dn_detr_loss(predictions, target_boxes, noised_predictions=None): # Standard DETR loss components classification_loss = compute_classification_loss(...) l1_regression_loss = compute_l1_regression_loss(...) total_loss = classification_loss + l1_regression_loss if noised_predictions is not None: # Additional denoising loss component denoise_loss = compute_denoise_loss(noised_predictions, ...) total_loss += denoise_loss return total_loss ``` --related questions-- 1. What specific changes were implemented in the architecture compared to original DETR? 2. Can you explain how adding noise helps improve stability according to Figure 2 mentioned in Section IV? 3. Are there any particular challenges faced while implementing query de-noising in practice? 4. How does incorporating denoising affect inference time versus training improvements achieved? 5. Could similar principles apply outside object detection frameworks, e.g., instance segmentation or keypoint estimation?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值