双重AEB：将基于规则的方法与多模态大型语言模型相结合，以实现有效的紧急制动（202410）

最新推荐文章于 2025-06-30 09:08:26 发布

真诚的灰灰

最新推荐文章于 2025-06-30 09:08:26 发布

阅读量4.3k

点赞数 26

CC 4.0 BY-SA版权

文章标签：语言模型人工智能自然语言处理

本文链接：https://blog.youkuaiyun.com/jch924583667/article/details/144291085

Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking

双重AEB：将基于规则的方法与多模态大型语言模型相结合，以实现有效的紧急制动

在这里插入图片描述

Abstract

Automatic Emergency Braking (AEB) systems are a crucial component in ensuring the safety of passengers in autonomous vehicles. Conventional AEB systems primarily rely on closed-set perception modules to recognize traffic conditions and assess collision risks. To enhance the adaptability of AEB systems in open scenarios, we propose Dual-AEB, a system combines an advanced multimodal large language model (MLLM) for comprehensive scene understanding and a conventional rule-based rapid AEB to ensure quick response times. To the best of our knowledge, Dual-AEB is the first method to incorporate MLLMs within AEB systems. Through extensive experimentation, we have validated the effectiveness of our method. Codes will be publicly available at https: //github.com/ChipsICU/Dual-AEB.
自动紧急制动（AEB）系统是确保自动驾驶车辆乘客安全的关键组成部分。传统的AEB系统主要依赖于封闭集感知模块来识别交通状况和评估碰撞风险。为了增强AEB系统在开放场景中的适应性，我们提出了双重AEB，该系统结合了先进的 多模态大型语言模型（MLLM） 以全面理解场景和传统的基于规则的快速AEB以确保快速响应时间。据我们所知，双重AEB是将MLLM整合到AEB系统中的第一种方法。通过广泛的实验，我们验证了我们方法的有效性。代码将在https://github.com/ChipsICU/Dual-AEB上公开提供。

I. INTRODUCTION

The Autonomous Emergency Braking (AEB) system is a critical safety feature in autonomous vehicles, designed to mitigate or prevent collisions by automatically activating the brakes when a potential collision is detected [1]. Numerous studies [1]–[5] have demonstrated the effectiveness of AEB systems, with reductions in rear-end collisions ranging from 25% to 50%.
自动紧急制动（AEB）系统是自动驾驶车辆中的关键安全特性，旨在通过在检测到潜在碰撞时自动激活刹车来减轻或防止碰撞[1]。众多研究[1]–[5]已经证明了AEB系统的有效性，它们显示AEB系统能将追尾碰撞减少25%到50%。
Conventionally, AEB systems can be roughly categorized into two types: decision-making-only methods [6]–[15] and end-to-end methods [16], [17]. Decision-making-only methods use perception results of predefined perception categories (e.g., people, cars, bicycles) and apply rule-based techniques [10]–[12], [18] or deep reinforcement learning [13], [14] for braking decisions. End-to-end methods [16], [17], meanwhile, process raw sensory data directly to inform AEB decisions, allowing the system to benefit from comprehensive sensory inputs. These methods generally ensure safety in most driving scenarios.
传统上，自动紧急制动（AEB）系统大致可以分为两类：仅决策方法[6]–[15]和端到端方法[16]，[17]。仅决策方法使用预定义感知类别（例如，人、汽车、自行车）的感知结果，并应用基于规则的技术[10]–[12]，[18]或深度强化学习[13]，[14]来做出制动决策。与此同时，端到端方法[16]，[17]直接处理原始感官数据以通知AEB决策，使系统能够从全面的感官输入中获益。这些方法通常确保在大多数驾驶场景中的安全性。

学习一下这两篇文章
《Emergency-braking distance prediction using deep learning》，2021
《Fully convolutional neural network for vehicle speed and emergency-brake prediction》，2024

However, their ability to handle complex driving situations is limited due to a lack of comprehensive scene understanding. For example, in Fig. 1 (a), the scene describes a pedestrian positioned in the ego vehicle’s blind spot, intending to cross at a green-light intersection. Typically, decision-making-only methods would not activate braking in this scenario due to the absence of pedestrian perception information, making it impossible to predict an impending collision. Similarly, while end-to-end methods process raw sensory data, they often lack the reasoning capacity to interpret indirect cues—such as the illuminated brake lights on the vehicle to the left of the ego vehicle—that may indicate a potential hazard ahead. In Fig. 1 (b), a truck with a facial advertisement is driving on the ego vehicle’s left. Both decision-making-only and end-to-end methods may misinterpret the advertisement as a pedestrian, potentially triggering the AEB system and causing unnecessary braking. A truly effective AEB system should incorporate comprehensive scene understanding, enabling it to differentiate between real hazards and non-threatening elements, thereby ensuring appropriate braking responses.
然而，由于缺乏全面的场景理解，它们处理复杂驾驶情况的能力是有限的。例如，在**图1(a)**中，场景描述了一个行人位于本车盲点位置，打算在绿灯路口穿越。通常情况下，仅决策方法由于缺乏行人感知信息，不会在这种情况下激活制动，这使得预测即将发生的碰撞变得不可能。同样，尽管端到端方法处理原始感官数据，它们往往缺乏解释间接线索的推理能力——比如本车左侧车辆亮起的刹车灯——这可能表明前方存在潜在危险。在图1(b)中，一辆带有面部广告的卡车正在本车的左侧行驶。仅决策和端到端方法都可能将广告误认为是行人，从而可能触发AEB系统，导致不必要的制动。一个真正有效的AEB系统应该整合全面的场景理解，使其能够区分真正的危险和非威胁性元素，从而确保适当的制动响应。
在这里插入图片描述
图1。传统AEB系统在以下情况下往往失败：(a) 当需要提前检测到行人以提前刹车并避免危险时，以及 (b) 当错误的感知不必要地触发AEB时。这些场景对传统AEB方法来说是一个挑战。
To address these challenges, we propose the Dual-AEB system, which offers the following main advantages: (1) Comprehensive Scene Understanding: The Dual-AEB system integrates advanced Multimodal Large Language Models (MLLMs) to achieve a deep understanding of the driving environment. By processing comprehensive data—including environmental conditions, critical perception information, and ego-vehicle states—MLLMs enhance overall situational awareness while reducing the risk of false positives and missed detections. (2) Optimized Response Time: The Dual-AEB system leverages the strengths of both the conventional AEB module and MLLM components. The conventional AEB ensures a quick initial response to imminent threats, while the MLLM component provides detailed analyses in complex scenarios. This synergistic approach minimizes response time and maximizes accuracy during critical moments. (3) Flexible Modular Design: The Dual-AEB system’s modular architecture facilitates seamless upgrades and component replacements as technology advances. This feature guarantees long-term efficacy and continuous improvement, preparing the system to meet future challenges.

为了应对这些挑战，我们提出了 双重AEB系统，它具有以下主要优势：
全面场景理解：双重AEB系统整合了先进的多模态大型语言模型（MLLMs），以实现对驾驶环境的深入理解。通过处理包括环境条件、关键感知信息和自车状态在内的全面数据，MLLMs增强了整体的情境意识，同时降低了误报和漏检的风险。
优化响应时间：双重AEB系统利用传统AEB模块和MLLM组件的优势。传统AEB确保对迫在眉睫的威胁做出快速初步响应，而MLLM组件在复杂场景中提供详细分析。这种协同方法在关键时刻最小化响应时间并最大化准确性。
灵活的模块化设计：双重AEB系统的模块化架构便于随着技术进步进行无缝升级和组件替换。这一特性保证了长期的效能和持续改进，使系统准备好应对未来的挑战。
To summarize, our contributions are as follows:
• We present Dual-AEB, the first work that integrates MLLMs to enhance conventional AEB systems by leveraging their comprehensive scene understanding to improve braking decisions.
• Our method is validated through extensive experiments on both open-loop and closed-loop benchmarks, demonstrating its effectiveness.
• Qualitative analysis on our in-house real-world scenario dataset further confirms the practicality of deploying this system.
总结来说，我们的贡献如下：

我们提出了双重AEB，这是首次将MLLMs整合到传统AEB系统中的工作，通过利用它们全面的场景理解来改善制动决策。
我们的方法通过在开放循环和封闭循环基准测试上的广泛实验得到了验证，展示了其有效性。
对我们内部真实世界场景数据集的定性分析进一步确认了部署该系统的实用性。

II. RELATED WORK

A. Autonomous Emergency Braking (AEB)

AEB systems are essential for vehicle safety, as it autonomously detects risks and activates brakes to mitigate or avoid collisions, significantly reducing traffic accident rates [1]–[5], [19]–[21]. Over time, AEB systems have evolved to utilize either decision-making-only or end-to-end methods, ensuring safety in general scenarios.
自动紧急制动（AEB）系统对于车辆安全至关重要，因为它能够自动检测风险并在检测到潜在碰撞时激活刹车以减轻或避免碰撞，显著降低交通事故率[1]–[5]、[19]–[21]。随着时间的推移，AEB系统已经发展到使用仅决策或端到端方法，以确保在一般场景中的安全。
Decision-Making-Only Methods. These methods typically rely on a limited set of closed-set perception results, such as detecting pedestrians, vehicles, and bicycles, to determine the necessity of braking actions. These decisions are often based on metrics like Time To Collision (TTC) [10]–[12], [18] or are designed using control algorithms [22]–[25], and sometimes involve learning-based approaches [13]–[15]. While these methods are straightforward and computationally efficient, they suffer from significant limitations in complex, dynamic environments [6]–[9], [26], [27]. Relying on a predefined closed-set of objects can result in the omission of vital environmental information, potentially leading to the failure of the AEB in critical situations.
仅决策方法。这些方法通常依赖于有限的封闭集感知结果，例如检测行人、车辆和自行车，以确定是否需要采取制动行动。这些决策通常基于诸如碰撞时间（TTC）[10]–[12]、[18]等指标，或使用控制算法[22]–[25]设计，有时还涉及基于学习的方法[13]–[15]。虽然这些方法简单且计算效率高，但它们在复杂、动态的环境中存在显著局限性[6]–[9]、[26]、[27]。依赖于预定义的封闭对象集可能导致遗漏关键的环境信息，这可能会导致AEB在关键时刻失败。
End-to-End Methods. End-to-end methods bypass traditional decision-making pipelines by directly using raw perception data for AEB decisions [14], [16], [17], [28]– [30]. They offer flexibility and can continuously improve with more data [31], enabling the detection and response to hazards that rule-based systems might miss. However, these approach