告别关于基于学习的车辆运动规划的误解（PDM-Closed）

最新推荐文章于 2025-09-23 02:14:51 发布

原创

最新推荐文章于 2025-09-23 02:14:51 发布 · 8.4k 阅读

17 ·

CC 4.0 BY-SA版权

文章标签：

#学习 #汽车 #自动驾驶

Parting with Misconceptions about Learning-based Vehicle Motion Planning

告别关于基于学习的车辆运动规划的误解（PDM-Closed）

在这里插入图片描述
https://github.com/autonomousvision/tuplan_garage

Abstract

The release of nuPlan marks a new era in vehicle motion planning research, offering the first large-scale real-world dataset and evaluation schemes requiring both precise short-term planning and long-horizon ego-forecasting. Existing systems struggle to simultaneously meet both requirements. Indeed, we find that these tasks are fundamentally misaligned and should be addressed independently. We further assess the current state of closed-loop planning in the field, revealing the limitations of learning-based methods in complex real-world scenarios and the value of simple rule-based priors such as centerline selection through lane graph search algorithms. More surprisingly, for the open-loop sub-task, we observe that the best results are achieved when using only this centerline as scene context (i.e., ignoring all information regarding the map and other agents). Combining these insights, we propose an extremely simple and efficient planner which outperforms an extensive set of competitors, winning the nuPlan planning challenge 2023.
nuPlan的发布标志着车辆运动规划研究的新时代，它提供了第一个大规模的真实世界数据集和评估方案，这些方案要求同时进行精确的短期规划和长期的自我预测。现有的系统很难同时满足这两个要求。事实上，我们发现这些任务在根本上是不一致的，应该独立处理。我们进一步评估了该领域当前的闭环规划状态，揭示了基于学习方法在复杂真实世界场景中的局限性，以及简单的基于规则的先验（如通过车道图搜索算法选择中心线）的价值。更令人惊讶的是，对于开环子任务，我们观察到当只使用这个中心线作为场景上下文时（即忽略所有关于地图和其他代理的信息），能够获得最佳结果。结合这些见解，我们提出了一个非常简单高效的规划器，它超越了一系列竞争对手，赢得了2023年nuPlan规划挑战。
在这里插入图片描述

1 Introduction

Despite learning-based systems’ success in vehicle motion planning research [1, 2, 3, 4, 5], a lack of standardized large-scale datasets for benchmarking holds back their transfer from research to applications [6, 7, 8]. The recent release of the nuPlan dataset and simulator [9], a collection of 1300 hours of real-world vehicle motion data, has changed this, enabling the development of a new generation of learned motion planners, which promise reduced manual design effort and improved scalability. Equipped with this new benchmark, we perform the first rigorous empirical analysis on a large-scale, open-source, and data-driven simulator for vehicle motion planning, including a comprehensive set of state-of-the-art (SoTA) planners [10, 11, 12] using the official metrics. Our analysis yields several surprising findings:
尽管基于学习的系统在车辆运动规划研究中取得了成功[1, 2, 3, 4, 5]，但缺乏标准化的大规模数据集进行基准测试，阻碍了它们从研究到应用的转化[6, 7, 8]。最近发布的nuPlan数据集和模拟器[9]，包含了1300小时的真实世界车辆运动数据，改变了这一现状，使得新一代学习型运动规划器的开发成为可能，这些规划器承诺减少手动设计工作量并提高可扩展性。借助这一新基准，我们在大规模、开源且数据驱动的车辆运动规划模拟器上进行了首次严格的实证分析，包括使用官方指标对一系列最先进的（SoTA）规划器[10, 11, 12]进行全面评估。我们的分析得出了几个令人惊讶的发现：
Open- and closed-loop evaluation are misaligned. Most learned planners are trained through the supervised learning task of forecasting the ego vehicle’s future motion conditioned on a desired goal location. We refer to this setting as ego-forecasting [2, 3, 13, 14]. In nuPlan, planners can be evaluated in two ways: (1) in open-loop evaluation, which measures ego-forecasting accuracy using distance-based metrics or (2) in closed-loop evaluation, which assesses the actual driving performance in simulation with metrics such as progress or collision rates. Open-loop evaluation lacks dynamic feedback and can have little correlation with closed-loop driving, as previously shown on the simplistic CARLA simulator [15, 16]. Our primary contribution lies in uncovering a negative correlation between both evaluation schemes. Learned planners excel at ego-forecasting but struggle to make safe closed-loop plans, whereas rule-based planners exhibit the opposite trend.
开环和闭环评估是不一致的。大多数学习型规划器是通过监督学习任务来训练的，即预测以期望的目标位置为条件的自车未来运动。我们称这种设置为自我预测[2, 3, 13, 14]。在nuPlan中，规划器可以通过两种方式进行评估：(1) 开环评估，使用基于距离的指标测量自我预测的准确性；(2) 闭环评估，通过模拟中的实际驾驶性能来评估，如进度或碰撞率等指标。开环评估缺乏动态反馈，可能与闭环驾驶的相关性很小，正如之前在简单的CARLA模拟器上所展示的[15, 16]。我们的主要贡献在于揭示了两种评估方案之间的负相关性。学习型规划器在自我预测方面表现出色，但在制定安全的闭环计划方面却遇到困难，而基于规则的规划器则表现出相反的趋势。
Rule-based planning generalizes. We surprisingly find that an established rule-based planning baseline from over twenty years ago [17] surpasses all SoTA learning-based methods in terms of closed-loop evaluation metrics on our benchmark. This contradicts the prevalent motivating claim used in most research on learned planners that rule-based planning faces difficulties in generalization.This was previously only verified on simpler benchmarks [4, 10, 11]. As a result, most current work on learned planning only compares to other learned methods, ignoring rule-based baselines [3, 5, 18].
基于规则的规划具有泛化性。我们惊讶地发现，一个建立于二十多年前的基于规则的规划基线[17]在闭环评估指标方面超过了我们基准测试中的所有最先进的基于学习的方法。这与大多数关于学习型规划器研究中普遍使用的动机声明相矛盾，即基于规则的规划在泛化方面面临困难。这之前只在更简单的基准测试上得到了验证[4, 10, 11]。因此，目前大多数关于学习型规划的工作只与其他学习方法进行比较，忽略了基于规则的基线[3, 5, 18]。
A centerline is all you need for ego-forecasting. We implement a naïve learned planning baseline which does not incorporate any input about other agents in the scene and merely extrapolates the ego state given a centerline representation of the desired route. This baseline sets the new SoTA for open-loop evaluation on our benchmark. It does not require intricate scene representations (e.g. lane graphs, vectorized maps, rasterized maps, tokenized objects), which have been the central subject of inquiry in previous work [10, 11, 12]. None of these prior studies considered a simple centerline-only representation as a baseline, perhaps due to its extraordinary simplicity.
对于自我预测，中心线就足够了。我们实现了一个简单的学习型规划基线，它不包含场景中其他代理的任何输入，只是根据期望路线的中心线表示，对自车状态进行外推。这个基线为我们的基准测试中的开环评估设定了新的SoTA（State of the Art）。它不需要复杂的场景表示（例如车道图、矢量化地图、光栅化地图、标记化对象），这些一直是之前研究的核心主题[10, 11, 12]。这些先前的研究没有考虑仅使用中心线表示作为基线，可能是因为它的简单性非同寻常。

但是也就意味着自车只能加减速，不能做横向的换道、绕行等操作

Our contributions are as follows: (1) We demonstrate and analyze the misalignment between openand closed-loop evaluation schemes in planning. (2) We propose a lightweight extension of IDM [17] with real-time capability that achieves state-of-the-art closed-loop performance. (3) We conduct experiments with an open-loop planner, which is only conditioned on the current dynamic state and a centerline, showing that it outperforms sophisticated models with complex input representations. (4) By combining both models into a hybrid planner, we establish a simple baseline that outperformed 24 other, often learning-based, competing approaches and claimed victory in the nuPlan challenge 2023.
我们的贡献如下：

我们展示了规划中开环和闭环评估方案之间的不一致性，并进行了分析。
我们提出了一个轻量级的IDM[17]扩展，具有实时能力，实现了最先进的闭环性能。
我们使用一个仅依赖于当前动态状态和中心线的开环规划器进行实验，表明它超越了具有复杂输入表示的复杂模型。
通过将这两种模型结合到一个混合规划器中，我们建立了一个简单的基线，它超越了其他 24 种通常基于学习的竞争方法，并在2023年nuPlan挑战中获胜。

2 Related Work

Rule-based planning. Rule-based planners offer a structured, interpretable decision-making framework [17, 19, 20, 21, 22, 23, 24, 25, 26]. They employ explicit rules to determine an autonomous vehicle’s behavior (e.g., brake when an object is straight ahead). A seminal approach in rule-based planning is the Intelligent Driver Model (IDM [17]), which is designed to follow a leading vehicle in traffic while maintaining a safe distance. There exist extensions of IDM [27] which focus on enabling lane changes on highways. However, this is not the goal of our work. Instead, we extend IDM by executing multiple policies with different hyperparameters, and scoring them to select the best option.
基于规则的规划。 基于规则的规划器提供了一个结构化、可解释的决策框架[17, 19, 20, 21, 22, 23, 24, 25, 26]。它们使用明确规则来确定自动驾驶车辆的行为（例如，当正前方有物体时刹车）。在基于规则的规划中，一个开创性的方法是智能驾驶模型（IDM [17]），它旨在在交通中跟随前车，同时保持安全距离。存在IDM的扩展[27]，专注于在高速公路上启用变道。然而，这并不是我们工作的目标。相反，我们通过执行具有不同超参数的多个策略，并对它们进行评分以选择最佳选项，来扩展IDM。
Prior work also combines rule-based decision-making with learned components, e.g., with learned agent forecasts [28], affordance indicators [23, 24], cost-based imitation learning [4, 29, 30, 31, 32], or learning-based planning with rule-based safety filtering [33]. These hybrid planners often forecast future environmental states, enabling informed and contingent driving decisions. This forecasting can either be agent-centric [34, 35, 36], where trajectories are determined for each actor, or environmentcentric [4, 31, 30, 29, 37, 38], involving occupancy or cost maps. Additionally, forecasting can be conditioned on the ego-plan, modeling the ego vehicle’s influence on the scene’s future [39, 40, 41, 42]. We employ an agent-centric forecasting module that is considerably simpler than existing methods, allowing for its use as a starting point in the newly released nuPlan framework.
先前的工作 也将基于规则的决策与学习组件相结合，例如，结合学习型代理预测[28]、可负担性指标[23, 24]、基于成本的模仿学习[4, 29, 30, 31, 32]，或基于规则的安全过滤的基于学习规划[33]。这些混合规划器通常预测未来环境状态，从而实现知情和有条件的驾驶决策。这种预测可以是代理中心的[34, 35, 36]，即为每个参与者确定轨迹，或者是环境中心的[4, 31, 30, 29, 37, 38]，涉及占用或成本地图。此外，预测可以基于自我规划，模拟自车对场景未来的影响[39, 40, 41, 42]。我们采用了一个代理中心的预测模块，它比现有方法简单得多，允许其作为新发布的nuPlan框架中的起点使用。
Ego-forecasting. Unlike predictive planning, ego-forecasting methods use observational data to directly determine the future trajectory. Ego-forecasting approaches include both end-to-end methods [43] that utilize LiDAR scans [44, 45], RGB images [46, 47, 48, 49, 14, 50] or both [13, 51, 5, 52], as well as modular methods involving lower-dimensional inputs like bird’s eye view (BEV) grids or state vectors [24, 53, 11, 54, 55, 56]. A concurrent study introduces a naive MLP inputting the current dynamic state, yielding competitive ego-forecasting results on the nuScenes dataset [57] with no scene context input [58]. Our findings complement these results, differing by evaluating long-term (8s) ego-forecasting in the challenging 2023 nuPlan challenge scenario test distribution [9]. We show that in this setting, completely removing scene context (as in [58]) is harmful, whereas a simple centerline representation of the context is sufficient for strong open-loop performance.
自我预测（Ego-forecasting）。 与预测性规划不同，自我预测方法使用观测数据直接确定未来轨迹。自我预测方法包括端到端方法[43]，这些方法利用激光雷达扫描[44, 45]、RGB图像[46, 47, 48, 49, 14, 50]或两者兼有[13, 51, 5, 52]，以及涉及更低维度输入的模块化方法，如鸟瞰图（BEV）网格或状态向量[24, 53, 11, 54, 55, 56]。一项并行研究引入了一个简单的多层感知器（MLP），输入当前的动态状态，在nuScenes数据集[57]上取得了有竞争力的自我预测结果，而没有场景上下文输入[58]。我们的发现补充了这些结果，不同之处在于评估了2023年nuPlan挑战场景测试分布[9]中长期的（8秒）自我预测。我们展示了在这种设置中，完全移除场景上下文（如[58]中）是有害的，而一个简单的中心线表示的上下文对于强大的开环性能来说是足够的。

3 Ego-forecasting and Planning are Misaligned

In this section, we provide the relevant background regarding the data-driven simulator nuPlan [9]. We describe two baselines for a preliminary experiment to demonstrate that although ego-forecasting and planning are often considered related tasks, they are not well-aligned given their definitions on nuPlan. Improvements in one task can often lead to degradation in the other.
在本节中，我们提供了有关数据驱动模拟器nuPlan[9]的相关背景。我们描述了两个基线，用于初步实验，以证明尽管自我预测和规划通常被认为是相关任务，但根据nuPlan上的定义，它们并没有很好地对齐。一个任务的改进往往会导致另一个任务的退化。

3.1 Background

nuPlan. The nuPlan simulator is the first publicly available real-world planning benchmark and enables rapid prototyping and testing of motion planners. nuPlan constructs a simulated environment as closely as possible to a real-world driving setting through data-driven simulation [59, 60, 61, 62, 63, 64, 65]. This method extracts road maps, traffic patterns, and object properties (positions, orientations, and speeds) from a pre-recorded dataset consisting of 1,300 hours of real-world driving. These elements are then used to initialize scenarios, which are 15-second simulations employed to assess open-loop and closed-loop driving performance. Hence, in simulation, our methods rely on access to detailed HD map information and ground-truth perception. i.e., no localization errors, map imperfections, or misdetections are considered. In open-loop simulation, the entire log is merely replayed (for both the ego vehicle and other actors). Conversely, in closed-loop simulation, the ego vehicle operates under the control of the planner being tested. There are two versions of closed-loop simulation: non-reactive, where all other actors are replayed along their original trajectory, and reactive, where other vehicles employ an IDM planner [17], which detail in the following.
nuPlan模拟器。 nuPlan模拟器是第一个公开可用的真实世界规划基准，它能够快速原型制作和测试运动规划器。nuPlan通过数据驱动的模拟[59, 60, 61, 62, 63, 64, 65]尽可能地构建一个接近真实驾驶环境的模拟环境。这种方法从包含1300小时真实驾驶的预录数据集中提取道路地图、交通模式和对象属性（位置、方向和速度）。然后使用这些元素来初始化场景，这些场景是用于评估开环和闭环驾驶性能的15秒模拟。因此，在模拟中，我们的方法依赖于访问详细的高精地图信息和地面真实感知，即不考虑定位误差、地图不完善或误检。
在开环模拟中，整个日志仅仅是重播的（对于自车和其他参与者都是如此）。相反，在闭环模拟中，自车在被测试的规划器的控制下运行。闭环模拟有两个版本：非反应性的，所有其他参与者都沿着他们的原始轨迹重播；反应性的，其他车辆采用 IDM 规划器[17]，下面将详细说明。
Metrics. nuPlan offers three official evaluation metrics: open-loop score (OLS), closed-loop score non-reactive (CLS-NR), and closed-loop score reactive (CLS-R). Although CLS-NR and CLS-R are computed identically, they differ in background traffic behavior. Each score is a weighted average of sub-scores that are multiplied by a set of penalties. In OLS