DiffusionDriveV2核心代码解析

转载于 2025-12-22 11:23:00 发布 · 10 阅读

CC 4.0 BY-SA版权

原文链接：https://mp.weixin.qq.com/s?__biz=Mzg2NzUxNTU1OA==&mid=2247688706&idx=1&sn=ddc844daaeb5353dec987a0973e8f3e0&chksm=cf147081cb974ca3ded9f579b66bba3e39e81f391a8a348d49575c0f067b596a7ab3074a0b72&scene=126&sessionid=0

作者 | wenbo9 编辑 | 自动驾驶之心

原文链接：https://zhuanlan.zhihu.com/p/1982760169126510973

点击下方卡片，关注“自动驾驶之心”公众号

戳我-> 领取自动驾驶近30个方向学习路线

>>自动驾驶前沿信息获取→自动驾驶之心知识星球

本文只做学术分享，如有侵权，联系删文

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

https://github.com/hustvl/DiffusionDrive

https://github.com/hustvl/DiffusionDriveV2

DiffusionDrive的整体架构

DiffusionDriveV2: Reinforcement Learning-Constrained Truncated DiffusionModeling in E2E AD

整体架构

环境编码（bev和自车状态）

★
参考文献：TransFuser代码
TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving

★
参考文献：GoalFlow代码
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

# bev_feature_upscale(bz,64,64,64), bev_feature(bz,512,8,8)
bev_feature_upscale, bev_feature, _ = self._backbone(camera_feature, lidar_feature)
bev_feature = self._bev_downscale(bev_feature).flatten(-2, -1).permute(0, 2, 1) # (bz,64,256)

#自车当前状态编码
status_encoding = self._status_encoding(status_feature) # (1,256)
keyval = torch.concatenate([bev_feature, status_encoding[:, None]], dim=1)  # (bz,64 + 1,256)
keyval += self._keyval_embedding.weight[None, ...]

query = self._query_embedding.weight[None, ...].repeat(batch_size, 1, 1)
# 参考detr目标检测，直接预测自车和多个障碍物
query_out = self._tf_decoder(query, keyval)
trajectory_query, agents_query = query_out.split(self._query_splits, dim=1)

Trajectory Planning Module

多尺度bev特征

concat_cross_bev = keyval[:,:-1].permute(0,2,1).contiguous().view(batch_size, -1, concat_cross_bev_shape[0], concat_cross_bev_shape[1])
# upsample to the same shape as bev_feature_upscale
concat_cross_bev = F.interpolate(concat_cross_bev, size=bev_spatial_shape, mode='bilinear', align_corners=False)

cross_bev_feature = bev_feature_upscale
cross_bev_feature = torch.cat([concat_cross_bev, cross_bev_feature], dim=1)
cross_bev_feature = self.bev_proj(cross_bev_feature.flatten(-2,-1).permute(0,2,1))
cross_bev_feature = cross_bev_feature.permute(0,2,1).contiguous().view(batch_size, -1, bev_spatial_shape[0], bev_spatial_shape[1])

planner anchor归一化+加噪+反归一化

获取anchors: 首先在训练数据集上对自车未来轨迹的真值进行K-Means聚类，得到一系列anchors ，每一个anchor 是一条轨迹线：。
在anchors上加入高斯噪声: 传统的扩散步骤是不断地在每一个都加入一些噪声，而在本文中是基于anchors的位置加入高斯噪声的。

其中表示第个step。

plan_anchor = np.load(plan_anchor_path) # [bs, 20,8,2]
self.plan_anchor = nn.Parameter(torch.tensor(plan_anchor, dtype=torch.float32), requires_grad=False,) 

plan_anchor = self.plan_anchor.unsqueeze(0).repeat(bs,1,1,1)
odo_info_fut = self.norm_odo(plan_anchor)
timesteps = torch.randint(0, 50, (bs,), device=device )
noise = torch.randn(odo_info_fut.shape, device=device)
noisy_traj_points = self.diffusion_scheduler.add_noise(original_samples=odo_info_fut, noise=noise, timesteps=timesteps,).float()
noisy_traj_points = torch.clamp(noisy_traj_points, min=-1, max=1)
noisy_traj_points = self.denorm_odo(noisy_traj_points)

投射带噪声的planner anchor轨迹点的pos embed为query

traj_pos_embed = gen_sineembed_for_position(noisy_traj_points,hidden_dim=64) # [bs, 20, 8, 64]
traj_pos_embed = traj_pos_embed.flatten(-2) # [bs, 20, 512]
traj_feature = self.plan_anchor_encoder(traj_pos_embed)
traj_feature = traj_feature.view(bs,noisy_traj_points.shape[1],-1)

对时间进行编码

time_embed = self.time_mlp(timesteps)
time_embed = time_embed.view(bs,1,-1)

轨迹预测（前向加噪）

轨迹和bev特征的cross attention

traj_feature = self.cross_bev_attention(traj_feature,noisy_traj_points,bev_feature,bev_spatial_shape)

attention_weights = self.attention_weights(queries)
attention_weights = attention_weights.view(bs, num_queries, num_points).softmax(-1)
attention_weights = attention_weights.unsqueeze(1)

value = self.value_proj(bev_feature)
grid = normalized_trajectory.view(bs, num_queries, num_points, 2)
# Sample features
sampled_features = torch.nn.functional.grid_sample(value, grid,  mode='bilinear', 
   padding_mode='zeros', align_corners=False) # bs, C, num_queries, num_points

out = (attention_weights * sampled_features).sum(dim=-1)
out = out.permute(0, 2, 1).contiguous()  # bs, num_queries, C
out = self.output_proj(out)

轨迹和agents_query特征的cross attention

self.cross_agent_attention(traj_feature, agents_query,agents_query)[0]

轨迹和ego_query特征的cross attention

ego_query=trajectory_query
self.cross_ego_attention(traj_feature, ego_query,ego_query)[0]

轨迹和时间t特征的尺度和偏移融合

# feedforward network
traj_feature = self.norm3(self.ffn(traj_feature))
traj_feature = self.time_modulation(traj_feature, time_embed,global_cond=None,global_img=global_img)

self.scale_shift_mlp = nn.Sequential(nn.Mish(), nn.Linear(condition_dims, embed_dims*2),)
scale_shift = self.scale_shift_mlp(global_feature)
scale,shift = scale_shift.chunk(2,dim=-1)
traj_feature = traj_feature * (1 + scale) + shift

轨迹分类分数和去噪轨迹预测

将扩散（加噪声）后得到的个带噪声的轨迹输入到diffusion decoder 中，预测分类分数和去噪后的轨迹，整个过程可以用公式表述为：

其中是conditional information，本文中用的是感知特征来进行控制。

poses_reg, poses_cls = self.task_decoder(traj_feature) #bs,20,8,3; bs,20

traj_feature = traj_feature.view(bs, ego_fut_mode,-1)
plan_cls = self.plan_cls_branch(traj_feature).squeeze(-1)
traj_delta = self.plan_reg_branch(traj_feature)
plan_reg = traj_delta.reshape(bs,ego_fut_mode, self.ego_fut_ts, 3)
# 最终轨迹 = 轨迹偏移 + 原始轨迹
poses_reg[...,:2] = poses_reg[...,:2] + noisy_traj_points
poses_reg[..., StateSE2Index.HEADING] = poses_reg[..., StateSE2Index.HEADING].tanh() * np.pi

Mode Selector

基本机制流程

轨迹生成（反向去燥）：根据plan_anchor，记录网络输出的所有中间去燥的结果，以及最终的去燥结果。

plan_anchor = self.plan_anchor.unsqueeze(0).unsqueeze(0).repeat(bs,num_groups, 1, 1, 1)  
plan_anchor = plan_anchor.view(bs, num_groups * self.ego_fut_mode, *plan_anchor.shape[3:]) # bs num_groups * 20, 8, 2
diffusion_output = self.norm_odo(plan_anchor)
trunc_timesteps = torch.ones((bs,), device=device, dtype=torch.long) * 8
diffusion_output = self.diffusionrl_scheduler.add_noise(original_samples=diffusion_output, noise=noise, timesteps=trunc_timesteps)

# roll_timesteps [18, 16, 14, 12, 10, 8, 6, 4, 2, 0]
roll_timesteps = (np.arange(0, step_num) * step_ratio).round()[::-1].copy().astype(np.int64)
roll_timesteps = torch.from_numpy(roll_timesteps).to(device)

for i, k in enumerate(roll_timesteps[:]):
    poses_reg_list, poses_cls_list = self.diff_decoder(
        traj_feature, noisy_traj_points, bev_feature, bev_spatial_shape,
        agents_query, ego_query, time_embed, status_encoding, global_img)

    prev_sample, log_prob, _ = self.diffusionrl_scheduler.step(
        model_output=x_start, timestep=k, sample=diffusion_output, eta=eta,)
    diffusion_output = prev_sample
    all_diffusion_output.append(prev_sample) # BG N 8 2

使用PDM评分器计算网络输出的所有模态的轨迹以及真值轨迹的cost：

scores_np - 主评分矩阵：基于PDM评分器对轨迹的多个维度进行评估后的综合分数

metric_cache - 指标缓存：每个场景计算过程中用到的中间结果和参考数据（避免重复计算）

sub_scores - 子项评分：轨迹在不同维度上的细分评分

# 安全性检查（碰撞检测、驶出道路等）
safety_score = check_safety(traj, scenario_data['agents'], scenario_data['map'])
# 舒适性评估（加速度、加加速度、曲率连续性）
comfort_score = check_comfort(traj)
# 规则遵守（交通灯、车道保持、速度限制）
rule_score = check_rules(traj, scenario_data['traffic_rules'])
# 进度评估（是否到达目标）
progress_score = check_progress(traj, scenario_data['goal'])
# 物理可行性（动力学约束）
feasibility_score = check_feasibility(traj, vehicle_dynamics)
# 综合评分（加权平均）
total_score = (
            safety_weight * safety_score +
            comfort_weight * comfort_score +
            rule_weight * rule_score +
            progress_weight * progress_score +
            feasibility_weight * feasibility_score
        )

我们提出锚点内GRPO（Intra-Anchor GRPO）。对于每个锚点，首先通过随机高斯噪声和探索噪声对其进行扩散，生成个轨迹变体组成的组；随后在该组内执行GRPO更新，而非跨不同锚点的组进行更新。该方法将策略优化约束在每个特定行为意图的状态空间内，引导模型生成更安全、更面向目标的轨迹，同时不损害其多模态能力。强化学习损失函数可表示为：

其中为折扣系数，用于缓解早期去噪步骤中的不稳定性；为优势函数，GRPO通过计算组相对优势进行估计，无需价值模型：

为基于最终去噪轨迹计算的单一奖励估计值，该奖励被应用于扩散链中的所有去噪步骤，且每个步骤的影响通过去噪折扣进行缩放。

# 计算生成轨迹的基础优势函数
reward_group = reward_group.view(bs, num_groups, self.ego_fut_mode)  # (B,G,N)
mean_grouped_rewards = reward_group.mean(dim=1)
std_grouped_rewards = reward_group.std(dim=1)
advantages = (reward_group - mean_grouped_rewards.unsqueeze(1)) / (std_grouped_rewards.unsqueeze(1) + 1e-4)

# 只保留 “好于 GT” 的正向样本，对比学习：通过与ground truth对比来定义"好"轨迹
mask_positive = (reward_group > (reward_gt-1e-6))                  # (B,G) bool
advantages = advantages.clamp(min=0) * mask_positive.float()       # 负 adv 归 0

锚点内GRPO虽能防止模式崩溃，但完全隔离不同模式会引发新问题：优势估计丧失全局可比性。例如，某一模式中次优但安全的轨迹可能获得负优势，而另一模式中危险且存在碰撞的轨迹若为其组内“最优”样本，则可能获得正优势。这种依赖局部组内比较的方式会向模型传递误导性学习信号。

具体实现方式为修改锚点内GRPO的优势估计：将所有负优势截断为0，并对存在碰撞的轨迹分配-1的强惩罚：

这一设计为模型提供了清晰且一致的学习信号。随后，该截断优势将替代公式(7)中的用于强化学习损失计算。

# 根据子奖励调节优势
if k == 'no_collision' or k == 'drivable_area':
    # 对于安全性指标: 如果不满足条件(值不为1)，优势设为-1
    zero_mask = (v != 1)
    advantages = torch.where(zero_mask, torch.full_like(advantages, -1.0), advantages)

# 创建时间折扣因子
# discount: [0.8^3, 0.8^2, 0.8^1, 0.8^0] = [0.512, 0.64, 0.8, 1.0]
discount = torch.tensor(
    [
        0.8 ** (step_num - i - 1)  # 指数衰减: 越远的未来折扣越大
        for i in range(step_num)
    ]
).to(advantages.device)  # 形状: (T,)
# 未来时间步的奖励权重降低
advantages = advantages * discount  # (B, M, T)

Loss

DiffusionDrive采用DDIM更新规则，将去噪步数大幅减少。该更新规则通常通过设置作为确定性采样器使用。为实现更广泛的探索并避免狄拉克分布下的似然计算问题，我们在训练阶段设置以引入探索噪声（等价于采用DDPM），而在验证阶段保持以实现确定性推理。然而，由于轨迹的近端段与远端段存在固有尺度不一致性，直接在每个点施加加法高斯噪声会破坏轨迹的结构完整性，降低探索质量。

如图3(a)所示，对标准化轨迹施加加法高斯噪声后，生成的探索路径通常呈锯齿状（类似折线），丧失了原始轨迹的平滑性。为保留轨迹连贯性，我们提出仅添加两个乘法高斯噪声（一个纵向噪声、一个横向噪声），其表达式为，其中。这种尺度自适应乘法噪声确保生成的探索路径保持平滑，如图3(b)所示。

在强化学习中，引入探索噪声是推动模型探索新行为的重要手段，

加性噪声：轨迹的近端（proximal segments）和远端（distal segments）存在固有的尺度不一致性。简单地在每个点上添加相同的加性高斯噪声，会导致轨迹的结构完整性被破坏，使其变得不连贯且“锯齿状”（jagged），如图 3(a) 所示。这会降低探索的质量
乘性噪声：它不是直接添加噪声，而是通过一个乘法因子来调整轨迹的每个点。具体来说，它只添加两个乘法高斯噪声：一个沿纵向（longitudinal），一个沿横向（lateral）。这种乘法形式使得噪声的大小能够根据轨迹自身的尺度自动调整。对于轨迹上距离较远、坐标值较大的点，相同的乘法噪声会产生更大的绝对扰动，反之亦然，从而自然地保持轨迹的连贯性和平滑性，如图 3(b) 所示。

if prev_sample is None:
    # 乘性噪声
    variance_noise_horizon = randn_tensor(
        [model_output.shape[0],model_output.shape[1],1,1], generator=generator, device=model_output.device, dtype=model_output.dtype
    ) * std_dev_t_mul + 1.0
    variance_noise_vert = randn_tensor(
        [model_output.shape[0],model_output.shape[1],1,1], generator=generator, device=model_output.device, dtype=model_output.dtype
    ) * std_dev_t_mul + 1.0

    variance_noise_mul = torch.cat((variance_noise_horizon,variance_noise_vert),dim=-1)
    variance_noise_mul = variance_noise_mul.repeat(1,1,model_output.shape[2],1)

    # 加性噪声
    variance_noise_x = randn_tensor(
        [model_output.shape[0],model_output.shape[1],1,1], generator=generator, device=model_output.device, dtype=model_output.dtype
    )
    variance_noise_y = randn_tensor(
        [model_output.shape[0],model_output.shape[1],1,1], generator=generator, device=model_output.device, dtype=model_output.dtype
    )
    variance_noise_add = torch.cat((variance_noise_x,variance_noise_y),dim=-1)
    variance_noise_add = variance_noise_add.repeat(1,1,model_output.shape[2],1)

    prev_sample = prev_sample_mean * variance_noise_mul + std_dev_t_add * variance_noise_add

根据计算的每一步加噪输出，计算log_prob？

std_dev_t_mul = torch.clip(std_dev_t, min=0.1)
log_prob = (
    -((prev_sample.detach() - prev_sample_mean) ** 2) / (2 * (std_dev_t_mul**2))
    - torch.log(std_dev_t_mul)
    - torch.log(torch.sqrt(2 * torch.as_tensor(math.pi)))
)   
log_prob = log_prob.sum(dim=(-2, -1))

此外，与原始GRPO通过添加策略模型与参考模型之间的KL散度实现正则化类似，我们引入额外的模仿学习损失，以防止模型过拟合并保障其通用驾驶能力。组合损失函数为

其中为权重系数。

真值监督：先用anchors和真值匹配，得到和真值最接近的anchor，然后这个anchor周围的“噪声轨迹”作为正样本( )，其他的则看作是负样本( )

target_traj = targets["trajectory"]
dist = torch.linalg.norm(target_traj.unsqueeze(1)[...,:2] - plan_anchor, dim=-1)
dist = dist.mean(dim=-1)
mode_idx = torch.argmin(dist, dim=-1)
mode_idx = mode_idx[...,None,None,None].repeat(1,1,ts,d)
best_reg = torch.gather(poses_reg, 1, mode_idx).squeeze(1)
target_classes_onehot = torch.zeros([bs, num_mode],
   dtype=poses_cls.dtype, layout=poses_cls.layout, device=poses_cls.device)
target_classes_onehot.scatter_(1, cls_target.unsqueeze(1), 1)

loss_cls = self.cls_loss_weight * py_sigmoid_focal_loss(
   poses_cls, target_classes_onehot, weight=None,
   gamma=2.0, alpha=0.25, reduction='mean', avg_factor=None)

loss由“轨迹恢复”和“分类置信度”两部分组成：是loss权重

reg_loss = self.reg_loss_weight * F.l1_loss(best_reg, target_traj)

自动驾驶之心

自动驾驶知识星球交流社区