SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation

最新推荐文章于 2024-04-09 00:58:18 发布

64318@461

最新推荐文章于 2024-04-09 00:58:18 发布

阅读量2.5k

点赞数 1

分类专栏：感知文章标签：目标检测 3d

本文链接：https://blog.youkuaiyun.com/weixin_56836871/article/details/122527251

版权

SMOKE是一种创新的单阶段3D目标检测方法，它摒弃了传统的2D提案网络，转而使用关键点估计与3D变量回归。通过多步解耦策略，显著提高了训练效率和检测精度。该方法使用DLA-34作为Backbone，通过关键点分支和回归分支联合预测对象的3D边界框。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

动机：

in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which signifificantly improves both training convergence and detection accuracy.
【CC】首先对于3D BB位置预测放弃了2D BB的RPN，使用keypointde+3D变量回归的方式. 其次，对于3D BB的构造采用解耦的多阶段方式提升了训练的便利性和精度

形式化描述：

Given a single RGB image I ∈ R W×H×3, with W being the width and H being the height of the image, find for each present object its category label C and its 3D bounding box B, where the latter is parameterized by 7 variables (h, w, l, x, y, z, θ). Here, (h, w, l) represent
the height, weight, and length of each object in meters, and (x, y, z) is the coordinates (in meters) of the object center in the camera coordinate frame. Variable θ is the yaw orientation of the corresponding cubic box.
【CC】输入图片I，输出类别C和3D-BB B；B表示为7维变量 (h, w, l, x, y, z, θ)，其中(h, w, l)表示高/宽/长，(x, y, z) 相机坐标系下（其实就是自车坐标系）的中心点，θ表示航向角

网络架构：

Figure 2. Network Structure of SMOKE. We leverage DLA-34 [41] to extract features from images. The size of the feature map is 1:4 due to downsampling by 4 of the original image. Two separate branches are attached to the feature map to perform keypoint classification(pink) and 3D box regression (green) jointly. The 3D bounding box is obtained by combining information from two branches.
【CC】Backbone是DLA-34，1/4的下采样；两个Header分别是keypoint 分类/3D-BB回归