【3D 目标检测】2019 CVPR Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles

最新推荐文章于 2025-08-28 12:15:30 发布

原创

最新推荐文章于 2025-08-28 12:15:30 发布 · 2.1k 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#3D #3D object detection #3D目标检测 #自动驾驶 #autonomous vehicles

该研究提出了一种利用2D图像进行3D目标检测的方法，通过学习神经网络将2D图像提升到3D表示。对比了仅使用2D图像和利用3D数据的检测方法，发现使用3D数据的方法表现更优。研究中，利用生成对抗网络（GANs）从2D图像生成鸟瞰图（BEV），然后结合现有的3D检测网络进行3D对象检测和定位。实验表明，即使在没有实际3D输入的情况下，该方法也能实现高性能，且可作为现有3D检测平台的即插即用模块。

CVPR 2019

Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles

3D object detection

2D monocular images
autonomous driving scenarios

Proposal

lift the 2D images to 3D representations using learned neural network
3D representations using state-of-the-art GANs
leverage existing networks workding directly on 3D data to perform 3D object detection and localization
3D data for ground plane estimation using recent 3D networks

Results

highter results than many methods working on actual 3D inputs acquired from physical sensors
a late fusion of the output of the network trained on
- generated 3D image
- real 3D image

improve performance

Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles

Introduction
Related work
Approach
Experiments
Conclusion

Introduction

Two approaches have been widespread for 3D object detection problems

to detect objects in 2D using monocular images and then infer in 3D
to use 3D data (e.g. LiDAR(激光雷达)) to detect bounding boxes directly in 3D
- MV3D
  [CVPR, 2017] Multi-view 3d object detection network for autonomous driving
Compare the two methods
- the methods based on 2D monocular images significantly lag behind the the method use 3D data
  - methods based on monocular images attempt at implicitly inferring 3D information from the input
  - availability of depth information (derived or explicit)
    greatly increases the performance of methods that use 3D data
- a monocular image based 3D object detection method will be highly practical
  - closing the gap in performance with the methods requiring explicit 3D data
  - cheaper and lighter 2D cameras
  - expensive and bulky 3D scanners

Our Results are of importance as

(i) only using monocular images at inference
the efforts that are directed towards collecting high quality 3D data can help in scenarios where explicit 3D data cannot be acquired at test time.
(ii) the method can be used as a plug-and-play module
with any existing 3D method which works with BEV images, allowing operations with seamless switching between RGB and 3D scanners while leveraging the same underlying object detection platform.

This paper refers to the following methods