Javascript must be enabled to continue!

Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons for this to misalignment of views and difficulty in matching heterogeneous features. Specially, using the single-stage fusion method, it is difficult to fully fuse the features of the image and point cloud. In this work, we propose a 3D object detection network based on the multi-layer and multi-modal fusion (3DMMF) method. 3DMMF works by painting and encoding the point cloud in the frustum proposed by the 2D object detection network. Then, the painted point cloud is fed to the LiDAR-only object detection network, which has expanded channels and a self-attention mechanism module. Finally, the camera-LiDAR object candidates fusion for 3D object detection(CLOCs) method is used to match the geometric direction features and category semantic features of the 2D and 3D detection results. Experiments on the KITTI dataset (a public dataset) show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.

MDPI AG

Wenming Zhu Jia Zhou Zizhe Wang Xuehua Zhou Feng Zhou Jingwen Sun Mingrui Song Zhiguo Zhou

Electronics

2024

Title: Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

Description:

Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other.

However, most LiDAR-only methods outperform the fusion method on the main benchmark datasets.

Current studies attribute the reasons for this to misalignment of views and difficulty in matching heterogeneous features.

Specially, using the single-stage fusion method, it is difficult to fully fuse the features of the image and point cloud.

In this work, we propose a 3D object detection network based on the multi-layer and multi-modal fusion (3DMMF) method.

3DMMF works by painting and encoding the point cloud in the frustum proposed by the 2D object detection network.

Then, the painted point cloud is fed to the LiDAR-only object detection network, which has expanded channels and a self-attention mechanism module.

Finally, the camera-LiDAR object candidates fusion for 3D object detection(CLOCs) method is used to match the geometric direction features and category semantic features of the 2D and 3D detection results.

Experiments on the KITTI dataset (a public dataset) show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.

3%.

Back

Related Results

The Nuclear Fusion Award

The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

Nonproliferation and fusion power plants

Abstract The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...

Synchronizability and eigenvalues of two-layer star networks

From the study of multilayer networks, scientists have found that the properties of the multilayer networks show great difference from those of the traditional complex networks. In...

Fusion rate: a time-to-event phenomenon

Object.The term “fusion rate” is generally denoted in the literature as the percentage of patients with successful fusion over a specific range of follow up. Because the time to fu...

GINMCL: graph isomorphism network-driven modality enhancement and cross-modal consistency learning for multi-modal fake news detection

Multi-modal fake news detection is a technique designed to identify and classify fake news by integrating information from multiple modalities. However, existing multi-modal fake n...

Multi-perspective, Multi-modal Image Registration and Fusion

Multi-modal image fusion is an active research area with many civilian and military applications. Fusion is defined as strategic combination of information collected by various sen...

Deep learning for small object detection in images

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] With the rapid development of deep learning in computer vision, especially deep convolutional neural network...

Email:
Password:

Email: