Javascript must be enabled to continue!
Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion
View through CrossRef
Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons for this to misalignment of views and difficulty in matching heterogeneous features. Specially, using the single-stage fusion method, it is difficult to fully fuse the features of the image and point cloud. In this work, we propose a 3D object detection network based on the multi-layer and multi-modal fusion (3DMMF) method. 3DMMF works by painting and encoding the point cloud in the frustum proposed by the 2D object detection network. Then, the painted point cloud is fed to the LiDAR-only object detection network, which has expanded channels and a self-attention mechanism module. Finally, the camera-LiDAR object candidates fusion for 3D object detection(CLOCs) method is used to match the geometric direction features and category semantic features of the 2D and 3D detection results. Experiments on the KITTI dataset (a public dataset) show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.
Title: Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion
Description:
Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other.
However, most LiDAR-only methods outperform the fusion method on the main benchmark datasets.
Current studies attribute the reasons for this to misalignment of views and difficulty in matching heterogeneous features.
Specially, using the single-stage fusion method, it is difficult to fully fuse the features of the image and point cloud.
In this work, we propose a 3D object detection network based on the multi-layer and multi-modal fusion (3DMMF) method.
3DMMF works by painting and encoding the point cloud in the frustum proposed by the 2D object detection network.
Then, the painted point cloud is fed to the LiDAR-only object detection network, which has expanded channels and a self-attention mechanism module.
Finally, the camera-LiDAR object candidates fusion for 3D object detection(CLOCs) method is used to match the geometric direction features and category semantic features of the 2D and 3D detection results.
Experiments on the KITTI dataset (a public dataset) show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.
3%.
Related Results
The Nuclear Fusion Award
The Nuclear Fusion Award
The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...
Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
Nonproliferation and fusion power plants
Nonproliferation and fusion power plants
Abstract
The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...
Synchronizability and eigenvalues of two-layer star networks
Synchronizability and eigenvalues of two-layer star networks
From the study of multilayer networks, scientists have found that the properties of the multilayer networks show great difference from those of the traditional complex networks. In...
Fusion rate: a time-to-event phenomenon
Fusion rate: a time-to-event phenomenon
Object.The term “fusion rate” is generally denoted in the literature as the percentage of patients with successful fusion over a specific range of follow up. Because the time to fu...
Multi-perspective, Multi-modal Image Registration and Fusion
Multi-perspective, Multi-modal Image Registration and Fusion
Multi-modal image fusion is an active research area with many civilian and military applications. Fusion is defined as strategic combination of information collected by various sen...
Pavement Pothole Detection Based on Cascade and Fusion Convolutional
Neural Network Using 2D Images under Complex Pavement Conditions
Pavement Pothole Detection Based on Cascade and Fusion Convolutional
Neural Network Using 2D Images under Complex Pavement Conditions
Background:
Background: The development of deep learning technology has promoted the industrial intelligence, and
automatic driving vehicles have become a hot research direction. A...
Deep learning for small object detection in images
Deep learning for small object detection in images
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] With the rapid development of deep learning in computer vision, especially deep convolutional neural network...

