•  
  •  
 

Journal of System Simulation

Abstract

Abstract: As the 3D object detection based on point clouds shows an incapacity of feature extraction and incongruity between classification and regression, this research introduces a novel ResCST architecture based on the SECOND network. It incorporates residual connections into the 3D sparse convolutional layer, with the advantages of capturing long-distance dependent relation by SwinTransformer and obtaining local features by convolutional neural network integrated, proposing the CNN-SwinTransformer hybrid model for enhanced feature extraction. It introduces the RCIoU method for the joint optimization of classification and regression tasks. The experimental results show that the model achieves a 3D detection accuracy of 91.21%, 82.97%, and 80.28% under easy, moderate, and hard levels in detecting cars of the KITTI dataset respectively. The proposed method significantly improves the performance of detecting hard-level targets at an inference speed of 25 frames per second. The proposed ResCST architecture achieves a good balance between accuracy and efficiency.

First Page

2616

Last Page

2630

CLC

TP391.9

Recommended Citation

Lu Bin, Wang Minghan, Sun Yang, et al. Global-local Fusion for Efficient 3D Object Detection[J]. Journal of System Simulation, 2024, 36(11): 2616-2630.

Corresponding Author

Wang Minghan

DOI

10.16182/j.issn1004731x.joss.23-0926

Share

COinS