Journal of System Simulation
Abstract
Abstract: As the 3D object detection based on point clouds shows an incapacity of feature extraction and incongruity between classification and regression, this research introduces a novel ResCST architecture based on the SECOND network. It incorporates residual connections into the 3D sparse convolutional layer, with the advantages of capturing long-distance dependent relation by SwinTransformer and obtaining local features by convolutional neural network integrated, proposing the CNN-SwinTransformer hybrid model for enhanced feature extraction. It introduces the RCIoU method for the joint optimization of classification and regression tasks. The experimental results show that the model achieves a 3D detection accuracy of 91.21%, 82.97%, and 80.28% under easy, moderate, and hard levels in detecting cars of the KITTI dataset respectively. The proposed method significantly improves the performance of detecting hard-level targets at an inference speed of 25 frames per second. The proposed ResCST architecture achieves a good balance between accuracy and efficiency.
Recommended Citation
Lu, Bin; Wang, Minghan; Sun, Yang; and Yang, Zhenyu
(2024)
"Global-local Fusion for Efficient 3D Object Detection,"
Journal of System Simulation: Vol. 36:
Iss.
11, Article 10.
DOI: 10.16182/j.issn1004731x.joss.23-0926
Available at:
https://dc-china-simulation.researchcommons.org/journal/vol36/iss11/10
First Page
2616
Last Page
2630
CLC
TP391.9
Recommended Citation
Lu Bin, Wang Minghan, Sun Yang, et al. Global-local Fusion for Efficient 3D Object Detection[J]. Journal of System Simulation, 2024, 36(11): 2616-2630.
DOI
10.16182/j.issn1004731x.joss.23-0926
Included in
Artificial Intelligence and Robotics Commons, Computer Engineering Commons, Numerical Analysis and Scientific Computing Commons, Operations Research, Systems Engineering and Industrial Engineering Commons, Systems Science Commons