The monocular 3D object detection methods based on Transformer have recently progressed significantly. However, most existing methods struggle to effectively handle fine-grained objects and complex scenes, particularly when capturing the features of occluded or small objects. To tackle these issues, we propose a monocular 3D object detector, CU-DETR, based on the MonoDETR framework. CU-DETR introduces the local-global fusion encoder to enhance local feature extraction and fusion and applies an uncertainty perturbation strategy in position encoding to enhance the model’s performance in handling complex scenes. Experimental results on the KITTI public dataset demonstrate that CU-DETR outperforms the MonoDETR.
|