AShapeFormer : Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers

Zechuan Li1
Hongshan Yu1, 2, 3
Zhengeng Yang1
Tongjia Chen1
Naveed Akhtar4

1Hunan University
2National University of Defense Technology
3Quanzhou Institute of Industrial Design and Machine Intelligence Innovation, Hunan University
4The University of Western Australia

CVPR 2023





alt text


3D object detection techniques commonly follow a pipeline that aggregates predicted object central point features to compute candidate points. However, these candidate points contain only positional information, largely ignoring the object-level shape information. This eventually leads to sub-optimal 3D object detection. In this work, we propose AShapeFormer, a semantics-guided object-level shape encoding module for 3D object detection. This is a plug-n-play module that leverages multi-head attention to encode object shape information. We also propose shape tokens and object-scene positional encoding to ensure that the shape information is fully exploited. Moreover, we introduce a semantic guidance sub-module to sample more foreground points and suppress the influence of background points for a better object shape perception. We demonstrate a straightforward enhancement of multiple existing methods with our AShapeFormer. Through extensive experiments on the popular SUN RGB-D and ScanNetV2 dataset, we show that our enhanced models are able to outperform the baselines by a considerable absolute margin of up to 8.1%.


3D object detection results on SUN RGB-D validation set with mAP@0.25

3D object detection results on ScanNet V2 validation set


Comparisons between VoteNet


This work was supported by the NSFC (U2013203, 61973106, U1913202); the Natural Science Fund of Hunan Province (2021JJ10024, 2022JJ40100); the Project of Talent Innovation and Sharing Alliance of Quanzhou City under Grant 2021C062L; the Key Research and Development Project of Science and the Technology Plan of Hunan Province under Grant 2022GK2014.

Webpage template modified from Richard Zhang.