Accurate 3D object detection from cameras alone remains a fundamental challenge in autonomous driving, particularly for precise localization and velocity estimation, two metrics critical for safe trajectory planning and collision avoidance. Existing camera-based methods lift image features into dense Bird’s-Eye View (BEV) grids, which struggle to capture fine-grained geometry and motion cues. We present GaussianDet3D, the first method, to the best of our knowledge, to apply 3D Gaussian Splatting from multiview images to 3D object detection in the context of autonomous driving, treating predicted Gaussian primitives as a pseudo-LiDAR point cloud fed into a sparse LiDAR detector. Unlike a LiDAR point, which carries only coordinates and intensity, each Gaussian encodes parameters capturing geometry, orientation, opacity, and per-class semantic distributions. By aggregating Gaussian point clouds across multiple frames, GaussianDet3D captures temporal motion cues that enable precise velocity estimation. On the nuScenes benchmark, GaussianDet3D achieves stateof-the-art translation and velocity errors among all camerabased methods, outperforming BEVFormer by 8.1% and 13.1%, respectively, while remaining competitive in overall detection score. These results demonstrate that Gaussian Splatting provides a geometrically precise, semantically rich representation that bridges the gap between imagebased perception and LiDAR-quality spatial reasoning, particularly for the localization and motion estimation tasks most critical to autonomous driving safety.
inproceedings TZM+26
BibTeXKey: TZM+26