Home | Publications | LGX+25

VXP: Voxel-Cross-Pixel Large-Scale Camera-LiDAR Place Recognition

MCML Authors

Jim Li

→ Group Angela P. Schöllig
Learning Systems and Robotics

Mariia Gladkova

→ Group Daniel Cremers
Computer Vision & Artificial Intelligence

Yan Xia

Dr.

* Former Member

→ Group Daniel Cremers
Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Director

Computer Vision & Artificial Intelligence

Abstract

Recent works on the global place recognition treat the task as a retrieval problem, where an off-the-shelf global descriptor is commonly designed in image-based and LiDAR-based modalities. However, it is non-trivial to perform accurate image-LiDAR global place recognition since extracting consistent and robust global descriptors from different domains (2D images and 3D point clouds) is challenging. To address this issue, we propose a novel Voxel-Cross-Pixel (VXP) approach, which establishes voxel and pixel correspondences in a self-supervised manner and brings them into a shared feature space. Specifically, VXP is trained in a two-stage manner that first explicitly exploits local feature correspondences and enforces similarity of global descriptors. Extensive experiments on the three benchmarks (Oxford RobotCar, ViViD++ and KITTI) demonstrate our method surpasses the state-of-the-art cross-modal retrieval by a large margin.

inproceedings LGX+25

3DV 2025

12th International Conference on 3D Vision. Singapore, Mar 25-28, 2025.

Authors

Y.-J. Li • M. Gladkova • Y. Xia • R. Wang • D. Cremers

Links

DOI

In Collaboration

Microsoft

Research Areas

B1 | Computer Vision

B3 | Multimodal Perception

BibTeXKey: LGX+25

#p-cremers #p-schoellig