16.05.2025

MCML Researchers With Four Papers at ICRA 2025

IEEE International Conference on Robotics and Automation (ICRA 2025). Atlanta, GA, USA, 19.05.2025–23.05.2025

We are happy to announce that MCML researchers are represented with four papers at ICRA 2025. Congrats to our researchers!

Main Track (4 papers)

D. Huang, N. Navab and Z. Jiang.
Improving Probe Localization for Freehand 3D Ultrasound using Lightweight Cameras.
ICRA 2025 - IEEE International Conference on Robotics and Automation. Atlanta, GA, USA, May 19-23, 2025. DOI

Abstract

Ultrasound (US) probe localization relative to the examined subject is essential for freehand 3D US imaging, which offers significant clinical value due to its affordability and unrestricted field of view. However, existing methods often rely on expensive tracking systems or bulky probes, while recent US image-based deep learning methods suffer from accumulated errors during probe maneuvering. To address these challenges, this study proposes a versatile, cost-effective probe pose localization method for freehand 3D US imaging, utilizing two lightweight cameras. To eliminate accumulated errors during US scans, we introduce PoseNet, which directly predicts the probe’s 6D pose relative to a preset world coordinate system based on camera observations. We first jointly train pose and camera image encoders based on pairs of 6D pose and camera observations densely sampled in simulation. This will encourage each pair of probe pose and its corresponding camera observation to share the same representation in latent space. To ensure the two encoders handle unseen images and poses effectively, we incorporate a triplet loss that enforces smaller differences in latent features between nearby poses compared to distant ones. Then, the pose decoder uses the latent representation of the camera images to predict the probe’s 6D pose. To bridge the sim-to-real gap, in the real world, we use the trained image encoder and pose decoder for initial predictions, followed by an additional MLP layer to refine the estimated pose, improving accuracy. The results obtained from an arm phantom demonstrate the effectiveness of the proposed method, which notably surpasses state-of-the-art techniques, achieving average positional and rotational errors of 2.03 mm and 0.37◦, respectively.

MCML Authors

Dianye Huang

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Principal Investigator

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

J. Jung, S. Boche, S. B. Laina and S. Leutenegger.
Uncertainty-Aware Visual-Inertial SLAM with Volumetric Occupancy Mapping.
ICRA 2025 - IEEE International Conference on Robotics and Automation. Atlanta, GA, USA, May 19-23, 2025. DOI

Abstract

We propose visual-inertial simultaneous localization and mapping that tightly couples sparse reprojection errors, inertial measurement unit pre-integrals, and relative pose factors with dense volumetric occupancy mapping. Hereby depth predictions from a deep neural network are fused in a fully probabilistic manner. Specifically, our method is rigorously uncertainty-aware: first, we use depth and uncertainty predictions from a deep network not only from the robot’s stereo rig, but we further probabilistically fuse motion stereo that provides depth information across a range of baselines, therefore drastically increasing mapping accuracy. Next, predicted and fused depth uncertainty propagates not only into occupancy probabilities but also into alignment factors between generated dense submaps that enter the probabilistic nonlinear least squares estimator. This submap representation offers globally consistent geometry at scale. Our method is thoroughly evaluated in two benchmark datasets, resulting in localization and mapping accuracy that exceeds the state of the art, while simultaneously offering volumetric occupancy directly usable for downstream robotic planning and control in real-time.

MCML Authors

Stefan Leutenegger

Prof. Dr.

Principal Investigator

* Former Principal Investigator

J. Meier, L. Inchingolo, O. Dhaouadi, Y. Xia, J. Kaiser and D. Cremers.
MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models.
ICRA 2025 - IEEE International Conference on Robotics and Automation. Atlanta, GA, USA, May 19-23, 2025. DOI

Abstract

We tackle the problem of monocular 3D object detection across different sensors, environments, and camera setups. In this paper, we introduce a novel unsupervised domain adaptation approach, MonoCT, that generates highly accurate pseudo labels for self-supervision. Inspired by our observation that accurate depth estimation is critical to mitigating domain shifts, MonoCT introduces a novel Generalized Depth Enhancement (GDE) module with an ensemble concept to improve depth estimation accuracy. Moreover, we introduce a novel Pseudo Label Scoring (PLS) module by exploring inner-model consistency measurement and a Diversity Maximization (DM) strategy to further generate high-quality pseudo labels for self-training. Extensive experiments on six benchmarks show that MonoCT outperforms existing SOTA domain adaptation methods by large margins (~21% minimum for AP Mod.) and generalizes well to car, traffic camera and drone views.

MCML Authors

Johannes Meier

→ Group Daniel Cremers
Computer Vision & Artificial Intelligence

Yan Xia

Dr.

* Former Member

→ Group Daniel Cremers
Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Director

Computer Vision & Artificial Intelligence

S. Papatheodorou, S. Boche, S. Laina and S. Leutenegger.
Efficient Submap-based Autonomous MAV Exploration using Visual-Inertial SLAM Configurable for LiDARs or Depth Cameras.
ICRA 2025 - IEEE International Conference on Robotics and Automation. Atlanta, GA, USA, May 19-23, 2025. DOI

Abstract

Autonomous exploration of unknown space is an essential component for the deployment of mobile robots in the real world. Safe navigation is crucial for all robotics applications and requires accurate and consistent maps of the robot’s surroundings. To achieve full autonomy and allow deployment in a wide variety of environments, the robot must rely on onboard state estimation which is prone to drift over time. We propose a Micro Aerial Vehicle (MAV) exploration framework based on local submaps to allow retaining global consistency by applying loop-closure corrections to the relative submap poses. To enable large-scale exploration we efficiently compute global, environment-wide frontiers from the local submap frontiers and use a sampling-based next-best-view exploration planner. Our method seamlessly supports using either a LiDAR sensor or a depth camera, making it suitable for different kinds of MAV platforms. We perform comparative evaluations in simulation against a state-of-the-art submap-based exploration framework to showcase the efficiency and reconstruction quality of our approach. Finally, we demonstrate the applicability of our method to real-world MAVs, one equipped with a LiDAR and the other with a depth camera.

MCML Authors

Sotiris Papatheodorou

* Former Member

→ Group Stefan Leutenegger

* Former Principal Investigator

Stefan Leutenegger

Prof. Dr.

Principal Investigator

* Former Principal Investigator