Home  | Publications | HBL+26

Generative 6D Pose Estimation via Conditional Flow Matching

MCML Authors

Abstract

Existing methods for instance-level 6D pose estimation typically rely on neural networks that either directly regress the pose in SE(3) or estimate it indirectly via local feature matching. The former struggle with object symmetries, while the latter fail in the absence of distinctive local features. To overcome these limitations, we propose a novel formulation of 6D pose estimation as a conditional flow matching problem in ℝ3. We introduce Flose, a generative method that infers object poses via a denoising process conditioned on local features. While prior approaches based on conditional flow matching perform denoising solely based on geometric guidance, Flose integrates appearance-based semantic features to mitigate ambiguities caused by object symmetries. We further incorporate RANSAC-based registration to handle outliers. We validate Flose on five datasets from the established BOP benchmark. Flose outperforms prior methods with an average improvement of +4.5 Average Recall.

misc HBL+26


Preprint

Feb. 2026

Authors

A. Hamza • D. Boscaini • W. LiB. Busam • F. Poiesi

Links

arXiv GitHub

Research Areas

 B1 | Computer Vision

 C1 | Medicine

BibTeXKey: HBL+26

Back to Top