Home | Publications | TCZ+26

Motion2VecSets: Non-Rigid Shape Reconstruction and Tracking With 4D Latent Set Diffusion

MCML Authors

Matthias Nießner

Prof. Dr.

Core PI

Visual Computing & Artificial Intelligence

Abstract

We introduce Motion2VecSets, a 4D diffusion model for dynamic surface mesh generation from various ambiguous observations, including a sequence of RGB images, sparse and partial point clouds, and low-resolution voxel grids. While recent methods using neural field representations have shown success in modeling non-rigid objects, conventional feed-forward architectures struggle with noisy, partial, or sparse observations due to their deterministic nature. To address the inherent one-to-many mapping problem, we introduce a diffusion model that explicitly learns the shape and motion distribution of non-rigid objects through an iterative denoising process of compressed latent representations. The diffusion-based priors provide more plausible and diverse reconstructions under ambiguous conditions. Instead of relying on global latent codes, we represent 4D dynamics using latent sets. This novel 4D representation captures local shape and deformation patterns, leading to more accurate non-linear motion capture and significantly improving generalization capacity to unseen motions and identities. For temporally coherent tracking, we jointly denoise latent sets across frames and enable cross-frame information exchange. To reduce computational cost, we design an interleaved spatial-temporal attention block that alternately aggregates deformation latents along spatial and temporal dimensions. Extensive experiments on datasets of humans, animals, and articulated objects demonstrate that Motion2VecSets outperforms prior methods in reconstructing and tracking non-rigid deformations from various imperfect observations.

article TCZ+26

IEEE Transactions on Pattern Analysis and Machine Intelligence

Apr. 2026.

Authors

J. Tang • W. Cao • B. Zhang • C. Luo • Y. Liu • M. Nießner

Links

DOI URL

Research Area

B1 | Computer Vision

BibTeXKey: TCZ+26

#p-niessner