Home  | Publications | Rfa 25

DisMo: Disentangled Motion Representations for Open-World Motion Transfer

MCML Authors

Link to Profile Björn Ommer PI Matchmaking

Björn Ommer

Prof. Dr.

Principal Investigator

Abstract

Recent advances in text-to-video (T2V) and image-to-video (I2V) models, have enabled the creation of visually compelling and dynamic videos from simple textual descriptions or initial frames. However, these models often fail to provide an explicit representation of motion separate from content, limiting their applicability for content creators. To address this gap, we propose DisMo, a novel paradigm for learning abstract motion representations directly from raw video data via an image-space reconstruction objective. Our representation is generic and independent of static information such as appearance, object identity, or pose. This enables open-world motion transfer, allowing motion to be transferred across semantically unrelated entities without requiring object correspondences, even between vastly different categories. Unlike prior methods, which trade off motion fidelity and prompt adherence, are overfitting to source structure or drifting from the described action, our approach disentangles motion semantics from appearance, enabling accurate transfer and faithful conditioning. Furthermore, our motion representation can be combined with any existing video generator via lightweight adapters, allowing us to effortlessly benefit from future advancements in video models. We demonstrate the effectiveness of our method through a diverse set of motion transfer tasks. Finally, we show that the learned representations are well-suited for downstream motion understanding tasks, consistently outperforming state-of-the-art video representation models such as V-JEPA in zero-shot action classification on benchmarks including Something-Something v2 and Jester.

inproceedings RFA+25


NeurIPS 2025

39th Conference on Neural Information Processing Systems. San Diego, CA, USA, Nov 30-Dec 07, 2025. Spotlight Presentation. To be published. Preprint available.
Conference logo
A* Conference

Authors

T. Ressler-Antal • F. Fundel • M. B. Alaya • S. A. Baumann • F. Krause • M. Gui • B. Ommer

Links

URL

Research Area

 B1 | Computer Vision

BibTeXKey: RFA+25

Back to Top