Home  | Publications | SCC+26a

TwoSquared: 4D Generation From 2D Image Pairs

MCML Authors

Abstract

Recovering a 4D motion from sparse visual information (such as two temporal frames of a subject) is a significant challenge. While humans are able to hallucinate the missing information in a plausible way, generative AI struggles due to a lack of high-quality training data and heavy computing requirements. To overcome these limitations, we propose TwoSquared, a method that obtains a 4D plausible sequence from just two 2D RGB images corresponding to the beginning and the end of the action. We propose to decompose and solve the problem in two steps: 1) first, obtaining a 3D reconstruction of the initial and final status, and 2) model the intermediate sequence as a physically plausible deformation. Our method does not require templates or class-specific prior knowledge, and can operate with arbitrary in-the-wild examples. We demonstrate our capabilities in a number of different objects, diverse in terms of nature, class, and deformation, surpassing video-based alternatives, which cannot achieve the same level of consistency.

inproceedings SCC+26a


3DV 2026

13th International Conference on 3D Vision. Vancouver, Canada, Mar 20-23, 2026. To be published. Preprint available.

Authors

L. Sang • Z. Canfes • D. Cao • R. Marin • F. Bernard • D. Cremers

Links

URL

Research Areas

 B1 | Computer Vision

 B3 | Multimodal Perception

BibTeXKey: SCC+26a

Back to Top