08.01.2026
High-Res Images, Less Wait: A Simple Flow for Image Generation
MCML Research Insight - With Johannes Schusterbauer, Pingchuan Ma, Vincent Tao Hu, and Björn Ommer
Image generation models today can create almost anything, like a futuristic city glowing at sunset, a classical painting of your cat, or a realistic spaceship made of glass. But when you ask them to go bigger and sharper, the magic slows down. The process takes longer, eats up more memory, and trying out new versions can feel endless. For quite long, high-resolution AI images have come with an annoying trade-off: do you want speed, or do you want perfection?
In their paper, “Boosting Latent Diffusion with Flow Matching”, MCML Junior Members Johannes Schusterbauer, Pingchuan Ma, Vincent Tao Hu, together with MCML PI Björn Ommer and collaborators Ming Gui, Nick Stracke, and Stefan A. Baumann, break that trade-off. Their method teaches diffusion models to jump straight from a compact sketch to full high-res detail. The result: crisp, detailed images in a fraction of the time.
©Schusterbauer et al.
Figure 1: Samples synthesized at 1024² px. The method elevates Diffusion Models (DMs) and related architectures to a higher-resolution domain while maintaining exceptionally fast processing speeds. The authors use Latent Consistency Models (LCM) distilled from SD1.5 and SDXL, respectively. To match the resolution of LCM-SDXL, they boost LCM-SD1.5 with their proposed Coupling Flow Matching (CFM) model. The LCM-SDXL baseline fails to produce competitive results within the same short time window, highlighting the effectiveness of CFM in achieving both speed and image quality.
«Our approach uses flow matching to lift low-resolution diffusion outputs directly into high-resolution image space.»
Johannes Schusterbauer et al.
MCML Junior Members
The Intuition and Method Behind the Paper
Their approach balances two roles: a compact “artist” that sets the idea and a fast “elevator” that lifts this idea cleanly to high resolution, so we keep quality without the usual wait:
- Diffusion model (the artist): Think of starting with TV static and slowly revealing a picture by brushing away noise, step by tiny step. That’s what diffusion models do. They’re great at exploring lots of creative possibilities, but the many small steps make them slow, especially when the picture is very large. “Latent” diffusion speeds things up by doing the work in a smaller, hidden workspace (a compressed version of the image), but making really high-resolution images can still take time.
- Flow Matching (the elevator): Now imagine an express elevator that goes straight from “rough idea” to “finished detail” without stopping on every floor. Flow Matching learns that direct route. In their setup, the “elevator” still adds a bit of noise to the low-resolution sample before lifting it, so running it multiple times can give slightly different high-res images. But the path it follows is still short and simple: it only needs a handful of steps to turn the noisy, low-res sketch into a detailed, high-resolution result.
Results
At images of the size 1024×1024, their approach keeps image quality high while cutting the wait. In head-to-head tests against strong baselines, it delivers similar or better visuals in the same or less time per image. It also scales cleanly to sizes of 2048×2048. By stacking the “elevator” step, the system moves from rough to very high-res without the usual slowdown that large images bring.
Compared to standard samplers, it reaches cleaner results with fewer steps, so upgrades from rough to detailed are faster and steadier.
© Schusterbauer et al.
Figure 2: Overview of the proposed Coupling Flow Matching (CFM) approach. (a) During training, both low- and high-resolution images are passed through a pretrained encoder to obtain their respective latent codes. The model then concatenates the low-res latent with a noisy version of itself and learns to regress a vector field within t ∈ [0, 1]. (b) During inference, any Latent Diffusion Model can be used to generate a low-res latent, which the CFM module then transforms into a higher-dimensional latent. Finally, a pretrained decoder projects this latent back to pixel space, producing the high-resolution output.
«By combining diffusion models with flow matching, we achieve high-resolution image synthesis that is both fast and computationally efficient.»
Johannes Schusterbauer et al.
MCML Junior Members
Why it Matters
Big images are expensive. This approach keeps the “creative” piece small (the diffusion model) and hands the heavy lifting to a fast, predictable flow in the latent space. It also works with common diffusion speed-ups, so teams can drop this flow matching into existing systems instead of rebuilding everything.
Challenges Ahead
In this work, CFM acts as a fast, faithful upsampler: it takes the low-resolution diffusion output and lifts it to high resolution instead of adding new variation. Diversity therefore mainly comes from the low-res diffusion model, and balancing speed, faithfulness, and diversity through choices like noise level, and when to upsample, remains a tuning problem.
Interested in More?
Want to try Coupling Flow Matching (CFM) to upscale your diffusion models? Take a look at the paper presented as an oral presentation at ECCV 2024 -one of the most prestigious international conferences in computer vision- and checkout the code, and the project page.
FMBoost: Boosting Latent Diffusion with Flow Matching.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. Oral Presentation. DOI GitHub
Share Your Research!
Get in touch with us!
Are you an MCML Junior Member and interested in showcasing your research on our blog?
We’re happy to feature your work—get in touch with us to present your paper.
Related
©Joachim Wendler - stock-adobe.com
02.01.2026
MCML Researchers in Highly-Ranked Journals
We are excited to announce that MCML researchers have four papers published in highly-ranked journals in 2026.
18.12.2025
"See, Don’t Assume": Revealing and Reducing Gender Bias in AI
ICLR 2025 research led by Zeynep Akata’s team reveals and reduces gender bias in popular vision-language AI models.
16.12.2025
Fabian Theis Featured in Handelsblatt on the Future of AI in Precision Medicine
MCML PI Fabian Theis discusses AI-driven precision medicine and its growing impact on individualized healthcare and biomedical research.
16.12.2025
Gitta Kutyniok Featured in VDI Nachrichten on AI Ethics
Gitta Kutyniok discusses measurable criteria for ethical AI, promoting safe and responsible autonomous decision-making.