Home  | Publications | FGM+24

FMBoost: Boosting Latent Diffusion With Flow Matching

MCML Authors

Link to Profile Björn Ommer PI Matchmaking

Björn Ommer

Prof. Dr.

Principal Investigator

Abstract

Visual synthesis has recently seen significant leaps in performance, largely due to breakthroughs in generative models. Diffusion models have been a key enabler, as they excel in image diversity. However, this comes at the cost of slow training and synthesis, which is only partially alleviated by latent diffusion. To this end, flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis. We demonstrate our FMBoost approach, which introduces flow matching between a frozen diffusion model and a convolutional decoder that enables high-resolution image synthesis at reduced computational cost and model size. A small diffusion model can then effectively provide the necessary visual diversity, while flow matching efficiently enhances resolution and detail by mapping the small to a high-dimensional latent space, producing high-resolution images. Combining the diversity of diffusion models, the efficiency of flow matching, and the effectiveness of convolutional decoders, state-of-the-art high-resolution image synthesis is achieved at 10242 pixels with minimal computational cost. Cascading FMBoost optionally boosts this further to 20482 pixels. Importantly, this approach is orthogonal to recent approximation and speed-up strategies for the underlying model, making it easily integrable into the various diffusion model frameworks.

inproceedings


ECCV 2024

18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. Oral Presentation.
Conference logo
A* Conference

Authors

J. S. Fischer • M. Gui • P. Ma • N. Stracke • S. A. Baumann • B. Ommer

Links

DOI

Research Area

 B1 | Computer Vision

BibTeXKey: FGM+24

Back to Top