Home | Research | Groups | Björn Ommer

Research Group Björn Ommer

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

B1 | Computer Vision

Björn Ommer

heads the Computer Vision & Learning Group at LMU Munich.

His research interests include all aspects of semantic image and video understanding based on (deep) machine learning. His special focus is on generative approaches for visual synthesis (e.g. Stable Diffusion), invertible deep models for explainable AI, deep metric and representation learning, and self-supervised learning paradigms and their interdisciplinary applications in the digital humanities and neurosciences.

Team members @MCML

Link to Olga Grebenkova

Olga Grebenkova

Machine Vision & Learning

B1 | Computer Vision

Link to Tao Hu

Tao Hu

Dr.

Machine Vision & Learning

B1 | Computer Vision

Link to Pingchuan Ma

Pingchuan Ma

Machine Vision & Learning

B1 | Computer Vision

Publications @MCML

[7]
J. S. Fischer, M. Gui, P. Ma, N. Stracke, S. A. Baumann and B. Ommer.
FMBoost: Boosting Latent Diffusion with Flow Matching.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Visual synthesis has recently seen significant leaps in performance, largely due to breakthroughs in generative models. Diffusion models have been a key enabler, as they excel in image diversity. However, this comes at the cost of slow training and synthesis, which is only partially alleviated by latent diffusion. To this end, flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis. We demonstrate our FMBoost approach, which introduces flow matching between a frozen diffusion model and a convolutional decoder that enables high-resolution image synthesis at reduced computational cost and model size. A small diffusion model can then effectively provide the necessary visual diversity, while flow matching efficiently enhances resolution and detail by mapping the small to a high-dimensional latent space, producing high-resolution images. Combining the diversity of diffusion models, the efficiency of flow matching, and the effectiveness of convolutional decoders, state-of-the-art high-resolution image synthesis is achieved at 10242 pixels with minimal computational cost. Cascading FMBoost optionally boosts this further to 20482 pixels. Importantly, this approach is orthogonal to recent approximation and speed-up strategies for the underlying model, making it easily integrable into the various diffusion model frameworks.

MCML Authors
Link to Pingchuan Ma

Pingchuan Ma

Machine Vision & Learning

B1 | Computer Vision

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

B1 | Computer Vision


[6]
V. T. Hu, S. A. Baumann, M. Gui, O. Grebenkova, P. Ma, J. S. Fischer and B. Ommer.
ZigMa: A DiT-style Zigzag Mamba Diffusion Model.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Olga Grebenkova

Olga Grebenkova

Machine Vision & Learning

B1 | Computer Vision

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

B1 | Computer Vision


[5]
D. Kotovenko, O. Grebenkova, N. Sarafianos, A. Paliwal, P. Ma, O. Poursaeed, S. Mohan, Y. Fan, Y. Li, R. Ranjan and B. Ommer.
WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Olga Grebenkova

Olga Grebenkova

Machine Vision & Learning

B1 | Computer Vision

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

B1 | Computer Vision


[4]
N. Stracke, S. A. Baumann, J. Susskind, M. A. Bautista and B. Ommer.
CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control and Altering of T2I Models.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

B1 | Computer Vision


[3]
A. Farshad, Y. Yeganeh, Y. Chi, C. Shen, B. Ommer and N. Navab.
Scenegenie: Scene graph guided diffusion models for image synthesis.
Workshops at the IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Junior Representative

C1 | Medicine

Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

C1 | Medicine

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

B1 | Computer Vision

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

C1 | Medicine


[2]
D. Kotovenko, P. Ma, T. Milbich and B. Ommer.
Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). Vancouver, Canada, Jun 18-23, 2023. DOI.
MCML Authors
Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

B1 | Computer Vision


[1]
A. Blattmann, R. Rombach, K. Oktay and B. Ommer.
Retrieval-Augmented Diffusion Models.
36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. PDF.
MCML Authors
Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

B1 | Computer Vision