Home | Research | Groups | Almut Sophia Koepke

Research Group Almut Sophia Koepke

Almut Sophia Koepke

Dr.

JRG Leader Multi-Modal Learning

Computer Vision & Artificial Intelligence

Almut Sophia Koepke

leads the MCML Junior Research Group ‘Multi-Modal Learning’ at TU Munich.

She and her team conduct research into multi-modal learning from vision, sound, and text. They focus on advancing video understanding, with an emphasis on capturing temporal dynamics and cross-modal relationships. To achieve this, they aim to improve the combination of information from various modalities within learning frameworks. Furthermore, they are exploring how to adapt large pre-trained models for audio-visual understanding tasks. Funded as a BMBF project, the group explores research areas that go beyond our current focus while maintaining a close collaboration with MCML.

Team members @MCML

PhD Students

Jianzhe Liu

→ Group Almut Sophia Koepke
Computer Vision & Artificial Intelligence

Daniil Zverev

→ Group Almut Sophia Koepke
Computer Vision & Artificial Intelligence

Publications @MCML

2026

[8]

A. Harrington • A. S. Koepke • S. Karthik • T. Darrell • A. A. Efros
It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models.
CVPR 2026 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Denver, CO, USA, Jun 03-07, 2026. To be published. Preprint available. arXiv GitHub

[7]

A. S. Koepke • D. Zverev • S. Ginosar • A. A. Efros
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale.
Preprint (Apr. 2026). arXiv GitHub

2025

[6]

D. Zverev • A. S. Koepke • J. F. Henriques
On the Dangers of Bootstrapping Generation for Continual Learning and Beyond.
GCPR 2025 - German Conference on Pattern Recognition. Freiburg, Germany, Oct 23-26, 2025. DOI

[5]

D. Zverev • T. Wiedemer • A. Prabhu • M. Bethge • W. Brendel • A. S. Koepke
VGGSounder: Audio-Visual Evaluations for Foundation Models.
ICCV 2025 - IEEE/CVF International Conference on Computer Vision. Honolulu, Hawai’i, Oct 19-23, 2025. DOI GitHub

[4]

S. Chen • J. Liu • Z. Han • Y. Xia • D. Cremers • P. Torr • V. Tresp • J. Gu
True Multimodal In-Context Learning Needs Attention to the Visual Context.
COLM 2025 - Conference on Language Modeling. Montreal, Canada, Oct 07-09, 2025. URL GitHub

[3]

D. Zverev • T. Wiedemer • A. Prabhu • M. Bethge • W. Brendel • A. S. Koepke
VGGSounder: Audio-Visual Evaluations for Foundation Models.
Sight and Sound @CVPR 2025 - Workshop Sight and Sound at the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. PDF

[2]

S. Chen • Z. Han • B. He • J. Liu • M. Buckley • Y. Qin • P. Torr • V. Tresp • J. Gu
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI URL

2024

[1]

H. Zhang • J. Liu • Z. Han • S. Chen • B. He • V. Tresp • Z. Xu • J. Gu
Visual Question Decomposition on Multimodal Large Language Models.
Findings @EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

2024-12-27 - Last modified: 2026-06-02

Research Group Almut Sophia Koepke

Almut Sophia Koepke

Team members @MCML

PhD Students

Recent News @MCML

MCML at CVPR 2026

MCML at ICCV 2025

MCML at CVPR 2025

Publications @MCML

2026

2025

2024

Research Group Almut Sophia Koepke

Almut Sophia Koepke

Team members @MCML

PhD Students

Recent News @MCML

MCML at CVPR 2026

Almut Sophia Koepke Awarded Gauss AI Compute Grant for Omni-Modal AI Research

MCML at ICCV 2025

MCML at CVPR 2025

Publications @MCML

2026

2025

2024