Home  | Research | Groups | Almut Sophia Koepke

Research Group Almut Sophia Koepke


Link to website at TUM

Almut Sophia Koepke

Dr.

JRG Leader Multi-Modal Learning

Almut Sophia Koepke

leads the MCML Junior Research Group ‘Multi-Modal Learning’ at TU Munich.

She and her team conduct research into multi-modal learning from vision, sound, and text. They focus on advancing video understanding, with an emphasis on capturing temporal dynamics and cross-modal relationships. To achieve this, they aim to improve the combination of information from various modalities within learning frameworks. Furthermore, they are exploring how to adapt large pre-trained models for audio-visual understanding tasks. Funded as a BMBF project, the group explores research areas that go beyond our current focus while maintaining a close collaboration with MCML.

Team members @MCML

PhD Students

Link to website

Jianzhe Liu

Link to website

Daniil Zverev

Recent News @MCML

Tiny logo
Link to MCML at CVPR 2026

02.06.2026

MCML at CVPR 2026

28 Accepted Papers (26 Main, and 2 Workshops)

Tiny logo
Link to MCML at ICCV 2025

17.10.2025

MCML at ICCV 2025

29 Accepted Papers (23 Main, and 6 Workshops)

Tiny logo
Link to MCML at CVPR 2025

10.06.2025

MCML at CVPR 2025

35 Accepted Papers (29 Main, and 6 Workshops)

Tiny logo
Link to MCML at WACV 2025

27.02.2025

MCML at WACV 2025

Eight Accepted Papers

Publications @MCML

2026


[8] A* Conference
A. Harrington • A. S. Koepke • S. Karthik • T. Darrell • A. A. Efros
It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models.
CVPR 2026 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Denver, CO, USA, Jun 03-07, 2026. To be published. Preprint available. arXiv GitHub

[7]
A. S. KoepkeD. Zverev • S. Ginosar • A. A. Efros
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale.
Preprint (Apr. 2026). arXiv GitHub

2025


[6]
D. ZverevA. S. Koepke • J. F. Henriques
On the Dangers of Bootstrapping Generation for Continual Learning and Beyond.
GCPR 2025 - German Conference on Pattern Recognition. Freiburg, Germany, Oct 23-26, 2025. DOI

[5] A* Conference
D. Zverev • T. Wiedemer • A. Prabhu • M. Bethge • W. Brendel • A. S. Koepke
VGGSounder: Audio-Visual Evaluations for Foundation Models.
ICCV 2025 - IEEE/CVF International Conference on Computer Vision. Honolulu, Hawai’i, Oct 19-23, 2025. DOI GitHub

[4]
S. ChenJ. Liu • Z. Han • Y. Xia • D. Cremers • P. Torr • V. Tresp • J. Gu
True Multimodal In-Context Learning Needs Attention to the Visual Context.
COLM 2025 - Conference on Language Modeling. Montreal, Canada, Oct 07-09, 2025. URL GitHub

[3]
D. Zverev • T. Wiedemer • A. Prabhu • M. Bethge • W. Brendel • A. S. Koepke
VGGSounder: Audio-Visual Evaluations for Foundation Models.
Sight and Sound @CVPR 2025 - Workshop Sight and Sound at the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. PDF

[2] A Conference
S. Chen • Z. Han • B. HeJ. Liu • M. Buckley • Y. Qin • P. Torr • V. Tresp • J. Gu
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI URL

2024


[1]
H. Zhang • J. Liu • Z. Han • S. ChenB. HeV. Tresp • Z. Xu • J. Gu
Visual Question Decomposition on Multimodal Large Language Models.
Findings @EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Back to Top