Home  | Research | Groups | Almut Sophia Koepke

Research Group Almut Sophia Koepke


Link to website at TUM PI Matchmaking

Almut Sophia Koepke

Dr.

JRG Leader Multi-Modal Learning

Almut Sophia Koepke

leads the MCML Junior Research Group ‘Multi-Modal Learning’ at TU Munich.

She and her team conduct research into multi-modal learning from vision, sound, and text. They focus on advancing video understanding, with an emphasis on capturing temporal dynamics and cross-modal relationships. To achieve this, they aim to improve the combination of information from various modalities within learning frameworks. Furthermore, they are exploring how to adapt large pre-trained models for audio-visual understanding tasks. Funded as a BMBF project, the group explores research areas that go beyond our current focus while maintaining a close collaboration with MCML.

Team members @MCML

PhD Students

Link to website

Jianzhe Liu

Link to website

Daniil Zverev

Recent News @MCML

Link to MCML at ICCV 2025

17.10.2025

MCML at ICCV 2025

28 Accepted Papers (22 Main, and 6 Workshops)

Link to MCML at CVPR 2025

10.06.2025

MCML at CVPR 2025

35 Accepted Papers (29 Main, and 6 Workshops)

Link to MCML at WACV 2025

27.02.2025

MCML at WACV 2025

Eight Accepted Papers

Link to MCML at EMNLP 2024

11.11.2024

MCML at EMNLP 2024

22 Accepted Papers (6 Main, 14 Findings, and 2 Workshops)

Publications @MCML

2025


[6]
D. ZverevA. S. Koepke • J. F. Henriques
On the Dangers of Bootstrapping Generation for Continual Learning and Beyond.
GCPR 2025 - German Conference on Pattern Recognition. Freiburg, Germany, Oct 23-26, 2025. DOI

[5] A* Conference
D. Zverev • T. Wiedemer • A. Prabhu • M. Bethge • W. Brendel • A. S. Koepke
VGGSounder: Audio-Visual Evaluations for Foundation Models.
ICCV 2025 - IEEE/CVF International Conference on Computer Vision. Honolulu, Hawai’i, Oct 19-23, 2025. To be published. Preprint available. URL

[4]
S. ChenJ. Liu • Z. Han • Y. Xia • D. Cremers • P. Torr • V. Tresp • J. Gu
True Multimodal In-Context Learning Needs Attention to the Visual Context.
COLM 2025 - Conference on Language Modeling. Montreal, Canada, Oct 07-09, 2025. URL GitHub

[3]
D. Zverev • T. Wiedemer • A. Prabhu • M. Bethge • W. Brendel • A. S. Koepke
VGGSounder: Audio-Visual Evaluations for Foundation Models.
Sight and Sound @CVPR 2025 - Workshop Sight and Sound at IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. PDF

[2] A Conference
S. Chen • Z. Han • B. HeJ. Liu • M. Buckley • Y. Qin • P. Torr • V. Tresp • J. Gu
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI URL

2024


[1]
H. Zhang • J. Liu • Z. Han • S. ChenB. HeV. Tresp • Z. Xu • J. Gu
Visual Question Decomposition on Multimodal Large Language Models.
Findings @EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Back to Top