Home | Publications | HTH+25

Specialized Curricula for Training Vision Language Models in Retinal Image Analysis

MCML Authors

Daniel Rückert

Prof. Dr.

Director

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

JRG Leader AI for Vision

Artificial Intelligence in Healthcare and Medicine

Abstract

Clinicians spend significant time reviewing medical images and transcribing findings. By integrating visual and textual data, foundation models have the potential to reduce workloads and boost efficiency, yet their practical clinical value remains uncertain. In this study, we find that OpenAI’s ChatGPT-4o and two medical vision-language models (VLMs) significantly underperform ophthalmologists in key tasks for age-related macular degeneration (AMD). To address this, we developed a dedicated training curriculum, designed by domain specialists, to optimize VLMs for tasks related to clinical decision making. The resulting model, RetinaVLM-Specialist, significantly outperforms foundation medical VLMs and ChatGPT-4o in AMD disease staging (F1: 0.63 vs. 0.33) and referral (0.67 vs. 0.50), achieving performance comparable to junior ophthalmologists. In a reader study, two senior ophthalmologists confirmed that RetinaVLM’s reports were substantially more accurate than those written by ChatGPT-4o (64.3% vs. 14.3%). Overall, our curriculum-based approach offers a blueprint for adapting foundation models to real-world medical applications.

article HTH+25

npj Digital Medicine

8.532. Aug. 2025.

Authors

R. Holland • T. R. P. Taylor • C. Holmes • S. Riedl • J. Mai • M. Patsiamanidi • D. Mitsopoulou • P. Hager • P. Müller • J. C. Paetzold • H. P. N. Scholl • H. Bogunović • U. Schmidt-Erfurth • D. Rückert • S. Sivaprasad • A. J. Lotery • M. J. Menten • O. b. o. t. PINNACLE consortium

Links

DOI

Research Area

C1 | Medicine

BibTeXKey: HTH+25

#p-menten #p-rueckert