Home  | Publications | HTH+25

Specialized Curricula for Training Vision Language Models in Retinal Image Analysis

MCML Authors

Link to Profile Martin Menten

Martin Menten

Dr.

JRG Leader AI for Vision

Abstract

Clinicians spend significant time reviewing medical images and transcribing findings. By integrating visual and textual data, foundation models have the potential to reduce workloads and boost efficiency, yet their practical clinical value remains uncertain. In this study, we find that OpenAI’s ChatGPT-4o and two medical vision-language models (VLMs) significantly underperform ophthalmologists in key tasks for age-related macular degeneration (AMD). To address this, we developed a dedicated training curriculum, designed by domain specialists, to optimize VLMs for tasks related to clinical decision making. The resulting model, RetinaVLM-Specialist, significantly outperforms foundation medical VLMs and ChatGPT-4o in AMD disease staging (F1: 0.63 vs. 0.33) and referral (0.67 vs. 0.50), achieving performance comparable to junior ophthalmologists. In a reader study, two senior ophthalmologists confirmed that RetinaVLM’s reports were substantially more accurate than those written by ChatGPT-4o (64.3% vs. 14.3%). Overall, our curriculum-based approach offers a blueprint for adapting foundation models to real-world medical applications.

article


npj Digital Medicine

8.532. Aug. 2025.
Top Journal

Authors

R. Holland • T. R. P. Taylor • C. Holmes • S. Riedl • J. Mai • M. Patsiamanidi • D. Mitsopoulou • P. Hager • P. Müller • J. C. Paetzold • H. P. N. Scholl • H. Bogunović • U. Schmidt-Erfurth • D. Rückert • S. Sivaprasad • A. J. Lotery • M. J. Menten • O. b. o. t. PINNACLE consortium

Links

DOI

Research Area

 C1 | Medicine

BibTeXKey: HTH+25

Back to Top