Home | Publications | Esb 25

Independent Benchmarking of Prompt-Based Medical Segmentation Models

MCML Authors

Daniel Scholz

→ Group Benedikt Wiestler
AI for Image-Guided Diagnosis and Therapy

Benedikt Wiestler

Prof. Dr.

Principal Investigator

AI for Image-Guided Diagnosis and Therapy

Daniel Rückert

Prof. Dr.

Director

Artificial Intelligence in Healthcare and Medicine

Abstract

Medical image segmentation rapidly shifts toward vision(-language) foundation models that unify diverse modalities and tasks within a single framework. In this work, we systematically benchmark high-impact vision-language and segment-anything-based architectures across multiple clinically relevant CT and MRI tasks. We show that while these models achieve strong performance, each comes with specific (dis)advantages. Non-3D models are highly flexible but require substantial user guidance and are prone to over- or under-detection. 3D architectures offer overall more reliable volumetric consistency, but can still have detection problems. Vision-language models appear sensitive to the coverage of training data, whereas click-prompted SAM-based models are more universal, with a, though limited, ability to address zero-shot targets. When tested with more complex text prompts, most vision-language models exhibit missing semantic language understanding. Overall, these models hold considerable promise but still express limitations. Our work highlights key areas where future research is needed to advance vision(-language) foundation models.

misc ESB+25