Home  | Publications | ESB+25

Independent Benchmarking of Prompt-Based Medical Segmentation Models

MCML Authors

Abstract

Medical image segmentation rapidly shifts toward vision(-language) foundation models that unify diverse modalities and tasks within a single framework. In this work, we systematically benchmark high-impact vision-language and segment-anything-based architectures across multiple clinically relevant CT and MRI tasks. We show that while these models achieve strong performance, each comes with specific (dis)advantages. Non-3D models are highly flexible but require substantial user guidance and are prone to over- or under-detection. 3D architectures offer overall more reliable volumetric consistency, but can still have detection problems. Vision-language models appear sensitive to the coverage of training data, whereas click-prompted SAM-based models are more universal, with a, though limited, ability to address zero-shot targets. When tested with more complex text prompts, most vision-language models exhibit missing semantic language understanding. Overall, these models hold considerable promise but still express limitations. Our work highlights key areas where future research is needed to advance vision(-language) foundation models.

misc


Preprint

Oct. 2025

Authors

A. C. Erdur • D. Scholz • J. A. Buchner • D. Bernhardt • S. E. Combs • B. WiestlerD. Rückert • J. C. Peeken

Links

DOI

Research Area

 C1 | Medicine

BibTeXKey: ESB+25

Back to Top