Home | Publications | BLR+25

NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI

MCML Authors

Cosmin Bercea

Dr.

→ Group Benedikt Wiestler
AI for Image-Guided Diagnosis and Therapy

Jun Li

→ Group Julia Schnabel
Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Principal Investigator

Computational Imaging and AI in Medicine

Benedikt Wiestler

Prof. Dr.

Principal Investigator

AI for Image-Guided Diagnosis and Therapy

Abstract

In many real-world applications, deployed models encounter inputs that differ from the data seen during training. Open-world recognition ensures that such systems remain robust as ever-emerging, previously unknown categories appear and must be addressed without retraining. Foundation and vision-language models are pre-trained on large and diverse datasets with the expectation of broad generalization across domains, including medical imaging. However, benchmarking these models on test sets with only a few common outlier types silently collapses the evaluation back to a closed-set problem, masking failures on rare or truly novel conditions encountered in clinical use. We therefore present NOVA, a challenging, real-life evaluation-only benchmark of 900 brain MRI scans that span 281 rare pathologies and heterogeneous acquisition protocols. Each case includes rich clinical narratives and double-blinded expert bounding-box annotations. Together, these enable joint assessment of anomaly localisation, visual captioning, and diagnostic reasoning. Because NOVA is never used for training, it serves as an extreme stress-test of out-of-distribution generalisation: models must bridge a distribution gap both in sample appearance and in semantic space. Baseline results with leading vision-language models (GPT-4o, Gemini 2.0 Flash, and Qwen2.5-VL-72B) reveal substantial performance drops, with approximately a 65% gap in localisation compared to natural-image benchmarks and 40% and 20% gaps in captioning and reasoning, respectively, compared to resident radiologists. Therefore, NOVA establishes a testbed for advancing models that can detect, localize, and reason about truly unknown anomalies.

inproceedings BLR+25

NeurIPS 2025

39th Conference on Neural Information Processing Systems. San Diego, CA, USA, Nov 30-Dec 07, 2025. To be published. Preprint available.

Authors

C. I. Bercea • J. Li • P. Raffler • E. O. Riedel • L. Schmitzer • A. Kurz • F. Bitzer • P. Roßmüller • J. Canisius • M. L. Beyrle • C. Liu • W. Bai • B. Kainz • J. A. Schnabel • B. Wiestler

Links

URL

Research Area

C1 | Medicine

BibTeXKey: BLR+25

#p-schnabel #p-wiestler