Home | Publications | DGF+25

AutoPET Challenge on Fully Automated Lesion Segmentation in Oncologic PET/CT Imaging, Part 2: Domain Generalization

MCML Authors

Jakob Dexl

→ Group Michael Ingrisch
Clinical Data Science in Radiology

Katharina Jeblick

Dr.

→ Group Michael Ingrisch
Clinical Data Science in Radiology

Andreas Mittermeier

Dr.

→ Group Michael Ingrisch
Clinical Data Science in Radiology

Theresa Stüber

→ Group Michael Ingrisch
Clinical Data Science in Radiology

Balthasar Schachtner

Dr.

→ Group Michael Ingrisch
Clinical Data Science in Radiology

Johanna Topalis

→ Group Michael Ingrisch
Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Principal Investigator

Clinical Data Science in Radiology

Abstract

This article reports the results of the second iteration of the autoPET challenge on automated lesion segmentation in whole-body PET/CT, held in conjunction with the 26th International Conference on Medical Image Computing and Computer Assisted Intervention in 2023. In contrast to the first autoPET challenge, which served as a proof of concept, this study investigates whether machine learning{textendash}based segmentation models trained on data from a single source can maintain performance across clinically relevant variations in PET/CT data, reflecting the demands of real-world deployment. Methods: A comprehensive biomedical segmentation challenge on PET/CT domain generalization was designed and conducted. Participants were tasked to train machine learning models on annotated whole-body 18F-FDG data (n = 1,014). These models were then evaluated on a test set of 200 samples from 5 clinically relevant domains, including variations in institutions, pathologies, and populations and a different tracer. Performance was measured in terms of average dice similarity coefficient, average false-positive volume, and average false-negative volume. The best-performing teams were awarded in 3 categories. Furthermore, a detailed analysis was conducted after the challenge, examining results across domains and unique instances, along with a ranking analysis. Results: Generalization from a single-source domain remains a significant challenge. Seventeen international teams successfully participated in the challenge. The best-performing team reached an average dice similarity coefficient of 0.5038, a mean false-positive volume of 87.8388 mL, and a mean false-negative volume of 8.4154 mL on the test set. nnU-Net was the most commonly used framework, with most participants using a 3-dimensional U-Net. Despite competitive in-domain results, out-of-domain performance deteriorated substantially, particularly on pediatric and prostate-specific membrane antigen data. Detailed error analysis revealed frequent false-positives due to physiologic uptake and decreased sensitivity in detecting small or low-uptake lesions. A majority-vote ensemble offered minimal performance gains, whereas an oracle ensemble indicates hypothetical gains. Ranking analysis showed no single team consistently outperformed all others across ranking schemes. Conclusion: The second autoPET challenge provides a comprehensive evaluation of the current state of automated PET/CT tumor segmentation, highlighting both progress and persistent challenges of single-source domain generalization and the need for diverse public datasets to enhance algorithm robustness.

article DGF+25

Journal of Nuclear Medicine

Dec. 2025.

Authors

J. Dexl • S. Gatidis • M. Früh • K. Jeblick • A. Mittermeier • A. T. Stüber • B. Schachtner • J. Topalis • M. P. Fabritius • S. Gu • G. K. Murugesan • J. VanOss • J. Ye • J. He • A. Alloula • B. W. Papież • Z. Mesbah • R. Modzelewski • M. Hadlich • Z. Marinov • R. Stiefelhagen • F. Isensee • K. H. Maier-Hein • A. Galdran • K. Nikolaou • C. la Fougère • M. Kim • N. Kallenberg • J. Kleesiek • K. Herrmann • R. Werner • M. Ingrisch • C. C. Cyran • T. Küstner

Links

DOI

In Collaboration

Siemens AG

Research Area

C1 | Medicine

BibTeXKey: DGF+25

#p-ingrisch