Home | Publications | HDT+25

Artificial Intelligence for TNM Staging in NSCLC: A Critical Appraisal of Segmentation Utility in [1⁸F]FDG PET/CT

MCML Authors

Jakob Dexl

→ Group Michael Ingrisch
Clinical Data Science in Radiology

Katharina Jeblick

Dr.

→ Group Michael Ingrisch
Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Principal Investigator

Clinical Data Science in Radiology

Abstract

Purpose: This study aims to investigate whether a diagnostic AI model can effectively support lesion detection and staging in non-small cell lung cancer (NSCLC) [1⁸F]FDG PET/CT studies, focusing on the distinction between technical segmentation accuracy and clinically meaningful performance.<br>Methods: In this retrospective single-centre study, [1⁸F]FDG PET/CT scans from 306 treatment-naïve NSCLC patients were reviewed with reference to multidisciplinary team decisions. Tumour lesions were manually segmented for reference and compared with predictions from the top-performing algorithm of the autoPET III challenge. Quantitative segmentation metrics were calculated, and lesion-level errors were assessed for impact on patient-level TNM and UICC staging.<br>Results: The algorithm achieved a mean Dice Similarity Coefficient (DSC) of 0.64. Lesion-level sensitivity was 95.8% across all patients, with a precision of 87.5%. False positive M-category lesions (n = 196) occurred as most frequent error. Of all false positives, 35.7% were benign and 34.7% non-oncologic pathologies. UICC staging matched ground truth in 207/306 patients, with most discordances due to upstaging (88/306).<br>Conclusion: Clinically driven metrics and cause-based error analysis offer valuable insight into AI segmentation performance. The evaluated model showed excellent lesion sensitivity but a tendency towards systematic overprediction across TNM categories. On a lesion level M-stage false positives and undersegmentation in the hilar region emerged as the main driver of clinically relevant upstaging. Despite promising lesion detection sensitivity, only 67.7% UICC-stagings were accurate using AI masks, indicating that diagnostic AI may support, though not yet replace, manual lesion evaluation in NSCLC [1⁸F]FDG PET/CT.

article HDT+25

European Journal of Nuclear Medicine and Molecular Imaging

Nov. 2025.

Authors

M. M. Heimer • J. Dexl • J. Ta • R. Ebner • F. L. Herr • L. Orasanin • K. Jeblick • L. C. Adams • L. K. Shiyam Sundar • A. Tufman • R. A. Werner • G. Sheikh • J. Ricke • M. Ingrisch • M. P. Fabritius • C. C. Cyran