Purpose: This study aims to investigate whether a diagnostic AI model can effectively support lesion detection and staging in non-small cell lung cancer (NSCLC) [1⁸F]FDG PET/CT studies, focusing on the distinction between technical segmentation accuracy and clinically meaningful performance.<br>Methods: In this retrospective single-centre study, [1⁸F]FDG PET/CT scans from 306 treatment-naïve NSCLC patients were reviewed with reference to multidisciplinary team decisions. Tumour lesions were manually segmented for reference and compared with predictions from the top-performing algorithm of the autoPET III challenge. Quantitative segmentation metrics were calculated, and lesion-level errors were assessed for impact on patient-level TNM and UICC staging.<br>Results: The algorithm achieved a mean Dice Similarity Coefficient (DSC) of 0.64. Lesion-level sensitivity was 95.8% across all patients, with a precision of 87.5%. False positive M-category lesions (n = 196) occurred as most frequent error. Of all false positives, 35.7% were benign and 34.7% non-oncologic pathologies. UICC staging matched ground truth in 207/306 patients, with most discordances due to upstaging (88/306).<br>Conclusion: Clinically driven metrics and cause-based error analysis offer valuable insight into AI segmentation performance. The evaluated model showed excellent lesion sensitivity but a tendency towards systematic overprediction across TNM categories. On a lesion level M-stage false positives and undersegmentation in the hilar region emerged as the main driver of clinically relevant upstaging. Despite promising lesion detection sensitivity, only 67.7% UICC-stagings were accurate using AI masks, indicating that diagnostic AI may support, though not yet replace, manual lesion evaluation in NSCLC [1⁸F]FDG PET/CT.
article HDT+25
BibTeXKey: HDT+25