Longitudinal medical image segmentation is fundamental for quantifying disease progression and evaluating treatment efficacy. However, two critical challenges persist: First, methods that jointly segment baseline and follow-up images remain underexplored, often missing the contextual benefits of simultaneous assessment and lacking longitudinal consistency. Second, real-world datasets typically exhibit severe class imbalance between stable and progressive scans — an issue frequently neglected by existing models. To address these limitations, we propose SegMaST, a novel Mamba-based spatio-temporal framework. Unlike conventional approaches that treat timepoints in isolation, SegMaST leverages cross-temporal information and spatial correspondences to jointly segment the initial baseline mask and explicitly localize new pathologies in follow-up scans. Additionally, we introduce an imbalance-aware loss accumulation strategy to enhance robustness in realistic clinical settings. On longitudinal Multiple Sclerosis and Glioma cohorts, SegMaST outperforms established CNN- and attention-based baselines for follow-up segmentation (mean follow-up Dice MS in-house 0.536, MSSEG-2 0.620, and glioma 0.631) and lesion detection (F1 in-house 0.688, MSSEG-2 0.723), while maintaining state-of-the-art accuracy in baseline segmentation (Dice: 0.617 MS, 0.844 glioma).
misc VWL+25
BibTeXKey: VWL+25