Current prognostic and diagnostic AI models for healthcare often limit informational input capacity by being time-agnostic and focusing on single modalities, therefore lacking the holistic perspective clinicians rely on. To address this, we introduce a Time-Aware Multi Modal Transformer Encoder (TAMME) for longitudinal medical data. Unlike most state-of-the-art models, TAMME integrates longitudinal imaging, textual, numerical, and categorical data together with temporal information. Each element is represented as the sum of embeddings for high-level categorical type, further specification of this type, time-related data, and value. This composition overcomes limitations of a closed input vocabulary, enabling generalization to novel data. Additionally, with temporal context including the delta to the preceding element, we eliminate the requirement for evenly sampled input sequences. For long-term EHRs, the model employs a novel summarization mechanism that processes sequences piecewise and prepends recent data with history representations in end-to-end training. This enables balancing recent information with historical signals via self-attention. We demonstrate TAMME’s capabilities using data from 431k+ hospital stays, 73k ICU stays, and 425k Emergency Department (ED) visits from the MIMIC dataset for clinical classification tasks: prediction of triage acuity, length of stay, and readmission. We show superior performance over state-of-the-art approaches especially gained from long-term data. Overall, our approach provides versatile processing of entire patient trajectories as a whole to enhance predictive performance on clinical tasks.
inproceedings
BibTeXKey: SQB+25