AI models are increasingly deployed in clinical practice to assist doctors in diagnostic or screening tasks. However, a critical concern arises from the inherent ability of modern AI models to memorise individual examples from their training datasets. Such memorisation could lead to inaccurate predictions when a model is later used on individuals whose historical data was (potentially unknowingly) used for model training or fine-tuning. In this study, we discover evidence for memorisation bias in two large medical imaging datasets: CheXpert (chest radiography) and Kermany-OCT (optical coherence tomography). Our experiments reveal that a small proportion of data-contributing patients (and for CheXpert/Kermany-OCT, respectively) exhibit significant changes in their predictions on (future) longitudinal evaluation data when their historical data is included for model training. Strikingly, we find that larger, more diagnostically accurate models exhibit increased memorisation bias: for Kermany-OCT, the number of data-contributing patients affected by memorisation increases substantially (from to) when scaling model size from 1.5 to 80 million parameters. Together, our results raise the question whether the future health outcomes of data-contributing patients could be adversely affected by memorisation bias, i.e., predictions which are biased towards their previous health states.
inproceedings KMR+25
BibTeXKey: KMR+25