Home  | Publications | KMR+25

Memorisation Bias: AI Predictions for Data Contributors Are Biased Towards Their Health States in the Training Data

MCML Authors

Link to Profile Martin Menten

Martin Menten

Dr.

JRG Leader AI for Vision

Abstract

AI models are increasingly deployed in clinical practice to assist doctors in diagnostic or screening tasks. However, a critical concern arises from the inherent ability of modern AI models to memorise individual examples from their training datasets. Such memorisation could lead to inaccurate predictions when a model is later used on individuals whose historical data was (potentially unknowingly) used for model training or fine-tuning. In this study, we discover evidence for memorisation bias in two large medical imaging datasets: CheXpert (chest radiography) and Kermany-OCT (optical coherence tomography). Our experiments reveal that a small proportion of data-contributing patients (and for CheXpert/Kermany-OCT, respectively) exhibit significant changes in their predictions on (future) longitudinal evaluation data when their historical data is included for model training. Strikingly, we find that larger, more diagnostically accurate models exhibit increased memorisation bias: for Kermany-OCT, the number of data-contributing patients affected by memorisation increases substantially (from to) when scaling model size from 1.5 to 80 million parameters. Together, our results raise the question whether the future health outcomes of data-contributing patients could be adversely affected by memorisation bias, i.e., predictions which are biased towards their previous health states.

inproceedings KMR+25


LMID @MICCAI 2025

Workshop on Learning with Longitudinal Medical Images and Data at the 28th International Conference on Medical Image Computing and Computer Assisted Intervention. Daejeon, Republic of Korea, Sep 23-27, 2025.

Authors

M. Knolle • M. J. MentenD. Rückert • G. Kaissis • B. 

Links

DOI

Research Area

 C1 | Medicine

BibTeXKey: KMR+25

Back to Top