Home  | Publications | SRN25

Adjustment for Confounding Using Pre-Trained Representations

MCML Authors

Abstract

There is growing interest in extending average treatment effect (ATE) estimation to incorporate non-tabular data, such as images and text, which may act as sources of confounding. Neglecting these effects risks biased results and flawed scientific conclusions. However, incorporating non-tabular data necessitates sophisticated feature extractors, often in combination with ideas of transfer learning. In this work, we investigate how latent features from pre-trained neural networks can be leveraged to adjust for sources of confounding. We formalize conditions under which these latent features enable valid adjustment and statistical inference in ATE estimation, demonstrating results along the example of double machine learning. In this context, we also discuss critical challenges inherent to latent feature learning and downstream parameter estimation using those. As our results are agnostic to the considered data modality, they represent an important first step towards a theoretical foundation for the usage of latent representation from foundation models in ATE estimation.

inproceedings


ICML 2025

42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025.
Conference logo
A* Conference

Authors

R. SchulteD. RügamerT. Nagler

Links

URL

Research Area

 A1 | Statistical Foundations & Explainability

BibTeXKey: SRN25

Back to Top