Home | Publications | SRN25

Adjustment for Confounding Using Pre-Trained Representations

MCML Authors

Rickmer Schulte

→ Group David Rügamer
Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Principal Investigator

Statistics, Data Science and Machine Learning

Thomas Nagler

Prof. Dr.

Principal Investigator

Computational Statistics & Data Science

Abstract

There is growing interest in extending average treatment effect (ATE) estimation to incorporate non-tabular data, such as images and text, which may act as sources of confounding. Neglecting these effects risks biased results and flawed scientific conclusions. However, incorporating non-tabular data necessitates sophisticated feature extractors, often in combination with ideas of transfer learning. In this work, we investigate how latent features from pre-trained neural networks can be leveraged to adjust for sources of confounding. We formalize conditions under which these latent features enable valid adjustment and statistical inference in ATE estimation, demonstrating results along the example of double machine learning. In this context, we also discuss critical challenges inherent to latent feature learning and downstream parameter estimation using those. As our results are agnostic to the considered data modality, they represent an important first step towards a theoretical foundation for the usage of latent representation from foundation models in ATE estimation.

inproceedings SRN25