Home | Research | Groups | Thomas Nagler

Research Group Thomas Nagler

Thomas Nagler

Prof. Dr.

Principal Investigator

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

Thomas Nagler

is Professor of Computational Statistics & Data Science at LMU Munich.

His research is at the intersection of mathematical and computational statistics. He develops statistical methods, derives theoretical guarantees and scalable algorithms, packages them in user-friendly software, and collaborates with domain experts to solve problems in diverse areas.

Team members @MCML

PhD Students

Tobias Brock

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Computational Statistics & Data Science

Nicolai Palm

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Computational Statistics & Data Science

Publications @MCML

2025

[26]

R. Schulte, D. Rügamer and T. Nagler.
Adjustment for Confounding using Pre-Trained Representations.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

There is growing interest in extending average treatment effect (ATE) estimation to incorporate non-tabular data, such as images and text, which may act as sources of confounding. Neglecting these effects risks biased results and flawed scientific conclusions. However, incorporating non-tabular data necessitates sophisticated feature extractors, often in combination with ideas of transfer learning. In this work, we investigate how latent features from pre-trained neural networks can be leveraged to adjust for sources of confounding. We formalize conditions under which these latent features enable valid adjustment and statistical inference in ATE estimation, demonstrating results along the example of double machine learning. In this context, we also discuss critical challenges inherent to latent feature learning and downstream parameter estimation using those. As our results are agnostic to the considered data modality, they represent an important first step towards a theoretical foundation for the usage of latent representation from foundation models in ATE estimation.

MCML Authors

Rickmer Schulte

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistics, Data Science and Machine Learning

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[25]

E. Walter, T. Brock, P. Lahoud, N. Werner, F. Czaja, A. Tichy, C. Bumm, A. Bender, A. Castro, W. Teughels, F. Schwendicke and M. Folwaczny.
Predictive modeling for step II therapy response in periodontitis - model development and validation.
npj Digital Medicine 8.445 (Jul. 2025). DOI

Abstract

Steps I and II periodontal therapy is the first-line treatment for periodontal disease, but has varying success. This study aimed to develop machine learning models to predict changes in periodontal probing depth (PPD) after step II therapy using patient-, tooth-, and site-specific clinical covariates. Models accurately predicted that healthy sites stay healthy, but performed suboptimally for diseased sites. Tuning improved performance, with PPD, tooth-site, and tooth-type identified as key predictors. Pocket closure was predicted with fair accuracy, with baseline PPD as the most relevant covariate. Models predicted improving pockets well but underperformed for non-responding sites, with antibiotic treatment and tooth type being the most influential features. While predictive performance for step II periodontal therapy based on routine clinical data remains limited, models can stratify periodontal sites into meaningful categories and estimate the probability of pocket improvement. They provide a foundation for site-specific outcome prediction and may support patient communication and expectations.

MCML Authors

Tobias Brock

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Computational Statistics & Data Science

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[24]

T. Cheng, T. Vatter, T. Nagler and K. Chen.
Vine Copulas as Differentiable Computational Graphs.
Preprint (Jun. 2025). arXiv

Abstract

Vine copulas are sophisticated models for multivariate distributions and are increasingly used in machine learning. To facilitate their integration into modern ML pipelines, we introduce the vine computational graph, a DAG that abstracts the multilevel vine structure and associated computations. On this foundation, we devise new algorithms for conditional sampling, efficient sampling-order scheduling, and constructing vine structures for customized conditioning variables. We implement these ideas in torchvinecopulib, a GPU-accelerated Python library built upon PyTorch, delivering improved scalability for fitting, sampling, and density evaluation. Our experiments illustrate how gradient flowing through the vine can improve Vine Copula Autoencoders and that incorporating vines for uncertainty quantification in deep learning can outperform MC-dropout, deep ensembles, and Bayesian Neural Networks in sharpness, calibration, and runtime. By recasting vine copula models as computational graphs, our work connects classical dependence modeling with modern deep-learning toolchains and facilitates the integration of state-of-the-art copula methods in modern machine learning pipelines.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[23]

J. Min, H. Li, T. Nagler and S. Li.
Assessing Climate-Driven Mortality Risk: A Stochastic Approach with Distributed Lag Non-Linear Models.
Preprint (Jun. 2025). arXiv

Abstract

Assessing climate-driven mortality risk has become an emerging area of research in recent decades. In this paper, we propose a novel approach to explicitly incorporate climate-driven effects into both single- and multi-population stochastic mortality models. The new model consists of two components: a stochastic mortality model, and a distributed lag non-linear model (DLNM). The first component captures the non-climate long-term trend and volatility in mortality rates. The second component captures non-linear and lagged effects of climate variables on mortality, as well as the impact of heat waves and cold waves across different age groups. For model calibration, we propose a backfitting algorithm that allows us to disentangle the climate-driven mortality risk from the non-climate-driven stochastic mortality risk. We illustrate the effectiveness and superior performance of our model using data from three European regions: Athens, Lisbon, and Rome. Furthermore, we utilize future UTCI data generated from climate models to provide mortality projections into 2045 across these regions under two Representative Concentration Pathway (RCP) scenarios. The projections show a noticeable decrease in winter mortality alongside a rise in summer mortality, driven by a general increase in UTCI over time. Although we expect slightly lower overall mortality in the short term under RCP8.5 compared to RCP2.6, a long-term increase in total mortality is anticipated under the RCP8.5 scenario.

MCML Authors

Han Li

Dr.

C1 | Medicine
→ Group Peter Schüffler

Computational Pathology

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[22]

T. Nagler and T. Vatter.
Solving Estimating Equations With Copulas.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. DOI

Abstract

Thanks to their ability to capture complex dependence structures, copulas are frequently used to glue random variables into a joint model with arbitrary marginal distributions. More recently, they have been applied to solve statistical learning problems such as regression or classification. Framing such approaches as solutions of estimating equations, we generalize them in a unified framework. We can then obtain simultaneous, coherent inferences across multiple regression-like problems. We derive consistency, asymptotic normality, and validity of the bootstrap for corresponding estimators. The conditions allow for both continuous and discrete data as well as parametric, nonparametric, and semiparametric estimators of the copula and marginal distributions. The versatility of this methodology is illustrated by several theoretical examples, a simulation study, and an application to financial portfolio allocation. Supplementary materials for this article are available online.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[21]

D. Dold, J. Kobialka, N. Palm, E. Sommer, D. Rügamer and O. Dürr.
Paths and Ambient Spaces in Neural Loss Landscapes.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. To be published. URL

Abstract

Understanding the structure of neural network loss surfaces, particularly the emergence of low-loss tunnels, is critical for advancing neural network theory and practice. In this paper, we propose a novel approach to directly embed loss tunnels into the loss landscape of neural networks. Exploring the properties of these loss tunnels offers new insights into their length and structure and sheds light on some common misconceptions. We then apply our approach to Bayesian neural networks, where we improve subspace inference by identifying pitfalls and proposing a more natural prior that better guides the sampling procedure.

MCML Authors

Julius Kobialka

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

Nicolai Palm

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Computational Statistics & Data Science

Emanuel Sommer

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistics, Data Science and Machine Learning

[20]

M. Arpogaus, T. Kneib, T. Nagler and D. Rügamer.
Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals.
Preprint (May. 2025). arXiv

Abstract

Density regression models allow a comprehensive understanding of data by modeling the complete conditional probability distribution. While flexible estimation approaches such as normalizing flows (NF) work particularly well in multiple dimensions, interpreting the input-output relationship of such models is often difficult, due to the black-box character of deep learning models. In contrast, existing statistical methods for multivariate outcomes such as multivariate conditional transformation models (MCTM) are restricted in flexibility and are often not expressive enough to represent complex multivariate probability distributions. In this paper, we combine MCTM with state-of-the-art and autoregressive NF to leverage the transparency of MCTM for modeling interpretable feature effects on the marginal distributions in the first step and the flexibility of neural-network-based NF techniques to account for complex and non-linear relationships in the joint data distribution. We demonstrate our method’s versatility in various numerical experiments and compare it with MCTM and other NF models on both simulated and real-world data.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistics, Data Science and Machine Learning

[19]

T. Nagler and D. Rügamer.
Uncertainty Quantification for Prior-Fitted Networks using Martingale Posteriors.
AABI 2025 - 7th Symposium on Advances in Approximate Bayesian Inference collocated with the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 29, 2025. To be published. Preprint available. URL

Abstract

Prior-fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient method to construct Bayesian posteriors for such estimates based on Martingale Posteriors. Several simulated and real-world data examples are used to showcase the resulting uncertainty quantification of our method in inference applications.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistics, Data Science and Machine Learning

[18]

T. Nagler and D. Rügamer.
Uncertainty Quantification for Prior-Fitted Networks using Martingale Posteriors.
FPI @ICLR 2025 - Workshop on Frontiers in Probabilistic Inference: Learning meets Sampling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. arXiv URL

Abstract

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistics, Data Science and Machine Learning

[17]

H. Funk, R. Ludwig, H. Küchenhoff and T. Nagler.
Modelling Climate Variables at High Temporal Resolution.
Preprint (Feb. 2025). DOI

Abstract

Large ensembles of climate models are indispensable for analyzing natural climate variability and estimating the occurrence of rare extreme events. Many hydrometeorological applications—such as compound event analysis, return period estimation, weather forecasting, downscaling, and bias correction—rely on an accurate representation of the multivariate distribution of climate variables. However, at high temporal resolutions, variables like precipitation often exhibit significant zero-inflation and heavy-tailed distributions. This inflation propagates through the entire multivariate dependence structure, complicating the relationships between zero-inflated and non-inflated variables. Inadequate modeling and correction of these dependencies can substantially degrade the reliability of hydrometeorological methodologes.
In an earlier work, we developed a novel multivariate density decomposition for zero inflated variables based on vine copulas. This method has been integrated into multivariate Vine Copula Bias Correction for partially zero-inflated margins (VBC), with potential applications in other fields facing high-resolution climate data challenges. We resume the idea behind VBC and illustrate it’s advantages to other bias correction methods. This highlights the interpretability and the advantages of control and assessment of the results generated by VBC.

MCML Authors

Henri Funk

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences

Statistical Consulting Unit (StaBLab)

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

2024

[16]

T. Nagler, L. Schneider, B. Bischl and M. Feurer.
Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model’s generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment. While reshuffling leads to test performances that are competitive with using fixed splits, it drastically improves results for a single train-validation holdout protocol and can often make holdout become competitive with standard CV while being computationally cheaper.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistical Learning and Data Science

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistical Learning and Data Science

[15]

M. Koshil, T. Nagler, M. Feurer and K. Eggensperger.
Towards Localization via Data Embedding for TabPFN.
TLR @NeurIPS 2024 - 3rd Table Representation Learning Workshop at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Prior-data fitted networks (PFNs), especially TabPFN, have shown significant promise in tabular data prediction. However, their scalability is limited by the quadratic complexity of the transformer architecture’s attention across training points. In this work, we propose a method to localize TabPFN, which embeds data points into a learned representation and performs nearest neighbor selection in this space. We evaluate it across six datasets, demonstrating its superior performance over standard TabPFN when scaling to larger datasets. We also explore its design choices and analyze the bias-variance trade-off of this localization method, showing that it reduces bias while maintaining manageable variance. This work opens up a pathway for scaling TabPFN to arbitrarily large tabular datasets.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistical Learning and Data Science

[14]

J. Herbinger, M. N. Wright, T. Nagler, B. Bischl and G. Casalicchio.
Decomposing Global Feature Effects Based on Feature Interactions.
Journal of Machine Learning Research 25.381 (Dec. 2024). URL

Abstract

Global feature effect methods, such as partial dependence plots, provide an intelligible visualization of the expected marginal feature effect. However, such global feature effect methods can be misleading, as they do not represent local feature effects of single observations well when feature interactions are present. We formally introduce generalized additive decomposition of global effects (GADGET), which is a new framework based on recursive partitioning to find interpretable regions in the feature space such that the interaction-related heterogeneity of local feature effects is minimized. We provide a mathematical foundation of the framework and show that it is applicable to the most popular methods to visualize marginal feature effects, namely partial dependence, accumulated local effects, and Shapley additive explanations (SHAP) dependence. Furthermore, we introduce and validate a new permutation-based interaction detection procedure that is applicable to any feature effect method that fits into our proposed framework. We empirically evaluate the theoretical characteristics of the proposed methods based on various feature effect methods in different experimental settings. Moreover, we apply our introduced methodology to three real-world examples to showcase their usefulness.

MCML Authors

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[13]

J. Gauss and T. Nagler.
Asymptotics for estimating a diverging number of parameters -- with and without sparsity.
Preprint (Nov. 2024). arXiv

Abstract

We consider high-dimensional estimation problems where the number of parameters diverges with the sample size. General conditions are established for consistency, uniqueness, and asymptotic normality in both unpenalized and penalized estimation settings. The conditions are weak and accommodate a broad class of estimation problems, including ones with non-convex and group structured penalties. The wide applicability of the results is illustrated through diverse examples, including generalized linear models, multi-sample inference, and stepwise estimation procedures.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[12]

H. Funk, R. Ludwig, H. Küchenhoff and T. Nagler.
Towards more realistic climate model outputs: A multivariate bias correction based on zero-inflated vine copulas.
Preprint (Oct. 2024). arXiv

Abstract

Climate model large ensembles are an essential research tool for analysing and quantifying natural climate variability and providing robust information for rare extreme events. The models simulated representations of reality are susceptible to bias due to incomplete understanding of physical processes. This paper aims to correct the bias of five climate variables from the CRCM5 Large Ensemble over Central Europe at a 3-hourly temporal resolution. At this high temporal resolution, two variables, precipitation and radiation, exhibit a high share of zero inflation. We propose a novel bias-correction method, VBC (Vine copula bias correction), that models and transfers multivariate dependence structures for zero-inflated margins in the data from its error-prone model domain to a reference domain. VBC estimates the model and reference distribution using vine copulas and corrects the model distribution via (inverse) Rosenblatt transformation. To deal with the variables’ zero-inflated nature, we develop a new vine density decomposition that accommodates such variables and employs an adequately randomized version of the Rosenblatt transform. This novel approach allows for more accurate modelling of multivariate zero-inflated climate data. Compared with state-of-the-art correction methods, VBC is generally the best-performing correction and the most accurate method for correcting zero-inflated events.

MCML Authors

Henri Funk

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences

Statistical Consulting Unit (StaBLab)

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[11]

H. Schulz-Kümpel, S. Fischer, T. Nagler, A.-L. Boulesteix, B. Bischl and R. Hornung.
Constructing Confidence Intervals for 'the' Generalization Error – a Comprehensive Benchmark Study.
Preprint (Sep. 2024). arXiv

Abstract

When assessing the quality of prediction models in machine learning, confidence intervals (CIs) for the generalization error, which measures predictive performance, are a crucial tool. Luckily, there exist many methods for computing such CIs and new promising approaches are continuously being proposed. Typically, these methods combine various resampling procedures, most popular among them cross-validation and bootstrapping, with different variance estimation techniques. Unfortunately, however, there is currently no consensus on when any of these combinations may be most reliably employed and how they generally compare. In this work, we conduct the first large-scale study comparing CIs for the generalization error - empirically evaluating 13 different methods on a total of 18 tabular regression and classification problems, using four different inducers and a total of eight loss functions. We give an overview of the methodological foundations and inherent challenges of constructing CIs for the generalization error and provide a concise review of all 13 methods in a unified framework. Finally, the CI methods are evaluated in terms of their relative coverage frequency, width, and runtime. Based on these findings, we are able to identify a subset of methods that we would recommend. We also publish the datasets as a benchmarking suite on OpenML and our code on GitHub to serve as a basis for further studies.

MCML Authors

Hannah Schulz-Kümpel

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistical Learning and Data Science

Roman Hornung

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[10]

D. Rügamer, C. Kolb, T. Weber, L. Kook and T. Nagler.
Generalizing orthogonalization for models with non-linearities.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms’ application. It was, for instance, shown that neural networks can deduce racial information solely from a patient’s X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the ‘‘orthogonalization’’ or ‘’normalization’’ of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method’s effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistics, Data Science and Machine Learning

Chris Kolb

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[9]

D. Rundel, J. Kobialka, C. von Crailsheim, M. Feurer, T. Nagler and D. Rügamer.
Interpretable Machine Learning for TabPFN.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI GitHub

Abstract

The recently developed Prior-Data Fitted Networks (PFNs) have shown very promising results for applications in low-data regimes. The TabPFN model, a special case of PFNs for tabular data, is able to achieve state-of-the-art performance on a variety of classification tasks while producing posterior predictive distributions in mere seconds by in-context learning without the need for learning parameters or hyperparameter tuning. This makes TabPFN a very attractive option for a wide range of domain applications. However, a major drawback of the method is its lack of interpretability. Therefore, we propose several adaptations of popular interpretability methods that we specifically design for TabPFN. By taking advantage of the unique properties of the model, our adaptations allow for more efficient computations than existing implementations. In particular, we show how in-context learning facilitates the estimation of Shapley values by avoiding approximate retraining and enables the use of Leave-One-Covariate-Out (LOCO) even when working with large-scale Transformers. In addition, we demonstrate how data valuation methods can be used to address scalability challenges of TabPFN.

MCML Authors

David Rundel

A1 | Statistical Foundations & Explainability
→ Group Matthias Feurer

Statistical Learning and Data Science

Julius Kobialka

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistics, Data Science and Machine Learning

[8]

Y. Sale, P. Hofman, T. Löhr, L. Wimmer, T. Nagler and E. Hüllermeier.
Label-wise Aleatoric and Epistemic Uncertainty Quantification.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL

Abstract

We present a novel approach to uncertainty quantification in classification tasks based on label-wise decomposition of uncertainty measures. This label-wise perspective allows uncertainty to be quantified at the individual class level, thereby improving cost-sensitive decision-making and helping understand the sources of uncertainty. Furthermore, it allows to define total, aleatoric, and epistemic uncertainty on the basis of non-categorical measures such as variance, going beyond common entropy-based measures. In particular, variance-based measures address some of the limitations associated with established methods that have recently been discussed in the literature. We show that our proposed measures adhere to a number of desirable properties. Through empirical evaluation on a variety of benchmark data sets – including applications in the medical domain where accurate uncertainty quantification is crucial – we establish the effectiveness of label-wise uncertainty quantification.

MCML Authors

Yusuf Sale

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Paul Hofman

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Lisa Wimmer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models

Artificial Intelligence and Machine Learning

[7]

N. Palm and T. Nagler.
An Online Bootstrap for Time Series.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.

MCML Authors

Nicolai Palm

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Computational Statistics & Data Science

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

2023

[6]

Y. Sale, P. Hofman, L. Wimmer, E. Hüllermeier and T. Nagler.
Second-Order Uncertainty Quantification: Variance-Based Measures.
Preprint (Dec. 2023). arXiv

Abstract

Uncertainty quantification is a critical aspect of machine learning models, providing important insights into the reliability of predictions and aiding the decision-making process in real-world applications. This paper proposes a novel way to use variance-based measures to quantify uncertainty on the basis of second-order distributions in classification problems. A distinctive feature of the measures is the ability to reason about uncertainties on a class-based level, which is useful in situations where nuanced decision-making is required. Recalling some properties from the literature, we highlight that the variance-based measures satisfy important (axiomatic) properties. In addition to this axiomatic approach, we present empirical results showing the measures to be effective and competitive to commonly used entropy-based measures.

MCML Authors

Yusuf Sale

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Paul Hofman

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Lisa Wimmer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models

Artificial Intelligence and Machine Learning

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[5]

J. Rodemann, J. Goschenhofer, E. Dorigatti, T. Nagler and T. Augustin.
Approximately Bayes-optimal pseudo-label selection.
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Abstract

Semi-supervised learning by self-training heavily relies on pseudo-label selection (PLS). This selection often depends on the initial model fit on labeled data. Early overfitting might thus be propagated to the final model by selecting instances with overconfident but erroneous predictions, often referred to as confirmation bias. This paper introduces BPLS, a Bayesian framework for PLS that aims to mitigate this issue. At its core lies a criterion for selecting instances to label: an analytical approximation of the posterior predictive of pseudo-samples. We derive this selection criterion by proving Bayes-optimality of the posterior predictive of pseudo-samples. We further overcome computational hurdles by approximating the criterion analytically. Its relation to the marginal likelihood allows us to come up with an approximation based on Laplace’s method and the Gaussian integral. We empirically assess BPLS on simulated and real-world data. When faced with high-dimensional data prone to overfitting, BPLS outperforms traditional PLS methods.

MCML Authors

Jann Goschenhofer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[4]

T. Nagler.
Statistical Foundations of Prior-Data Fitted Networks.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning. Instead of training the network to an observed training set, a fixed model is pre-trained offline on small, simulated training sets from a variety of tasks. The pre-trained model is then used to infer class probabilities in-context on fresh training sets with arbitrary size and distribution. Empirically, PFNs achieve state-of-the-art performance on tasks with similar size to the ones used in pre-training. Surprisingly, their accuracy further improves when passed larger data sets during inference. This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior. While PFNs are motivated by Bayesian ideas, a purely frequentistic interpretation of PFNs as pre-tuned, but untrained predictors explains their behavior. A predictor’s variance vanishes if its sensitivity to individual training samples does and the bias vanishes only if it is appropriately localized around the test feature. The transformer architecture used in current PFN implementations ensures only the former. These findings shall prove useful for designing architectures with favorable empirical behavior.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[3]

T. Nagler and T. Vatter.
Solving Estimating Equations With Copulas.
Journal of the American Statistical Association 119.546 (Mar. 2023). DOI

Abstract

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

2022

[2]

K. Lotto, T. Nagler and M. Radic.
Modeling Stochastic Data Using Copulas for Applications in the Validation of Autonomous Driving.
Electronics 11.24 (Dec. 2022). DOI

Abstract

The verification and validation processes of fully automated vehicles are linked to an almost intractable challenge of reflecting the real world with all its interactions in a virtual environment. Influential stochastic parameters need to be extracted from real-world measurements and real-time data, capturing all interdependencies, for an accurate simulation of reality. A copula is a probability model that represents a multivariate distribution, examining the dependence between the underlying variables. This model is used on drone measurement data from a roundabout containing dependent stochastic parameters. With the help of the copula model, samples are generated that reflect the real-time data. The resulting applications and possible extensions are discussed and explored.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability

Computational Statistics & Data Science

[1]

N. Palm, F. Stroebl and H. Palm.
Parameter Individual Optimal Experimental Design and Calibration of Parametric Models.
IEEE Access 10 (Oct. 2022). DOI GitHub

Abstract

Parametric models allow to reflect system behavior in general and characterize individual system instances by specific parameter values. For a variety of scientific disciplines, model calibration by parameter quantification is therefore of central importance. As the time and cost of calibration experiments increases, the question of how to determine parameter values of required quality with a minimum number of experiments comes to the fore. In this paper, a methodology is introduced allowing to quantify and optimize achievable parameter extraction quality based on an experimental plan including a process and methods how to adapt the experimental plan for improved estimation of individually selectable parameters. The resulting parameter-individual optimal design of experiments (pi-OED) enables experimenters to extract a maximum of parameter-specific information from a given number of experiments. We demonstrate how to minimize variance or covariances of individually selectable parameter estimators by model-based calculation of the experimental designs. Using the Fisher Information Matrix in combination with the Cramer-Raó inequality, the pi-OED plan is reduced to a global optimization problem. The pi-OED workflow is demonstrated using computer experiments to calibrate a model describing calendrical aging of lithium-ion battery cells. Applying bootstrapping methods allows to also quantify parameter estimation distributions for further benchmarking. Comparing pi-OED based computer experimental results with those based on state-of-the-art designs of experiments, reveals its efficiency improvement. All computer experimental results are gained in Python and may be reproduced using a provided Jupyter Notebook along with the source code. Both are available under https://github.com/nicolaipalm/oed.

MCML Authors

Nicolai Palm

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Computational Statistics & Data Science

Research Group Thomas Nagler

Thomas Nagler

Team members @MCML

PhD Students

Recent News @MCML

MCML Researchers With 24 Papers at ICML 2025

MCML Researchers With 52 Papers at ICLR 2025

MCML Researchers With 31 Papers at NeurIPS 2024

Publications @MCML

2025

2024

2023

2022