Home | Research | Groups | Vincent Fortuin

Research Group Vincent Fortuin

Vincent Fortuin

Dr.

Associate

A1 | Statistical Foundations & Explainability

Bayesian Deep Learning

Vincent Fortuin

His research focuses on reliable and data-efficient AI approaches leveraging Bayesian deep learning, deep generative modeling, meta-learning, and PAC-Bayesian theory.

Publications @MCML

2025

[8]

T. Rochussen and V. Fortuin.
Sparse Gaussian Neural Processes.
AABI 2025 - 7th Symposium on Advances in Approximate Bayesian Inference collocated with the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 29, 2025. To be published. Preprint available. arXiv

Abstract

Despite significant recent advances in probabilistic meta-learning, it is common for practitioners to avoid using deep learning models due to a comparative lack of interpretability. Instead, many practitioners simply use non-meta-models such as Gaussian processes with interpretable priors, and conduct the tedious procedure of training their model from scratch for each task they encounter. While this is justifiable for tasks with a limited number of data points, the cubic computational cost of exact Gaussian process inference renders this prohibitive when each task has many observations. To remedy this, we introduce a family of models that meta-learn sparse Gaussian process inference. Not only does this enable rapid prediction on new tasks with sparse Gaussian processes, but since our models have clear interpretations as members of the neural process family, it also allows manual elicitation of priors in a neural process for the first time. In meta-learning regimes for which the number of observed tasks is small or for which expert domain knowledge is available, this offers a crucial advantage.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability

Bayesian Deep Learning

[7]

A. Reuter, T. G. J. Rudner, V. Fortuin and D. Rügamer.
Can Transformers Learn Full Bayesian Inference in Context?
Preprint (Jan. 2025). arXiv

Abstract

Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context – without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows which enables us to infer complex posterior distributions for methods such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods not operating in context.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability

Bayesian Deep Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistics, Data Science and Machine Learning

2024

[6]

R. Dhahri, A. Immer, B. Charpentier, S. Günnemann and V. Fortuin.
Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to naïvely deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlooked. We present Sparsifiability via the Marginal likelihood (SpaM), a pruning framework that highlights the effectiveness of using the Bayesian marginal likelihood in conjunction with sparsity-inducing priors for making neural networks more sparsifiable. Our approach implements an automatic Occam’s razor that selects the most sparsifiable model that still explains the data well, both for structured and unstructured sparsification. In addition, we demonstrate that the pre-computed posterior Hessian approximation used in the Laplace approximation can be re-used to define a cheap pruning criterion, which outperforms many existing (more expensive) approaches. We demonstrate the effectiveness of our framework, especially at high sparsity levels, across a range of different neural network architectures and datasets.

MCML Authors

Stephan Günnemann

Prof. Dr.

A3 | Computational Models

Data Analytics & Machine Learning

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability

Bayesian Deep Learning

[5]

K. Flöge, M. A. Moeed and V. Fortuin.
Stein Variational Newton Neural Network Ensembles.
Preprint (Nov. 2024). arXiv

Abstract

Deep neural network ensembles are powerful tools for uncertainty quantification, which have recently been re-interpreted from a Bayesian perspective. However, current methods inadequately leverage second-order information of the loss landscape, despite the recent availability of efficient Hessian approximations. We propose a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates. Our approach uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations. We validate the effectiveness of our method on diverse regression and classification tasks, demonstrating superior performance with a significantly reduced number of training epochs compared to existing ensemble-based methods, while enhancing uncertainty quantification and robustness against overfitting.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability

Bayesian Deep Learning

[4]

K. Flöge, S. Udayakumar, J. Sommer, M. Piraud, S. Kesselheim, V. Fortuin, S. Günneman, K. J. van der Weg, H. Gohlke, A. Bazarova and E. Merdivan.
OneProt: Towards Multi-Modal Protein Foundation Models.
Preprint (Nov. 2024). arXiv

Abstract

Recent AI advances have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal AI for proteins that integrates structural, sequence, alignment, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of modality encoders along protein sequences. It demonstrates strong performance in retrieval tasks and surpasses state-of-the-art methods in various downstream tasks, including metal ion binding classification, gene-ontology annotation, and enzyme function prediction. This work expands multi-modal capabilities in protein models, paving the way for applications in drug discovery, biocatalytic reaction planning, and protein engineering.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability

Bayesian Deep Learning

[3]

K. Bouchiat, A. Immer, H. Yèche, G. Ratsch and V. Fortuin.
Improving Neural Additive Models with Bayesian Principles.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability

Bayesian Deep Learning

[2]

T. Papamarkou, M. Skoularidou, K. Palla, L. Aitchison, J. Arbel, D. Dunson, M. Filippone, V. Fortuin, P. Hennig, J. M. Hernández-Lobato, A. Hubin, A. Immer, T. Karaletsos, M. E. Khan, A. Kristiadi, Y. Li, S. Mandt, C. Nemeth, M. A. Osborne, T. G. J. Rudner, D. Rügamer, Y. W. Teh, M. Welling, A. G. Wilson and R. Zhang.
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability

Bayesian Deep Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability

Statistics, Data Science and Machine Learning

[1]

F. Sergeev, P. Malsot, G. Rätsch and V. Fortuin.
Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information.
Preprint (Jul. 2024). arXiv

Abstract

Knowing which features of a multivariate time series to measure and when is a key task in medicine, wearables, and robotics. Better acquisition policies can reduce costs while maintaining or even improving the performance of downstream predictors. Inspired by the maximization of conditional mutual information, we propose an approach to train acquirers end-to-end using only the downstream loss. We show that our method outperforms random acquisition policy, matches a model with an unrestrained budget, but does not yet overtake a static acquisition strategy. We highlight the assumptions and outline avenues for future work.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability

Bayesian Deep Learning

Research Group Vincent Fortuin

Vincent Fortuin

Recent News @MCML

MCML Researchers With 31 Papers at NeurIPS 2024

MCML Researchers With 20 Papers at ICML 2024

Publications @MCML

2025

2024