A | Foundations of Machine Learning

Algorithmic Machine Learning & Explainable AI

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Mathias Drton

Prof. Dr.

Mathematical Statistics

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Debarghya Ghoshdastidar

Prof. Dr.

Theoretical Foundations of Artificial Intelligence

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Fabian Scheipl

PD Dr.

Functional Data Analysis

Volker Schmid

Prof. Dr.

Bayesian Imaging & Spatial Statistics

Andreas Döpp

Dr. habil

Associate

Data-driven methods in Physics and Optics

Vincent Fortuin

Dr.

Associate

Bayesian Deep Learning

Michael Schomaker

Prof. Dr.

Associate

Biostatistics

©all images: LMU | TUM

Publications in Research Area A1

[462]

S. Lumpp and M. Drton.
On weak convergence of Gaussian conditional distributions.
Statistics and Probability Letters 226.110497 (Nov. 2025). DOI

Abstract

Weak convergence of joint distributions generally does not imply convergence of conditional distributions. In particular, conditional distributions need not converge when joint Gaussian distributions converge to a singular Gaussian limit. Algebraically, this is due to the fact that at singular covariance matrices, Schur complements are not continuous functions of the matrix entries. Our results lay out special conditions under which convergence of Gaussian conditional distributions nevertheless occurs, and we exemplify how this allows one to reason about conditional independence in a new class of graphical models.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[461]

E. Özeren, A. Ulbrich, S. Filimon, D. Rügamer and A. Bender.
Enhancing Traffic Accident Classifications: Application of NLP Methods for City Safety.
ECML-PKDD 2025 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Porto, Portugal, Sep 15-19, 2025. To be published. Preprint available. arXiv

Abstract

A comprehensive understanding of traffic accidents is essential for improving city safety and informing policy decisions. In this study, we analyze traffic incidents in Munich to identify patterns and characteristics that distinguish different types of accidents. The dataset consists of both structured tabular features, such as location, time, and weather conditions, as well as unstructured free-text descriptions detailing the circumstances of each accident. Each incident is categorized into one of seven predefined classes. To assess the reliability of these labels, we apply NLP methods, including topic modeling and few-shot learning, which reveal inconsistencies in the labeling process. These findings highlight potential ambiguities in accident classification and motivate a refined predictive approach. Building on these insights, we develop a classification model that achieves high accuracy in assigning accidents to their respective categories. Our results demonstrate that textual descriptions contain the most informative features for classification, while the inclusion of tabular data provides only marginal improvements. These findings emphasize the critical role of free-text data in accident analysis and highlight the potential of transformer-based models in improving classification reliability.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[460]

L. Schneider, B. Bischl and M. Feurer.
Overtuning in Hyperparameter Optimization.
AutoML 2025 - Methods Track - Methods Track at the International Conference on Automated Machine Learning. New York City, NY, USA, Sep 08-11, 2025. To be published. URL

Abstract

Hyperparameter optimization (HPO) aims to identify an optimal hyperparameter configuration (HPC) such that the resulting model generalizes well to unseen data. Since directly optimizing the expected generalization error is impossible, resampling techniques like holdout validation or cross-validation are used as proxy measures in HPO. However, this implicitly assumes that the HPC minimizing validation error will also yield the best true generalization performance. Given that our inner validation error estimate is inherently stochastic and depends on the resampling, we study: Can excessive optimization of the validation error lead to a similarly detrimental effect as excessive optimization of the empirical risk of an ML model? This phenomenon, which we refer to as overtuning, represents a form of overfitting at the HPO level. Despite its potential impact, overtuning has received limited attention in the HPO and automated machine learning (AutoML) literature. We first formally define overtuning and distinguish it from related concepts such as meta-overfitting. We then reanalyze large-scale HPO benchmark data, assessing how frequently overtuning occurs and its practical relevance. Our findings suggest that overtuning is more common than expected, although often mild. However, in 10% of cases, severe overtuning results in selecting an HPC whose generalization performance is worse than the default HPC. We further examine how factors such as the chosen performance metric, resampling method, dataset size, learning algorithm, and optimization strategy influence overtuning and discuss potential mitigation strategies. Our results highlight the need to raise awareness of overtuning, particularly in the small-data regime, indicating that further mitigation strategies should be studied.

MCML Authors

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[459]

T. Zehle, M. Schlager, T. Heiß and M. Feurer.
CAPO: Cost-Aware Prompt Optimization.
AutoML 2025 - International Conference on Automated Machine Learning. New York City, NY, USA, Sep 08-11, 2025. To be published. Preprint available. arXiv

Abstract

Large language models (LLMs) have revolutionized natural language processing by solving a wide range of tasks simply guided by a prompt. Yet their performance is highly sensitive to prompt formulation. While automated prompt optimization addresses this challenge by finding optimal prompts, current methods require a substantial number of LLM calls and input tokens, making prompt optimization expensive. We introduce CAPO (Cost-Aware Prompt Optimization), an algorithm that enhances prompt optimization efficiency by integrating AutoML techniques. CAPO is an evolutionary approach with LLMs as operators, incorporating racing to save evaluations and multi-objective optimization to balance performance with prompt length. It jointly optimizes instructions and few-shot examples while leveraging task descriptions for improved robustness. Our extensive experiments across diverse datasets and LLMs demonstrate that CAPO outperforms state-of-the-art discrete prompt optimization methods in 11/15 cases with improvements up to 21%p. Our algorithm achieves better performances already with smaller budgets, saves evaluations through racing, and decreases average prompt length via a length penalty, making it both cost-efficient and cost-aware. Even without few-shot examples, CAPO outperforms its competitors and generally remains robust to initial prompts. CAPO represents an important step toward making prompt optimization more powerful and accessible by improving cost-efficiency.

MCML Authors

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Statistical Learning and Data Science

[458]

D. Strieder and M. Drton.
Identifying total causal effects in linear models under partial homoscedasticity.
International Journal of Approximate Reasoning 183.109455 (Aug. 2025). DOI

Abstract

A fundamental challenge of scientific research is inferring causal relations based on observed data. One commonly used approach involves utilizing structural causal models that postulate noisy functional relations among interacting variables. A directed graph naturally represents these models and reflects the underlying causal structure. However, classical identifiability results suggest that, without conducting additional experiments, this causal graph can only be identified up to a Markov equivalence class of indistinguishable models. Recent research has shown that focusing on linear relations with equal error variances can enable the identification of the causal structure from mere observational data. Nonetheless, practitioners are often primarily interested in the effects of specific interventions, rendering the complete identification of the causal structure unnecessary. In this work, we investigate the extent to which less restrictive assumptions of partial homoscedasticity are sufficient for identifying the causal effects of interest. Furthermore, we construct mathematically rigorous confidence regions for total causal effects under structure uncertainty and explore the performance gain of relying on stricter error assumptions in a simulation study.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Mathematical Statistics

[457]

B. Ma, B. Yoztyurk, A.-C. Haensch, X. Wang, M. Herklotz, F. Kreuter, B. Plank and M. Aßenmacher.
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL

Abstract

In recent research, large language models (LLMs) have been increasingly used to investigate public opinions. This study investigates the algorithmic fidelity of LLMs, i.e., the ability to replicate the socio-cultural context and nuanced opinions of human participants. Using open-ended survey data from the German Longitudinal Election Studies (GLES), we prompt different LLMs to generate synthetic public opinions reflective of German subpopulations by incorporating demographic features into the persona prompts. Our results show that Llama performs better than other LLMs at representing subpopulations, particularly when there is lower opinion diversity within those groups. Our findings further reveal that the LLM performs better for supporters of left-leaning parties like The Greens and The Left compared to other parties, and matches the least with the right-party AfD. Additionally, the inclusion or exclusion of specific variables in the prompts can significantly impact the models’ predictions. These findings underscore the importance of aligning LLMs to more effectively model diverse public opinions while minimizing political biases and enhancing robustness in representativeness.

MCML Authors

Bolei Ma

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Xinpeng Wang

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Barbara Plank

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI and Computational Linguistics

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[456]

S. Urchs, V. Thurner, M. Aßenmacher, C. Heumann and S. Thiemichen.
taz2024full: Analysing German Newspapers for Gender Bias and Discrimination across Decades.
ACL 2025 - Findings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL

Abstract

Open-access corpora are essential for advancing natural language processing (NLP) and computational social science (CSS). However, large-scale resources for German remain limited, restricting research on linguistic trends and societal issues such as gender bias. We present taz2024full, the largest publicly available corpus of German newspaper articles to date, comprising over 1.8 million texts from taz, spanning 1980 to 2024. As a demonstration of the corpus’s utility for bias and discrimination research, we analyse gender representation across four decades of reporting. We find a consistent overrepresentation of men, but also a gradual shift toward more balanced coverage in recent years. Using a scalable, structured analysis pipeline, we provide a foundation for studying actor mentions, sentiment, and linguistic framing in German journalistic texts. The corpus supports a wide range of applications, from diachronic language analysis to critical media studies, and is freely available to foster inclusive and reproducible research in German-language NLP.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[455]

E. Garces Arias, H. Blocher, J. Rodemann, M. Li, C. Heumann and M. Aßenmacher.
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework.
GEM2 @ACL 2025 - 4th Workshop on Generation, Evaluation and Metrics at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Open-ended text generation has become a prominent task in natural language processing due to the rise of powerful (large) language models. However, evaluating the quality of these models and the employed decoding strategies remains challenging because of trade-offs among widely used metrics such as coherence, diversity, and perplexity. Decoding methods often excel in some metrics while underperforming in others, complicating the establishment of a clear ranking. In this paper, we present novel ranking strategies within this multicriteria framework. Specifically, we employ benchmarking approaches based on partial orderings and present a new summary metric designed to balance existing automatic indicators, providing a more holistic evaluation of text generation quality. Furthermore, we discuss the alignment of these approaches with human judgments. Our experiments demonstrate that the proposed methods offer a robust way to compare decoding strategies, exhibit similarities with human preferences, and serve as valuable tools in guiding model selection for open-ended text generation tasks. Finally, we suggest future directions for improving evaluation methodologies in text generation. Our codebase, datasets, and models are publicly available.

MCML Authors

Esteban Garces Arias

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[454]

M. Koshil, M. Feurer and K. Eggensperger.
In-Context Learning of Soft Nearest Neighbor Classifiers for Intelligible Tabular Machine Learning.
TRL @ACL 2025 - 4th Table Representation Learning Workshop at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. URL

Abstract

With in-context learning foundation models like TabPFN excelling on small supervised tabular learning tasks, it has been argued that ‘boosted trees are not the best default choice when working with data in tables’. However, such foundation models are inherently black-box models that do not provide interpretable predictions. We introduce a novel learning task to train ICL models to act as a nearest neighbor algorithm, which enables intelligible inference and does not decrease performance empirically.

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[453]

M. Drton, M. Garrote-López, N. Nikov, E. Robeva and Y. S. Wang.
Causal Discovery for Linear Non-Gaussian Models with Disjoint Cycles.
UAI 2025 - 41st Conference on Uncertainty in Artificial Intelligence. Rio de Janeiro, Brazil, Jul 21-25, 2025. To be published. Preprint available. URL GitHub

Abstract

The paradigm of linear structural equation modeling readily allows one to incorporate causal feedback loops in the model specification. These appear as directed cycles in the common graphical representation of the models. However, the presence of cycles entails difficulties such as the fact that models need no longer be characterized by conditional independence relations. As a result, learning cyclic causal structures remains a challenging problem. In this paper, we offer new insights on this problem in the context of linear non-Gaussian models. First, we precisely characterize when two directed graphs determine the same linear non-Gaussian model. Next, we take up a setting of cycle-disjoint graphs, for which we are able to show that simple quadratic and cubic polynomial relations among low-order moments of a non-Gaussian distribution allow one to locate source cycles. Complementing this with a strategy of decorrelating cycles and multivariate regression allows one to infer a block-topological order among the directed cycles, which leads to a consistent and computationally efficient algorithm for learning causal structures with disjoint cycles.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[452]

S. Müller, A. Reuter, N. Hollmann, D. Rügamer and F. Hutter.
Position: The Future of Bayesian Prediction Is Prior-Fitted.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Preprint. arXiv

Abstract

Training neural networks on randomly generated artificial datasets yields Bayesian models that capture the prior defined by the dataset-generating distribution. Prior-data Fitted Networks (PFNs) are a class of methods designed to leverage this insight. In an era of rapidly increasing computational resources for pre-training and a near stagnation in the generation of new real-world data in many applications, PFNs are poised to play a more important role across a wide range of applications. They enable the efficient allocation of pre-training compute to low-data scenarios. Originally applied to small Bayesian modeling tasks, the field of PFNs has significantly expanded to address more complex domains and larger datasets. This position paper argues that PFNs and other amortized inference approaches represent the future of Bayesian inference, leveraging amortized learning to tackle data-scarce problems. We thus believe they are a fruitful area of research. In this position paper, we explore their potential and directions to address their current limitations.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[451]

T. Pielok, B. Bischl and D. Rügamer.
Revisiting Unbiased Implicit Variational Inference.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Recent years have witnessed growing interest in semi-implicit variational inference (SIVI) methods due to their ability to rapidly generate samples from highly complicated distributions. However, since the likelihood of these samples is non-trivial to estimate in high dimensions, current research focuses on finding effective SIVI training routines. While unbiased implicit variational inference (UIVI) has largely been dismissed as imprecise and computationally prohibitive because of its inner MCMC loop, we revisit this method and identify key shortcomings. In particular, we show that UIVI’s MCMC loop can be effectively replaced via importance sampling and the optimal proposal distribution can be learned stably by minimizing an expected forward Kullback–Leibler divergence without bias. Our refined approach demonstrates superior performance or parity with state-of-the-art methods on established SIVI benchmarks.

MCML Authors

Tobias Pielok

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[450]

A. Reuter, T. G. J. Rudner, V. Fortuin and D. Rügamer.
Can Transformers Learn Full Bayesian Inference in Context?
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context – without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows which enables us to infer complex posterior distributions for methods such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods not operating in context.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

[449]

R. Schulte, D. Rügamer and T. Nagler.
Adjustment for Confounding using Pre-Trained Representations.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

There is growing interest in extending average treatment effect (ATE) estimation to incorporate non-tabular data, such as images and text, which may act as sources of confounding. Neglecting these effects risks biased results and flawed scientific conclusions. However, incorporating non-tabular data necessitates sophisticated feature extractors, often in combination with ideas of transfer learning. In this work, we investigate how latent features from pre-trained neural networks can be leveraged to adjust for sources of confounding. We formalize conditions under which these latent features enable valid adjustment and statistical inference in ATE estimation, demonstrating results along the example of double machine learning. In this context, we also discuss critical challenges inherent to latent feature learning and downstream parameter estimation using those. As our results are agnostic to the considered data modality, they represent an important first step towards a theoretical foundation for the usage of latent representation from foundation models in ATE estimation.

MCML Authors

Rickmer Schulte

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Stefan Bauer

Computational Statistics & Data Science

[448]

A. Uselis, A. Dittadi and S. J. Oh.
Does Data Scaling Lead to Visual Compositional Generalization?
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL GitHub

Abstract

Compositional understanding is crucial for human intelligence, yet it remains unclear whether contemporary vision models exhibit it. The dominant machine learning paradigm is built on the premise that scaling data and model sizes will improve out-of-distribution performance, including compositional generalization. We test this premise through controlled experiments that systematically vary data scale, concept diversity, and combination coverage. We find that compositional generalization is driven by data diversity, not mere data scale. Increased combinatorial coverage forces models to discover a linearly factored representational structure, where concepts decompose into additive components. We prove this structure is key to efficiency, enabling perfect generalization from few observed combinations. Evaluating pretrained models (DINO, CLIP), we find above-random yet imperfect performance, suggesting partial presence of this structure. Our work motivates stronger emphasis on constructing diverse datasets for compositional generalization, and considering the importance of representational structure that enables efficient compositional learning.

MCML Authors

Andrea Dittadi

Dr.

Algorithmic Machine Learning & Explainable AI

[447]

V. M. Singh, A. G. V. Asiares, L. S. Schuhmacher, K. Rendall, S. Weißbrod, D. Rügamer and I. Körte.
An Interpretable Representation Learning Approach for Diffusion Tensor Imaging.
MIDL 2025 - Medical Imaging with Deep Learning. Salt Lake City, UT, USA, Jul 09-11, 2025. To be published. Preprint available. arXiv

Abstract

Diffusion Tensor Imaging (DTI) tractography offers detailed insights into the structural connectivity of the brain, but presents challenges in effective representation and interpretation in deep learning models. In this work, we propose a novel 2D representation of DTI tractography that encodes tract-level fractional anisotropy (FA) values into a 9x9 grayscale image. This representation is processed through a Beta-Total Correlation Variational Autoencoder with a Spatial Broadcast Decoder to learn a disentangled and interpretable latent embedding. We evaluate the quality of this embedding using supervised and unsupervised representation learning strategies, including auxiliary classification, triplet loss, and SimCLR-based contrastive learning. Compared to the 1D Group deep neural network (DNN) baselines, our approach improves the F1 score in a downstream sex classification task by 15.74% and shows a better disentanglement than the 3D representation.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[446]

R. Debelak, T. K. Koch, M. Aßenmacher and C. Stachl.
From Embeddings to Explainability: A Tutorial on Large-Language-Model-Based Text Analysis for Behavioral Scientists.
Advances in Methods and Practices in Psychological Science 8.3 (Jul. 2025). DOI

Abstract

Large language models (LLMs) are transforming research in psychology and the behavioral sciences by enabling advanced text analysis at scale. Their applications range from the analysis of social media posts to infer psychological traits to the automated scoring of open-ended survey responses. However, despite their potential, many behavioral scientists struggle to integrate LLMs into their research because of the complexity of text modeling. In this tutorial, we aim to provide an accessible introduction to LLM-based text analysis, focusing on the Transformer architecture. We guide researchers through the process of preparing text data, using pretrained Transformer models to generate text embeddings, fine-tuning models for specific tasks such as text classification, and applying interpretability methods, such as Shapley additive explanations and local interpretable model-agnostic explanations, to explain model predictions. By making these powerful techniques more approachable, we hope to empower behavioral scientists to leverage LLMs in their research, unlocking new opportunities for analyzing and interpreting textual data.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[445]

B. Bischl, G. Casalicchio, T. Das, M. Feurer, S. Fischer, P. Gijsbers, S. Mukherjee, A. C. Müller, L. Németh, L. Oala, L. Purucker, S. Ravi, J. N. van Rijn, P. Singh, J. Vanschoren, J. van der Velde and M. Wever.
OpenML: Insights from 10 years and more than a thousand papers.
Patterns In Press, Corrected Proof (Jul. 2025). DOI

Abstract

OpenML is an open-source platform that democratizes machine-learning evaluation by enabling anyone to share datasets in uniform standards, define precise machine-learning tasks, and automatically share detailed workflows and model evaluations. More than just a platform, OpenML fosters a collaborative ecosystem where scientists create new tools, launch initiatives, and establish standards to advance machine learning. Over the past decade, OpenML has inspired over 1,500 publications across diverse fields, from scientists releasing new datasets and benchmarking new models to educators teaching reproducible science. Looking back, we detail and describe the platform’s impact by looking at usage and citations. We share lessons from a decade of building, maintaining, and expanding OpenML, highlighting how rich metadata, collaborative benchmarking, and open interfaces have enhanced research and interoperability. Looking ahead, we cover ongoing efforts to expand OpenML’s capabilities and integrate with other platforms, informing a broader vision for open-science infrastructure for machine learning.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Sebastian Fischer

Statistical Learning and Data Science

[444]

M. M. Mandl, A.-L. Boulesteix, S. Burgess and V. Zuber.
Outlier Detection in Mendelian Randomization.
Statistics in Medicine 44.15-17 (Jul. 2025). DOI

Abstract

Mendelian randomization (MR) uses genetic variants as instrumental variables to infer causal effects of exposures on an outcome. One key assumption of MR is that the genetic variants used as instrumental variables are independent of the outcome conditional on the risk factor and unobserved confounders. Violations of this assumption, that is, the effect of the instrumental variables on the outcome through a path other than the risk factor included in the model (which can be caused by pleiotropy), are common phenomena in human genetics. Genetic variants, which deviate from this assumption, appear as outliers to the MR model fit and can be detected by the general heterogeneity statistics proposed in the literature, which are known to suffer from overdispersion, that is, too many genetic variants are declared as false outliers. We propose a method that corrects for overdispersion of the heterogeneity statistics in uni- and multivariable MR analysis by making use of the estimated inflation factor to correctly remove outlying instruments and therefore account for pleiotropic effects. Our method is applicable to summary-level data.

MCML Authors

Maximilian Mandl

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Biometry in Molecular Medicine

[443]

E. Walter, T. Brock, P. Lahoud, N. Werner, F. Czaja, A. Tichy, C. Bumm, A. Bender, A. Castro, W. Teughels, F. Schwendicke and M. Folwaczny.
Predictive modeling for step II therapy response in periodontitis - model development and validation.
npj Digital Medicine 8.445 (Jul. 2025). DOI

Abstract

Steps I and II periodontal therapy is the first-line treatment for periodontal disease, but has varying success. This study aimed to develop machine learning models to predict changes in periodontal probing depth (PPD) after step II therapy using patient-, tooth-, and site-specific clinical covariates. Models accurately predicted that healthy sites stay healthy, but performed suboptimally for diseased sites. Tuning improved performance, with PPD, tooth-site, and tooth-type identified as key predictors. Pocket closure was predicted with fair accuracy, with baseline PPD as the most relevant covariate. Models predicted improving pockets well but underperformed for non-responding sites, with antibiotic treatment and tooth type being the most influential features. While predictive performance for step II periodontal therapy based on routine clinical data remains limited, models can stratify periodontal sites into meaningful categories and estimate the probability of pocket improvement. They provide a foundation for site-specific outcome prediction and may support patient communication and expectations.

MCML Authors

Tobias Brock

Computational Statistics & Data Science

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[442]

C. Gruber, H. Alber, B. Bischl, G. Kauermann, B. Plank and M. Aßenmacher.
Revisiting Active Learning under (Human) Label Variation.
Preprint (Jul. 2025). arXiv

Abstract

Access to high-quality labeled data remains a limiting factor in applied supervised learning. While label variation (LV), i.e., differing labels for the same instance, is common, especially in natural language processing, annotation frameworks often still rest on the assumption of a single ground truth. This overlooks human label variation (HLV), the occurrence of plausible differences in annotations, as an informative signal. Similarly, active learning (AL), a popular approach to optimizing the use of limited annotation budgets in training ML models, often relies on at least one of several simplifying assumptions, which rarely hold in practice when acknowledging HLV. In this paper, we examine foundational assumptions about truth and label nature, highlighting the need to decompose observed LV into signal (e.g., HLV) and noise (e.g., annotation error). We survey how the AL and (H)LV communities have addressed – or neglected – these distinctions and propose a conceptual framework for incorporating HLV throughout the AL loop, including instance selection, annotator choice, and label representation. We further discuss the integration of large language models (LLM) as annotators. Our work aims to lay a conceptual foundation for HLV-aware active learning, better reflecting the complexities of real-world annotation.

MCML Authors

Helen Alber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

Barbara Plank

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI and Computational Linguistics

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[441]

B. Pulido, A. M. Franco-Pereira, R. E. Lillo and F. Scheipl.
Area-based epigraph and hypograph indices for functional outlier detection.
Preprint (Jul. 2025). arXiv

Abstract

Detecting outliers in Functional Data Analysis is challenging because curves can stray from the majority in many different ways. The Modified Epigraph Index (MEI) and Modified Hypograph Index (MHI) rank functions by the fraction of the domain on which one curve lies above or below another. While effective for spotting shape anomalies, their construction limits their ability to flag magnitude outliers. This paper introduces two new metrics, the Area-Based Epigraph Index (ABEI) and Area-Based Hypograph Index (ABHI) that quantify the area between curves, enabling simultaneous sensitivity to both magnitude and shape deviations. Building on these indices, we present EHyOut, a robust procedure that recasts functional outlier detection as a multivariate problem: for every curve, and for its first and second derivatives, we compute ABEI and ABHI and then apply multivariate outlier-detection techniques to the resulting feature vectors. Extensive simulations show that EHyOut remains stable across a wide range of contamination settings and often outperforms established benchmark methods. Moreover, applications to Spanish weather data and United Nations world population data further illustrate the practical utility and meaningfulness of this methodology.

MCML Authors

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Functional Data Analysis

[440]

N. Sturma and M. Drton.
Trek-Based Parameter Identification for Linear Causal Models With Arbitrarily Structured Latent Variables.
Preprint (Jul. 2025). arXiv

Abstract

We develop a criterion to certify whether causal effects are identifiable in linear structural equation models with latent variables. Linear structural equation models correspond to directed graphs whose nodes represent the random variables of interest and whose edges are weighted with linear coefficients that correspond to direct causal effects. In contrast to previous identification methods, we do not restrict ourselves to settings where the latent variables constitute independent latent factors (i.e., to source nodes in the graphical representation of the model). Our novel latent-subgraph criterion is a purely graphical condition that is sufficient for identifiability of causal effects by rational formulas in the covariance matrix. To check the latent-subgraph criterion, we provide a sound and complete algorithm that operates by solving an integer linear program. While it targets effects involving observed variables, our new criterion is also useful for identifying effects between latent variables, as it allows one to transform the given model into a simpler measurement model for which other existing tools become applicable.

MCML Authors

Nils Sturma

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[439]

L. Bothmann, P. A. Boustani, J. M. Alvarez, G. Casalicchio, B. Bischl and S. Dandl.
Privilege Scores.
EWAF 2025 - 4th European Workshop on Algorithmic Fairness. Eindhoven, The Netherlands, Jun 30-Jul 02, 2025. To be published. Preprint available. arXiv

Abstract

Bias-transforming methods of fairness-aware machine learning aim to correct a non-neutral status quo with respect to a protected attribute (PA). Current methods, however, lack an explicit formulation of what drives non-neutrality. We introduce privilege scores (PS) to measure PA-related privilege by comparing the model predictions in the real world with those in a fair world in which the influence of the PA is removed. At the individual level, PS can identify individuals who qualify for affirmative action; at the global level, PS can inform bias-transforming policies. After presenting estimation methods for PS, we propose privilege score contributions (PSCs), an interpretation method that attributes the origin of privilege to mediating features and direct effects. We provide confidence intervals for both PS and PSCs. Experiments on simulated and real-world data demonstrate the broad applicability of our methods and provide novel insights into gender and racial privilege in mortgage and college admissions applications.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Philip Amir Boustani

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[438]

L. Bothmann, K. Peters and B. Bischl.
What Is Fairness? On the Role of Protected Attributes and Fictitious Worlds.
EWAF 2025 - 4th European Workshop on Algorithmic Fairness. Eindhoven, The Netherlands, Jun 30-Jul 02, 2025. To be published. Preprint available. arXiv

Abstract

A growing body of literature in fairness-aware machine learning (fairML) aims to mitigate machine learning (ML)-related unfairness in automated decision-making (ADM) by defining metrics that measure fairness of an ML model and by proposing methods to ensure that trained ML models achieve low scores on these metrics. However, the underlying concept of fairness, i.e., the question of what fairness is, is rarely discussed, leaving a significant gap between centuries of philosophical discussion and the recent adoption of the concept in the ML community. In this work, we try to bridge this gap by formalizing a consistent concept of fairness and by translating the philosophical considerations into a formal framework for the training and evaluation of ML models in ADM systems. We argue that fairness problems can arise even without the presence of protected attributes (PAs), and point out that fairness and predictive performance are not irreconcilable opposites, but that the latter is necessary to achieve the former. Furthermore, we argue why and how causal considerations are necessary when assessing fairness in the presence of PAs by proposing a fictitious, normatively desired (FiND) world in which PAs have no causal effects. In practice, this FiND world must be approximated by a warped world in which the causal effects of the PAs are removed from the real-world data. Finally, we achieve greater linguistic clarity in the discussion of fairML. We outline algorithms for practical applications and present illustrative experiments on COMPAS data.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[437]

C. Leininger, S. Rittel and L. Bothmann.
Overcoming Fairness Trade-offs via Pre-processing: A Causal Perspective.
EWAF 2025 - 4th European Workshop on Algorithmic Fairness. Eindhoven, The Netherlands, Jun 30-Jul 02, 2025. To be published. Preprint available. arXiv

Abstract

Training machine learning models for fair decisions faces two key challenges: The fairness-accuracy trade-off results from enforcing fairness which weakens its predictive performance in contrast to an unconstrained model. The incompatibility of different fairness metrics poses another trade-off – also known as the impossibility theorem. Recent work identifies the bias within the observed data as a possible root cause and shows that fairness and predictive performance are in fact in accord when predictive performance is measured on unbiased data. We offer a causal explanation for these findings using the framework of the FiND (fictitious and normatively desired) world, a ‘fair’ world, where protected attributes have no causal effects on the target variable. We show theoretically that (i) classical fairness metrics deemed to be incompatible are naturally satisfied in the FiND world, while (ii) fairness aligns with high predictive performance. We extend our analysis by suggesting how one can benefit from these theoretical insights in practice, using causal pre-processing methods that approximate the FiND world. Additionally, we propose a method for evaluating the approximation of the FiND world via pre-processing in practical use cases where we do not have access to the FiND world. In simulations and empirical studies, we demonstrate that these pre-processing methods are successful in approximating the FiND world and resolve both trade-offs. Our results provide actionable solutions for practitioners to achieve fairness and high predictive performance simultaneously.

MCML Authors

Simon Rittel

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[436]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Preventing Sensitive Information Leakage via Post-hoc Orthogonalization with Application to Chest Radiograph Embeddings.
PAKDD 2025 - 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Sydney, Australia, Jun 10-13, 2025. DOI GitHub

Abstract

Deep learning has substantially advanced data analysis across various fields. However, research indicates that protected characteristics, such as age, sex, and race, are often implicitly encoded within the deep feature representations, or embeddings, generated by neural networks. This encoding can lead to inherent biases, which in turn may influence decision-making processes. In clinical settings, in particular, such biases risk leading to unfair treatment of certain subgroups, potentially resulting in serious consequences. After analyzing the sources of these biases in the field of radiology, we illustrate how embeddings of chest radiographs (CXRs) can be corrected to remove the influence of protected features. To showcase the harms of such incidents, we study the MIMIC and CheXpert datasets with three prominent pre-trained models: a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our experiments reveal a significant influence of protected features on predictions of pathologies in CXRs, demonstrating the potential harm of such practices. We then propose a correction method, removing these harmful effects while maintaining competitive predictive performance.

MCML Authors

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[435]

T. Boege, M. Drton, B. Hollering, S. Lumpp, P. Misra and D. Schkoda.
Conditional independence in stationary distributions of diffusions.
Stochastic Processes and their Applications 184.104604 (Jun. 2025). DOI

Abstract

Stationary distributions of multivariate diffusion processes have recently been proposed as probabilistic models of causal systems in statistics and machine learning. Motivated by these developments, we study stationary multivariate diffusion processes with a sparsely structured drift. Our main result gives a characterization of the conditional independence relations that hold in a stationary distribution. The result draws on a graphical representation of the drift structure and pertains to conditional independence relations that hold generally as a consequence of the drift’s sparsity pattern.

MCML Authors

Mathias Drton

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Mathematical Statistics

[434]

L. Gosch, M. Sabanayagam, D. Ghoshdastidar and S. Günnemann.
Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks.
Transactions on Machine Learning Research (Jun. 2025). URL

Abstract

Generalization of machine learning models can be severely compromised by data poisoning, where adversarial changes are applied to the training data. This vulnerability has led to interest in certifying (i.e., proving) that such changes up to a certain magnitude do not affect test predictions. We, for the first time, certify Graph Neural Networks (GNNs) against poisoning attacks, including backdoors, targeting the node features of a given graph. Our certificates are white-box and based upon the neural tangent kernel, which characterizes the training dynamics of sufficiently wide networks; and a novel reformulation of the bilevel optimization problem describing poisoning as a mixed-integer linear program. Consequently, we leverage our framework to provide fundamental insights into the role of graph structure and its connectivity on the worst-case robustness behavior of convolution-based and PageRank-based GNNs. We note that our framework is more general and constitutes the first approach to derive white-box poisoning certificates for NNs, which can be of independent interest beyond graph-related tasks.

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Debarghya Ghoshdastidar

Prof. Dr.

Theoretical Foundations of Artificial Intelligence

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[433]

C. Benjamins, H. Graf, S. Segel, D. Deng, T. Ruhkopf, L. Hennig, S. Basu, N. Mallik, E. Bergman, D. Chen, F. Clément, M. Feurer, K. Eggensperger, F. Hutter, C. Doerr and M. Lindauer.
carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks.
Preprint (Jun. 2025). arXiv URL

Abstract

Hyperparameter Optimization (HPO) is crucial to develop well-performing machine learning models. In order to ease prototyping and benchmarking of HPO methods, we propose carps, a benchmark framework for Comprehensive Automated Research Performance Studies allowing to evaluate N optimizers on M benchmark tasks. In this first release of carps, we focus on the four most important types of HPO task types: blackbox, multi-fidelity, multi-objective and multi-fidelity-multi-objective. With 3 336 tasks from 5 community benchmark collections and 28 variants of 9 optimizer families, we offer the biggest go-to library to date to evaluate and compare HPO methods. The carps framework relies on a purpose-built, lightweight interface, gluing together optimizers and benchmark tasks. It also features an analysis pipeline, facilitating the evaluation of optimizers on benchmarks. However, navigating a huge number of tasks while developing and comparing methods can be computationally infeasible. To address this, we obtain a subset of representative tasks by minimizing the star discrepancy of the subset, in the space spanned by the full set. As a result, we propose an initial subset of 10 to 30 diverse tasks for each task type, and include functionality to re-compute subsets as more benchmarks become available, enabling efficient evaluations. We also establish a first set of baseline results on these tasks as a measure for future comparisons. With carps (this https URL), we make an important step in the standardization of HPO evaluation.

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[432]

T. Cheng, T. Vatter, T. Nagler and K. Chen.
Vine Copulas as Differentiable Computational Graphs.
Preprint (Jun. 2025). arXiv

Abstract

Vine copulas are sophisticated models for multivariate distributions and are increasingly used in machine learning. To facilitate their integration into modern ML pipelines, we introduce the vine computational graph, a DAG that abstracts the multilevel vine structure and associated computations. On this foundation, we devise new algorithms for conditional sampling, efficient sampling-order scheduling, and constructing vine structures for customized conditioning variables. We implement these ideas in torchvinecopulib, a GPU-accelerated Python library built upon PyTorch, delivering improved scalability for fitting, sampling, and density evaluation. Our experiments illustrate how gradient flowing through the vine can improve Vine Copula Autoencoders and that incorporating vines for uncertainty quantification in deep learning can outperform MC-dropout, deep ensembles, and Bayesian Neural Networks in sharpness, calibration, and runtime. By recasting vine copula models as computational graphs, our work connects classical dependence modeling with modern deep-learning toolchains and facilitates the integration of state-of-the-art copula methods in modern machine learning pipelines.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

[431]

E. Garces Arias, H. Blocher, J. Rodemann, M. Aßenmacher and C. Jansen.
Statistical Multicriteria Evaluation of LLM-Generated Text.
Preprint (Jun. 2025). arXiv

Abstract

Assessing the quality of LLM-generated text remains a fundamental challenge in natural language processing. Current evaluation approaches often rely on isolated metrics or simplistic aggregations that fail to capture the nuanced trade-offs between coherence, diversity, fluency, and other relevant indicators of text quality. In this work, we adapt a recently proposed framework for statistical inference based on Generalized Stochastic Dominance (GSD) that addresses three critical limitations in existing benchmarking methodologies: the inadequacy of single-metric evaluation, the incompatibility between cardinal automatic metrics and ordinal human judgments, and the lack of inferential statistical guarantees. The GSD-front approach enables simultaneous evaluation across multiple quality dimensions while respecting their different measurement scales, building upon partial orders of decoding strategies, thus avoiding arbitrary weighting of the involved metrics. By applying this framework to evaluate common decoding strategies against human-generated text, we demonstrate its ability to identify statistically significant performance differences while accounting for potential deviations from the i.i.d. assumption of the sampling design.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[430]

K. Göbler, T. Windisch and M. Drton.
Nonlinear Causal Discovery for Grouped Data.
Preprint (Jun. 2025). arXiv

Abstract

Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather than individual scalar measurements. Motivated by these applications, we extend nonlinear additive noise models to handle random vectors, establishing a two-step approach for causal graph learning: First, infer the causal order among random vectors. Second, perform model selection to identify the best graph consistent with this order. We introduce effective and novel solutions for both steps in the vector case, demonstrating strong performance in simulations. Finally, we apply our method to real-world assembly line data with partial knowledge of causal ordering among variable groups.

MCML Authors

Mathias Drton

Prof. Dr.

C1 | Medicine
→ Group Peter Schüffler

Mathematical Statistics

[429]

J. Min, H. Li, T. Nagler and S. Li.
Assessing Climate-Driven Mortality Risk: A Stochastic Approach with Distributed Lag Non-Linear Models.
Preprint (Jun. 2025). arXiv

Abstract

Assessing climate-driven mortality risk has become an emerging area of research in recent decades. In this paper, we propose a novel approach to explicitly incorporate climate-driven effects into both single- and multi-population stochastic mortality models. The new model consists of two components: a stochastic mortality model, and a distributed lag non-linear model (DLNM). The first component captures the non-climate long-term trend and volatility in mortality rates. The second component captures non-linear and lagged effects of climate variables on mortality, as well as the impact of heat waves and cold waves across different age groups. For model calibration, we propose a backfitting algorithm that allows us to disentangle the climate-driven mortality risk from the non-climate-driven stochastic mortality risk. We illustrate the effectiveness and superior performance of our model using data from three European regions: Athens, Lisbon, and Rome. Furthermore, we utilize future UTCI data generated from climate models to provide mortality projections into 2045 across these regions under two Representative Concentration Pathway (RCP) scenarios. The projections show a noticeable decrease in winter mortality alongside a rise in summer mortality, driven by a general increase in UTCI over time. Although we expect slightly lower overall mortality in the short term under RCP8.5 compared to RCP2.6, a long-term increase in total mortality is anticipated under the RCP8.5 scenario.

MCML Authors

Han Li

Dr.

Computational Pathology

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

[428]

M. Schöffel, E. Garces Arias, M. Wiedner, P. Ruppert, M. Li, C. Heumann and M. Aßenmacher.
Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages.
Preprint (Jun. 2025). arXiv

Abstract

Part-of-speech (POS) tagging remains a foundational component in natural language processing pipelines, particularly critical for historical text analysis at the intersection of computational linguistics and digital humanities. Despite significant advancements in modern large language models (LLMs) for ancient languages, their application to Medieval Romance languages presents distinctive challenges stemming from diachronic linguistic evolution, spelling variations, and labeled data scarcity. This study systematically investigates the central determinants of POS tagging performance across diverse corpora of Medieval Occitan, Medieval Spanish, and Medieval French texts, spanning biblical, hagiographical, medical, and dietary domains. Through rigorous experimentation, we evaluate how fine-tuning approaches, prompt engineering, model architectures, decoding strategies, and cross-lingual transfer learning techniques affect tagging accuracy. Our results reveal both notable limitations in LLMs’ ability to process historical language variations and non-standardized spelling, as well as promising specialized techniques that effectively address the unique challenges presented by low-resource historical languages.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[427]

D. Strieder.
Structure Uncertainty in Causal Inference.
Dissertation 2025. URL

Abstract

In order to draw causal conclusions from available data, it is crucial to reason about the underlying causal structure that governs the data-generating process. In this publication-based thesis, we tackle the challenge of rigorously accounting for uncertainty in this underlying causal structure in causal inference. We present a framework based on test inversions to construct calibrated confidence regions for total causal effects that capture both sources of uncertainty: causal structure and numerical size of nonzero effects.

MCML Authors

David Strieder

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

[426]

T. Nagler and T. Vatter.
Solving Estimating Equations With Copulas.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. DOI

Abstract

Thanks to their ability to capture complex dependence structures, copulas are frequently used to glue random variables into a joint model with arbitrary marginal distributions. More recently, they have been applied to solve statistical learning problems such as regression or classification. Framing such approaches as solutions of estimating equations, we generalize them in a unified framework. We can then obtain simultaneous, coherent inferences across multiple regression-like problems. We derive consistency, asymptotic normality, and validity of the bootstrap for corresponding estimators. The conditions allow for both continuous and discrete data as well as parametric, nonparametric, and semiparametric estimators of the copula and marginal distributions. The versatility of this methodology is illustrated by several theoretical examples, a simulation study, and an application to financial portfolio allocation. Supplementary materials for this article are available online.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Computational Statistics & Data Science

[425]

R. Schulte and D. Rügamer.
Additive Model Boosting: New Insights and Path(ologie)s.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. Oral Presentation. To be published. Preprint available. URL

Abstract

Additive models (AMs) have sparked a lot of interest in machine learning recently, allowing the incorporation of interpretable structures into a wide range of model classes. Many commonly used approaches to fit a wide variety of potentially complex additive models build on the idea of boosting additive models. While boosted additive models (BAMs) work well in practice, certain theoretical aspects are still poorly understood, including general convergence behavior and what optimization problem is being solved when accounting for the implicit regularizing nature of boosting. In this work, we study the solution paths of BAMs and establish connections with other approaches for certain classes of problems. Along these lines, we derive novel convergence results for BAMs, which yield crucial insights into the inner workings of the method. While our results generally provide reassuring theoretical evidence for the practical use of BAMs, they also uncover some ‘pathologies’ of boosting for certain additive model classes concerning their convergence behavior that require caution in practice. We empirically validate our theoretical findings through several numerical experiments.

MCML Authors

Rickmer Schulte

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

[424]

D. Dold, J. Kobialka, N. Palm, E. Sommer, D. Rügamer and O. Dürr.
Paths and Ambient Spaces in Neural Loss Landscapes.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. To be published. URL

Abstract

Understanding the structure of neural network loss surfaces, particularly the emergence of low-loss tunnels, is critical for advancing neural network theory and practice. In this paper, we propose a novel approach to directly embed loss tunnels into the loss landscape of neural networks. Exploring the properties of these loss tunnels offers new insights into their length and structure and sheds light on some common misconceptions. We then apply our approach to Bayesian neural networks, where we improve subspace inference by identifying pitfalls and proposing a more natural prior that better guides the sampling procedure.

MCML Authors

Julius Kobialka

Statistics, Data Science and Machine Learning

Nicolai Palm

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Computational Statistics & Data Science

Emanuel Sommer

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[423]

H. Löwe, C. A. Scholbeck, C. Heumann, B. Bischl and G. Casalicchio.
fmeffects: An R Package for Forward Marginal Effects.
The R Journal 16.3 (May. 2025). DOI

Abstract

Forward marginal effects have recently been introduced as a versatile and effective model-agnostic interpretation method particularly suited for non-linear and non-parametric prediction models. They provide comprehensible model explanations of the form: if we change feature values by a pre-specified step size, what is the change in the predicted outcome? We present the R package fmeffects, the first software implementation of the theory surrounding forward marginal effects. The relevant theoretical background, package functionality and handling, as well as the software design and options for future extensions are discussed in this paper.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[422]

M. Arpogaus, T. Kneib, T. Nagler and D. Rügamer.
Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals.
Preprint (May. 2025). arXiv

Abstract

Density regression models allow a comprehensive understanding of data by modeling the complete conditional probability distribution. While flexible estimation approaches such as normalizing flows (NF) work particularly well in multiple dimensions, interpreting the input-output relationship of such models is often difficult, due to the black-box character of deep learning models. In contrast, existing statistical methods for multivariate outcomes such as multivariate conditional transformation models (MCTM) are restricted in flexibility and are often not expressive enough to represent complex multivariate probability distributions. In this paper, we combine MCTM with state-of-the-art and autoregressive NF to leverage the transparency of MCTM for modeling interpretable feature effects on the marginal distributions in the first step and the flexibility of neural-network-based NF techniques to account for complex and non-linear relationships in the joint data distribution. We demonstrate our method’s versatility in various numerical experiments and compare it with MCTM and other NF models on both simulated and real-world data.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Andreas Döpp

Statistics, Data Science and Machine Learning

[421]

J. Schroeder, S. Howard, C. Eberle, J. Esslinger, N. Leopold-Kerschbaumer, K. V. Kepesidis and A. Döpp.
Information-optimal measurement: From fixed sampling protocols to adaptive spectroscopy.
Preprint (May. 2025). arXiv

Abstract

All measurements of continuous signals rely on taking discrete snapshots, with the Nyquist-Shannon theorem dictating sampling paradigms. We present a broader framework of information-optimal measurement, showing that traditional sampling is optimal only when we are entirely ignorant about the system under investigation. This insight unlocks methods that efficiently leverage prior information to overcome long-held fundamental sampling limitations. We demonstrate this for optical spectroscopy - vital to research and medicine - and show how adaptively selected measurements yield higher information in medical blood analysis, optical metrology, and hyperspectral imaging. Through our rigorous statistical framework, performance never falls below conventional sampling while providing complete uncertainty quantification in real time. This establishes a new paradigm where measurement devices operate as information-optimal agents, fundamentally changing how scientific instruments collect and process data.

MCML Authors

Sunny Howard

Data-driven methods in Physics and Optics

Christoph Eberle

A1 | Statistical Foundations & Explainability
→ Group Andreas Döpp

Data-driven methods in Physics and Optics

Andreas Döpp

Dr. habil

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Data-driven methods in Physics and Optics

[420]

R. Sonabend, J. Zobolas, R. Bin, P. Kopper, L. Burk and A. Bender.
Examining marginal properness in the external validation of survival models with squared and logarithmic losses.
Preprint (May. 2025). arXiv

Abstract

Scoring rules promote rational and honest decision-making, which is important for model evaluation and becoming increasingly important for automated procedures such as ‘AutoML’. In this paper we survey common squared and logarithmic scoring rules for survival analysis, with a focus on their theoretical and empirical properness. We introduce a marginal definition of properness and show that both the Integrated Survival Brier Score (ISBS) and the Right-Censored Log-Likelihood (RCLL) are theoretically improper under this definition. We also investigate a new class of losses that may inform future survival scoring rules. Simulation experiments reveal that both the ISBS and RCLL behave as proper scoring rules in practice. The RCLL showed no violations across all settings, while ISBS exhibited only minor, negligible violations at extremely small sample sizes, suggesting one can trust results from historical experiments. As such we advocate for both the RCLL and ISBS in external validation of models, including in automated procedures. However, we note practical challenges in estimating these losses including estimation of censoring distributions and densities; as such further research is required to advance development of robust and honest evaluation in survival analysis.

MCML Authors

Lukas Burk

Statistical Learning and Data Science

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[419]

C. Zhang, S. Wu, Y. Chen, M. Aßenmacher, C. Heumann, Y. Men, G. Fan and J. Gama.
OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery.
Preprint (May. 2025). arXiv GitHub

Abstract

Oracle Bone Inscription (OBI) is the earliest systematic writing system in China, while the identification of Oracle Bone (OB) duplicates is a fundamental issue in OBI research. In this work, we design a progressive OB duplicate discovery framework that combines unsupervised low-level keypoints matching with high-level text-centric content-based matching to refine and rank the candidate OB duplicates with semantic awareness and interpretability. We compare our approach with state-of-the-art content-based image retrieval and image matching methods, showing that our approach yields comparable recall performance and the highest simplified mean reciprocal rank scores for both Top-5 and Top-15 retrieval results, and with significantly accelerated computation efficiency. We have discovered over 60 pairs of new OB duplicates in real-world deployment, which were missed by OBI researchers for decades.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[418]

J. Kobialka, E. Sommer, J. Kwon, D. Dold and D. Rügamer.
Approximate Posteriors in Neural Networks: A Sampling Perspective.
AABI 2025 - 7th Symposium on Advances in Approximate Bayesian Inference collocated with the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 29, 2025. To be published. Preprint available. URL

Abstract

The landscape of neural network loss functions is known to be highly complex, and the ability of gradient-based approaches to find well-generalizing solutions to such high-dimensional problems is often considered a miracle. Similarly, Bayesian neural networks (BNNs) inherit this complexity through the model’s likelihood. In applications where BNNs are used to account for weight uncertainty, recent advantages in sampling-based inference (SAI) have shown promising results outperforming other approximate Bayesian inference (ABI) methods. In this work, we analyze the approximate posterior implicitly defined by SAI and uncover key insights into its success. Among other things, we demonstrate how SAI handles symmetries differently than ABI, and examine the role of overparameterization. Further, we investigate the characteristics of approximate posteriors with sampling budgets scaled far beyond previously studied limits and explain why the localized behavior of samplers does not inherently constitute a disadvantage.

MCML Authors

Julius Kobialka

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

Emanuel Sommer

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[417]

T. Nagler and D. Rügamer.
Uncertainty Quantification for Prior-Fitted Networks using Martingale Posteriors.
AABI 2025 - 7th Symposium on Advances in Approximate Bayesian Inference collocated with the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 29, 2025. To be published. Preprint available. URL

Abstract

Prior-fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient method to construct Bayesian posteriors for such estimates based on Martingale Posteriors. Several simulated and real-world data examples are used to showcase the resulting uncertainty quantification of our method in inference applications.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[416]

T. Rochussen and V. Fortuin.
Sparse Gaussian Neural Processes.
AABI 2025 - 7th Symposium on Advances in Approximate Bayesian Inference collocated with the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 29, 2025. To be published. Preprint available. arXiv

Abstract

Despite significant recent advances in probabilistic meta-learning, it is common for practitioners to avoid using deep learning models due to a comparative lack of interpretability. Instead, many practitioners simply use non-meta-models such as Gaussian processes with interpretable priors, and conduct the tedious procedure of training their model from scratch for each task they encounter. While this is justifiable for tasks with a limited number of data points, the cubic computational cost of exact Gaussian process inference renders this prohibitive when each task has many observations. To remedy this, we introduce a family of models that meta-learn sparse Gaussian process inference. Not only does this enable rapid prediction on new tasks with sparse Gaussian processes, but since our models have clear interpretations as members of the neural process family, it also allows manual elicitation of priors in a neural process for the first time. In meta-learning regimes for which the number of observed tasks is small or for which expert domain knowledge is available, this offers a crucial advantage.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Bayesian Deep Learning

[415]

M. Schöffel, M. Wiedner, E. Garces Arias, P. Ruppert, C. Heumann and M. Aßenmacher.
Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. To be published. Preprint available. arXiv

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing, yet their effectiveness in handling historical languages remains largely unexplored. This study examines the performance of open-source LLMs in part-of-speech (POS) tagging for Old Occitan, a historical language characterized by non-standardized orthography and significant diachronic variation. Through comparative analysis of two distinct corpora-hagiographical and medical texts-we evaluate how current models handle the inherent challenges of processing a low-resource historical language. Our findings demonstrate critical limitations in LLM performance when confronted with extreme orthographic and syntactic variability. We provide detailed error analysis and specific recommendations for improving model performance in historical language processing. This research advances our understanding of LLM capabilities in challenging linguistic contexts while offering practical insights for both computational linguistics and historical language studies.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[414]

T. Nagler and D. Rügamer.
Uncertainty Quantification for Prior-Fitted Networks using Martingale Posteriors.
FPI @ICLR 2025 - Workshop on Frontiers in Probabilistic Inference: Learning meets Sampling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. arXiv URL

Abstract

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[413]

A. Reuter, T. G. J. Rudner, V. Fortuin and D. Rügamer.
Can Transformers Learn Full Bayesian Inference in Context?
FPI @ICLR 2025 - Workshop on Frontiers in Probabilistic Inference: Learning meets Sampling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. arXiv URL

Abstract

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Matthias Feurer

Statistics, Data Science and Machine Learning

[412]

D. Rundel, E. Sommer, B. Bischl, D. Rügamer and M. Feurer.
Efficiently Warmstarting MCMC for BNNS.
FPI @ICLR 2025 - Workshop on Frontiers in Probabilistic Inference: Learning meets Sampling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. URL

Abstract

Markov Chain Monte Carlo (MCMC) algorithms are widely regarded as the gold standard for approximate inference in Bayesian neural networks (BNNs). However, they remain computationally expensive and prone to inefficiencies, such
as dying samplers, frequently leading to substantial waste of computational resources. While prior work has presented warmstarting techniques as an effective method to mitigate these inefficiencies, we provide a more comprehensive empirical analysis of how initializations of samplers affect their behavior. Based on various experiments examining the dynamics of warmstarting MCMC, we propose novel warmstarting strategies that leverage performance predictors and adaptive termination criteria to achieve better-performing, yet more cost-efficient, models. In numerical experiments, we demonstrate that this approach provides a practical pathway to more resource-efficient approximate inference in BNNs.

MCML Authors

David Rundel

Statistical Learning and Data Science

Emanuel Sommer

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistical Learning and Data Science

[411]

J. Kobialka, E. Sommer, J. Kwon, D. Dold and D. Rügamer.
Approximate Posteriors in Neural Networks: A Sampling Perspective.
FPI @ICLR 2025 - Workshop on Frontiers in Probabilistic Inference: Learning meets Sampling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. Spotlight Presentation. URL

Abstract

MCML Authors

Julius Kobialka

Statistics, Data Science and Machine Learning

Emanuel Sommer

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[410]

C. Kolb, T. Weber, B. Bischl and D. Rügamer.
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the L1 norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of L1-penalized neural networks by adding differentiable L2 regularization to the factors. In this work, we introduce deep weight factorization, extending previous shallow approaches to more than two factors. We theoretically establish equivalence of our deep factorization with non-convex sparse regularization and analyze its impact on training dynamics and optimization. Due to the limitations posed by standard training practices, we propose a tailored initialization scheme and identify important learning rate requirements necessary for training factorized networks. We demonstrate the effectiveness of our deep weight factorization through experiments on various architectures and datasets, consistently outperforming its shallow counterpart and widely used pruning methods.

MCML Authors

Chris Kolb

Statistical Learning and Data Science

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

[409]

E. Sommer, J. Robnik, G. Nozadze, U. Seljak and D. Rügamer.
Microcanonical Langevin Ensembles: Advancing the Sampling of Bayesian Neural Networks.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Despite recent advances, sampling-based inference for Bayesian Neural Networks (BNNs) remains a significant challenge in probabilistic deep learning. While sampling-based approaches do not require a variational distribution assumption, current state-of-the-art samplers still struggle to navigate the complex and highly multimodal posteriors of BNNs. As a consequence, sampling still requires considerably longer inference times than non-Bayesian methods even for small neural networks, despite recent advances in making software implementations more efficient. Besides the difficulty of finding high-probability regions, the time until samplers provide sufficient exploration of these areas remains unpredictable. To tackle these challenges, we introduce an ensembling approach that leverages strategies from optimization and a recently proposed sampler called Microcanonical Langevin Monte Carlo (MCLMC) for efficient, robust and predictable sampling performance. Compared to approaches based on the state-of-the-art No-U-Turn Sampler, our approach delivers substantial speedups up to an order of magnitude, while maintaining or improving predictive performance and uncertainty quantification across diverse tasks and data modalities. The suggested Microcanonical Langevin Ensembles and modifications to MCLMC additionally enhance the method’s predictability in resource requirements, facilitating easier parallelization. All in all, the proposed method offers a promising direction for practical, scalable inference for BNNs.

MCML Authors

Emanuel Sommer

Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[408]

H. Baniecki, G. Casalicchio, B. Bischl and P. Biecek.
Efficient and Accurate Explanation Estimation with Distribution Compression.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. Spotlight Presentation. URL

Abstract

We discover a theoretical connection between explanation estimation and distribution compression that significantly improves the approximation of feature attributions, importance, and effects. While the exact computation of various machine learning explanations requires numerous model inferences and becomes impractical, the computational cost of approximation increases with an ever-increasing size of data and model parameters. We show that the standard i.i.d. sampling used in a broad spectrum of algorithms for post-hoc explanation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm of sample-efficient explainability. It relies on distribution compression through kernel thinning to obtain a data sample that best approximates its marginal distribution. CTE significantly improves the accuracy and stability of explanation estimation with negligible computational overhead. It often achieves an on-par explanation approximation error 2-3x faster by using fewer samples, i.e. requiring 2-3x fewer model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.

MCML Authors

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Statistical Learning and Data Science

[407]

M. Sabanayagam, L. Gosch, S. Günnemann and D. Ghoshdastidar.
Exact Certification of (Graph) Neural Networks Against Label Poisoning.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. Spotlight Presentation. URL GitHub

Abstract

Machine learning models are highly vulnerable to label flipping, i.e., the adversarial modification (poisoning) of training labels to compromise performance. Thus, deriving robustness certificates is important to guarantee that test predictions remain unaffected and to understand worst-case robustness behavior. However, for Graph Neural Networks (GNNs), the problem of certifying label flipping has so far been unsolved. We change this by introducing an exact certification method, deriving both sample-wise and collective certificates. Our method leverages the Neural Tangent Kernel (NTK) to capture the training dynamics of wide networks enabling us to reformulate the bilevel optimization problem representing label flipping into a Mixed-Integer Linear Program (MILP). We apply our method to certify a broad range of GNN architectures in node classification tasks. Thereby, concerning the worst-case robustness to label flipping: (i) we establish hierarchies of GNNs on different benchmark graphs; (ii) quantify the effect of architectural choices such as activations, depth and skip-connections; and surprisingly, (iii) uncover a novel phenomenon of the robustness plateauing for intermediate perturbation budgets across all investigated datasets and architectures. While we focus on GNNs, our certificates are applicable to sufficiently wide NNs in general through their NTK. Thus, our work presents the first exact certificate to a poisoning attack ever derived for neural networks, which could be of independent interest.

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Debarghya Ghoshdastidar

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Theoretical Foundations of Artificial Intelligence

[406]

Y. Li, D. Rügamer, B. Bischl and M. Rezaei.
Calibrating LLMs with Information-Theoretic Evidential Deep Learning.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Fine-tuned large language models (LLMs) often exhibit overconfidence, particularly when trained on small datasets, resulting in poor calibration and inaccurate uncertainty estimates. Evidential Deep Learning (EDL), an uncertainty-aware approach, enables uncertainty estimation in a single forward pass, making it a promising method for calibrating fine-tuned LLMs. However, despite its computational efficiency, EDL is prone to overfitting, as its training objective can result in overly concentrated probability distributions. To mitigate this, we propose regularizing EDL by incorporating an information bottleneck (IB). Our approach IB-EDL suppresses spurious information in the evidence generated by the model and encourages truly predictive information to influence both the predictions and uncertainty estimates. Extensive experiments across various fine-tuned LLMs and tasks demonstrate that IB-EDL outperforms both existing EDL and non-EDL approaches. By improving the trustworthiness of LLMs, IB-EDL facilitates their broader adoption in domains requiring high levels of confidence calibration.

MCML Authors

Yawei Li

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[405]

L. Wimmer, B. Bischl and L. Bothmann.
Trust Me, I Know the Way: Predictive Uncertainty in the Presence of Shortcut Learning.
SCSL @ICLR 2025 - Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. URL

Abstract

The correct way to quantify predictive uncertainty in neural networks remains a topic of active discussion. In particular, it is unclear whether the state-of-the art entropy decomposition leads to a meaningful representation of model, or epistemic, uncertainty (EU) in the light of a debate that pits ignorance against disagreement perspectives. We aim to reconcile the conflicting viewpoints by arguing that both are valid but arise from different learning situations. Notably, we show that the presence of shortcuts is decisive for EU manifesting as disagreement.

MCML Authors

Lisa Wimmer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[404]

C. Kolb, B. Bischl and D. Rügamer.
Differentiable Attention Sparsity via Structured D-Gating.
SLLM @ICLR 2025 - Workshop on Sparsity in LLMs at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. URL

Abstract

A core component of modern large language models is the attention mechanism, but its immense parameter count necessitates structured sparsity for resource-efficient optimization and inference. Traditional sparsity penalties, such as the group lasso, are non-smooth and thus incompatible with standard stochastic gradient descent methods. To address this, we propose a deep gating mechanism that reformulates the structured sparsity penalty into a fully differentiable optimization problem, allowing effective and principled norm-based group sparsification without requiring specialized non-smooth optimizers. Our theoretical analysis and empirical results demonstrate that this approach enables structured sparsity with simple stochastic gradient descent or variants while maintaining predictive performance.

MCML Authors

Chris Kolb

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[403]

A. Reuter, T. G. J. Rudner, V. Fortuin and D. Rügamer.
Can Transformers Learn Full Bayesian Inference in Context?
SynthData @ICLR 2025 - Workshop SynthData: Will Synthetic Data Finally Solve the Data Access Problem? at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. URL

Abstract

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[402]

L. Meynent, I. Melev, K. Schürholt, G. Kauermann and D. Borth.
Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction.
Weight Space Learning @ICLR 2025 - Workshop on Weight Space Learning at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. arXiv URL

Abstract

The weights of neural networks (NNs) have recently gained prominence as a new data modality in machine learning, with applications ranging from accuracy and hyperparameter prediction to representation learning or weight generation. One approach to leverage NN weights involves training autoencoders (AEs), using contrastive and reconstruction losses. This allows such models to be applied to a wide variety of downstream tasks, and they demonstrate strong predictive performance and low reconstruction error. However, despite the low reconstruction error, these AEs reconstruct NN models with deteriorated performance compared to the original ones, limiting their usability with regard to model weight generation. In this paper, we identify a limitation of weight-space AEs, specifically highlighting that a structural loss, that uses the Euclidean distance between original and reconstructed weights, fails to capture some features critical for reconstructing high-performing models. We analyze the addition of a behavioral loss for training AEs in weight space, where we compare the output of the reconstructed model with that of the original one, given some common input. We show a strong synergy between structural and behavioral signals, leading to increased performance in all downstream tasks evaluated, in particular NN weights reconstruction and generation.

MCML Authors

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[401]

H. A. Gündüz.
Designing and optimizing deep learning methods for genomic sequencing data.
Dissertation 2025. DOI

Abstract

This dissertation applies modern deep learning techniques to genomics, introducing new approaches for self-supervised learning, uncertainty quantification, and automated model design. A key focus is the effective use of unlabeled genomic data, highlighted by the development of Self-GenomeNet, a self-supervised method tailored to genomic sequences. The work also presents automated optimization strategies for model architectures and hyperparameters, achieving better results than expert-designed models. Finally, it contributes user-friendly software that supports various genomic data formats and integrates core methods developed in the thesis. (Shortened).

MCML Authors

Hüseyin Anil Gündüz

* Former Member

[400]

M. Mironov, A. Marquard, D. Racek, C. Heumann, P. W. Thurner and M. Aßenmacher.
A Geoparsing Pipeline for Multilingual Social Media Posts from Ukraine.
GeoExT @ECIR 2025 - 3rd International Workshop on Geographic Information Extraction from Texts at the 47th European Conference on Information Retrieval (ECIR 2025). Lucca, Italy, Apr 06-10, 2025. PDF

Abstract

The dynamics of contemporary social media communication, particularly on platforms like X (formerly Twitter), have significantly evolved, and this data is frequently used for scientific research. However, due to X’s API changes in 2019, a tweet’s precise geolocation is no longer present in the data, thus preventing a geographical assessment of tweets. This project aims to extract location mentions from tweets’ texts and to map them to Ukraine’s administrative regions. We have developed a specialized pipeline for geoparsing with specific prebuilt components for the Ukrainian, Russian, and English languages. The main advantage of our pipeline’s architecture is the interchangeability of all components, allowing for the integration of custom-developed solutions. Initial tests on our hand-labeled Ukrainian dataset show promising results in accurately identifying and mapping location mentions despite various challenges, such as declension and the presence of multiple languages in a single tweet. Additional experiments using publicly available benchmark data further indicate promising performance when transferring our pipeline to other geographical regions. Both our geoparsing pipeline and its online documentation have been made publicly available.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[399]

T. Weber.
Advancing Deep Learning in medical imaging through generative modeling and representation learning.
Dissertation 2025. DOI

Abstract

In recent years, deep learning (DL) has proven to be a disruptive enabler in many domains, including the realm of medical imaging. The application of neural networks and other learnable algorithms has substantially impacted the medical field, promising to improve diagnostic accuracy, enhance patient outcomes, and streamline clinical workflows. The advent of large-scale datasets and advancements in computational power have facilitated the development of sophisticated DL models capable of analyzing and interpreting complex medical images. The scope of this thesis concentrates on a subset of the full DL spectrum, specifically the uprising areas of generative modeling and representation learning, which are closely interleaved with each other. The proposed contributions aim to push the boundaries of established medical image DL methods, venturing into more experimental research areas. (Shortened)

MCML Authors

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[398]

N. Santhanam, H. E. Kim, D. Rügamer, A. Bender, S. Muthers, C. G. Cho, A. Alonso, K. Szabo, F.-S. Centner, H. Wenz, T. Ganslandt, M. Platten, C. Groden, M. Neumaier, F. Siegel and M. E. Maros.
Machine learning-based forecasting of daily acute ischemic stroke admissions using weather data.
npj Digital Medicine 8.225 (Apr. 2025). DOI

Abstract

Background: In the midst of the emerging climate crisis, healthcare providers lack locally validated, disease-specific surveillance models. Stroke, a significant contributor to the global disease burden, has been linked to climate change. Therefore, we developed and benchmarked machine learning (ML) models based on locoregional weather systems to forecast the number of daily acute ischemic stroke (AIS) admissions.
Methods: AIS patients diagnosed between 2015 and 2021 at the tertiary University Medical Center (UMC) Mannheim, Germany were extracted from the local data integration center and geospatially matched to weather data from the German Weather Service (DWD) based on the clinic’s, patients’ home and closest tower’s locations at the time of admission. Statistical-(Poisson), boosted generalized additive model (GAM), support vector machines (SVR), and tree-based models including random forest (RF) and extreme gradient boosting (XGB) were evaluated in regression settings within time-stratified nested cross-validation setup (training-validation: 2015-2020, test set: 2021) to predict the number of daily AIS admissions.
Findings: The cohort included 7,914 AIS patients (4,244 male, 53·6%). XGB showed the best test performance with lowest mean absolute error (MAE) of 1·21 cases/day. Maximum air pressure was identified as the top predictive variable. Shapley additive explanations analyses revealed that temperature extremes of extended cold- (lag-3 minimum temperature <-2 °C; minimum perceived temperature <-1·4 °C) and hot stressors (lag-7 minimum temperature >15 °C), as well as stormy conditions (lag-1 and lag-2 maximum wind gust >14 m/s and speed >10·4 m/s), increased stroke incidences substantially with distinct seasonal associations.
Interpretation: ML models can sufficiently forecast AIS admissions based on weather patterns allowing for improved resource allocation and preparedness.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[397]

C. Sauer, F. J. D. Lange, M. Thurow, I. Dormuth and A.-L. Boulesteix.
Statistical parametric simulation studies based on real data.
Preprint (Apr. 2025). arXiv

Abstract

Simulation studies are indispensable for evaluating and comparing statistical methods. The most common simulation approach is parametric simulation, where the data-generating mechanism (DGM) corresponds to a predefined parametric model from which observations are drawn. Many statistical simulation studies aim to provide practical recommendations on a method’s suitability for a given application; however, parametric simulations in particular are frequently criticized for being too simplistic and not reflecting reality. To overcome this drawback, it is generally considered a sensible approach to employ real data for constructing the parametric DGMs. However, while the concept of real-data-based parametric DGMs is widely recognized, the specific ways in which DGM components are inferred from real data vary, and their implications may not always be well understood. Additionally, researchers often rely on a limited selection of real datasets, with the rationale for their selection often unclear. This paper addresses these issues by formally discussing how components of parametric DGMs can be inferred from real data and how dataset selection can be performed more systematically. By doing so, we aim to support researchers in conducting simulation studies with a lower risk of overgeneralization and misinterpretation. We illustrate the construction of parametric DGMs based on a systematically selected set of real datasets using two examples: one on ordinal outcomes in randomized controlled trials and one on differential gene expression analysis.

MCML Authors

Christina Sauer (née Nießl)

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[396]

L. Bothmann, S. Dandl, J. M. A. Jose M. Alvarez, P. A. Boustani and B. Bischl.
Privilege Scores for Fairness-Aware ML.
DAGStat 2025 - 7th Joint Statistical Meeting of the Deutsche Arbeitsgemeinschaft Statistik. Berlin, Germany, Mar 24-28, 2025. Poster presentation. Full paper available. arXiv

Abstract

Bias-preserving methods in fairness-aware machine learning (fairML) focus on metrics that prioritize formal equality by balancing error rates across subgroups. These methods can perpetuate historical discrimination embedded in real-world data. In contrast, bias-transforming methods aim for substantive equality by actively addressing historical inequalities. As a contribution to bias-transforming methods, we introduce the concept of privilege scores, a novel approach to identifying and quantifying individual privilege in machine learning tasks. Privilege scores use causal inference techniques to compare real-world outcomes to those in a ‘fair’ world in which the protected attributes do not influence the target variable. This individual-level perspective provides actionable insights for applications such as affirmative action and beyond. Key contributions include (1) the formalization of privilege scores, (2) a methodological framework for estimation with uncertainty quantification via confidence intervals, (3) an interpretable machine learning approach for understanding privilege score contributions, and (4) a novel in-processing method, Multi-PrivScore, to mitigate model-level discrimination during model training. Experiments on simulated and real-world data demonstrate the usefulness of privilege scores. Overall, our work highlights privilege scores as a versatile tool for assessing and mitigating historical discrimination in various machine learning applications.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Philip Amir Boustani

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[395]

L. Zumeta-Olaskoaga, A. Bender and D.-J. Lee.
Flexible modelling of time-varying exposures in event history analysis.
DAGStat 2025 - 7th Joint Statistical Meeting of the Deutsche Arbeitsgemeinschaft Statistik. Berlin, Germany, Mar 24-28, 2025. Poster presentation. Full paper available. DOI

Abstract

We present a flexible modelling approach to analyse time-varying exposures and recurrent events in team sports injuries. The approach is based on the piece-wise exponential additive mixed model where the effects of past exposures (i.e. high-intensity training loads) may accumulate over time and present complex forms of association. In order to identify a relevant time window at which past exposures have an impact on the current risk, we propose a penalty approach. We conduct a simulation study to evaluate the performance of the proposed model, under different true weight functions and different levels of heterogeneity between recurrent events. Finally, we illustrate the approach with a case study application involving an elite male football team participating in the Spanish LaLiga competition. The cohort includes time-loss injuries and external training load variables tracked by Global Positioning System devices, during the seasons 2017–2018 and 2018–2019.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[394]

L. Zumeta-Olaskoaga, A. Bender and D.-J. Lee.
Flexible modelling of time-varying exposures and recurrent events to analyse training load effects in team sports injuries.
Journal of the Royal Statistical Society. Series C (Applied Statistics) 74.2 (Mar. 2025). DOI

Abstract

MCML Authors

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[393]

M. Schneble and G. Kauermann.
Statistical modelling of on-street parking spot occupancy in smart cities.
Journal of the Royal Statistical Society. Series C (Applied Statistics).qlaf017 (Mar. 2025). DOI

Abstract

Many studies suggest that searching for parking is associated with significant direct and indirect costs. Therefore, it is appealing to reduce the time that car drivers spend on finding an available parking spot, especially in urban areas where the space for all road users is limited. The prediction of on-street parking spot occupancy can provide drivers with guidance on where clear parking spaces are likely to be found. This field of research has gained more and more attention in the last decade through the increasing availability of real-time parking spot occupancy data. In this paper, we pursue a statistical approach for the prediction of parking spot occupancy, where we make use of time-to-event models and semi-Markov process theory. The latter involves the employment of Laplace transformations as well as their inversion, which is an ambitious numerical task. We apply our methodology to data from the City of Melbourne in Australia. Our main result is that the semi-Markov model outperforms a Markov model in terms of both true negative rate and true positive rate while this is essentially achieved by respecting the current duration that a parking space already spends in its initial state.

MCML Authors

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[392]

A. Tejada-Lapuerta, P. Bertin, S. Bauer, H. Aliee, Y. Bengio and F. J. Theis.
Causal machine learning for single-cell genomics.
Nature Genetics (Mar. 2025). DOI

Abstract

Advances in single-cell ‘-omics’ allow unprecedented insights into the transcriptional profiles of individual cells and, when combined with large-scale perturbation screens, enable measuring of the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes. In this Perspective, we delineate the application of causal machine learning to single-cell genomics and its associated challenges. We first present the causal model that is most commonly applied to single-cell biology and then identify and discuss potential approaches to three open problems: the lack of generalization of models to novel experimental conditions, the complexity of interpreting learned models, and the difficulty of learning cell dynamics.

MCML Authors

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Mathematical Modelling of Biological Systems

[391]

R. Hornung, M. Nalenz, L. Schneider, A. Bender, L. Bothmann, F. Dumpert, B. Bischl, T. Augustin and A.-L. Boulesteix.
Evaluating Machine Learning Models in Non-Standard Settings: An Overview and New Findings.
Statistical Science (Mar. 2025). To be published. Preprint available. arXiv

Abstract

Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to which the model will be applied, while the training data should be representative of the entire data set used to obtain the final model. Beyond providing an overview, we address literature gaps by conducting simulation studies. These studies assess the necessity of using GE-estimation methods tailored to the respective setting. Our findings corroborate the concern that standard resampling methods often yield biased GE estimates in non-standard settings, underscoring the importance of tailored GE estimation.

MCML Authors

Roman Hornung

Dr.

Biometry in Molecular Medicine

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[390]

P. Bertin, J. D. Viviano, A. Tejada-Lapuerta, W. Wang, S. Bauer, F. J. Theis and Y. Bengio.
A scalable gene network model of regulatory dynamics in single cells.
Preprint (Mar. 2025). arXiv

Abstract

Single-cell data provide high-dimensional measurements of the transcriptional states of cells, but extracting insights into the regulatory functions of genes, particularly identifying transcriptional mechanisms affected by biological perturbations, remains a challenge. Many perturbations induce compensatory cellular responses, making it difficult to distinguish direct from indirect effects on gene regulation. Modeling how gene regulatory functions shape the temporal dynamics of these responses is key to improving our understanding of biological perturbations. Dynamical models based on differential equations offer a principled way to capture transcriptional dynamics, but their application to single-cell data has been hindered by computational constraints, stochasticity, sparsity, and noise. Existing methods either rely on low-dimensional representations or make strong simplifying assumptions, limiting their ability to model transcriptional dynamics at scale. We introduce a Functional and Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions. Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale, provides improved functional insights into transcriptional mechanisms perturbed by gene knockouts, both in myeloid differentiation and K562 Perturb-seq experiments, and simulates single-cell trajectories of A549 cells following small-molecule perturbations.

MCML Authors

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Mathematical Modelling of Biological Systems

[389]

F. J. D. Lange, J. C. Wilcke, S. Hoffmann, M. Herrmann and A.-L. Boulesteix.
On 'confirmatory' methodological research in statistics and related fields.
Preprint (Mar. 2025). arXiv

Abstract

Empirical substantive research, such as in the life or social sciences, is commonly categorized into the two modes exploratory and confirmatory, both of which are essential to scientific progress. The former is also referred to as hypothesis-generating or data-contingent research, the latter is also called hypothesis-testing research. In the context of empirical methodological research in statistics, however, the exploratory-confirmatory distinction has received very little attention so far. Our paper aims to fill this gap. First, we revisit the concept of empirical methodological research through the lens of the exploratory-confirmatory distinction. Secondly, we examine current practice with respect to this distinction through a literature survey including 115 articles from the field of biostatistics. Thirdly, we provide practical recommendations towards more appropriate design, interpretation, and reporting of empirical methodological research in light of this distinction. In particular, we argue that both modes of research are crucial to methodological progress, but that most published studies – even if sometimes disguised as confirmatory – are essentially of exploratory nature. We emphasize that it may be adequate to consider empirical methodological research as a continuum between ‘pure’ exploration and ‘strict’ confirmation, recommend transparently reporting the mode of conducted research within the spectrum between exploratory and confirmatory, and stress the importance of study protocols written before conducting the study, especially in confirmatory methodological research.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[388]

M. M. Mandl, F. Weber, T. Wöhrle and A.-L. Boulesteix.
The impact of the storytelling fallacy on real data examples in methodological research.
Preprint (Mar. 2025). arXiv

Abstract

The term ‘researcher degrees of freedom’ (RDF), which was introduced in metascientific literature in the context of the replication crisis in science, refers to the extent of flexibility a scientist has in making decisions related to data analysis. These choices occur at all stages of the data analysis process. In combination with selective reporting, RDF may lead to over-optimistic statements and an increased rate of false positive findings. Even though the concept has been mainly discussed in fields such as epidemiology or psychology, similar problems affect methodological statistical research. Researchers who develop and evaluate statistical methods are left with a multitude of decisions when designing their comparison studies. This leaves room for an over-optimistic representation of the performance of their preferred method(s). The present paper defines and explores a particular RDF that has not been previously identified and discussed. When interpreting the results of real data examples that are most often part of methodological evaluations, authors typically tell a domain-specific ‘story’ that best supports their argumentation in favor of their preferred method. However, there are often plenty of other plausible stories that would support different conclusions. We define the ‘storytelling fallacy’ as the selective use of anecdotal domain-specific knowledge to support the superiority of specific methods in real data examples. While such examples fed by domain knowledge play a vital role in methodological research, if deployed inappropriately they can also harm the validity of conclusions on the investigated methods. The goal of our work is to create awareness for this issue, fuel discussions on the role of real data in generating evidence in methodological research and warn readers of methodological literature against naive interpretations of real data examples.

MCML Authors

Maximilian Mandl

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[387]

R. D. Paul, J. Seiffarth, D. Rügamer, H. Scharr and K. Nöh.
How To Make Your Cell Tracker Say 'I dunno!'.
Preprint (Mar. 2025). arXiv

Abstract

Cell tracking is a key computational task in live-cell microscopy, but fully automated analysis of high-throughput imaging requires reliable and, thus, uncertainty-aware data analysis tools, as the amount of data recorded within a single experiment exceeds what humans are able to overlook. We here propose and benchmark various methods to reason about and quantify uncertainty in linear assignment-based cell tracking algorithms. Our methods take inspiration from statistics and machine learning, leveraging two perspectives on the cell tracking problem explored throughout this work: Considering it as a Bayesian inference problem and as a classification problem. Our methods admit a framework-like character in that they equip any frame-to-frame tracking method with uncertainty quantification. We demonstrate this by applying it to various existing tracking algorithms including the recently presented Transformer-based trackers. We demonstrate empirically that our methods yield useful and well-calibrated tracking uncertainties.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistics, Data Science and Machine Learning

[386]

R. Rehms, N. Ellenbach, V. Deffner and S. Hoffmann.
Addressing complex structures of measurement error arising in the exposure assessment in occupational epidemiology using a Bayesian hierarchical approach.
Preprint (Mar. 2025). arXiv

Abstract

Exposure assessment in occupational epidemiology may involve multiple unknown quantities that are measured or reconstructed simultaneously for groups of workers and over several years. Additionally, exposures may be collected using different assessment strategies, depending on the period of exposure. As a consequence, researchers who are analyzing occupational cohort studies are commonly faced with challenging structures of exposure measurement error, involving complex dependence structures and multiple measurement error models, depending on the period of exposure. However, previous work has often made many simplifying assumptions concerning these errors. In this work, we propose a Bayesian hierarchical approach to account for a broad range of error structures arising in occupational epidemiology. The considered error structures may involve several unknown quantities that can be subject to mixtures of Berkson and classical measurement error. It is possible to account for different error structures, depending on the exposure period and the location of a worker. Moreover, errors can present complex dependence structures over time and between workers. We illustrate the proposed hierarchical approach on a subgroup of the German cohort of uranium miners to account for potential exposure uncertainties in the association between radon exposure and lung cancer mortality. The performance of the proposed approach and its sensitivity to model misspecification are evaluated in a simulation study. The results show that biases in estimates arising from very complex measurement errors can be corrected through the proposed Bayesian hierarchical approach.

MCML Authors

Nicole Ellenbach

Biometry in Molecular Medicine

[385]

A. Wuttke, M. Aßenmacher, C. Klamm, M. Lang, Q. Würschinger and F. Kreuter.
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers.
Preprint (Mar. 2025). arXiv

Abstract

Traditional methods for eliciting people’s opinions face a trade-off between depth and scale: structured surveys enable large-scale data collection but limit respondents’ ability to voice their opinions in their own words, while conversational interviews provide deeper insights but are resource-intensive. This study explores the potential of replacing human interviewers with large language models (LLMs) to conduct scalable conversational interviews. Our goal is to assess the performance of AI Conversational Interviewing and to identify opportunities for improvement in a controlled environment. We conducted a small-scale, in-depth study with university students who were randomly assigned to a conversational interview by either AI or human interviewers, both employing identical questionnaires on political topics. Various quantitative and qualitative measures assessed interviewer adherence to guidelines, response quality, participant engagement, and overall interview efficacy. The findings indicate the viability of AI Conversational Interviewing in producing quality data comparable to traditional methods, with the added benefit of scalability. We publish our data and materials for re-use and present specific recommendations for effective implementation.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Frauke Kreuter

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Social Data Science and AI

[384]

L. Burk, A. Bender and M. N. Wright.
High-Dimensional Variable Selection With Competing Events Using Cooperative Penalized Regression.
Biometrical Journal 67.1 (Feb. 2025). DOI

Abstract

Variable selection is an important step in the analysis of high-dimensional data, yet there are limited options for survival outcomes in the presence of competing risks. Commonly employed penalized Cox regression considers each event type separately through cause-specific models, neglecting possibly shared information between them. We adapt the feature-weighted elastic net (fwelnet), an elastic net generalization, to survival outcomes and competing risks. For two causes, our proposed algorithm fits two alternating cause-specific models, where each model receives the coefficient vector of the complementary model as prior information. We dub this ‘‘cooperative penalized regression’’, as it enables the modeling of competing risk data with cause-specific models while accounting for shared effects between causes. Coefficients that are shrunken toward zero in the model for the first cause will receive larger penalization weights in the model for the second cause and vice versa. Through multiple iterations, this process ensures stronger penalization of uninformative predictors in both models. We demonstrate our method’s variable selection capabilities on simulated genomics data and apply it to bladder cancer microarray data. We evaluate selection performance using the positive predictive value for the correct selection of informative features and the false positive rate for the selection of uninformative variables. The benchmark compares results with cause-specific penalized Cox regression, random survival forests, and likelihood-boosted Cox regression. Results indicate that our approach is more effective at selecting informative features and removing uninformative features. In settings without shared effects, variable selection performance is similar to cause-specific penalized Cox regression.

MCML Authors

Lukas Burk

Statistical Learning and Data Science

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[383]

M. Wünsch, C. Sauer, M. Herrmann, L. C. Hinske and A.-L. Boulesteix.
To tweak or not to tweak. How exploiting flexibilities in gene set analysis leads to over-optimism.
Biometrical Journal 67.1 (Feb. 2025). DOI

Abstract

Gene set analysis, a popular approach for analyzing high-throughput gene expression data, aims to identify sets of genes that show enriched expression patterns between two conditions. In addition to the multitude of methods available for this task, users are typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibility can lead to uncertainty about the “right” choice, further reinforced by a lack of evidence-based guidance. Especially when their statistical experience is scarce, this uncertainty might entice users to produce preferable results using a ’trial-and-error’ approach. While it may seem unproblematic at first glance, this practice can be viewed as a form of ‘cherry-picking’ and cause an optimistic bias, rendering the results nonreplicable on independent data. After this problem has attracted a lot of attention in the context of classical hypothesis testing, we now aim to raise awareness of such overoptimism in the different and more complex context of gene set analyses. We mimic a hypothetical researcher who systematically selects the analysis variants yielding their preferred results, thereby considering three distinct goals they might pursue. Using a selection of popular gene set analysis methods, we tweak the results in this way for two frequently used benchmark gene expression data sets. Our study indicates that the potential for overoptimism is particularly high for a group of methods frequently used despite being commonly criticized. We conclude by providing practical recommendations to counter overoptimism in research findings in gene set analysis and beyond.

MCML Authors

Milena Wünsch

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Christina Sauer (née Nießl)

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[382]

T. Willem, V. A. Shitov, M. D. Luecken, N. Kilbertus, S. Bauer, M. Piraud, A. Buyx and F. J. Theis.
Biases in machine-learning models of human single-cell data.
Nature Cell Biology (Feb. 2025). DOI

Abstract

Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases.

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

[381]

M. Drton, A. Grosdos, I. Portakal and N. Sturma.
Algebraic Sparse Factor Analysis.
SIAM Journal on Applied Algebra and Geometry 9 (Feb. 2025). DOI

Abstract

Factor analysis is a statistical technique that explains correlations among observed random variables with the help of a smaller number of unobserved factors. In traditional full factor analysis, each observed variable is influenced by every factor. However, many applications exhibit interesting sparsity patterns; that is, each observed variable only depends on a subset of the factors. In this paper, we study such sparse factor analysis models from an algebro-geometric perspective. Under mild conditions on the sparsity pattern, we examine the dimension of the set of covariance matrices that corresponds to a given model. Moreover, we study algebraic relations among the covariances in sparse two-factor models. In particular, we identify cases in which a Gröbner basis for these relations can be derived via a 2-delightful term order and join of toric ideals of graphs.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

Nils Sturma

Mathematical Statistics

[380]

E. Banzato, M. Drton, K. Saraf-Poor and H. Shi.
Existence of Direct Density Ratio Estimators.
Preprint (Feb. 2025). arXiv

Abstract

Many two-sample problems call for a comparison of two distributions from an exponential family. Density ratio estimation methods provide ways to solve such problems through direct estimation of the differences in natural parameters. The term direct indicates that one avoids estimating both marginal distributions. In this context, we consider the Kullback–Leibler Importance Estimation Procedure (KLIEP), which has been the subject of recent work on differential networks. Our main result shows that the existence of the KLIEP estimator is characterized by whether the average sufficient statistic for one sample belongs to the convex hull of the set of all sufficient statistics for data points in the second sample. For high-dimensional problems it is customary to regularize the KLIEP loss by adding the product of a tuning parameter and a norm of the vector of parameter differences. We show that the existence of the regularized KLIEP estimator requires the tuning parameter to be no less than the dual norm-based distance between the average sufficient statistic and the convex hull. The implications of these existence issues are explored in applications to differential network analysis.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[379]

H. Bao and M. Schomaker.
Addressing Positivity Violations in Continuous Interventions through Data-Adaptive Strategies.
Preprint (Feb. 2025). arXiv

Abstract

Positivity violations pose a key challenge in the estimation of causal effects, particularly for continuous interventions. Current approaches for addressing this issue include the use of projection functions or modified treatment policies. While effective in many contexts, these methods can result in estimands that potentially do not align well with the original research question, thereby leading to compromises in interpretability. In this paper, we introduce a novel diagnostic tool, the non-overlap ratio, to detect positivity violations. To address these violations while maintaining interpretability, we propose a data-adaptive solution, specially a ‘most feasible’ intervention strategy. Our strategy operates on a unit-specific basis. For a given intervention of interest, we first assess whether the intervention value is feasible for each unit. For units with sufficient support, conditional on confounders, we adhere to the intervention of interest. However, for units lacking sufficient support, as identified through the assessment of the non-overlap ratio, we do not assign the actual intervention value of interest. Instead, we assign the closest feasible value within the support region. We propose an estimator using g-computation coupled with flexible conditional density estimation to estimate high- and low support regions to estimate this new estimand. Through simulations, we demonstrate that our method effectively reduces bias across various scenarios by addressing positivity violations. Moreover, when positivity violations are absent, the method successfully recovers the standard estimand. We further validate its practical utility using real-world data from the CHAPAS-3 trial, which enrolled HIV-positive children in Zambia and Uganda.

MCML Authors

Michael Schomaker

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Biostatistics

[378]

H. Funk, R. Ludwig, H. Küchenhoff and T. Nagler.
Modelling Climate Variables at High Temporal Resolution.
Preprint (Feb. 2025). DOI

Abstract

Large ensembles of climate models are indispensable for analyzing natural climate variability and estimating the occurrence of rare extreme events. Many hydrometeorological applications—such as compound event analysis, return period estimation, weather forecasting, downscaling, and bias correction—rely on an accurate representation of the multivariate distribution of climate variables. However, at high temporal resolutions, variables like precipitation often exhibit significant zero-inflation and heavy-tailed distributions. This inflation propagates through the entire multivariate dependence structure, complicating the relationships between zero-inflated and non-inflated variables. Inadequate modeling and correction of these dependencies can substantially degrade the reliability of hydrometeorological methodologes.
In an earlier work, we developed a novel multivariate density decomposition for zero inflated variables based on vine copulas. This method has been integrated into multivariate Vine Copula Bias Correction for partially zero-inflated margins (VBC), with potential applications in other fields facing high-resolution climate data challenges. We resume the idea behind VBC and illustrate it’s advantages to other bias correction methods. This highlights the interpretability and the advantages of control and assessment of the results generated by VBC.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

[377]

J. Rodemann, E. Garces Arias, C. Luther, C. Jansen and T. Augustin.
A Statistical Case Against Empirical Human-AI Alignment.
Preprint (Feb. 2025). arXiv

Abstract

Empirical human-AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This position paper thus advocates against naive empirical alignment, offering prescriptive alignment and a posteriori empirical alignment as alternatives. We substantiate our principled argument by tangible examples like human-centric decoding of language models.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

[376]

N. Sturma, M. Kranzlmueller, I. Portakal and M. Drton.
Matching Criterion for Identifiability in Sparse Factor Analysis.
Preprint (Feb. 2025). arXiv

Abstract

Factor analysis models explain dependence among observed variables by a smaller number of unobserved factors. A main challenge in confirmatory factor analysis is determining whether the factor loading matrix is identifiable from the observed covariance matrix. The factor loading matrix captures the linear effects of the factors and, if unrestricted, can only be identified up to an orthogonal transformation of the factors. However, in many applications the factor loadings exhibit an interesting sparsity pattern that may lead to identifiability up to column signs. We study this phenomenon by connecting sparse factor models to bipartite graphs and providing sufficient graphical conditions for identifiability of the factor loading matrix up to column signs. In contrast to previous work, our main contribution, the matching criterion, exploits sparsity by operating locally on the graph structure, thereby improving existing conditions. Our criterion is efficiently decidable in time that is polynomial in the size of the graph, when restricting the search steps to sets of bounded size.

MCML Authors

Nils Sturma

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[375]

M. Surner, A. Khelil and L. Bothmann.
Invariance Pair-Guided Learning: Enhancing Robustness in Neural Networks.
Preprint (Feb. 2025). arXiv

Abstract

Out-of-distribution generalization of machine learning models remains challenging since the models are inherently bound to the training data distribution. This especially manifests, when the learned models rely on spurious correlations. Most of the existing approaches apply data manipulation, representation learning, or learning strategies to achieve generalizable models. Unfortunately, these approaches usually require multiple training domains, group labels, specialized augmentation, or pre-processing to reach generalizable models. We propose a novel approach that addresses these limitations by providing a technique to guide the neural network through the training phase. We first establish input pairs, representing the spurious attribute and describing the invariance, a characteristic that should not affect the outcome of the model. Based on these pairs, we form a corrective gradient complementing the traditional gradient descent approach. We further make this correction mechanism adaptive based on a predefined invariance condition. Experiments on ColoredMNIST, Waterbird-100, and CelebA datasets demonstrate the effectiveness of our approach and the robustness to group shifts.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[374]

E. Garces Arias, M. Li, C. Heumann and M. Aßenmacher.
Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation.
COLING 2025 - The 31st International Conference on Computational Linguistics. Abu Dhabi, United Arab Emirates, Jan 19-24, 2025. URL

Abstract

Decoding strategies for large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Since LLMs produce probability distributions over the entire vocabulary, various decoding methods have been developed to transform these probabilities into coherent and fluent text, each with its own set of hyperparameters. In this study, we present a large-scale, comprehensive analysis of how hyperparameter selection affects text quality in open-ended text generation across multiple LLMs, datasets, and evaluation metrics. Through an extensive sensitivity analysis, we provide practical guidelines for hyperparameter tuning and demonstrate the substantial influence of these choices on text quality. Using three established datasets, spanning factual domains (e.g., news) and creative domains (e.g., fiction), we show that hyperparameter tuning significantly impacts generation quality, though its effects vary across models and tasks. We offer in-depth insights into these effects, supported by both human evaluations and a synthesis of widely-used automatic evaluation metrics.

MCML Authors

Esteban Garces Arias

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[373]

M. Abrahamowicz, M.-E. Beauchamp, A.-L. Boulesteix, T. P. Morris, W. Sauerbrei, J. S. Kaufman and o. b. o. t. STRATOS Simulation Panel.
Data-driven simulations to assess the impact of study imperfections in time-to-event analyses.
American Journal of Epidemiology 194.1 (Jan. 2025). DOI

Abstract

Quantitative bias analysis (QBA) permits assessment of the expected impact of various imperfections of the available data on the results and conclusions of a particular real-world study. This article extends QBA methodology to multivariable time-to-event analyses with right-censored endpoints, possibly including time-varying exposures or covariates. The proposed approach employs data-driven simulations, which preserve important features of the data at hand while offering flexibility in controlling the parameters and assumptions that may affect the results. First, the steps required to perform data-driven simulations are described, and then two examples of real-world time-to-event analyses illustrate their implementation and the insights they may offer. The first example focuses on the omission of an important time-invariant predictor of the outcome in a prognostic study of cancer mortality, and permits separating the expected impact of confounding bias from noncollapsibility. The second example assesses how imprecise timing of an interval-censored event—ascertained only at sparse times of clinic visits—affects its estimated association with a time-varying drug exposure. The simulation results also provide a basis for comparing the performance of two alternative strategies for imputing the unknown event times in this setting. The R scripts that permit the reproduction of our examples are provided.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[372]

L. Schneider.
Advancing hyperparameter optimization: foundations, multiple objectives and algorithmic innovations informed through benchmarking.
Dissertation 2025. DOI

Abstract

Hyperparameter optimization (HPO) is a fundamental aspect of machine learning (ML), directly influencing model performance and adaptability. As a computationally expensive black-box optimization problem, HPO requires efficient algorithms to identify optimal hyperparameter configurations. This thesis advances the field of HPO along three key dimensions: foundational insights, HPO in the presence of more than one objective, and algorithmic innovations through benchmarking. (Shortened.)

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

[371]

L. Bothmann and K. Peters.
Fairness von KI – ein Brückenschlag zwischen Philosophie und Maschinellem Lernen.
Grenzen Künstlicher Intelligenz (Jan. 2025). DOI

MCML Authors

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[370]

T. Weber, J. Dexl, D. Rügamer and M. Ingrisch.
Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition.
Radiology: Artificial Intelligence 7.2 (Jan. 2025). DOI

Abstract

We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model’s parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.

MCML Authors

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Jakob Dexl

Clinical Data Science in Radiology

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[369]

R. Schwank, A. McCormack and M. Drton.
Robust Score Matching.
Preprint (Jan. 2025). arXiv

Abstract

Proposed in Hyvärinen (2005), score matching is a parameter estimation procedure that does not require computation of distributional normalizing constants. In this work we utilize the geometric median of means to develop a robust score matching procedure that yields consistent parameter estimates in settings where the observed data has been contaminated. A special appeal of the proposed method is that it retains convexity in exponential family models. The new method is therefore particularly attractive for non-Gaussian, exponential family graphical models where evaluation of normalizing constants is intractable. Support recovery guarantees for such models when contamination is present are provided. Additionally, support recovery is studied in numerical experiments and on a precipitation dataset. We demonstrate that the proposed robust score matching estimator performs comparably to the standard score matching estimator when no contamination is present but greatly outperforms this estimator in a setting with contamination.

MCML Authors

Mathias Drton

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Mathematical Statistics

[368]

L. Gosch, M. Sabanayagam, D. Ghoshdastidar and S. Günnemann.
Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks.
AdvML-Frontiers @NeurIPS 2024 - 3rd Workshop on New Frontiers in Adversarial Machine Learning at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Generalization of machine learning models can be severely compromised by data poisoning, where adversarial changes are applied to the training data. This vulnerability has led to interest in certifying (i.e., proving) that such changes up to a certain magnitude do not affect test predictions. We, for the first time, certify Graph Neural Networks (GNNs) against poisoning attacks, including backdoors, targeting the node features of a given graph. Our certificates are white-box and based upon (i) the neural tangent kernel, which characterizes the training dynamics of sufficiently wide networks; and (ii) a novel reformulation of the bilevel optimization describing poisoning as a mixed-integer linear program. We note that our framework is more general and constitutes the first approach to derive white-box poisoning certificates for NNs, which can be of independent interest beyond graph-related tasks.

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Debarghya Ghoshdastidar

Prof. Dr.

Theoretical Foundations of Artificial Intelligence

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[367]

R. Dhahri, A. Immer, B. Charpentier, S. Günnemann and V. Fortuin.
Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to naïvely deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlooked. We present Sparsifiability via the Marginal likelihood (SpaM), a pruning framework that highlights the effectiveness of using the Bayesian marginal likelihood in conjunction with sparsity-inducing priors for making neural networks more sparsifiable. Our approach implements an automatic Occam’s razor that selects the most sparsifiable model that still explains the data well, both for structured and unstructured sparsification. In addition, we demonstrate that the pre-computed posterior Hessian approximation used in the Laplace approximation can be re-used to define a cheap pruning criterion, which outperforms many existing (more expensive) approaches. We demonstrate the effectiveness of our framework, especially at high sparsity levels, across a range of different neural network architectures and datasets.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Vincent Fortuin

Dr.

Bayesian Deep Learning

[366]

T. Nagler, L. Schneider, B. Bischl and M. Feurer.
Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model’s generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment. While reshuffling leads to test performances that are competitive with using fixed splits, it drastically improves results for a single train-validation holdout protocol and can often make holdout become competitive with standard CV while being computationally cheaper.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[365]

D. Rügamer, B. X. W. Liew, Z. Altai and A. Stöcker.
A Functional Extension of Semi-Structured Networks.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Semi-structured networks (SSNs) merge the structures familiar from additive models with deep neural networks, allowing the modeling of interpretable partial feature effects while capturing higher-order non-linearities at the same time. A significant challenge in this integration is maintaining the interpretability of the additive model component. Inspired by large-scale biomechanics datasets, this paper explores extending SSNs to functional data. Existing methods in functional data analysis are promising but often not expressive enough to account for all interactions and non-linearities and do not scale well to large datasets. Although the SSN approach presents a compelling potential solution, its adaptation to functional data remains complex. In this work, we propose a functional SSN method that retains the advantageous properties of classical functional regression approaches while also improving scalability. Our numerical experiments demonstrate that this approach accurately recovers underlying signals, enhances predictive performance, and performs favorably compared to competing methods.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[364]

Y. Zhang, Y. Li, X. Wang, Q. Shen, B. Plank, B. Bischl, M. Rezaei and K. Kawaguchi.
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models.
NeurIPS 2024 - Workshop on Machine Learning and Compression at the 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all self-attention and feed-forward network (FFN) layers within blocks as individual pruning candidates. FinerCut prunes layers whose removal causes minimal alternation to the model’s output – contributing to a new, lean, interpretable, and task-agnostic pruning method. Tested across 9 benchmarks, our approach retains 90% performance of Llama3-8B with 25% layers removed, and 95% performance of Llama3-70B with 30% layers removed, all without fine-tuning or post-pruning reconstruction. Strikingly, we observe intriguing results with FinerCut: 42% (34 out of 80) of the self-attention layers in Llama3-70B can be removed while preserving 99% of its performance – without additional fine-tuning after removal. Moreover, FinerCut provides a tool to inspect the types and locations of pruned layers, allowing to observe interesting pruning behaviors. For instance, we observe a preference for pruning self-attention layers, often at deeper consecutive decoder layers. We hope our insights inspire future efficient LLM architecture designs.

MCML Authors

Yawei Li

Statistical Learning and Data Science

Xinpeng Wang

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[363]

M. Koshil, T. Nagler, M. Feurer and K. Eggensperger.
Towards Localization via Data Embedding for TabPFN.
TLR @NeurIPS 2024 - 3rd Table Representation Learning Workshop at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Prior-data fitted networks (PFNs), especially TabPFN, have shown significant promise in tabular data prediction. However, their scalability is limited by the quadratic complexity of the transformer architecture’s attention across training points. In this work, we propose a method to localize TabPFN, which embeds data points into a learned representation and performs nearest neighbor selection in this space. We evaluate it across six datasets, demonstrating its superior performance over standard TabPFN when scaling to larger datasets. We also explore its design choices and analyze the bias-variance trade-off of this localization method, showing that it reduces bias while maintaining manageable variance. This work opens up a pathway for scaling TabPFN to arbitrarily large tabular datasets.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Stefan Bauer

Statistical Learning and Data Science

[362]

B. M. G. Nielsen, L. Gresele and A. Dittadi.
Challenges in Explaining Representational Similarity through Identifiability.
UniReps @NeurIPS 2024 - 2nd Workshop on Unifying Representations in Neural Models at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

The phenomenon of different deep learning models producing similar data representations has garnered significant attention, raising the question of why such representational similarity occurs. Identifiability theory offers a partial explanation: for a broad class of discriminative models, including many popular in representation learning, those assigning equal likelihood to the observations yield representations that are equal up to a linear transformation, if a suitable diversity condition holds. In this work, we identify two key challenges in applying identifiability theory to explain representational similarity. First, the assumption of exact likelihood equality is rarely satisfied by practical models trained with different initializations. To address this, we describe how the representations of two models deviate from being linear transformations of each other, based on their difference in log-likelihoods. Second, we demonstrate that even models with similar and near-optimal loss values can produce highly dissimilar representations due to an underappreciated difference between loss and likelihood. Our findings highlight key open questions and point to future research directions for advancing the theoretical understanding of representational similarity.

MCML Authors

Andrea Dittadi

Dr.

Algorithmic Machine Learning & Explainable AI

[361]

J. Herbinger, M. N. Wright, T. Nagler, B. Bischl and G. Casalicchio.
Decomposing Global Feature Effects Based on Feature Interactions.
Journal of Machine Learning Research 25.381 (Dec. 2024). URL

Abstract

Global feature effect methods, such as partial dependence plots, provide an intelligible visualization of the expected marginal feature effect. However, such global feature effect methods can be misleading, as they do not represent local feature effects of single observations well when feature interactions are present. We formally introduce generalized additive decomposition of global effects (GADGET), which is a new framework based on recursive partitioning to find interpretable regions in the feature space such that the interaction-related heterogeneity of local feature effects is minimized. We provide a mathematical foundation of the framework and show that it is applicable to the most popular methods to visualize marginal feature effects, namely partial dependence, accumulated local effects, and Shapley additive explanations (SHAP) dependence. Furthermore, we introduce and validate a new permutation-based interaction detection procedure that is applicable to any feature effect method that fits into our proposed framework. We empirically evaluate the theoretical characteristics of the proposed methods based on various feature effect methods in different experimental settings. Moreover, we apply our introduced methodology to three real-world examples to showcase their usefulness.

MCML Authors

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[360]

F. Fumagalli, M. Muschalik, E. Hüllermeier, B. Hammer and J. Herbinger.
Unifying Feature-Based Explanations with Functional ANOVA and Cooperative Game Theory.
Preprint (Dec. 2024). arXiv

Abstract

Feature-based explanations, using perturbations or gradients, are a prevalent tool to understand decisions of black box machine learning models. Yet, differences between these methods still remain mostly unknown, which limits their applicability for practitioners. In this work, we introduce a unified framework for local and global feature-based explanations using two well-established concepts: functional ANOVA (fANOVA) from statistics, and the notion of value and interaction from cooperative game theory. We introduce three fANOVA decompositions that determine the influence of feature distributions, and use game-theoretic measures, such as the Shapley value and interactions, to specify the influence of higher-order interactions. Our framework combines these two dimensions to uncover similarities and differences between a wide range of explanation techniques for features and groups of features. We then empirically showcase the usefulness of our framework on synthetic and real-world datasets.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Julia Herbinger

Dr.

* Former Member

[359]

C. Sauer, A.-L. Boulesteix, L. Hanßum, F. Hodiamont, C. Bausewein and T. Ullmann.
Beyond algorithm hyperparameters: on preprocessing hyperparameters and associated pitfalls in machine learning applications.
Preprint (Dec. 2024). arXiv

Abstract

Adequately generating and evaluating prediction models based on supervised machine learning (ML) is often challenging, especially for less experienced users in applied research areas. Special attention is required in settings where the model generation process involves hyperparameter tuning, i.e. data-driven optimization of different types of hyperparameters to improve the predictive performance of the resulting model. Discussions about tuning typically focus on the hyperparameters of the ML algorithm (e.g., the minimum number of observations in each terminal node for a tree-based algorithm). In this context, it is often neglected that hyperparameters also exist for the preprocessing steps that are applied to the data before it is provided to the algorithm (e.g., how to handle missing feature values in the data). As a consequence, users experimenting with different preprocessing options to improve model performance may be unaware that this constitutes a form of hyperparameter tuning - albeit informal and unsystematic - and thus may fail to report or account for this optimization. To illuminate this issue, this paper reviews and empirically illustrates different procedures for generating and evaluating prediction models, explicitly addressing the different ways algorithm and preprocessing hyperparameters are typically handled by applied ML users. By highlighting potential pitfalls, especially those that may lead to exaggerated performance claims, this review aims to further improve the quality of predictive modeling in ML applications.

MCML Authors

Christina Sauer (née Nießl)

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Theresa Ullmann

Dr.

* Former Member

[358]

E. Garces Arias, J. Rodemann, M. Li, C. Heumann and M. Aßenmacher.
Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Decoding from the output distributions of large language models to produce high-quality text is a complex challenge in language modeling. Various approaches, such as beam search, sampling with temperature, k−sampling, nucleus p−sampling, typical decoding, contrastive decoding, and contrastive search, have been proposed to address this problem, aiming to improve coherence, diversity, as well as resemblance to human-generated text. In this study, we introduce adaptive contrastive search, a novel decoding strategy extending contrastive search by incorporating an adaptive degeneration penalty, guided by the estimated uncertainty of the model at each generation step. This strategy is designed to enhance both the creativity and diversity of the language modeling process while at the same time producing coherent and high-quality generated text output. Our findings indicate performance enhancement in both aspects, across different model architectures and datasets, underscoring the effectiveness of our method in text generation tasks. Our code base, datasets, and models are publicly available.

MCML Authors

Esteban Garces Arias

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[357]

T. Woehrle, F. Pfeiffer, M. M. Mandl, W. Sobtzick, J. Heitzer, A. Krstova, L. Kamm, M. Feuerecker, D. Moser, M. Klein, B. Aulinger, M. Dolch, A.-L. Boulesteix, D. Lanz and A. Choukér.
Point-of-care breath sample analysis by semiconductor-based E-Nose technology discriminates non-infected subjects from SARS-CoV-2 pneumonia patients: a multi-analyst experiment.
MedComm 5.11 (Nov. 2024). DOI

Abstract

Metal oxide sensor-based electronic nose (E-Nose) technology provides an easy to use method for breath analysis by detection of volatile organic compound (VOC)-induced changes of electrical conductivity. Resulting signal patterns are then analyzed by machine learning (ML) algorithms. This study aimed to establish breath analysis by E-Nose technology as a diagnostic tool for severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) pneumonia within a multi-analyst experiment. Breath samples of 126 subjects with (n = 63) or without SARS-CoV-2 pneumonia (n = 63) were collected using the ReCIVA® Breath Sampler, enriched and stored on Tenax sorption tubes, and analyzed using an E-Nose unit with 10 sensors. ML approaches were applied by three independent data analyst teams and included a wide range of classifiers, hyperparameters, training modes, and subsets of training data. Within the multi-analyst experiment, all teams successfully classified individuals as infected or uninfected with an averaged area under the curve (AUC) larger than 90% and misclassification error lower than 19%, and identified the same sensor as most relevant to classification success. This new method using VOC enrichment and E-Nose analysis combined with ML can yield results similar to polymerase chain reaction (PCR) detection and superior to point-of-care (POC) antigen testing. Reducing the sensor set to the most relevant sensor may prove interesting for developing targeted POC testing.

MCML Authors

Maximilian Mandl

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[356]

Y. Li, Y. Zhang, K. Kawaguchi, A. Khakzar, B. Bischl and M. Rezaei.
A Dual-Perspective Approach to Evaluating Feature Attribution Methods.
Transactions on Machine Learning Research (Nov. 2024). URL

Abstract

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model’s behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

MCML Authors

Yawei Li

Statistical Learning and Data Science

Ashkan Khakzar

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[355]

K. Flöge, M. A. Moeed and V. Fortuin.
Stein Variational Newton Neural Network Ensembles.
Preprint (Nov. 2024). arXiv

Abstract

Deep neural network ensembles are powerful tools for uncertainty quantification, which have recently been re-interpreted from a Bayesian perspective. However, current methods inadequately leverage second-order information of the loss landscape, despite the recent availability of efficient Hessian approximations. We propose a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates. Our approach uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations. We validate the effectiveness of our method on diverse regression and classification tasks, demonstrating superior performance with a significantly reduced number of training epochs compared to existing ensemble-based methods, while enhancing uncertainty quantification and robustness against overfitting.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

[354]

K. Flöge, S. Udayakumar, J. Sommer, M. Piraud, S. Kesselheim, V. Fortuin, S. Günneman, K. J. van der Weg, H. Gohlke, E. Merdivan and A. Bazarova.
OneProt: Towards Multi-Modal Protein Foundation Models.
Preprint (Nov. 2024). arXiv

Abstract

Recent AI advances have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal AI for proteins that integrates structural, sequence, alignment, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of modality encoders along protein sequences. It demonstrates strong performance in retrieval tasks and surpasses state-of-the-art methods in various downstream tasks, including metal ion binding classification, gene-ontology annotation, and enzyme function prediction. This work expands multi-modal capabilities in protein models, paving the way for applications in drug discovery, biocatalytic reaction planning, and protein engineering.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

[353]

J. Gauss and T. Nagler.
Asymptotics for estimating a diverging number of parameters -- with and without sparsity.
Preprint (Nov. 2024). arXiv

Abstract

We consider high-dimensional estimation problems where the number of parameters diverges with the sample size. General conditions are established for consistency, uniqueness, and asymptotic normality in both unpenalized and penalized estimation settings. The conditions are weak and accommodate a broad class of estimation problems, including ones with non-convex and group structured penalties. The wide applicability of the results is illustrated through diverse examples, including generalized linear models, multi-sample inference, and stepwise estimation procedures.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

[352]

J. Nam, I. Chalkidis and M. Rezaei.
Hyperbolic Contrastive Learning for Document Representations – A Multi-View Approach with Paragraph-level Similarities.
ECAI 2024 - 27th European Conference on Artificial Intelligence. Santiago de Compostela, Spain, Oct 19-24, 2024. DOI

Abstract

Self-supervised learning (SSL) has gained prominence due to the increasing availability of unlabeled data and advances in computational efficiency, leading to revolutionized natural language processing with pre-trained language models like BERT and GPT. Representation learning, a core concept in SSL, aims to reduce data dimensionality while preserving meaningful aspects. Conventional SSL methods typically embed data in Euclidean space. However, recent research has revealed that alternative geometries can hold even richer representations, unlocking more meaningful insights from the data. Motivated by this, we propose two novel methods for integrating Hilbert geometry into self-supervised learning for efficient document embedding. First, we present a method directly incorporating Hilbert geometry into the standard Euclidean contrastive learning framework. Additionally, we propose a multi-view hyperbolic contrastive learning framework contrasting both documents and paragraphs. Our findings demonstrate that contrasting only paragraphs, rather than entire documents, can lead to superior efficiency and effectiveness.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

[351]

M. Aßenmacher, L. Karrlein, P. Schiele and C. Heumann.
Introducing wwm-german-18k - Can LLMs Crack the Million? (Or Win at Least 500 Euros?).
ICNLSP 2024 - 7th International Conference on Natural Language and Speech Processing. Trento, Italy, Oct 19-20, 2024. URL

Abstract

Language-specific evaluation of large language models (LLMs) for multiple-choice question answering (MCQA) is an important means to test their abilities for a multitude of different dimensions. With a data set assembled from questions from the German variant of ‘Who Wants to Be a Millionaire?’ we evaluate a set of German models and ChatGPT concerning factual/commonsense knowledge, syntactic abilities, and logical reasoning, amongst others. We contribute this new MCQA data set, extracted from the show’s episodes and designed to evaluate the ability of models to answer this diverse range of questions. To ensure data quality, we describe our preprocessing, encompassing data cleaning, deduplication, and the creation of stratified splits. Furthermore, we fine-tune a set of German LLMs and prompt ChatGPT to provide baseline results. Our findings reveal that these models achieve (partly) satisfactory performance on questions of lower difficulty levels (≤ 1000 euros). As the difficulty increases, performance steadily declines, highlighting the challenging nature of the later stages of the game. We contribute to the ongoing efforts to advance the capabilities of LLMs in comprehending and answering questions by providing a valuable resource for German MCQA research as well as further insights into the limitations of current LLMs.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[350]

J. Gertheiss, D. Rügamer, B. X. Liew and S. Greven.
Functional Data Analysis: An Introduction and Recent Developments.
Biometrical Journal 66.7 (Oct. 2024). DOI GitHub

Abstract

Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionality of observations and parameters, respectively. This paper provides an introduction to FDA, including a description of the most common statistical analysis techniques, their respective software implementations, and some recent developments in the field. The paper covers fundamental concepts such as descriptives and outliers, smoothing, amplitude and phase variation, and functional principal component analysis. It also discusses functional regression, statistical inference with functional data, functional classification and clustering, and machine learning approaches for functional data analysis. The methods discussed in this paper are widely applicable in fields such as medicine, biophysics, neuroscience, and chemistry, and are increasingly relevant due to the widespread use of technologies that allow for the collection of functional data. Sparse functional data methods are also relevant for longitudinal data analysis. All presented methods are demonstrated using available software in R by analyzing a data set on human motion and motor control. To facilitate the understanding of the methods, their implementation, and hands-on application, the code for these practical examples is made available on Github.

MCML Authors

David Rügamer

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistics, Data Science and Machine Learning

[349]

H. Funk, R. Ludwig, H. Küchenhoff and T. Nagler.
Towards more realistic climate model outputs: A multivariate bias correction based on zero-inflated vine copulas.
Preprint (Oct. 2024). arXiv

Abstract

Climate model large ensembles are an essential research tool for analysing and quantifying natural climate variability and providing robust information for rare extreme events. The models simulated representations of reality are susceptible to bias due to incomplete understanding of physical processes. This paper aims to correct the bias of five climate variables from the CRCM5 Large Ensemble over Central Europe at a 3-hourly temporal resolution. At this high temporal resolution, two variables, precipitation and radiation, exhibit a high share of zero inflation. We propose a novel bias-correction method, VBC (Vine copula bias correction), that models and transfers multivariate dependence structures for zero-inflated margins in the data from its error-prone model domain to a reference domain. VBC estimates the model and reference distribution using vine copulas and corrects the model distribution via (inverse) Rosenblatt transformation. To deal with the variables’ zero-inflated nature, we develop a new vine density decomposition that accommodates such variables and employs an adequately randomized version of the Rosenblatt transform. This novel approach allows for more accurate modelling of multivariate zero-inflated climate data. Compared with state-of-the-art correction methods, VBC is generally the best-performing correction and the most accurate method for correcting zero-inflated events.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

[348]

T. Pielok, B. Bischl and D. Rügamer.
Semi-Implicit Variational Inference via Kernelized Path Gradient Descent.
Preprint (Oct. 2024). arXiv

Abstract

Semi-implicit variational inference (SIVI) is a powerful framework for approximating complex posterior distributions, but training with the Kullback-Leibler (KL) divergence can be challenging due to high variance and bias in high-dimensional settings. While current state-of-the-art semi-implicit variational inference methods, particularly Kernel Semi-Implicit Variational Inference (KSIVI), have been shown to work in high dimensions, training remains moderately expensive. In this work, we propose a kernelized KL divergence estimator that stabilizes training through nonparametric smoothing. To further reduce the bias, we introduce an importance sampling correction. We provide a theoretical connection to the amortized version of the Stein variational gradient descent, which estimates the score gradient via Stein’s identity, showing that both methods minimize the same objective, but our semi-implicit approach achieves lower gradient variance. In addition, our method’s bias in function space is benign, leading to more stable and efficient optimization. Empirical results demonstrate that our method outperforms or matches state-of-the-art SIVI methods in both performance and training efficiency.

MCML Authors

Tobias Pielok

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[347]

Y. Liang, O. Zadorozhnyi and M. Drton.
Kernel-Based Differentiable Learning of Non-Parametric Directed Acyclic Graphical Models.
PGM 2024 - 12th International Conference on Probabilistic Graphical Models. Nijmegen, The Netherlands, Sep 11-13, 2024. URL

Abstract

Causal discovery amounts to learning a directed acyclic graph (DAG) that encodes a causal model. This model selection problem can be challenging due to its large combinatorial search space, particularly when dealing with non-parametric causal models. Recent research has sought to bypass the combinatorial search by reformulating causal discovery as a continuous optimization problem, employing constraints that ensure the acyclicity of the graph. In non-parametric settings, existing approaches typically rely on finite-dimensional approximations of the relationships between nodes, resulting in a score-based continuous optimization problem with a smooth acyclicity constraint. In this work, we develop an alternative approximation method by utilizing reproducing kernel Hilbert spaces (RKHS) and applying general sparsity-inducing regularization terms based on partial derivatives. Within this framework, we introduce an extended RKHS representer theorem. To enforce acyclicity, we advocate the log-determinant formulation of the acyclicity constraint and show its stability. Finally, we assess the performance of our proposed RKHS-DAGMA procedure through simulations and illustrative data analyses.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

[346]

D. Strieder and M. Drton.
Identifying Total Causal Effects in Linear Models under Partial Homoscedasticity.
PGM 2024 - 12th International Conference on Probabilistic Graphical Models. Nijmegen, The Netherlands, Sep 11-13, 2024. URL

Abstract

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[345]

H. Baniecki, G. Casalicchio, B. Bischl and P. Biecek.
On the Robustness of Global Feature Effect Explanations.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.

MCML Authors

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[344]

F. Stermann, I. Chalkidis, A. Vahidi, B. Bischl and M. Rezaei.
Attention-Driven Dropout: A Simple Method to Improve Self-supervised Contrastive Sentence Embeddings.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

Self-contrastive learning has proven effective for vision and natural language tasks. It aims to learn aligned data representations by encoding similar and dissimilar sentence pairs without human annotation. Therefore, data augmentation plays a crucial role in the learned embedding quality. However, in natural language processing (NLP), creating augmented samples for unsupervised contrastive learning is challenging since random editing may modify the semantic meanings of sentences and thus affect learning good representations. In this paper, we introduce a simple, still effective approach dubbed ADD (Attention-Driven Dropout) to generate better-augmented views of sentences to be used in self-contrastive learning. Given a sentence and a Pre-trained Transformer Language Model (PLM), such as RoBERTa, we use the aggregated attention scores of the PLM to remove the less “informative” tokens from the input. We consider two alternative algorithms based on NAIVEAGGREGATION across layers/heads and ATTENTIONROLLOUT [1]. Our approach significantly improves the overall performance of various self-supervised contrastive-based methods, including SIMCSE [14], DIFFCSE [10], and INFOCSE [33] by facilitating the generation of high-quality positive pairs required by these methods. Through empirical evaluations on multiple Semantic Textual Similarity (STS) and Transfer Learning tasks, we observe enhanced performance across the board.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[343]

A. Vahidi, L. Wimmer, H. A. Gündüz, B. Bischl, E. Hüllermeier and M. Rezaei.
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory demands. In addition, the efficiency of a deep ensemble is related to diversity among the ensemble members, which is challenging for large, over-parameterized deep neural networks. Moreover, ensemble learning has not yet seen such widespread adoption for unsupervised learning and it remains a challenging endeavor for self-supervised or unsupervised representation learning. Motivated by these challenges, we present a novel self-supervised training regime that leverages an ensemble of independent sub-networks, complemented by a new loss function designed to encourage diversity. Our method efficiently builds a sub-model ensemble with high diversity, leading to well-calibrated estimates of model uncertainty, all achieved with minimal computational overhead compared to traditional deep self-supervised ensembles. To evaluate the effectiveness of our approach, we conducted extensive experiments across various tasks, including in-distribution generalization, out-of-distribution detection, dataset corruption, and semi-supervised settings. The results demonstrate that our method significantly improves prediction reliability. Our approach not only achieves excellent accuracy but also enhances calibration, improving on important baseline performance across a wide range of self-supervised architectures in computer vision, natural language processing, and genomics data.

MCML Authors

Lisa Wimmer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Hüseyin Anil Gündüz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Mina Rezaei

Dr.

Statistical Learning and Data Science

[342]

C. Molnar, G. König, B. Bischl and G. Casalicchio.
Model-agnostic feature importance and effects with dependent features: a conditional subgroup approach.
Data Mining and Knowledge Discovery 38 (Sep. 2024). DOI

Abstract

The interpretation of feature importance in machine learning models is challenging when features are dependent. Permutation feature importance (PFI) ignores such dependencies, which can cause misleading interpretations due to extrapolation. A possible remedy is more advanced conditional PFI approaches that enable the assessment of feature importance conditional on all other features. Due to this shift in perspective and in order to enable correct interpretations, it is beneficial if the conditioning is transparent and comprehensible. In this paper, we propose a new sampling mechanism for the conditional distribution based on permutations in conditional subgroups. As these subgroups are constructed using tree-based methods such as transformation trees, the conditioning becomes inherently interpretable. This not only provides a simple and effective estimator of conditional PFI, but also local PFI estimates within the subgroups. In addition, we apply the conditional subgroups approach to partial dependence plots, a popular method for describing feature effects that can also suffer from extrapolation when features are dependent and interactions are present in the model. In simulations and a real-world application, we demonstrate the advantages of the conditional subgroup approach over existing methods: It allows to compute conditional PFI that is more true to the data than existing proposals and enables a fine-grained interpretation of feature effects and importance within the conditional subgroups.

MCML Authors

Gunnar König

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[341]

L. Barreñada, P. Dhiman, D. Timmerman, A.-L. Boulesteix and B. Van Calster.
Understanding overfitting in random forest for probability estimation: a visualization and simulation study.
Diagnostic and Prognostic Research 8.14 (Sep. 2024). DOI

Abstract

Background: Random forests have become popular for clinical risk prediction modeling. In a case study on predicting ovarian malignancy, we observed training AUCs close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behavior of random forests for probability estimation by (1) visualizing data space in three real-world case studies and (2) a simulation study.
Methods: For the case studies, multinomial risk estimates were visualized using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data-generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true AUC, and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 with binary outcomes were simulated, and random forest models were trained with minimum node size 2 or 20 using the ranger R package, resulting in 192 scenarios in total. Model performance was evaluated on large test datasets (N = 100,000).
Results: The visualizations suggested that the model learned “spikes of probability” around events in the training set. A cluster of events created a bigger peak or plateau (signal), isolated events local peaks (noise). In the simulation study, median training AUCs were between 0.97 and 1 unless there were 4 binary predictors or 16 binary predictors with a minimum node size of 20. The median discrimination loss, i.e., the difference between the median test AUC and the true AUC, was 0.025 (range 0.00 to 0.13). Median training AUCs had Spearman correlations of around 0.70 with discrimination loss. Median test AUCs were higher with higher events per variable, higher minimum node size, and binary predictors. Median training calibration slopes were always above 1 and were not correlated with median test slopes across scenarios (Spearman correlation − 0.11). Median test slopes were higher with higher true AUC, higher minimum node size, and higher sample size.
Conclusions: Random forests learn local probability peaks that often yield near perfect training AUCs without strongly affecting AUCs on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[340]

Y. Li, T. Herold, U. Mansmann and R. Hornung.
Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study.
Earth System Science Data 24.244 (Sep. 2024). DOI

Abstract

Background: Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions.
Methods: In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell’s C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives.
Results: Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures.
Conclusions: Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.

MCML Authors

Roman Hornung

Dr.

Biometry in Molecular Medicine

[339]

H. J. Coyle-Asbil, L. Burk, M. Brandes, B. Brandes, C. Buck, M. N. Wright and L. A. Vallis.
Energy Expenditure Prediction in Preschool Children: A Machine Learning Approach Using Accelerometry and External Validation.
Physiological Measurement 45.9 (Sep. 2024). DOI

Abstract

Objective. This study aimed to develop convolutional neural networks (CNNs) models to predict the energy expenditure (EE) of children from raw accelerometer data. Additionally, this study sought to external validation of the CNN models in addition to the linear regression (LM), random forest (RF), and full connected neural network (FcNN) models published in Steenbock et al (2019 J. Meas. Phys. Behav. 2 94–102). Approach. Included in this study were 41 German children (3.0–6.99 years) for the training and internal validation who were equipped with GENEActiv, GT3X+, and activPAL accelerometers. The external validation dataset consisted of 39 Canadian children (3.0–5.99 years) that were equipped with OPAL, GT9X, GENEActiv, and GT3X+ accelerometers. EE was recorded simultaneously in both datasets using a portable metabolic unit. The protocols consisted of a semi-structured activities ranging from low to high intensities. The root mean square error (RMSE) values were calculated and used to evaluate model performances. Main results. (1) The CNNs outperformed the LM (13.17%–23.81% lower mean RMSE values), FcNN (8.13%–27.27% lower RMSE values) and the RF models (3.59%–18.84% lower RMSE values) in the internal dataset. (2) In contrast, it was found that when applied to the external Canadian dataset, the CNN models had consistently higher RMSE values compared to the LM, FcNN, and RF. Significance. Although CNNs can enhance EE prediction accuracy, their ability to generalize to new datasets and accelerometer brands/models, is more limited compared to LM, RF, and FcNN models.

MCML Authors

Lukas Burk

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[338]

R. Hornung and A. Hapfelmeier.
Multi forests: Variable importance for multi-class outcomes.
Preprint (Sep. 2024). arXiv

Abstract

In prediction tasks with multi-class outcomes, identifying covariates specifically associated with one or more outcome classes can be important. Conventional variable importance measures (VIMs) from random forests (RFs), like permutation and Gini importance, focus on overall predictive performance or node purity, without differentiating between the classes. Therefore, they can be expected to fail to distinguish class-associated covariates from covariates that only distinguish between groups of classes. We introduce a VIM called multi-class VIM, tailored for identifying exclusively class-associated covariates, via a novel RF variant called multi forests (MuFs). The trees in MuFs use both multi-way and binary splitting. The multi-way splits generate child nodes for each class, using a split criterion that evaluates how well these nodes represent their respective classes. This setup forms the basis of the multi-class VIM, which measures the discriminatory ability of the splits performed in the respective covariates with regard to this split criterion. Alongside the multi-class VIM, we introduce a second VIM, the discriminatory VIM. This measure, based on the binary splits, assesses the strength of the general influence of the covariates, irrespective of their class-associatedness. Simulation studies demonstrate that the multi-class VIM specifically ranks class-associated covariates highly, unlike conventional VIMs which also rank other types of covariates highly. Analyses of 121 datasets reveal that MuFs often have slightly lower predictive performance compared to conventional RFs. This is, however, not a limiting factor given the algorithm’s primary purpose of calculating the multi-class VIM.

MCML Authors

Roman Hornung

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[337]

H. Schulz-Kümpel, S. Fischer, T. Nagler, A.-L. Boulesteix, B. Bischl and R. Hornung.
Constructing Confidence Intervals for 'the' Generalization Error – a Comprehensive Benchmark Study.
Preprint (Sep. 2024). arXiv

Abstract

When assessing the quality of prediction models in machine learning, confidence intervals (CIs) for the generalization error, which measures predictive performance, are a crucial tool. Luckily, there exist many methods for computing such CIs and new promising approaches are continuously being proposed. Typically, these methods combine various resampling procedures, most popular among them cross-validation and bootstrapping, with different variance estimation techniques. Unfortunately, however, there is currently no consensus on when any of these combinations may be most reliably employed and how they generally compare. In this work, we conduct the first large-scale study comparing CIs for the generalization error - empirically evaluating 13 different methods on a total of 18 tabular regression and classification problems, using four different inducers and a total of eight loss functions. We give an overview of the methodological foundations and inherent challenges of constructing CIs for the generalization error and provide a concise review of all 13 methods in a unified framework. Finally, the CI methods are evaluated in terms of their relative coverage frequency, width, and runtime. Based on these findings, we are able to identify a subset of methods that we would recommend. We also publish the datasets as a benchmarking suite on OpenML and our code on GitHub to serve as a basis for further studies.

MCML Authors

Hannah Schulz-Kümpel

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

Roman Hornung

Dr.

Biometry in Molecular Medicine

[336]

A. Stephan, D. Zhu, M. Aßenmacher, X. Shen and B. Roth.
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks.
Preprint (Sep. 2024). arXiv

Abstract

To reduce the need for human annotations, large language models (LLMs) have been proposed as judges of the quality of other candidate models. LLM judges are typically evaluated by measuring the correlation with human judgments on generation tasks such as summarization or machine translation. In contrast, we study LLM judges on mathematical reasoning tasks. These tasks require multi-step reasoning, and the correctness of their solutions is verifiable, enabling a more objective evaluation. We perform a detailed performance analysis and find that the used judges are mostly unable to improve task performance but are able to pick the better model. Our analysis uncovers a strong correlation between judgment performance and the candidate model task performance. We observe that judges tend to choose the model of higher quality even if its answer is incorrect. Further, we show that it is possible to use statistics, such as the task performances of the individual models, to predict judgment performance. In an ablation, we either swap or mask the candidate answers and observe that judges often keep the original judgment, providing evidence that judges incorporate writing style in their judgments. In summary, we find that regularities in the judgments are quantifiable using statistical measures and provide various angles on exploiting them.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[335]

S. Urchs, V. Thurner, M. Aßenmacher, C. Heumann and S. Thiemichen.
Detecting Gender Discrimination on Actor Level Using Linguistic Discourse Analysis.
GeBNLP 2024 - 5th Workshop on Gender Bias in Natural Language Processing. Bangkok, Thailand, Aug 16, 2024. URL

Abstract

With the usage of tremendous amounts of text data for training powerful large language models such as ChatGPT, the issue of analysing and securing data quality has become more pressing than ever. Any biases, stereotypes and discriminatory patterns that exist in the training data can be reproduced, reinforced or broadly disseminated by the models in production. Therefore, it is crucial to carefully select and monitor the text data that is used as input to train the model. Due to the vast amount of training data, this process needs to be (at least partially) automated. In this work, we introduce a novel approach for automatically detecting gender discrimination in text data on the actor level based on linguistic discourse analysis. Specifically, we combine existing information extraction (IE) techniques to partly automate the qualitative research done in linguistic discourse analysis. We focus on two important steps: Identifying the respectiveperson-named-entity (an actor) and all forms it is referred to (Nomination), and detecting the characteristics it is ascribed (Predication). Asa proof of concept, we integrate these two steps into a pipeline for automated text analysis. The separate building blocks of the pipeline could be flexibly adapted, extended, and scaled for bigger datasets to accommodate a wide range of usage scenarios and specific ML tasks or help social scientists with analysis tasks. We showcase and evaluate our approach on several real and simulated exemplary texts.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[334]

J. Pavlopoulos, V. Kougia, E. Garces Arias, P. Platanou, S. Shabalin, K. Liagkou, E. Papadatos, H. Essler, J.-B. Camps and F. Fischer.
Challenging Error Correction in Recognised Byzantine Greek.
ML4AL @ACL 2024 - 1st Workshop on Machine Learning for Ancient Languages at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Automatic correction of errors in Handwritten Text Recognition (HTR) output poses persistent challenges yet to be fully resolved. In this study, we introduce a shared task aimed at addressing this challenge, which attracted 271 submissions, yielding only a handful of promising approaches. This paper presents the datasets, the most effective methods, and an experimental analysis in error-correcting HTRed manuscripts and papyri in Byzantine Greek, the language that followed Classical and preceded Modern Greek. By using recognised and transcribed data from seven centuries, the two best-performing methods are compared, one based on a neural encoder-decoder architecture and the other based on engineered linguistic rules. We show that the recognition error rate can be reduced by both, up to 2.5 points at the level of characters and up to 15 at the level of words, while also elucidating their respective strengths and weaknesses.

MCML Authors

Esteban Garces Arias

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[333]

M. Aßenmacher, A. Stephan, L. Weissweiler, E. Çano, I. Ziegler, M. Härttrich, B. Bischl, B. Roth, C. Heumann and H. Schütze.
Collaborative Development of Modular Open Source Educational Resources for Natural Language Processing.
TeachingNLP @ACL 2024 - 6th Workshop on Teaching NLP at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL

Abstract

In this work, we present a collaboratively and continuously developed open-source educational resource (OSER) for teaching natural language processing at two different universities. We shed light on the principles we followed for the initial design of the course and the rationale for ongoing developments, followed by a reflection on the inter-university collaboration for designing and maintaining teaching material. When reflecting on the latter, we explicitly emphasize the considerations that need to be made when facing heterogeneous groups and when having to accommodate multiple examination regulations within one single course framework. Relying on the fundamental principles of OSER developments as defined by Bothmann et al. (2023) proved to be an important guideline during this process. The final part pertains to open-sourcing our teaching material, coping with the increasing speed of developments in the field, and integrating the course digitally, also addressing conflicting priorities and challenges we are currently facing.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Leonie Weissweiler

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[332]

J. G. Wiese, L. Wimmer, T. Papamarkou, B. Bischl, S. Günnemann and D. Rügamer.
Towards Efficient Posterior Sampling in Deep Neural Networks via Symmetry Removal (Extended Abstract).
IJCAI 2024 - 33rd International Joint Conference on Artificial Intelligence. Jeju, Korea, Aug 03-09, 2024. DOI

Abstract

Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[331]

L. Bothmann and K. Peters.
Fairness als Qualitätskriterium im Maschinellen Lernen – Rekonstruktion des philosophischen Konzepts und Implikationen für die Nutzung außergesetzlicher Merkmale bei qualifizierten Mietspiegeln.
AStA Wirtschafts- und Sozialstatistisches Archiv 18 (Aug. 2024). DOI

Abstract

With the increased use of machine learning (ML) models within automated decision-making systems, the demands on the quality of ML models are growing. Pure prediction quality is no longer the sole quality criterion; in particular, there is an increasing demand to consider fairness aspects. This paper pursues two goals. First, it summarizes the current fairness discussion in the field of ML (fairML) and describes the most recent developments, especially with respect to the philosophical foundations of the concept of fairness within ML. On the other hand, the question is addressed to what extent so-called ‘extra-legal’ characteristics may be used in the compilation of qualified rent indices. A recent proposal by Kauermann and Windmann (AStA Wirtschafts- und Sozialstatistisches Archiv, Volume 17, 2023) on using extra-legal features in qualified rent indices includes a model-based imputation method, which we contrast with the legal requirements. Finally, we show which alternatives from the field of fairML could be used and outline the different basic philosophical assumptions behind the various methods.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[330]

D. Schalk, R. Rehms, V. S. Hoffmann, B. Bischl and U. Mansmann.
Distributed non-disclosive validation of predictive models by a modified ROC-GLM.
BMC Medical Research Methodology 24.190 (Aug. 2024). DOI

Abstract

Distributed statistical analyses provide a promising approach for privacy protection when analyzing data distributed over several databases. Instead of directly operating on data, the analyst receives anonymous summary statistics, which are combined into an aggregated result. Further, in discrimination model (prognosis, diagnosis, etc.) development, it is key to evaluate a trained model w.r.t. to its prognostic or predictive performance on new independent data. For binary classification, quantifying discrimination uses the receiver operating characteristics (ROC) and its area under the curve (AUC) as aggregation measure. We are interested to calculate both as well as basic indicators of calibration-in-the-large for a binary classification task using a distributed and privacy-preserving approach…

MCML Authors

Daniel Schalk

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[329]

F. Drost, E. Dorigatti, A. Straub, P. Hilgendorf, K. I. Wagner, K. Heyer, M. López Montes, B. Bischl, D. H. Busch, K. Schober and B. Schubert.
Predicting T cell receptor functionality against mutant epitopes.
Cell Genomics 4.9 (Aug. 2024). DOI

Abstract

Cancer cells and pathogens can evade T cell receptors (TCRs) via mutations in immunogenic epitopes. TCR cross-reactivity (i.e., recognition of multiple epitopes with sequence similarities) can counteract such escape but may cause severe side effects in cell-based immunotherapies through targeting self-antigens. To predict the effect of epitope point mutations on T cell functionality, we here present the random forest-based model Predicting T Cell Epitope-Specific Activation against Mutant Versions (P-TEAM). P-TEAM was trained and tested on three datasets with TCR responses to single-amino-acid mutations of the model epitope SIINFEKL, the tumor neo-epitope VPSVWRSSL, and the human cytomegalovirus antigen NLVPMVATV, totaling 9,690 unique TCR-epitope interactions. P-TEAM was able to accurately classify T cell reactivities and quantitatively predict T cell functionalities for unobserved single-point mutations and unseen TCRs. Overall, P-TEAM provides an effective computational tool to study T cell responses against mutated epitopes.

MCML Authors

Emilio Dorigatti

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[328]

A. Mittermeier, M. Aßenmacher, B. Schachtner, S. Grosu, V. Dakovic, V. Kandratovich, B. Sabel and M. Ingrisch.
Automatische ICD-10-Codierung.
Die Radiologie 64 (Aug. 2024). DOI

Abstract

Hintergrund: Die medizinische Codierung von radiologischen Befunden ist essenziell für eine gute Qualität der Versorgung und die korrekte Abrechnung, gleichzeitig aber eine aufwändige und fehleranfällige Aufgabe.
Ziel der Arbeit: Bewertung der Anwendbarkeit natürlicher Sprachverarbeitung (Natural Language Processing, NLP) für die ICD-10-Codierung von radiologischen Befunden in deutscher Sprache durch Finetuning geeigneter Sprachmodelle.
Material und Methoden: In dieser retrospektiven Studie wurden alle Magnetresonanztomographie(MRT)-Befunde unseres Instituts zwischen 2010 und 2020 berücksichtigt. Die ICD-10-Codes bei Entlassung wurden den jeweiligen Befunden zugeordnet, um einen Datensatz für eine Multiclass-Klassifizierung zu erstellen. Finetuning von GermanBERT und flanT5 wurde auf dem Gesamtdatensatz (dstotal) mit 1035 verschiedenen ICD-10-Codes und zwei reduzierten Datensätzen mit den 100 (ds100) und 50 (ds50) häufigsten Codes durchgeführt. Die Performance der Modelle wurde mit Top-k-Genauigkeit für k = 1, 3, 5 evaluiert. In einer Ablationsstudie wurden beide Modelle einmal auf den zugehörigen Metadaten und dem Befund allein trainiert.
Ergebnisse: Der Gesamtdatensatz bestand aus 100.672 radiologischen Befunden, die reduzierten Datensätze ds100 aus 68.103 und ds50 aus 52.293 Berichten. Die Modellperformance stieg, wenn mehrere der besten Voraussagen des Modells in Betracht gezogen wurden, die Anzahl der Zielklassen reduziert wurde und die Metadaten mit dem Befund kombiniert wurden. FlanT5 übertraf GermanBERT in allen Datensätzen und Metriken und eignet sich am besten als medizinischer Codierungsassistent, wobei eine Top-3-Genauigkeit von fast 70% im realitätsnahen Datensatz dstotal erreicht wurde.
Schlussfolgerung: Finetuning von Sprachmodellen verspricht eine zuverlässige Vorhersage von ICD-10-Codes deutscher radiologischer MRT-Befunde in unterschiedlichen Szenarien. Als Codierungsassistent kann flanT5 medizinischen Codierern helfen, informierte Entscheidungen zu treffen und potenziell ihre Arbeitsbelastung reduzieren.

MCML Authors

Andreas Mittermeier

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

[327]

F. Ott, L. Heublein, D. Rügamer, B. Bischl and C. Mutschler.
Fusing structure from motion and simulation-augmented pose regression from optical flow for challenging indoor environments.
Journal of Visual Communication and Image Representation 103 (Aug. 2024). DOI

Abstract

The localization of objects is essential in many applications, such as robotics, virtual and augmented reality, and warehouse logistics. Recent advancements in deep learning have enabled localization using monocular cameras. Traditionally, structure from motion (SfM) techniques predict an object’s absolute position from a point cloud, while absolute pose regression (APR) methods use neural networks to understand the environment semantically. However, both approaches face challenges from environmental factors like motion blur, lighting changes, repetitive patterns, and featureless areas. This study addresses these challenges by incorporating additional information and refining absolute pose estimates with relative pose regression (RPR) methods. RPR also struggles with issues like motion blur. To overcome this, we compute the optical flow between consecutive images using the Lucas–Kanade algorithm and use a small recurrent convolutional network to predict relative poses. Combining absolute and relative poses is difficult due to differences between global and local coordinate systems. Current methods use pose graph optimization (PGO) to align these poses. In this work, we propose recurrent fusion networks to better integrate absolute and relative pose predictions, enhancing the accuracy of absolute pose estimates. We evaluate eight different recurrent units and create a simulation environment to pre-train the APR and RPR networks for improved generalization. Additionally, we record a large dataset of various scenarios in a challenging indoor environment resembling a warehouse with transportation robots. Through hyperparameter searches and experiments, we demonstrate that our recurrent fusion method outperforms PGO in effectiveness.

MCML Authors

Felix Ott

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[326]

E. Bergman, M. Feurer, A. Bahram, A. R. Balef, L. Purucker, S. Segel, M. Lindauer, F. Hutter and K. Eggensperger.
AMLTK: A Modular AutoML Toolkit in Python.
The Journal of Open Source Software 9.100 (Aug. 2024). DOI

Abstract

Machine Learning is a core building block in novel data-driven applications. Practitioners face many ambiguous design decisions while developing practical machine learning (ML) solutions. Automated machine learning (AutoML) facilitates the development of machine learning applications by providing efficient methods for optimizing hyperparameters, searching for neural architectures, or constructing whole ML pipelines (Hutter et al., 2019). Thereby, design decisions such as the choice of modelling, pre-processing, and training algorithm are crucial to obtaining well-performing solutions. By automatically obtaining ML solutions, AutoML aims to lower the barrier to leveraging machine learning and reduce the time needed to develop or adapt ML solutions for new domains or data.
Highly performant software packages for automatically building ML pipelines given data, so-called AutoML systems, are available and can be used off-the-shelf. Typically, AutoML systems evaluate ML models sequentially to return a well-performing single best model or multiple models combined into an ensemble. Existing AutoML systems are typically highly engineered monolithic software developed for specific use cases to perform well and robustly under various conditions…

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

[325]

R. Baptista, B. Liew, S. Pizzocaro, X. Zhai, S. Galasso, D. Rügamer, T. Waterkeyn, I. Boukhennoufa, X. Zhu and A. M. Nunzio.
Motion Analysis in Neurological Rehabilitation: From the Lab to the Clinic.
Translational Neurorehabilitation (Aug. 2024). DOI

Abstract

Human motion analysis and biomechanics are fundamental in a clinical environment, and together, they provide relevant and precise information towards diagnosing numerous neurodegenerative conditions such as stroke, Parkinson’s disease, Alzheimer’s disease, multiple sclerosis, etc. In most neurological disorders, walking is commonly impacted, where performance, quantity, and quality are affected. Thus, motion analysis aims at understanding the cause of altered motion patterns, mainly assisting with the prevention, identification, and rehabilitation. Usually, motion analysis assessment relies on the patient’s self-report and the practitioner’s visually assessed observations. Therefore, such assessments are often subjective and susceptible to human-induced error. In contrast, sophisticated devices can provide quantitative accuracy by equipping practitioners with precise, reliable, and objective measurements to simultaneously monitor an extensive set of parameters for gait analysis (e.g., 3D joint kinematics, muscle activation patterns, muscle forces, and coordination patterns). This book chapter addresses the challenges and describes the technological solutions considered when moving out of the lab condition to the real-world environments, in this case, the clinical setting.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[324]

D. Schkoda, E. Robeva and M. Drton.
Causal Discovery of Linear Non-Gaussian Causal Models with Unobserved Confounding.
Preprint (Aug. 2024). arXiv

Abstract

We consider linear non-Gaussian structural equation models that involve latent confounding. In this setting, the causal structure is identifiable, but, in general, it is not possible to identify the specific causal effects. Instead, a finite number of different causal effects result in the same observational distribution. Most existing algorithms for identifying these causal effects use overcomplete independent component analysis (ICA), which often suffers from convergence to local optima. Furthermore, the number of latent variables must be known a priori. To address these issues, we propose an algorithm that operates recursively rather than using overcomplete ICA. The algorithm first infers a source, estimates the effect of the source and its latent parents on their descendants, and then eliminates their influence from the data. For both source identification and effect size estimation, we use rank conditions on matrices formed from higher-order cumulants. We prove asymptotic correctness under the mild assumption that locally, the number of latent variables never exceeds the number of observed variables. Simulation studies demonstrate that our method achieves comparable performance to overcomplete ICA even though it does not know the number of latents in advance.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[323]

L. Bothmann, K. Peters, S. Dandl, M. Schomaker and B. Bischl.
Causal Fair Machine Learning.
GMDS/IBS-DR - 54. Arbeitstagung der Arbeitsgruppen Statistical Computing, Klassifikation und Datenanalyse in den Biowissenschaften. Günzburg, Germany, Jul 28-31, 2024. PDF

Abstract

A growing body of literature in fairness-aware ML aspires to mitigate machine learning (ML)-related unfairness in automated decision-making (ADM) by defining metrics that measure the fairness of an ML model and by proposing methods that ensure that trained ML models achieve low values in those metrics (see, e.g., Verma & Rubin, 2018, Caton & Haas, 2023). However, the underlying concept of fairness, i.e., the question of what fairness is, is rarely discussed, leaving a considerable gap between centuries of philosophical discussion and the recent adoption of the concept in the ML community. We bridge this gap by formalizing a consistent concept of fairness and translating the philosophical considerations into a formal framework for training and evaluating ML models in ADM systems (Bothmann et al., 2024). We argue why and how causal considerations are necessary when assessing fairness in the presence of protected attributes (PAs) by proposing a fictitious, normatively desired (FiND) world where the PAs have no (direct or indirect) causal effect on the target. In practice, this unknown FiND world must be approximated by a warped world, for which the causal effects of the PAs must be removed from the real-world data. We propose rank-preserving interventional distributions to define an estimand of this FiND world and a warping method for estimation (Bothmann et al., 2023). Evaluation criteria for both the method and the resulting ML model are presented. Experiments on simulated data show that our method effectively identifies the most discriminated individuals and mitigates unfairness. Experiments on real-world data showcase the practical application of our method.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michael Schomaker

Prof. Dr.

Biostatistics

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[322]

L. Burk, J. Zobolas, B. Bischl, A. Bender, M. N. Wright and R. Sonabend.
A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data.
GMDS/IBS-DR - 54. Arbeitstagung der Arbeitsgruppen Statistical Computing, Klassifikation und Datenanalyse in den Biowissenschaften. Günzburg, Germany, Jul 28-31, 2024. PDF

Abstract

This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are often narrow in scope, focusing, for example, on high-dimensional data. Additionally, they may lack appropriate tuning or evaluation procedures, or are qualitative reviews, rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable conclusions. We benchmark 18 models, ranging from classical statistical approaches to many common machine learning methods, on 32 publicly available datasets. The benchmark tunes for both a discrimination measure and a proper scoring rule to assess performance in different settings. Evaluating on 8 survival metrics, we assess discrimination, calibration, and overall predictive performance of the tested models. Using discrimination measures, we find that no method significantly outperforms the Cox model. However, (tuned) Accelerated Failure Time models were able to achieve significantly better results with respect to overall predictive performance as measured by the right-censored log-likelihood. Machine learning methods that performed comparably well include Oblique Random Survival Forests under discrimination, and Cox-based likelihood-boosting under overall predictive performance. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for practitioners.

MCML Authors

Lukas Burk

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[321]

M. Herrmann.
Dimensionality and Distance: Curse or Blessing? Geometrical Aspects of Nearest Neighbor Computation in High-Dimensional Data.
GMDS/IBS-DR - 54. Arbeitstagung der Arbeitsgruppen Statistical Computing, Klassifikation und Datenanalyse in den Biowissenschaften. Günzburg, Germany, Jul 28-31, 2024. PDF

Abstract

When it comes to computation, it is often said that high-dimensional data is particularly challenging, known as the curse of dimensionality. For example, in their seminal work, Beyer et al [1] study the impact of high-dimensional data on nearest neighbor computation. They show that in a wide range of settings, including IID data, the difference between the distance to the nearest neighbor and the distance to the most distant neighbor vanishes as the dimension increases. However, it is arguably often overlooked that they also point out that this result does not hold in certain situations, in particular when the intrinsic dimension of the data is low and/or when the data is distributed in well separable subsets. More generally, it is probably less well known that high dimensionality can make computation easier, to the extent that Kainen [2] even speaks of a blessing of dimensionality. Given these different aspects, a natural question to ask is: when is high dimensionality a curse and when is it not (or even a blessing)? In this talk we approach this question from a geometric point of view. Focusing on the aspect of nearest neighbor (and hence distance) computation, we show that high-dimensional data need not be more challenging than low-dimensional data in many practically relevant situations. In particular, using results from extensive experiments on synthetic and real data, we show that this can be the case for both outlier detection and cluster analysis, and for a range of different data types, including image and functional data [3, 4]. Moreover, based on concepts from manifold learning and topological data analysis, we show that these observations can be explained using a common conceptual foundation.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[320]

K. Bouchiat, A. Immer, H. Yèche, G. Ratsch and V. Fortuin.
Improving Neural Additive Models with Bayesian Principles.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Bayesian Deep Learning

[319]

M. Herrmann, F. J. D. Lange, K. Eggensperger, G. Casalicchio, M. Wever, M. Feurer, D. Rügamer, E. Hüllermeier, A.-L. Boulesteix and B. Bischl.
Position: Why We Must Rethink Empirical Research in Machine Learning.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

We warn against a common but incomplete understanding of empirical research in machine learning (ML) that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical ML research is fashioned as confirmatory research while it should rather be considered exploratory.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Marcel Wever

Dr.

* Former Member

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[318]

F. Karl, M. Kemeter, G. Dax and P. Sierak.
Position: Embracing Negative Results in Machine Learning.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Publications proposing novel machine learning methods are often primarily rated by exhibited predictive performance on selected problems. In this position paper we argue that predictive performance alone is not a good indicator for the worth of a publication. Using it as such even fosters problems like inefficiencies of the machine learning research community as a whole and setting wrong incentives for researchers. We therefore put out a call for the publication of “negative” results, which can help alleviate some of these problems and improve the scientific output of the machine learning research community. To substantiate our position, we present the advantages of publishing negative results and provide concrete measures for the community to move towards a paradigm where their publication is normalized.

MCML Authors

Florian Karl

Statistical Learning and Data Science

[317]

M. Lindauer, F. Karl, A. Klier, J. Moosbauer, A. Tornede, A. C. Mueller, F. Hutter, M. Feurer and B. Bischl.
Position: A Call to Action for a Human-Centered AutoML Paradigm.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive performance. This focused progress, while substantial, raises questions about how well AutoML has met its broader, original goals. In this position paper, we argue that a key to unlocking AutoML’s full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems, including their diverse roles, expectations, and expertise. We envision a more human-centered approach in future AutoML research, promoting the collaborative design of ML systems that tightly integrates the complementary strengths of human expertise and AutoML methodologies.

MCML Authors

Florian Karl

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[316]

T. Papamarkou, M. Skoularidou, K. Palla, L. Aitchison, J. Arbel, D. Dunson, M. Filippone, V. Fortuin, P. Hennig, J. M. Hernández-Lobato, A. Hubin, A. Immer, T. Karaletsos, M. E. Khan, A. Kristiadi, Y. Li, S. Mandt, C. Nemeth, M. A. Osborne, T. G. J. Rudner, D. Rügamer, Y. W. Teh, M. Welling, A. G. Wilson and R. Zhang.
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.

MCML Authors

Vincent Fortuin

Dr.

Bayesian Deep Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[315]

D. Rügamer, C. Kolb, T. Weber, L. Kook and T. Nagler.
Generalizing orthogonalization for models with non-linearities.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms’ application. It was, for instance, shown that neural networks can deduce racial information solely from a patient’s X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the ‘‘orthogonalization’’ or ‘’normalization’’ of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method’s effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Chris Kolb

Statistical Learning and Data Science

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Computational Statistics & Data Science

[314]

E. Sommer, L. Wimmer, T. Papamarkou, L. Bothmann, B. Bischl and D. Rügamer.
Connecting the Dots: Is Mode Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks’ parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a Bayesian deep ensemble approach as an effective solution with competitive performance and uncertainty quantification.

MCML Authors

Emanuel Sommer

Statistics, Data Science and Machine Learning

Lisa Wimmer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Statistics, Data Science and Machine Learning

[313]

D. Tramontano, Y. Kivva, S. Salehkaleybar, M. Drton and N. Kiyavash.
Causal Effect Identification in LiNGAM Models with Latent Confounders.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance. Regularization is key in deep learning, especially when training complex models on relatively small datasets. In order to understand inner workings of neural networks, attribution methods such as Layer-wise Relevance Propagation (LRP) have been extensively studied, particularly for interpreting the relevance of input features. We introduce Challenger, a module that leverages the explainable power of attribution maps in order to manipulate particularly relevant input patterns. Therefore, exposing and subsequently resolving regions of ambiguity towards separating classes on the ground-truth data manifold, an issue that arises particularly when training models on rather small datasets. Our Challenger module increases model performance through building more diverse filters within the network and can be applied to any input data domain. We demonstrate that our approach results in substantially better classification as well as calibration performance on datasets with only a few samples up to datasets with thousands of samples. In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.

MCML Authors

Daniele Tramontano

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[312]

S. Dandl, K. Blesch, T. Freiesleben, G. König, J. Kapar, B. Bischl and M. N. Wright.
CountARFactuals – Generating plausible model-agnostic counterfactual explanations with adversarial random forests.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Counterfactual explanations elucidate algorithmic decisions by pointing to scenarios that would have led to an alternative, desired outcome. Giving insight into the model’s behavior, they hint users towards possible actions and give grounds for contesting decisions. As a crucial factor in achieving these goals, counterfactuals must be plausible, i.e., describing realistic alternative scenarios within the data manifold. This paper leverages a recently developed generative modeling technique – adversarial random forests (ARFs) – to efficiently generate plausible counterfactuals in a model-agnostic way. ARFs can serve as a plausibility measure or directly generate counterfactual explanations. Our ARF-based approach surpasses the limitations of existing methods that aim to generate plausible counterfactual explanations: It is easy to train and computationally highly efficient, handles continuous and categorical data naturally, and allows integrating additional desiderata such as sparsity in a straightforward manner.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[311]

F. K. Ewald, L. Bothmann, M. N. Wright, B. Bischl, G. Casalicchio and G. König.
A Guide to Feature Importance Methods for Scientific Inference.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

While machine learning (ML) models are increasingly used due to their high predictive power, their use in understanding the data-generating process (DGP) is limited. Understanding the DGP requires insights into feature-target associations, which many ML models cannot directly provide due to their opaque internal mechanisms. Feature importance (FI) methods provide useful insights into the DGP under certain conditions. Since the results of different FI methods have different interpretations, selecting the correct FI method for a concrete use case is crucial and still requires expert knowledge. This paper serves as a comprehensive guide to help understand the different interpretations of global FI methods. Through an extensive review of FI methods and providing new proofs regarding their interpretation, we facilitate a thorough understanding of these methods and formulate concrete recommendations for scientific inference. We conclude by discussing options for FI uncertainty estimation and point to directions for future research aiming at full statistical inference from black-box ML models.

MCML Authors

Fiona Katharina Ewald

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[310]

D. Rundel, J. Kobialka, C. von Crailsheim, M. Feurer, T. Nagler and D. Rügamer.
Interpretable Machine Learning for TabPFN.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI GitHub

Abstract

The recently developed Prior-Data Fitted Networks (PFNs) have shown very promising results for applications in low-data regimes. The TabPFN model, a special case of PFNs for tabular data, is able to achieve state-of-the-art performance on a variety of classification tasks while producing posterior predictive distributions in mere seconds by in-context learning without the need for learning parameters or hyperparameter tuning. This makes TabPFN a very attractive option for a wide range of domain applications. However, a major drawback of the method is its lack of interpretability. Therefore, we propose several adaptations of popular interpretability methods that we specifically design for TabPFN. By taking advantage of the unique properties of the model, our adaptations allow for more efficient computations than existing implementations. In particular, we show how in-context learning facilitates the estimation of Shapley values by avoiding approximate retraining and enables the use of Leave-One-Covariate-Out (LOCO) even when working with large-scale Transformers. In addition, we demonstrate how data valuation methods can be used to address scalability challenges of TabPFN.

MCML Authors

David Rundel

A1 | Statistical Foundations & Explainability
→ Group Matthias Feurer

Statistical Learning and Data Science

Julius Kobialka

A1 | Statistical Foundations & Explainability
→ Group David Rügamer

Statistics, Data Science and Machine Learning

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

David Rügamer

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistics, Data Science and Machine Learning

[309]

C. A. Scholbeck, H. Funk and G. Casalicchio.
Algorithm-Agnostic Feature Attributions for Clustering.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Understanding how assignments of instances to clusters can be attributed to the features can be vital in many applications. However, research to provide such feature attributions has been limited. Clustering algorithms with built-in explanations are scarce. Common algorithm-agnostic approaches involve dimension reduction and subsequent visualization, which transforms the original features used to cluster the data; or training a supervised learning classifier on the found cluster labels, which adds additional and intractable complexity. We present FACT (feature attributions for clustering), an algorithm-agnostic framework that preserves the integrity of the data and does not introduce additional models. As the defining characteristic of FACT, we introduce a set of work stages: sampling, intervention, reassignment, and aggregation. Furthermore, we propose two novel FACT methods: SMART (scoring metric after permutation) measures changes in cluster assignments by custom scoring functions after permuting selected features; IDEA (isolated effect on assignment) indicates local and global changes in cluster assignments after making uniform changes to selected features.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[308]

S. Dandl, M. Becker, B. Bischl, G. Casalicchio and L. Bothmann.
mlr3summary: Concise and interpretable summaries for machine learning models.
xAI 2024 - Demo Track of the 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. arXiv

Abstract

This work introduces a novel R package for concise, informative summaries of machine learning models. We take inspiration from the summary function for (generalized) linear models in R, but extend it in several directions: First, our summary function is model-agnostic and provides a unified summary output also for non-parametric machine learning models; Second, the summary output is more extensive and customizable – it comprises information on the dataset, model performance, model complexity, model’s estimated feature importances, feature effects, and fairness metrics; Third, models are evaluated based on resampling strategies for unbiased estimates of model performances, feature importances, etc. Overall, the clear, structured output should help to enhance and expedite the model selection process, making it a helpful tool for practitioners and researchers alike.

MCML Authors

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[307]

L. Kook, C. Kolb, P. Schiele, D. Dold, M. Arpogaus, C. Fritz, P. Baumann, P. Kopper, T. Pielok, E. Dorigatti and D. Rügamer.
How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL

Abstract

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.

MCML Authors

Chris Kolb

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Tobias Pielok

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[306]

Y. Sale, P. Hofman, T. Löhr, L. Wimmer, T. Nagler and E. Hüllermeier.
Label-wise Aleatoric and Epistemic Uncertainty Quantification.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL

Abstract

We present a novel approach to uncertainty quantification in classification tasks based on label-wise decomposition of uncertainty measures. This label-wise perspective allows uncertainty to be quantified at the individual class level, thereby improving cost-sensitive decision-making and helping understand the sources of uncertainty. Furthermore, it allows to define total, aleatoric, and epistemic uncertainty on the basis of non-categorical measures such as variance, going beyond common entropy-based measures. In particular, variance-based measures address some of the limitations associated with established methods that have recently been discussed in the literature. We show that our proposed measures adhere to a number of desirable properties. Through empirical evaluation on a variety of benchmark data sets – including applications in the medical domain where accurate uncertainty quantification is crucial – we establish the effectiveness of label-wise uncertainty quantification.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Lisa Wimmer

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Eyke Hüllermeier

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Artificial Intelligence and Machine Learning

[305]

J. Piller, H. Küchenhoff and A. Bender.
Flexible additive models for multi-event survival analysis.
IWSM 2024 - 38th International Workshop on Statistical Modelling. Durham, UK, Jul 14-19, 2024. PDF

Abstract

Piecewise Exponential Additive Mixed Models (PAMMs) (Bender et al., 2018) have gained popularity in various domains due to their ability to tackle a wide variety of survival problems and their flexibility to model non-linear covariate effects, including time-varying effects and cumulative effects (Bender et al., 2019). One advantage of such reduction techniques is that they do not require any specialised software for the estimation of the model parameters. Thus, in the case of the PAMM, they can be conveniently estimated using generalized additive mixed modeling methodology or, for example, respective boosting or deep learning based approaches (Bender et al., 2022). Nevertheless, their use in practice requires pre-processing, which differs depending on the survival task at hand (e.g. left-truncation, competing risks, etc.) and post-processing (e.g. transforming estimated parameters to useful quantities like survival or transition probabilities). The R package pammtools facilitates the entire modeling process, so far, however, only for single-event data. Here we extend the framework and package capabilities to handle general multi-state models.

MCML Authors

Johannes Piller

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[304]

S. Dandl, M. Becker, B. Bischl, G. Casalicchio and L. Bothmann.
mlr3summary: Concise and interpretable summaries for machine learning models.
useR! 2024 - International R User Conference. Salzburg, Austria, Jul 08-22, 2024. arXiv GitHub

Abstract

MCML Authors

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[303]

S. Fischer and M. Binder.
mlr3torch - Deep Learning in R.
useR! 2024 - International R User Conference. Salzburg, Austria, Jul 08-22, 2024. GitHub

Abstract

mlr3torch is a deep learning framework for the mlr3 ecosystem built on top of torch. It allows to easily build, train and evaluate deep learning models in a few lines of codes, without needing to worry about low-level details. Off-the-shelf learners are readily available, but custom architectures can be defined by connecting PipeOpTorch operators in an mlr3pipelines::Graph.

MCML Authors

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[302]

M. M. Mandl, A. S. Becker-Pennrich, L. C. Hinske, S. Hoffmann and A.-L. Boulesteix.
Addressing researcher degrees of freedom through minP adjustment.
BMC Medical Research Methodology 24.152 (Jul. 2024). DOI

Abstract

When different researchers study the same research question using the same dataset they may obtain different and potentially even conflicting results. This is because there is often substantial flexibility in researchers’ analytical choices, an issue also referred to as ‘‘researcher degrees of freedom’’. Combined with selective reporting of the smallest p-value or largest effect, researcher degrees of freedom may lead to an increased rate of false positive and overoptimistic results. In this paper, we address this issue by formalizing the multiplicity of analysis strategies as a multiple testing problem. As the test statistics of different analysis strategies are usually highly dependent, a naive approach such as the Bonferroni correction is inappropriate because it leads to an unacceptable loss of power. Instead, we propose using the ‘‘minP’’ adjustment method, which takes potential test dependencies into account and approximates the underlying null distribution of the minimal p-value through a permutation-based procedure. This procedure is known to achieve more power than simpler approaches while ensuring a weak control of the family-wise error rate. We illustrate our approach for addressing researcher degrees of freedom by applying it to a study on the impact of perioperative paO2 on post-operative complications after neurosurgery. A total of 48 analysis strategies are considered and adjusted using the minP procedure. This approach allows to selectively report the result of the analysis strategy yielding the most convincing evidence, while controlling the type 1 error – and thus the risk of publishing false positive results that may not be replicable.

MCML Authors

Maximilian Mandl

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[301]

B. Ronval, S. Nijssen and L. Bothmann.
Can generative AI-based data balancing mitigate unfairness issues in Machine Learning?
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

Data imbalance in the protected attributes can lead to machine learning models that perform better on the majority than on the minority group, giving rise to unfairness issues. While baseline methods like undersampling or SMOTE can balance datasets, we investigate how methods of generative artificial intelligence compare concerning classical fairness metrics. Using generated fake data, we propose different balancing methods and investigate the behavior of classification models in thorough benchmark studies using German credit and Berkeley admission data. While our experiments suggest that such methods may improve fairness metrics, further investigations are necessary to derive clear practical recommendations.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[300]

F. Karl, J. Thomas, J. Elstner, R. Gross and B. Bischl.
Automated Machine Learning.
Unlocking Artificial Intelligence (Jul. 2024). DOI

Abstract

In the past few years automated machine learning (AutoML) has gained a lot of traction in the data science and machine learning community. AutoML aims at reducing the partly repetitive work of data scientists and enabling domain experts to construct machine learning pipelines without extensive knowledge in data science. This chapter presents a comprehensive review of the current leading AutoML methods and sets AutoML in an industrial context. To this extent we present the typical components of an AutoML system, give an overview over the stateof-the-art and highlight challenges to industrial application by presenting several important topics such as AutoML for time series data, AutoML in unsupervised settings, AutoML with multiple evaluation criteria, or interactive human-in-the-loop methods. Finally, the connection to Neural Architecture Search (NAS) is presented and a brief review with special emphasis on hardware-aware NAS is given.

MCML Authors

Florian Karl

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[299]

D. Köhler, D. Rügamer and M. Schmid.
Achieving interpretable machine learning by functional decomposition of black-box models into explainable predictor effects.
Preprint (Jul. 2024). arXiv

Abstract

Machine learning (ML) has seen significant growth in both popularity and importance. The high prediction accuracy of ML models is often achieved through complex black-box architectures that are difficult to interpret. This interpretability problem has been hindering the use of ML in fields like medicine, ecology and insurance, where an understanding of the inner workings of the model is paramount to ensure user acceptance and fairness. The need for interpretable ML models has boosted research in the field of interpretable machine learning (IML). Here we propose a novel approach for the functional decomposition of black-box predictions, which is considered a core concept of IML. The idea of our method is to replace the prediction function by a surrogate model consisting of simpler subfunctions. Similar to additive regression models, these functions provide insights into the direction and strength of the main feature contributions and their interactions. Our method is based on a novel concept termed stacked orthogonality, which ensures that the main effects capture as much functional behavior as possible and do not contain information explained by higher-order interactions. Unlike earlier functional IML approaches, it is neither affected by extrapolation nor by hidden feature interactions. To compute the subfunctions, we propose an algorithm based on neural additive modeling and an efficient post-hoc orthogonalization procedure.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[298]

F. Sergeev, P. Malsot, G. Rätsch and V. Fortuin.
Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information.
Preprint (Jul. 2024). arXiv

Abstract

Knowing which features of a multivariate time series to measure and when is a key task in medicine, wearables, and robotics. Better acquisition policies can reduce costs while maintaining or even improving the performance of downstream predictors. Inspired by the maximization of conditional mutual information, we propose an approach to train acquirers end-to-end using only the downstream loss. We show that our method outperforms random acquisition policy, matches a model with an unrestrained budget, but does not yet overtake a static acquisition strategy. We highlight the assumptions and outline avenues for future work.

MCML Authors

Vincent Fortuin

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Bayesian Deep Learning

[297]

B. Deiseroth, M. Meuer, N. Gritsch, C. Eichenberg, P. Schramowski, M. Aßenmacher and K. Kersting.
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. DOI

Abstract

Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. However, their ever-increasing size has raised concerns about their effective deployment and the need for LLM compression. This study introduces the Divergent Token Metrics (DTMs), a novel approach to assessing compressed LLMs, addressing the limitations of traditional perplexity or accuracy measures that fail to accurately reflect text generation quality. DTMs measure token divergences that allow deeper insights into the subtleties of model compression, in particular, when evaluating components’ impacts individually. Utilizing the First Divergent Token Metric (FDTM) in model sparsification reveals that 25% of all attention components can be pruned beyond 90% on the Llama-2 model family, still keeping SOTA performance. For quantization, FDTM suggests that more than 80% of parameters can be naively transformed to int8 without special outlier management. These evaluations indicate the necessity of choosing appropriate compressions for parameters individually—and that FDTM can identify those—while standard metrics result in deteriorated outcomes.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[296]

H. Chen, J. Büssing, D. Rügamer and E. Nie.
Leveraging (Sentence) Transformer Models with Contrastive Learning for Identifying Machine-Generated Text.
SemEval @NAACL 2024 - 18th International Workshop on Semantic Evaluation at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

This paper outlines our approach to SemEval-2024 Task 8 (Subtask B), which focuses on discerning machine-generated text from human-written content, while also identifying the text sources, i.e., from which Large Language Model (LLM) the target text is generated. Our detection system is built upon Transformer-based techniques, leveraging various pre-trained language models (PLMs), including sentence transformer models. Additionally, we incorporate Contrastive Learning (CL) into the classifier to improve the detecting capabilities and employ Data Augmentation methods. Ultimately, our system achieves a peak accuracy of 76.96% on the test set of the competition, configured using a sentence transformer model integrated with CL methodology.

MCML Authors

David Rügamer

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Statistics, Data Science and Machine Learning

Ercong Nie

Computational Linguistics

[295]

L. Mayer, C. Heumann and M. Aßenmacher.
Can OpenSource beat ChatGPT? - A Comparative Study of Large Language Models for Text-to-Code Generation.
SwissText 2024 - Swiss Text Analytics Conference. Chur, Switzerland, Jun 10-11, 2024. URL

Abstract

In recent years, large language models (LLMs) have emerged as powerful tools with potential applications in various fields, including software engineering. Within the scope of this research, we evaluate five different state-of-the-art LLMs - Bard, BingChat, ChatGPT, Llama2, and Code Llama - concerning their capabilities for text-to-code generation. In an empirical study, we feed prompts with textual descriptions of coding problems sourced from the programming website LeetCode to the models with the task of creating solutions in Python. Subsequently, the quality of the generated outputs is assessed using the testing functionalities of LeetCode. The results indicate large differences in performance between the investigated models. ChatGPT can handle these typical programming challenges by far the most effectively, surpassing even code-specialized models like Code Llama. To gain further insights, we measure the runtime as well as the memory usage of the generated outputs and compared them to the other code submissions on Leetcode. A detailed error analysis, encompassing a comparison of the differences concerning correct indentation and form of the generated code as well as an assignment of the incorrectly solved tasks to certain error categories allows us to obtain a more nuanced picture of the results and potential for improvement. The results also show a clear pattern of increasingly incorrect produced code when the models are facing a lot of context in the form of longer prompts.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[294]

B. Säfken and D. Rügamer.
Editorial special issue: Bridging the gap between AI and Statistics.
Advances in Statistical Analysis 108 (Jun. 2024). DOI

Abstract

This special issue aims to serve as a nexus for this vital interdisciplinary exchange. From theoretical advancements to innovative applications, this issue seeks to illuminate the synergy between AI and statistics and pave the way for a new era of discovery and innovation at the confluence of AI and statistics.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[293]

K. Göbler, M. Drton, S. Mukherjee and A. Miloschewski.
High-dimensional undirected graphical models for arbitrary mixed data.
Electronic Journal of Statistics 18.1 (Jun. 2024). DOI

Abstract

Graphical models are an important tool in exploring relationships between variables in complex, multivariate data. Methods for learning such graphical models are well-developed in the case where all variables are either continuous or discrete, including in high dimensions. However, in many applications, data span variables of different types (e.g., continuous, count, binary, ordinal, etc.), whose principled joint analysis is nontrivial. Latent Gaussian copula models, in which all variables are modeled as transformations of underlying jointly Gaussian variables, represent a useful approach. Recent advances have shown how the binary-continuous case can be tackled, but the general mixed variable type regime remains challenging. In this work, we make the simple but useful observation that classical ideas concerning polychoric and polyserial correlations can be leveraged in a latent Gaussian copula framework. Building on this observation, we propose a flexible and scalable methodology for data with variables of entirely general mixed type. We study the key properties of the approaches theoretically and empirically.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[292]

J. Ramjith, A. Bender, K. C. B. Roes and M. A. Jonker.
Recurrent events analysis with piece-wise exponential additive mixed models.
Statistical Modelling 24.3 (Jun. 2024). DOI

Abstract

Recurrent events analysis plays an important role in many applications, including the study of chronic diseases or recurrence of infections. Historically, many models for recurrent events have been variants of the Cox model. In this article we introduce and describe the application of the piece-wise exponential Additive Mixed Model (PAMM) for recurrent events analysis and illustrate how PAMMs can be used to flexibly model the dependencies in recurrent events data. Simulations confirm that PAMMs provide unbiased estimates as well as equivalence to the Cox model when proportional hazards are assumed. Applications to recurrence of staphylococcus aureus and malaria in children illustrate the estimation of seasonality, bivariate non-linear effects, multiple timescales and relaxation of the proportional hazards assumption via time-varying effects. The R package pammtools is extended to facilitate estimation and visualization of PAMMs for recurrent events data.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[291]

B. Felderer, L. Repke, W. Weber, J. Schweisthal and L. Bothmann.
Predicting the Validity and Reliability of Survey Questions.
Preprint (Jun. 2024). DOI

Abstract

The Survey Quality Predictor (SQP) is an open-access system to predict the quality, i.e., the reliability and validity, of survey questions based on the characteristics of the questions. The prediction is based on a meta-regression of many multitrait-multimethod (MTMM) experiments in which characteristics of the survey questions were systematically varied. The release of SQP 3.0 that is based on an expanded data base as compared to previous SQP versions raised the need for a new meta-regression. To find the best method for analyzing the complex data structure of SQP (e.g., the existence of various uncorrelated predictors), we compared four suitable machine learning methods in terms of their ability to predict both survey quality indicators: LASSO, elastic net, boosting and random forest. The article discusses the performance of the models and illustrates the importance of the individual item characteristics in the random forest model, which was chosen for SQP 3.0.

MCML Authors

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[290]

E. Dorigatti.
Cancer immunotherapy design and analysis through discrete optimization, positive-unlabeled learning, and semi-structured regression models.
Dissertation 2024. DOI

Abstract

This thesis advances precision medicine by leveraging artificial intelligence to improve cancer immunotherapy development and tackle key challenges in clinical trials, where high failure rates often stem from insufficient understanding of patient and disease-specific factors. Through novel computational frameworks for cancer vaccine design, methods for handling imbalanced biological data, and hybrid modeling techniques that combine clinical data with imaging, this work demonstrates AI’s potential to personalize and accelerate therapeutic development. These contributions collectively pave the way for more effective, targeted treatments, potentially reducing the time and cost to bring new therapies to market. (Shortened).

MCML Authors

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[289]

C. A. Scholbeck.
Bridging gaps in interpretable machine learning: sensitivity analysis, marginal effects, and cluster explanations.
Dissertation 2024. DOI

Abstract

This thesis explores interpretable machine learning (IML) through six papers, bridging the gap between IML and model interpretation in other domains. It presents a generalized framework for model-agnostic interpretation methods, highlights potential pitfalls, and connects IML to sensitivity analysis used in fields like environmental modeling. A novel approach, forward marginal effects (FMEs), is introduced to interpret predictive models at multiple levels, supported by the R package fmeffects. The work also extends IML to unsupervised learning by proposing algorithm-agnostic cluster explanation methods, including two new techniques: SMART and IDEA, for analyzing feature contributions to clustering. (Shortened.)

MCML Authors

Christian Alexander Scholbeck

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[288]

K. Röck.
Stochastic processes as surrogate models for dynamical systems in magnetic confinement fusion.
Dissertation 2024. DOI

Abstract

This thesis focuses on incorporating domain-specific knowledge into machine learning (ML) models for scientific applications, ensuring they accurately reflect underlying physical systems.
The first part introduces physics-consistent Gaussian processes (GPs), embedding physical laws directly into the model. These models address data governed by partial differential equations (PDEs) and Hamiltonian systems, preserving physical properties like symplecticity and enabling faster, long-term simulations. Applications include classifying chaotic trajectories and computing Lyapunov exponents.
The second part tackles data scarcity in plasma physics by proposing robust surrogate models for multivariate time series. Using Student-$t$ process regression, these models handle outliers effectively and facilitate data imputation and augmentation, ensuring reliable predictions for multichannel sensor data.
This work advances ML approaches for surrogate modeling, chaos analysis, and plasma physics. (Shortened.)

MCML Authors

Katharina Röck (née Rath)

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[287]

R. Kohli, M. Feurer, B. Bischl, K. Eggensperger and F. Hutter.
Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning.
DMLR @ICLR 2024 - Workshop on Data-centric Machine Learning Research at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Data in tabular form makes up a large part of real-world ML applications, and thus, there has been a strong interest in developing novel deep learning (DL) architectures for supervised learning on tabular data in recent years. As a result, there is a debate as to whether DL methods are superior to the ubiquitous ensembles of boosted decision trees. Typically, the advantage of one model class over the other is claimed based on an empirical evaluation, where different variations of both model classes are compared on a set of benchmark datasets that supposedly resemble relevant real-world tabular data. While the landscape of state-of-the-art models for tabular data changed, one factor has remained largely constant over the years: The datasets. Here, we examine 30 recent publications and 187 different datasets they use, in terms of age, study size and relevance. We found that the average study used less than 10 datasets and that half of the datasets are older than 20 years. Our insights raise questions about the conclusions drawn from previous studies and urge the research community to develop and publish additional recent, challenging and relevant datasets and ML tasks for supervised learning on tabular data.

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[286]

A. Vahidi, S. Schosser, L. Wimmer, Y. Li, B. Bischl, E. Hüllermeier and M. Rezaei.
Probabilistic Self-supervised Representation Learning via Scoring Rules Minimization.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL GitHub

Abstract

In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN’s convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method’s optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Yawei Li

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Mina Rezaei

Dr.

Statistical Learning and Data Science

[285]

D. Dold, D. Rügamer, B. Sick and O. Dürr.
Bayesian Semi-structured Subspace Inference.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects. The structured model part is inspired by statistical models and can be used to infer the input-output relationship for features of particular importance. The complex unstructured part defines an arbitrary deep neural network and thereby provides enough flexibility to achieve competitive prediction performance. While these models can also account for aleatoric uncertainty, there is still a lack of work on accounting for epistemic uncertainty. In this paper, we address this problem by presenting a Bayesian approximation for semi-structured regression models using subspace inference. To this end, we extend subspace inference for joint posterior sampling from a full parameter space for structured effects and a subspace for unstructured effects. Apart from this hybrid sampling scheme, our method allows for tunable complexity of the subspace and can capture multiple minima in the loss landscape. Numerical experiments validate our approach’s efficacy in recovering structured effect parameter posteriors in semi-structured models and approaching the full-space posterior distribution of MCMC for increasing subspace dimension. Further, our approach exhibits competitive predictive performance across simulated and real-world datasets.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Statistics, Data Science and Machine Learning

[284]

N. Palm and T. Nagler.
An Online Bootstrap for Time Series.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.

MCML Authors

Nicolai Palm

Computational Statistics & Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[283]

D. Rügamer.
Scalable Higher-Order Tensor Product Spline Models.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

In the current era of vast data and transparent machine learning, it is essential for techniques to operate at a large scale while providing a clear mathematical comprehension of the internal workings of the method. Although there already exist interpretable semi-parametric regression methods for large-scale applications that take into account non-linearity in the data, the complexity of the models is still often limited. One of the main challenges is the absence of interactions in these models, which are left out for the sake of better interpretability but also due to impractical computational costs. To overcome this limitation, we propose a new approach using a factorization method to derive a highly scalable higher-order tensor product spline model. Our method allows for the incorporation of all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions. We further develop a meaningful penalization scheme and examine the induced optimization problem. We conclude by evaluating the predictive and estimation performance of our method.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[282]

A. Solderer, S. P. Hicklin, M. Aßenmacher, A. Ender and P. R. Schmidlin.
Influence of an allogenic collagen scaffold on implant sites with thin supracrestal tissue height: a randomized clinical trial.
Clinical Oral Investigations 28.313 (May. 2024). DOI

Abstract

Objectives: This randomized clinical trial focused on patients with thin peri-implant soft-tissue height (STH) (≤ 2.5 mm) and investigated the impact of an allogenic collagen scaffold (aCS) on supracrestal tissue height and marginal bone loss (MBL).
Material & methods: Forty patients received bone level implants and were randomly assigned to the test group with simultaneous tissue thickening with aCS or the control group. After three months, prosthetic restoration occurred. STH measurements were taken at baseline (T0) and reopening surgery (TR), with MBL assessed at 12 months (T1). Descriptive statistics were calculated for continuous variables, and counts for categorical variables (significance level, p = 0.05).
Results: At T1, 37 patients were available. At T0, control and test groups had mean STH values of 2.3 ± 0.3 mm and 2.1 ± 0.4 mm. TR revealed mean STH values of 2.3 ± 0.2 mm (control) and 2.6 ± 0.7 mm (test), with a significant tissue thickening of 0.5 ± 0.6 mm in the test group (p < 0.03). At T1, control and test groups showed MBL mean values of 1.1 ± 0.8 mm and 1.0 ± 0.6 mm, with a moderate but significant correlation with STH thickening (-0.34), implant position (0.43), history of periodontitis (0.39), and smoking status (0.27).
Conclusion: The use of an aCS protocol resulted in soft tissue thickening but did not reach a threshold to reliably reduce MBL compared to the control group within the study’s limitations.
Clinical relevance: Peri-implant STH is crucial for maintaining peri-implant marginal bone stability. Marginal bone stability represents a crucial factor in prevention of peri-implantitis development.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[281]

K. Jeblick, B. Schachtner, J. Dexl, A. Mittermeier, A. T. Stüber, J. Topalis, T. Weber, P. Wesp, B. O. Sabel, J. Ricke and M. Ingrisch.
ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports.
European Radiology 34 (May. 2024). DOI

Abstract

Objectives: To assess the quality of simplified radiology reports generated with the large language model (LLM) ChatGPT and to discuss challenges and chances of ChatGPT-like LLMs for medical text simplification.
Methods: In this exploratory case study, a radiologist created three fictitious radiology reports which we simplified by prompting ChatGPT with ‘Explain this medical report to a child using simple language.’’ In a questionnaire, we tasked 15 radiologists to rate the quality of the simplified radiology reports with respect to their factual correctness, completeness, and potential harm for patients. We used Likert scale analysis and inductive free-text categorization to assess the quality of the simplified reports.
Results: Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed relevant medical information, and potentially harmful passages were reported.
Conclusion: While we see a need for further adaption to the medical field, the initial insights of this study indicate a tremendous potential in using LLMs like ChatGPT to improve patient-centered care in radiology and other medical domains.
Clinical relevance statement: Patients have started to use ChatGPT to simplify and explain their medical reports, which is expected to affect patient-doctor interaction. This phenomenon raises several opportunities and challenges for clinical routine.

MCML Authors

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Jakob Dexl

Clinical Data Science in Radiology

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Theresa Stüber

Clinical Data Science in Radiology

Johanna Topalis

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Tobias Weber

* Former Member

Philipp Wesp

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[280]

A. F. Thielmann, A. Reuter, T. Kneib, D. Rügamer and B. Säfken.
Interpretable Additive Tabular Transformer Networks.
Transactions on Machine Learning Research (May. 2024). URL

Abstract

Attention based Transformer networks have not only revolutionized Natural Language Processing but have also achieved state-of-the-art results for tabular data modeling. The attention mechanism, in particular, has proven to be highly effective in accurately modeling categorical variables. Although deep learning models recently outperform tree-based models, they often lack a complete comprehension of the individual impact of features because of their opaque nature. In contrast, additive neural network structures have proven to be both predictive and interpretable. Within the context of explainable deep learning, we propose Neural Additive Tabular Transformer Networks (NATT), a modeling framework that combines the intelligibility of additive neural networks with the predictive power of Transformer models. NATT offers inherent intelligibility while achieving similar performance to complex deep learning models. To validate its efficacy, we conduct experiments on multiple datasets and find that NATT performs on par with state-of-the-art methods on tabular data and surpasses other interpretable approaches.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[279]

R. Debelak, T. Koch, M. Aßenmacher and C. Stachl.
From Embeddings to Explainability: A Tutorial on Transformer-Based Text Analysis for Social and Behavioral Scientists.
Preprint (May. 2024). DOI

Abstract

Large language models and their use for text analysis have had a significant impact on psychology and the social and behavioral sciences in general. Key applications include the analysis of texts, such as social media posts, to infer psychological characteristics, as well as survey and interview analysis. In this tutorial paper, we demonstrate the use of the Python-based natural language processing software package transformers (and related modules from the Hugging Face Ecosystem) that allow for the automated classification of text inputs in a practical exercise. In doing so, we rely on pretrained transformer models which can be fine-tuned to a specific task and domain. The first proposed application of this model class is to use it as a feature extractor, allowing for the transformation of written text into real-valued numerical vectors (called ’embeddings’) that capture a text’s semantic meaning. These vectors can, in turn, be used as input for a subsequent machine-learning model. The second presented application of transformer models is the end-to-end training (so-called ‘fine-tuning’) of the model. This results in a direct prediction of the label within the same model that directly maps the text to the embeddings. While in the second case, results are usually better and training works more seamlessly, the model itself is often not directly interpretable. We showcase an alleviation of this issue via the application of post-hoc interpretability methods by calculating SHAP values and applying local interpretable model-agnostic explanations (LIME) in an attempt to explain the model’s inner workings.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[278]

P. Dettling, M. Drton and M. Kolar.
On the Lasso for Graphical Continuous Lyapunov Models.
CLeaR 2024 - 3rd Conference on Causal Learning and Reasoning. Los Angeles, CA, USA, Apr 01-03, 2024. URL

Abstract

Graphical continuous Lyapunov models offer a new perspective on modeling causally interpretable dependence structure in multivariate data by treating each independent observation as a one-time cross-sectional snapshot of a temporal process. Specifically, the models assume that the observations are cross-sections of independent multivariate Ornstein-Uhlenbeck processes in equilibrium. The Gaussian equilibrium exists under a stability assumption on the drift matrix, and the equilibrium covariance matrix is determined by the continuous Lyapunov equation. Each graphical continuous Lyapunov model assumes the drift matrix to be sparse, with a support determined by a directed graph. A natural approach to model selection in this setting is to use an ℓ1-regularization technique that, based on a given sample covariance matrix, seeks to find a sparse approximate solution to the Lyapunov equation. We study the model selection properties of the resulting lasso technique to arrive at a consistency result. Our detailed analysis reveals that the involved irrepresentability condition is surprisingly difficult to satisfy. While this may prevent asymptotic consistency in model selection, our numerical experiments indicate that even if the theoretical requirements for consistency are not met, the lasso approach is able to recover relevant structure of the drift matrix and is robust to aspects of model misspecification.

MCML Authors

Mathias Drton

Prof. Dr.

Mathematical Statistics

[277]

K. Göbler, T. Windisch, M. Drton, T. Pychynski, M. Roth and S. Sonntag.
causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery.
CLeaR 2024 - 3rd Conference on Causal Learning and Reasoning. Los Angeles, CA, USA, Apr 01-03, 2024. URL

Abstract

Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real and complex data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To tackle these challenges, we introduce causalAssembly, a semisynthetic data generator designed to facilitate the benchmarking of causal discovery methods. The tool is built using a complex real-world dataset comprised of measurements collected along an assembly line in a manufacturing setting. For these measurements, we establish a partial set of ground truth causal relationships through a detailed study of the physics underlying the processes carried out in the assembly line. The partial ground truth is sufficiently informative to allow for estimation of a full causal graph by mere nonparametric regression. To overcome potential confounding and privacy concerns, we use distributional random forests to estimate and represent conditional distributions implied by the ground truth causal graph. These conditionals are combined into a joint distribution that strictly adheres to a causal model over the observed variables. Sampling from this distribution, causalAssembly generates data that are guaranteed to be Markovian with respect to the ground truth. Using our tool, we showcase how to benchmark several well-known causal discovery algorithms.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

[276]

D. Strieder and M. Drton.
Dual Likelihood for Causal Inference under Structure Uncertainty.
CLeaR 2024 - 3rd Conference on Causal Learning and Reasoning. Los Angeles, CA, USA, Apr 01-03, 2024. URL

Abstract

Knowledge of the underlying causal relations is essential for inferring the effect of interventions in complex systems. In a widely studied approach, structural causal models postulate noisy functional relations among interacting variables, where the underlying causal structure is then naturally represented by a directed graph whose edges indicate direct causal dependencies. In the typical application, this underlying causal structure must be learned from data, and thus, the remaining structure uncertainty needs to be incorporated into causal inference in order to draw reliable conclusions. In recent work, test inversions provide an ansatz to account for this data-driven model choice and, therefore, combine structure learning with causal inference. In this article, we propose the use of dual likelihood to greatly simplify the treatment of the involved testing problem. Indeed, dual likelihood leads to a closed-form solution for constructing confidence regions for total causal effects that rigorously capture both sources of uncertainty: causal structure and numerical size of nonzero effects. The proposed confidence regions can be computed with a bottom-up procedure starting from sink nodes. To render the causal structure identifiable, we develop our ideas in the context of linear causal relations with equal error variances.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[275]

H. A. Gündüz, R. Mreches, J. Moosbauer, G. Robertson, X.-Y. To, E. A. Franzosa, C. Huttenhower, M. Rezaei, A. C. McHardy, B. Bischl, P. C. Münch and M. Binder.
Optimized model architectures for deep learning on genomic data.
Communications Biology 7.1 (Apr. 2024). DOI

Abstract

The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.

MCML Authors

Hüseyin Anil Gündüz

* Former Member

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Xiao-Yin To

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

[274]

M. Herrmann, D. Kazempour, F. Scheipl and P. Kröger.
Enhancing cluster analysis via topological manifold learning.
Data Mining and Knowledge Discovery 38 (Apr. 2024). DOI

Abstract

We discuss topological aspects of cluster analysis and show that inferring the topological structure of a dataset before clustering it can considerably enhance cluster detection: we show that clustering embedding vectors representing the inherent structure of a dataset instead of the observed feature vectors themselves is highly beneficial. To demonstrate, we combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN. Synthetic and real data results show that this both simplifies and improves clustering in a diverse set of low- and high-dimensional problems including clusters of varying density and/or entangled shapes. Our approach simplifies clustering because topological pre-processing consistently reduces parameter sensitivity of DBSCAN. Clustering the resulting embeddings with DBSCAN can then even outperform complex methods such as SPECTACL and ClusterGAN. Finally, our investigation suggests that the crucial issue in clustering does not appear to be the nominal dimension of the data or how many irrelevant features it contains, but rather how separable the clusters are in the ambient observation space they are embedded in, which is usually the (high-dimensional) Euclidean space defined by the features of the data. The approach is successful because it performs the cluster analysis after projecting the data into a more suitable space that is optimized for separability, in some sense.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Fabian Scheipl

PD Dr.

Functional Data Analysis

[273]

G. S. Collins, K. G. M. Moons, P. Dhiman, R. D. Riley, A. L. Beam, B. Van Calster, M. Ghassemi, X. Liu, J. B. Reitsma, M. van Smeden, A.-L. Boulesteix, J. C. Camaradou, L. A. Celi, S. Denaxas, A. K. Denniston, B. Glocker, R. M. Golub, H. Harvey, G. Heinze, M. M. Hoffman, A. P. Kengne, E. Lam, N. Lee, E. W. Loder, L. Maier-Hein, B. A. Mateen, M. D. McCradden, L. Oakden-Rayner, J. Ordish, R. Parnell, S. Rose, K. Singh, L. Wynants and P. Logullo.
TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods.
The BMJ 385.e078378 (Apr. 2024). DOI

Abstract

The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement was published in 2015 to provide the minimum reporting recommendations for studies developing or evaluating the performance of a prediction model. Methodological advances in the field of prediction have since included the widespread use of artificial intelligence (AI) powered by machine learning methods to develop prediction models. An update to the TRIPOD statement is thus needed. TRIPOD+AI provides harmonised guidance for reporting prediction model studies, irrespective of whether regression modelling or machine learning methods have been used. The new checklist supersedes the TRIPOD 2015 checklist, which should no longer be used. This article describes the development of TRIPOD+AI and presents the expanded 27 item checklist with more detailed explanation of each reporting recommendation, and the TRIPOD+AI for Abstracts checklist. TRIPOD+AI aims to promote the complete, accurate, and transparent reporting of studies that develop a prediction model or evaluate its performance. Complete reporting will facilitate study appraisal, model evaluation, and model implementation.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[272]

V. Gkolemis, C. Diou, E. Ntoutsi, T. Dalamagas, B. Bischl, J. Herbinger and G. Casalicchio.
Effector: A Python package for regional explanations.
Preprint (Apr. 2024). arXiv GitHub

Abstract

Global feature effect methods explain a model outputting one plot per feature. The plot shows the average effect of the feature on the output, like the effect of age on the annual income. However, average effects may be misleading when derived from local effects that are heterogeneous, i.e., they significantly deviate from the average. To decrease the heterogeneity, regional effects provide multiple plots per feature, each representing the average effect within a specific subspace. For interpretability, subspaces are defined as hyperrectangles defined by a chain of logical rules, like age’s effect on annual income separately for males and females and different levels of professional experience. We introduce Effector, a Python library dedicated to regional feature effects. Effector implements well-established global effect methods, assesses the heterogeneity of each method and, based on that, provides regional effects. Effector automatically detects subspaces where regional effects have reduced heterogeneity. All global and regional effect methods share a common API, facilitating comparisons between them. Moreover, the library’s interface is extensible so new methods can be easily added and benchmarked.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Julia Herbinger

Dr.

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[271]

M. Wünsch, M. Herrmann, E. Noltenius, M. Mohr, T. P. Morris and A.-L. Boulesteix.
On the handling of method failure in comparison studies.
Preprint (Apr. 2024). arXiv

Abstract

Comparison studies in methodological research are intended to compare methods in an evidence-based manner, offering guidance to data analysts to select a suitable method for their application. To provide trustworthy evidence, they must be carefully designed, implemented, and reported, especially given the many decisions made in planning and running. A common challenge in comparison studies is to handle the ``failure’’ of one or more methods to produce a result for some (real or simulated) data sets, such that their performances cannot be measured in those instances. Despite an increasing emphasis on this topic in recent literature (focusing on non-convergence as a common manifestation), there is little guidance on proper handling and interpretation, and reporting of the chosen approach is often neglected. This paper aims to fill this gap and provides practical guidance for handling method failure in comparison studies. In particular, we show that the popular approaches of discarding data sets yielding failure (either for all or the failing methods only) and imputing are inappropriate in most cases. We also discuss how method failure in published comparison studies – in various contexts from classical statistics and predictive modeling – may manifest differently, but is often caused by a complex interplay of several aspects. Building on this, we provide recommendations derived from realistic considerations on suitable fallbacks when encountering method failure, hence avoiding the need for discarding data sets or imputation. Finally, we illustrate our recommendations and the dangers of inadequate handling of method failure through two illustrative comparison studies.

MCML Authors

Milena Wünsch

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[270]

C. Gruber, K. Hechinger, M. Aßenmacher, G. Kauermann and B. Plank.
More Labels or Cases? Assessing Label Variation in Natural Language Inference.
UnImplicit 2024 - 3rd Workshop on Understanding Implicit and Underspecified Language. Malta, Mar 21, 2024. URL

Abstract

In this work, we analyze the uncertainty that is inherently present in the labels used for supervised machine learning in natural language inference (NLI). In cases where multiple annotations per instance are available, neither the majority vote nor the frequency of individual class votes is a trustworthy representation of the labeling uncertainty. We propose modeling the votes via a Bayesian mixture model to recover the data-generating process, i.e., the “true” latent classes, and thus gain insight into the class variations. This will enable a better understanding of the confusion happening during the annotation process. We also assess the stability of the proposed estimation procedure by systematically varying the numbers of i) instances and ii) labels. Thereby, we observe that few instances with many labels can predict the latent class borders reasonably well, while the estimation fails for many instances with only a few labels. This leads us to conclude that multiple labels are a crucial building block for properly analyzing label uncertainty.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Barbara Plank

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI and Computational Linguistics

[269]

S. Dandl, C. Haslinger, T. Hothorn, H. Seibold, E. Sverdrup, S. Wager and A. Zeileis.
What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work?
Annals of Applied Statistics 18.1 (Mar. 2024). DOI

Abstract

Estimation of heterogeneous treatment effects (HTE) is of prime importance in many disciplines, from personalized medicine to economics among many others. Random forests have been shown to be a flexible and powerful approach to HTE estimation in both randomized trials and observational studies. In particular “causal forests” introduced by Athey, Tibshirani and Wager (Ann. Statist. 47 (2019) 1148–1178), along with the R implementation in package grf were rapidly adopted. A related approach, called ‘model-based forests’ that is geared toward randomized trials and simultaneously captures effects of both prognostic and predictive variables, was introduced by Seibold, Zeileis and Hothorn (Stat. Methods Med. Res. 27 (2018) 3104–3125) along with a modular implementation in the R package model4you. Neither procedure is directly applicable to the estimation of individualized predictions of excess postpartum blood loss caused by a cesarean section in comparison to vaginal delivery. Clearly, randomization is hardly possible in this setup, and thus model-based forests lack clinical trial data to address this question. On the other hand, the skewed and interval-censored postpartum blood loss observations violate assumptions made by causal forests. Here we present a tailored model-based forest for skewed and interval-censored data to infer possible predictive prepartum characteristics and their impact on excess postpartum blood loss caused by a cesarean section. As a methodological basis, we propose a unifying view on causal and model-based forests that goes beyond the theoretical motivations and investigates which computational elements make causal forests so successful and how these can be blended with the strengths of model-based forests. To do so, we show that both methods can be understood in terms of the same parameters and model assumptions for an additive model under L2 loss. This theoretical insight allows us to implement several flavors of ‘model-based causal forests’ and dissect their different elements in silico. The original causal forests and model-based forests are compared with the new blended versions in a benchmark study exploring both randomized trials and observational settings. In the randomized setting, both approaches performed akin. If confounding was present in the data-generating process, we found local centering of the treatment indicator with the corresponding propensities to be the main driver for good performance. Local centering of the outcome was less important and might be replaced or enhanced by simultaneous split selection with respect to both prognostic and predictive effects. This lays the foundation for future research combining random forests for HTE estimation with other types of models.

MCML Authors

Susanne Dandl

Dr.

* Former Member

[268]

F. Coens, N. Knops, I. Tieken, S. Vogelaar, A. Bender, J. J. Kim, K. Krupka, L. Pape, A. Raes, B. Tönshoff, A. Prytula and C. Registry.
Time-Varying Determinants of Graft Failure in Pediatric Kidney Transplantation in Europe.
Clinical Journal of the American Society of Nephrology 19.3 (Mar. 2024). DOI

Abstract

Little is known about the time-varying determinants of kidney graft failure in children. We performed a retrospective study of primary pediatric kidney transplant recipients (younger than 18 years) from the Eurotransplant registry (1990-2020). Piece-wise exponential additive mixed models were applied to analyze time-varying recipient, donor, and transplant risk factors. Primary outcome was death-censored graft failure.

MCML Authors

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[267]

W. H. Hartl, P. Kopper, L. Xu, L. Heller, M. Mironov, R. Wang, A. G. Day, G. Elke, H. Küchenhoff and A. Bender.
Relevance of Protein Intake for Weaning in the Mechanically Ventilated Critically Ill: Analysis of a Large International Database.
Critical Care Medicine 50.3 (Mar. 2024). DOI

Abstract

The association between protein intake and the need for mechanical ventilation (MV) is controversial. We aimed to investigate the associations between protein intake and outcomes in ventilated critically ill patients.

MCML Authors

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[266]

B. X. Liew, F. Pfisterer, D. Rügamer and X. Zhai.
Strategies to optimise machine learning classification performance when using biomechanical features.
Journal of Biomechanics 165 (Mar. 2024). DOI

Abstract

Building prediction models using biomechanical features is challenging because such models may require large sample sizes. However, collecting biomechanical data on large sample sizes is logistically very challenging. This study aims to investigate if modern machine learning algorithms can help overcome the issue of limited sample sizes on developing prediction models. This was a secondary data analysis two biomechanical datasets – a walking dataset on 2295 participants, and a countermovement jump dataset on 31 participants. The input features were the three-dimensional ground reaction forces (GRFs) of the lower limbs. The outcome was the orthopaedic disease category (healthy, calcaneus, ankle, knee, hip) in the walking dataset, and healthy vs people with patellofemoral pain syndrome in the jump dataset. Different algorithms were compared: multinomial/LASSO regression, XGBoost, various deep learning time-series algorithms with augmented data, and with transfer learning. For the outcome of weighted multiclass area under the receiver operating curve (AUC) in the walking dataset, the three models with the best performance were InceptionTime with x12 augmented data (0.810), XGBoost (0.804), and multinomial logistic regression (0.800). For the jump dataset, the top three models with the highest AUC were the LASSO (1.00), InceptionTime with x8 augmentation (0.750), and transfer learning (0.653). Machine-learning based strategies for managing the challenging issue of limited sample size for biomechanical ML-based problems, could benefit the development of alternative prediction models in healthcare, especially when time-series data are involved.

MCML Authors

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Statistics, Data Science and Machine Learning

[265]

N. Sturma, M. Drton and D. Leung.
Testing many constraints in possibly irregular models using incomplete U-statistics.
Journal of the Royal Statistical Society. Series B (Statistical Methodology) 86.4 (Mar. 2024). DOI

Abstract

We consider the problem of testing a null hypothesis defined by equality and inequality constraints on a statistical parameter. Testing such hypotheses can be challenging because the number of relevant constraints may be on the same order or even larger than the number of observed samples. Moreover, standard distributional approximations may be invalid due to irregularities in the null hypothesis. We propose a general testing methodology that aims to circumvent these difficulties. The constraints are estimated by incomplete U-statistics, and we derive critical values by Gaussian multiplier bootstrap. We show that the bootstrap approximation of incomplete U-statistics is valid for kernels that we call mixed degenerate when the number of combinations used to compute the incomplete U-statistic is of the same order as the sample size. It follows that our test controls type I error even in irregular settings. Furthermore, the bootstrap approximation covers high-dimensional settings making our testing strategy applicable for problems with many constraints. The methodology is applicable, in particular, when the constraints to be tested are polynomials in U-estimable parameters. As an application, we consider goodness-of-fit tests of latent-tree models for multivariate data.

MCML Authors

Nils Sturma

Mathematical Statistics

Mathias Drton

Prof. Dr.

Mathematical Statistics

[264]

J. Gertheiss, D. Rügamer and S. Greven.
Methoden für die Analyse funktionaler Daten.
Moderne Verfahren der Angewandten Statistik (Mar. 2024). DOI

Abstract

Funktionale Daten entstehen als diskrete Messungen von inhärent glatten Funktionen wie z. B. Bewegungsprofilen oder Infrarot-Absorptionsspektren. Dieses Kapitel behandelt anhand konkreter Beispiele einige grundlegende Analyseverfahren für derartige Daten. Dabei wird der Fokus auf Regressionsmodelle gelegt, bei denen zumindest einige der Einflussgrößen und/oder die Zielgröße funktional sind. Darüber hinaus wird in weitere Verfahren wie die funktionale Hauptkomponentenanalyse und die Clusteranalyse für funktionale Daten eingeführt.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[263]

S. Dandl, A. Bender and T. Hothorn.
Heterogeneous treatment effect estimation for observational data using model-based forests.
Statistical Methods in Medical Research 33.3 (Mar. 2024). DOI

Abstract

The estimation of heterogeneous treatment effects has attracted considerable interest in many disciplines, most prominently in medicine and economics. Contemporary research has so far primarily focused on continuous and binary responses where heterogeneous treatment effects are traditionally estimated by a linear model, which allows the estimation of constant or heterogeneous effects even under certain model misspecifications. More complex models for survival, count, or ordinal outcomes require stricter assumptions to reliably estimate the treatment effect. Most importantly, the noncollapsibility issue necessitates the joint estimation of treatment and prognostic effects. Model-based forests allow simultaneous estimation of covariate-dependent treatment and prognostic effects, but only for randomized trials. In this paper, we propose modifications to model-based forests to address the confounding issue in observational data. In particular, we evaluate an orthogonalization strategy originally proposed by Robinson (1988, Econometrica) in the context of model-based forests targeting heterogeneous treatment effect estimation in generalized linear models and transformation models. We found that this strategy reduces confounding effects in a simulated study with various outcome distributions. We demonstrate the practical aspects of heterogeneous treatment effect estimation for survival and ordinal outcomes by an assessment of the potentially heterogeneous effect of Riluzole on the progress of Amyotrophic Lateral Sclerosis.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

[262]

P. Kopper, D. Rügamer, R. Sonabend, B. Bischl and A. Bender.
On Training Survival Models with Scoring Rules.
Preprint (Mar. 2024). arXiv

Abstract

Survival Analysis provides critical insights for partially incomplete time-to-event data in various domains. It is also an important example of probabilistic machine learning. The probabilistic nature of the predictions can be exploited by using (proper) scoring rules in the model fitting process instead of likelihood-based optimization. Our proposal does so in a generic manner and can be used for a variety of model classes. We establish different parametric and non-parametric sub-frameworks that allow different degrees of flexibility. Incorporated into neural networks, it leads to a computationally efficient and scalable optimization routine, yielding state-of-the-art predictive performance. Finally, we show that using our framework, we can recover various parametric models and demonstrate that optimization works equally well when compared to likelihood-based methods.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[261]

A. Reuter, A. Thielmann, C. Weisser, S. Fischer and B. Säfken.
GPTopic: Dynamic and Interactive Topic Representations.
Preprint (Mar. 2024). arXiv GitHub

Abstract

Topic modeling seems to be almost synonymous with generating lists of top words to represent topics within large text corpora. However, deducing a topic from such list of individual terms can require substantial expertise and experience, making topic modelling less accessible to people unfamiliar with the particularities and pitfalls of top-word interpretation. A topic representation limited to top-words might further fall short of offering a comprehensive and easily accessible characterization of the various aspects, facets and nuances a topic might have. To address these challenges, we introduce GPTopic, a software package that leverages Large Language Models (LLMs) to create dynamic, interactive topic representations. GPTopic provides an intuitive chat interface for users to explore, analyze, and refine topics interactively, making topic modeling more accessible and comprehensive.

MCML Authors

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[260]

J. Rodemann, F. Croppi, P. Arens, Y. Sale, J. Herbinger, B. Bischl, E. Hüllermeier, T. Augustin, C. J. Walsh and G. Casalicchio.
Explaining Bayesian Optimization by Shapley Values Facilitates Human-AI Collaboration.
Preprint (Mar. 2024). arXiv

Abstract

Bayesian optimization (BO) with Gaussian processes (GP) has become an indispensable algorithm for black box optimization problems. Not without a dash of irony, BO is often considered a black box itself, lacking ways to provide reasons as to why certain parameters are proposed to be evaluated. This is particularly relevant in human-in-the-loop applications of BO, such as in robotics. We address this issue by proposing ShapleyBO, a framework for interpreting BO’s proposals by game-theoretic Shapley this http URL quantify each parameter’s contribution to BO’s acquisition function. Exploiting the linearity of Shapley values, we are further able to identify how strongly each parameter drives BO’s exploration and exploitation for additive acquisition functions like the confidence bound. We also show that ShapleyBO can disentangle the contributions to exploration into those that explore aleatoric and epistemic uncertainty. Moreover, our method gives rise to a ShapleyBO-assisted human machine interface (HMI), allowing users to interfere with BO in case proposals do not align with human reasoning. We demonstrate this HMI’s benefits for the use case of personalizing wearable robotic devices (assistive back exosuits) by human-in-the-loop BO. Results suggest human-BO teams with access to ShapleyBO can achieve lower regret than teams without.

MCML Authors

Yusuf Sale

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Julia Herbinger

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[259]

S. Wiegrebe, P. Kopper, R. Sonabend, B. Bischl and A. Bender.
Deep learning for survival analysis: a review.
Artificial Intelligence Review 57.65 (Feb. 2024). DOI

Abstract

The influx of deep learning (DL) techniques into the field of survival analysis in recent years has led to substantial methodological progress; for instance, learning from unstructured or high-dimensional data such as images, text or omics data. In this work, we conduct a comprehensive systematic review of DL-based methods for time-to-event analysis, characterizing them according to both survival- and DL-related attributes. In summary, the reviewed methods often address only a small subset of tasks relevant to time-to-event data—e.g., single-risk right-censored data—and neglect to incorporate more complex settings.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[258]

C. A. Scholbeck, G. Casalicchio, C. Molnar, B. Bischl and C. Heumann.
Marginal Effects for Non-Linear Prediction Functions.
Data Mining and Knowledge Discovery 38 (Feb. 2024). DOI

Abstract

Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for non-linear models and especially generalized linear models, the estimated coefficients cannot be interpreted as a direct feature effect on the predicted outcome. Hence, marginal effects are typically used as approximations for feature effects, either in the shape of derivatives of the prediction function or forward differences in prediction due to a change in a feature value. While marginal effects are commonly used in many scientific fields, they have not yet been adopted as a model-agnostic interpretation method for machine learning models. This may stem from their inflexibility as a univariate feature effect and their inability to deal with the non-linearities found in black box models. We introduce a new class of marginal effects termed forward marginal effects. We argue to abandon derivatives in favor of better-interpretable forward differences. Furthermore, we generalize marginal effects based on forward differences to multivariate changes in feature values. To account for the non-linearity of prediction functions, we introduce a non-linearity measure for marginal effects. We argue against summarizing feature effects of a non-linear prediction function in a single metric such as the average marginal effect. Instead, we propose to partition the feature space to compute conditional average marginal effects on feature subspaces, which serve as conditional feature effect estimates.

MCML Authors

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[257]

B. X. W. Liew, D. Rügamer and A. V. Birn-Jeffery.
Neuromechanical stabilisation of the centre of mass during running.
Gait and Posture 108 (Feb. 2024). DOI

Abstract

Background: Stabilisation of the centre of mass (COM) trajectory is thought to be important during running. There is emerging evidence of the importance of leg length and angle regulation during running, which could contribute to stability in the COM trajectory The present study aimed to understand if leg length and angle stabilises the vertical and anterior-posterior (AP) COM displacements, and if the stability alters with running speeds.
Methods: Data for this study came from an open-source treadmill running dataset (n = 28). Leg length (m) was calculated by taking the resultant distance of the two-dimensional sagittal plane leg vector (from pelvis segment to centre of pressure). Leg angle was defined by the angle subtended between the leg vector and the horizontal surface. Leg length and angle were scaled to a standard deviation of one. Uncontrolled manifold analysis (UCM) was used to provide an index of motor abundance (IMA) in the stabilisation of the vertical and AP COM displacement.
Results: IMAAP and IMAvertical were largely destabilising and always stabilising, respectively. As speed increased, the peak destabilising effect on IMAAP increased from −0.66(0.18) at 2.5 m/s to −1.12(0.18) at 4.5 m/s, and the peak stabilising effect on IMAvertical increased from 0.69 (0.19) at 2.5 m/s to 1.18 (0.18) at 4.5 m/s.
Conclusion: Two simple parameters from a simple spring-mass model, leg length and angle, can explain the control behind running. The variability in leg length and angle helped stabilise the vertical COM, whilst maintaining constant running speed may rely more on inter-limb variation to adjust the horizontal COM accelerations.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[256]

H. Weerts, F. Pfisterer, M. Feurer, K. Eggensperger, E. Bergman, N. Awad, J. Vanschoren, M. Pechenizkiy, B. Bischl and F. Hutter.
Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML.
Journal of Artificial Intelligence Research 79 (Feb. 2024). DOI

Abstract

The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to propose AutoML systems that jointly optimize fairness and predictive performance to mitigate fairness-related harm. However, fairness is a complex and inherently interdisciplinary subject, and solely posing it as an optimization problem can have adverse side effects. With this work, we aim to raise awareness among developers of AutoML systems about such limitations of fairness-aware AutoML, while also calling attention to the potential of AutoML as a tool for fairness research. We present a comprehensive overview of different ways in which fairness-related harm can arise and the ensuing implications for the design of fairness-aware AutoML. We conclude that while fairness cannot be automated, fairness-aware AutoML can play an important role in the toolbox of ML practitioners. We highlight several open technical challenges for future work in this direction. Additionally, we advocate for the creation of more user-centered assistive systems designed to tackle challenges encountered in fairness work.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[255]

P. Gijsbers, M. L. P. Bueno, S. Coors, E. LeDell, S. Poirier, J. Thomas, B. Bischl and J. Vanschoren.
AMLB: an AutoML Benchmark.
Journal of Machine Learning Research 25.101 (Feb. 2024). URL

Abstract

Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative AutoML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.

MCML Authors

Stefan Coors

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[254]

D. Schalk, B. Bischl and D. Rügamer.
Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models.
Statistics and Computing 34.31 (Feb. 2024). DOI

Abstract

Various privacy-preserving frameworks that respect the individual’s privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy-preserving, and lossless estimation of generalized additive mixed models (GAMM) using component-wise gradient boosting (CWB). Making use of CWB allows us to reframe the GAMM estimation as a distributed fitting of base learners using the $L_2$-loss. In order to account for the heterogeneity of different data location sites, we propose a distributed version of a row-wise tensor product that allows the computation of site-specific (smooth) effects. Our adaption of CWB preserves all the important properties of the original algorithm, such as an unbiased feature selection and the feasibility to fit models in high-dimensional feature spaces, and yields equivalent model estimates as CWB on pooled data. Next to a derivation of the equivalence of both algorithms, we also showcase the efficacy of our algorithm on a distributed heart disease data set and compare it with state-of-the-art methods.

MCML Authors

Daniel Schalk

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[253]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Constrained Probabilistic Mask Learning for Task-specific Undersampled MRI Reconstruction.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI

Abstract

Undersampling is a common method in Magnetic Resonance Imaging (MRI) to subsample the number of data points in k-space, reducing acquisition times at the cost of decreased image quality. A popular approach is to employ undersampling patterns following various strategies, e.g., variable density sampling or radial trajectories. In this work, we propose a method that directly learns the under-sampling masks from data points, thereby also providing task- and domain-specific patterns. To solve the resulting discrete optimization problem, we propose a general optimization routine called ProM: A fully probabilistic, differentiable, versatile, and model-free framework for mask optimization that enforces acceleration factors through a convex constraint. Analyzing knee, brain, and cardiac MRI datasets with our method, we discover that different anatomic regions reveal distinct optimal undersampling masks, demonstrating the benefits of using custom masks, tailored for a downstream task. For example, ProM can create undersampling masks that maximize performance in downstream tasks like segmentation with networks trained on fully-sampled MRIs. Even with extreme acceleration factors, ProM yields reasonable performance while being more versatile than existing methods, paving the way for data-driven all-purpose mask generation.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[252]

B. Bischl, R. Sonabend, L. Kotthoff and M. Lang.
Applied Machine Learning Using mlr3 in R.
American Statistician 79.2 (Jan. 2024). DOI

Abstract

mlr3 is an award-winning ecosystem of R packages that have been developed to enable state-of-the-art machine learning capabilities in R. Applied Machine Learning Using mlr3 in R gives an overview of flexible and robust machine learning methods, with an emphasis on how to implement them using mlr3 in R. It covers various key topics, including basic machine learning tasks, such as building and evaluating a predictive model; hyperparameter tuning of machine learning approaches to obtain peak performance; building machine learning pipelines that perform complex operations such as pre-processing followed by modelling followed by aggregation of predictions; and extending the mlr3 ecosystem with custom learners, measures, or pipeline components. The book is primarily aimed at researchers, practitioners, and graduate students who use machine learning or who are interested in using it. It can be used as a textbook for an introductory or advanced machine learning class that uses R, as a reference for people who work with machine learning methods, and in industry for exploratory experiments in machine learning.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

[251]

G. Casalicchio and L. Burk.
Evaluation and Benchmarking.
Applied Machine Learning Using mlr3 in R I.3 (Jan. 2024). DOI

Abstract

Machine learning models can only be deployed in practice if they are robustly evaluated to estimate a model’s generalization performance, i.e. how well it will perform on new data. Resampling strategies including cross-validation and bootstrapping, can be used to estimate the generalization performance. Models can be compared to one another using a benchmark experiment, which makes use of the same resampling strategies and measures to fairly compare models and to help practitioners decide which model to use in practice.
This chapter introduces resample strategies in mlr3, including cross-validation, repeated cross-validation, leave-one-out, bootstrapping, and custom strategies. These are then demonstrated with the resample() function, which is used to resample a single learner with a given strategy. Benchmarking is then introduced and the benchmark() function is demonstrated for comparing multiple learners. The chapter concludes with a deep dive into binary classification evaluation, including ROC analysis and the Area Under the Curve metric.

MCML Authors

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Lukas Burk

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[250]

M. Becker, L. Schneider and S. Fischer.
Hyperparameter Optimization.
Applied Machine Learning Using mlr3 in R II.4 (Jan. 2024). DOI

Abstract

Machine learning models include parameters and hyperparameters. The former refers to model coefficients that are estimated during training. The latter are parameters that are set by the user and affect how the model is fit or how it makes predictions. Setting hyperparameters manually is arduous and error-prone, instead hyperparameter optimization (HPO) automating this ‘tuning’ procedure to reduce bias. When performing HPO there are many considerations including what tuning algorithm to use, how long to tune it for, and what measures to optimize. Moreover users have to decide which hyperparameters to tune and for what configurations. Finally, one has to be careful to make use of nested resampling to prevent leakage of information from training to testing datasets that can occur when resampling and tuning simultaneously. This chapter begins by introducing mlr3tuning and its functionality for tuning learners. This includes Tuners for configuring and running optimization algorithms, TuningInstances for storing results, and Terminators for controlling when to stop the HPO process. The chapter provides a practical example of tuning hyperparameters of a support vector machine, including introducing logarithmic transformations. The AutoTuner class is also introduced which is used for automating nested resampling to reduce bias in tuning.

MCML Authors

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[249]

L. Schneider and M. Becker.
Advanced Tuning Methods and Black Box Optimization.
Applied Machine Learning Using mlr3 in R II.5 (Jan. 2024). DOI

Abstract

Automated tuning can be error prone and it is very likely that models will crash in the tuning process, it is therefore essential to have reliable methods of encapsulating errors to prevent large experiments from failing and losing intermediate results. This chapter therefore begins by introducing fallback learners and encapsulation methods, which are returned to in ‘Advanced Technical Aspects of mlr3’.
Models can be tuned with respect to one or multiple measures. In general when tuning to multiple measures there will be a trade-off between them and therefore there will not be one optimal hyperparameter configuration, instead the aim is to estimate configurations that are not Pareto-dominated by any other. This chapter introduces multi-objective tuning and concepts including Pareto optimality.
Some tuning methods are more advanced than others, including Hyperband and Bayesian optimization. Hyperband is a multi-fidelity tuner that makes use of fidelity parameters, which provide a tradeoff between model runtime and performance accuracy. Bayesian optimization is a sample-efficient black-box optimization algorithm that is highly flexible and allows user fine-grained control over tuning large search spaces. This chapter introduces mlr3hyperband and the concept of fidelity parameters, and then mlr3mbo and bbotk to discuss black-box optimization and Bayesian optimization.

MCML Authors

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[248]

M. Binder and F. Pfisterer.
Sequential Pipelines.
Applied Machine Learning Using mlr3 in R II.7 (Jan. 2024). DOI

Abstract

Computational pipelines provide a layer of abstraction for swapping in and out different elements of the pipeline. In machine learning this can be useful for swapping algorithms, as well as common operations for data preprocessing and model post processing. Many real-world machine learning applications involve more than just fitting a single model at a time: It is often beneficial or even necessary to preprocess data for feature engineering and compatibility with learners. In many cases it is also useful to combine predictions of multiple models in ensembles. By defining these workflows as computational objects, it is then possible to treat them like models to be trained/tested and even tuned. This chapter introduces mlr3pipelines, a dataflow programming language that can be used to define machine learning processes from simple building blocks. The chapter focuses on sequential pipelines, in which data passes from one operation to another in a linear sequence and each operation has one input and output. The chapter introduces PipeOp and Graph, which are the building blocks of a pipeline, and provides some concrete examples with PCA.

MCML Authors

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[247]

M. Binder, F. Pfisterer, M. Becker and M. N. Wright.
Non-sequential Pipelines and Tuning.
Applied Machine Learning Using mlr3 in R II.8 (Jan. 2024). DOI

Abstract

Real-world applications often require complicated pipeline that do not progress sequentially. For example, many experiments have demonstrated that bagging is a powerful method to improve model performance. Bagging can be thought of as a non-sequential pipeline where a learner is replicated, each separate learner is trained and makes predictions, and their results are combined. This is non-sequential as data is not flowing sequentially through the pipeline but is instead passed to all learners (who may then subsample the data) and then recombined, thus creating a pipeline where operations have multiple inputs and outputs. Pipeline operations also have hyperparameters that can be set and tuned to improve model performance. Moreover the choice of operations to include in a pipeline can also be tuned, known as combined algorithm selection and hyperparameter optimization (CASH).
This chapter looks at more advanced uses of mlr3pipelines. This is put into practice by demonstrating how to build a bagging and stacking pipeline from scratch, as well as how to access common pipelines that are readily available in mlr3pipelines. The chapter then looks at tuning pipelines and CASH.

MCML Authors

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[246]

M. Lang, S. Fischer and R. Sonabend.
Advanced Technical Aspects of mlr3.
Applied Machine Learning Using mlr3 in R IV.10 (Jan. 2024). DOI

Abstract

Parallelization is often required to efficiently run machine learning models, which means models are run simultaneously on multiple CPU cores, CPUs, or computational nodes. This chapter begins by demonstrating how mlr3 uses the future package for parallelization and how different ‘plans’ can be applied to mlr3 experiments. In large machine learning experiments, it is common for a model to error during training or predicting. This is because the algorithms have to process arbitrary data, and not all eventualities can always be handled. It is therefore imperative to have robust methods for encapsulating and dealing with errors. This chapter builds on what has been briefly seen in Chapter 5 to discuss error handling and logging, including how to make use of fallback learners in experiments. Large experiments may also require data to be handled in different formats and to prevent all the data being loaded into memory. This chapter discussed different ‘backends’ that can be used for mlr3 Tasks, including interfacing with DuckDB and SQL. Finally, this chapter demonstrates how to extend classes in mlr3 by using the Measure class as an example. This may be of particular interest to readers who want to create new Measures or Learners.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[245]

S. Fischer, M. Lang and M. Becker.
Large-Scale Benchmarking.
Applied Machine Learning Using mlr3 in R IV.11 (Jan. 2024). DOI

Abstract

In the field of machine learning, benchmark experiments are used to evaluate and compare the performance of algorithms. To draw robust conclusions, benchmark experiments often have to be ‘large-scale’, which means including many datasets, learners, and possibly measures. Finding datasets can be difficult and the choice of dataset impacts conclusions that can be drawn. Conducting large-scale benchmark experiments is also complex as they are usually computationally intensive. It is therefore common to make use of high-performance computing clusters to efficiently run the experiment. Finally once these experiments are run, analysis of experiments usually requires more than a single score from a given performance measure, and therefore statistical test are often employed.
This chapter introduces mlr3oml for interfacing the OpenML database for accessing data and tasks. It then continues by discussing how to run experiments on high-performance computing clusters using batchtools and mlr3batchmark. Finally, mlr3benchmark is introduced for statistical analysis including Friedman tests and critical difference diagrams.

MCML Authors

Sebastian Fischer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[244]

S. Dandl, P. Biecek, G. Casalicchio and M. N. Wright.
Model Interpretation.
Applied Machine Learning Using mlr3 in R IV.12 (Jan. 2024). DOI

Abstract

The increasing availability of data and software frameworks to create predictive models has allowed the widespread adoption of machine learning in many applications. However, high predictive performance of such models often comes at the cost of interpretability. Machine learning interpretation methods can be useful for several purposes: 1) gaining global insights into a model (e.g., feature importance); 2) model improvement if flaws were identified (e.g., unexpected reliance on a certain feature); 3) understanding individual predictions. Several model-agnostic methods have been developed including feature permutation, Shapleys, and LIME.
This chapter presents the packages iml, counterfactuals, and DALEX, which implement model-agnostic interpretation methods. Throughout the chapter an xgboost is trained on the german credit dataset to understand how predictions are made and why. The chapter starts with discussing the iml package and the theory behind the discussed methods, as well as how to practically use the interface. It then moves to counterfactuals and the benefits of counterfactual analysis, including methods What-If and MOC. Finally, DALEX is introduced, which includes similar methods to iml but with a different design, hence users can make use of either package depending on their design preference.

MCML Authors

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[243]

C. Nießl, S. Hoffmann, T. Ullmann and A.-L. Boulesteix.
Explaining the optimistic performance evaluation of newly proposed methods: A cross-design validation experiment.
Biometrical Journal 66.1 (Jan. 2024). DOI

Abstract

The constant development of new data analysis methods in many fields of research is accompanied by an increasing awareness that these new methods often perform better in their introductory paper than in subsequent comparison studies conducted by other researchers. We attempt to explain this discrepancy by conducting a systematic experiment that we call “cross-design validation of methods”. In the experiment, we select two methods designed for the same data analysis task, reproduce the results shown in each paper, and then reevaluate each method based on the study design (i.e., datasets, competing methods, and evaluation criteria) that was used to show the abilities of the other method. We conduct the experiment for two data analysis tasks, namely cancer subtyping using multiomic data and differential gene expression analysis. Three of the four methods included in the experiment indeed perform worse when they are evaluated on the new study design, which is mainly caused by the different datasets. Apart from illustrating the many degrees of freedom existing in the assessment of a method and their effect on its performance, our experiment suggests that the performance discrepancies between original and subsequent papers may not only be caused by the nonneutrality of the authors proposing the new method but also by differences regarding the level of expertise and field of application. Authors of new methods should thus focus not only on a transparent and extensive evaluation but also on comprehensive method documentation that enables the correct use of their methods in subsequent studies.

MCML Authors

Christina Sauer (née Nießl)

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Theresa Ullmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[242]

L. Kook, P. F. M. Baumann, O. Dürr, B. Sick and D. Rügamer.
Estimating Conditional Distributions with Neural Networks Using R Package deeptrafo.
Journal of Statistical Software 111.10 (2024). DOI

Abstract

Contemporary empirical applications frequently require flexible regression models for complex response types and large tabular or non-tabular, including image or text, data. Classical regression models either break down under the computational load of processing such data or require additional manual feature extraction to make these problems tractable. Here, we present deeptrafo, a package for fitting flexible regression models for conditional distributions using a tensorflow backend with numerous additional processors, such as neural networks, penalties, and smoothing splines. Package deeptrafo implements deep conditional transformation models (DCTMs) for binary, ordinal, count, survival, continuous, and time series responses, potentially with uninformative censoring. Unlike other available methods, DCTMs do not assume a parametric family of distributions for the response. Further, the data analyst may trade off interpretability and flexibility by supplying custom neural network architectures and smoothers for each term in an intuitive formula interface. We demonstrate how to set up, fit, and work with DCTMs for several response types. We further showcase how to construct ensembles of these models, evaluate models using inbuilt cross-validation, and use other convenience functions for DCTMs in several applications. Lastly, we discuss DCTMs in light of other approaches to regression with non-tabular data.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistics, Data Science and Machine Learning

[241]

M. M. Mandl, S. Hoffmann, S. Bieringer, A. E. Jacob, M. Kraft, S. Lemster and A.-L. Boulesteix.
Raising awareness of uncertain choices in empirical data analysis: A teaching concept toward replicable research practices.
PLOS Computational Biology 20.3 (2024). DOI

Abstract

Throughout their education and when reading the scientific literature, students may get the impression that there is a unique and correct analysis strategy for every data analysis task and that this analysis strategy will always yield a significant and noteworthy result. This expectation conflicts with a growing realization that there is a multiplicity of possible analysis strategies in empirical research, which will lead to overoptimism and nonreplicable research findings if it is combined with result-dependent selective reporting. Here, we argue that students are often ill-equipped for real-world data analysis tasks and unprepared for the dangers of selectively reporting the most promising results. We present a seminar course intended for advanced undergraduates and beginning graduate students of data analysis fields such as statistics, data science, or bioinformatics that aims to increase the awareness of uncertain choices in the analysis of empirical data and present ways to deal with these choices through theoretical modules and practical hands-on sessions.

MCML Authors

Maximilian Mandl

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[240]

B. S. Siepe, F. Bartoš, T. P. Morris, A.-L. Boulesteix, D. W. Heck and S. Pawel.
Simulation Studies for Methodological Research in Psychology: A Standardized Template for Planning, Preregistration, and Reporting.
Psychological Methods Advance online publication (2024). DOI

Abstract

Simulation studies are widely used for evaluating the performance of statistical methods in psychology. However, the quality of simulation studies can vary widely in terms of their design, execution, and reporting. In order to assess the quality of typical simulation studies in psychology, we reviewed 321 articles published in Psychological Methods, Behavior Research Methods, and Multivariate Behavioral Research in 2021 and 2022, among which 100/321 = 31.2% report a simulation study. We find that many articles do not provide complete and transparent information about key aspects of the study, such as justifications for the number of simulation repetitions, Monte Carlo uncertainty estimates, or code and data to reproduce the simulation studies. To address this problem, we provide a summary of the ADEMP (aims, data-generating mechanism, estimands and other targets, methods, performance measures) design and reporting framework from Morris et al. (2019) adapted to simulation studies in psychology. Based on this framework, we provide ADEMP-PreReg, a step-by-step template for researchers to use when designing, potentially preregistering, and reporting their simulation studies. We give formulae for estimating common performance measures, their Monte Carlo standard errors, and for calculating the number of simulation repetitions to achieve a desired Monte Carlo standard error. Finally, we give a detailed tutorial on how to apply the ADEMP framework in practice using an example simulation study on the evaluation of methods for the analysis of pre–post measurement experiments. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[239]

Z. S. Dunias, B. Van Calster, D. Timmerman, A.-L. Boulesteix and M. van Smeden.
A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study.
Statistics in Medicine (Jan. 2024). DOI

Abstract

Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[238]

R. Hornung, F. Ludwigs, J. Hagenberg and A.-L. Boulesteix.
Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study.
Wiley Interdisciplinary Reviews: Computational Statistics 16.1 (Jan. 2024). DOI

Abstract

As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all obtained from the same patients. Such data lend themselves to being used as covariates in automatic outcome prediction because each omics type may contribute unique information, possibly improving predictions compared to using only one omics data type. Frequently, however, in the training data and the data to which automatic prediction rules should be applied, the test data, the different omics data types are not available for all patients. We refer to this type of data as block-wise missing multi-omics data. First, we provide a literature review on existing prediction methods applicable to such data. Subsequently, using a collection of 13 publicly available multi-omics data sets, we compare the predictive performances of several of these approaches for different block-wise missingness patterns. Finally, we discuss the results of this empirical comparison study and draw some tentative conclusions.

MCML Authors

Roman Hornung

Dr.

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[237]

M. Wünsch, C. Sauer, P. Callahan, L. C. Hinske and A.-L. Boulesteix.
From RNA sequencing measurements to the final results: a practical guide to navigating the choices and uncertainties of gene set analysis.
Wiley Interdisciplinary Reviews: Computational Statistics 16.1 (Jan. 2024). DOI

Abstract

Gene set analysis (GSA), a popular approach for analyzing high-throughput gene expression data, aims to identify sets of related genes that show significantly enriched or depleted expression patterns between different conditions. In the last years, a multitude of methods have been developed for this task. However, clear guidance is lacking: choosing the right method is the first hurdle a researcher is confronted with. No less challenging than overcoming this so-called method uncertainty is the procedure of preprocessing, from knowing which steps are required to selecting a corresponding approach from the plethora of valid options to create the accepted input object (data preprocessing uncertainty), with clear guidance again being scarce. Here, we provide a practical guide through all steps required to conduct GSA, beginning with a concise overview of a selection of established methods, including Gene Set Enrichment Analysis and Database for Annotation, Visualization, and Integrated Discovery (DAVID). We thereby lay a special focus on reviewing and explaining the necessary preprocessing steps for each method under consideration (e.g., the necessity of a transformation of the RNA sequencing data)—an essential aspect that is typically paid only limited attention to in both existing reviews and applications. To raise awareness of the spectrum of uncertainties, our review is accompanied by an extensive overview of the literature on valid approaches for each step and illustrative R code demonstrating the complex analysis pipelines. It ends with a discussion and recommendations to both users and developers to ensure that the results of GSA are, despite the above-mentioned uncertainties, replicable and transparent.

MCML Authors

Milena Wünsch

Biometry in Molecular Medicine

Christina Sauer (née Nießl)

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[236]

J. Goschenhofer.
Reducing the effort for data annotation: contributions to weakly supervised deep learning.
Dissertation 2023. DOI

Abstract

This thesis addresses methods for training machine learning models with limited labeled data, focusing on semi-supervised, positive unlabeled, constrained clustering, and transfer learning. It explores deep semi-supervised learning, particularly in time series and medical imaging contexts, and investigates positive unlabeled learning methods that utilize predictive uncertainty for self-training. The thesis also introduces weakly supervised learning for constrained clustering, combining it with semi-supervised approaches, and applies transfer learning to tasks with varying granularity in medical domains. (Shortened).

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

[235]

H. A. Gündüz, S. Giri, M. Binder, B. Bischl and M. Rezaei.
Uncertainty Quantification for Deep Learning Models Predicting the Regulatory Activity of DNA Sequences.
ICMLA 2023 - 22nd IEEE International Conference on Machine Learning and Applications. Jacksonville, FL, USA, Dec 15-17, 2023. DOI

Abstract

The field of computational biology has been enhanced by deep learning models, which hold great promise for revolutionizing domains such as protein folding and drug discovery. Recent studies have underscored the tremendous potential of these models, particularly in the realm of gene regulation and the more profound understanding of the non-coding regions of the genome. On the other hand, this raises significant concerns about the reliability and efficacy of such models, which have their own biases by design, along with those learned from the data. Uncertainty quantification allows us to measure where the system is confident and know when it can be trusted. In this paper, we study several uncertainty quantification methods with respect to a multi-target regression task, specifically predicting regulatory activity profiles using DNA sequence data. Using the Basenji model, we investigate how such methods can improve in-domain generalization, out-of-distribution detection, and provide coverage guarantees on prediction intervals.

MCML Authors

Hüseyin Anil Gündüz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[234]

N. Sturma, C. Squires, M. Drton and C. Uhler.
Unpaired Multi-Domain Causal Representation Learning.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

The goal of causal representation learning is to find a representation of data that consists of causally related latent variables. We consider a setup where one has access to data from multiple domains that potentially share a causal representation. Crucially, observations in different domains are assumed to be unpaired, that is, we only observe the marginal distribution in each domain but not their joint distribution. In this paper, we give sufficient conditions for identifiability of the joint distribution and the shared causal graph in a linear setup. Identifiability holds if we can uniquely recover the joint distribution and the shared causal representation from the marginal distributions in each domain. We transform our results into a practical method to recover the shared latent causal graph.

MCML Authors

Nils Sturma

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[233]

Y. Zhang, Y. Li, H. Brown, M. Rezaei, B. Bischl, P. Torr, A. Khakzar and K. Kawaguchi.
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments.
XAIA @NeurIPS 2023 - Workshop XAI in Action: Past, Present, and Future Applications at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.

MCML Authors

Yawei Li

Statistical Learning and Data Science

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Ashkan Khakzar

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

* Former Member

[232]

Z. Zhang, H. Yang, B. Ma, D. Rügamer and E. Nie.
Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models.
CoNLL 2023 - BabyLM Challenge at 27th Conference on Computational Natural Language Learning. Singapore, Dec 06-10, 2023. DOI GitHub

Abstract

Large Language Models (LLMs) demonstrate remarkable performance on a variety of natural language understanding (NLU) tasks, primarily due to their in-context learning ability. This ability could be applied to building babylike models, i.e. models at small scales, improving training efficiency. In this paper, we propose a ‘CoThought’ pipeline, which efficiently trains smaller ‘baby’ language models (BabyLMs) by leveraging the Chain of Thought prompting of LLMs. Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that are comparable to the school texts for language learners. The BabyLM is then pretrained on this restructured dataset in a RoBERTa fashion. In evaluations across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3 points, showing a superior ability to extract contextual information. These results suggest that compact LMs pretrained on small, LLM-resabructured data can better understand tasks and achieve improved performance.

MCML Authors

Bolei Ma

Social Data Science and AI

David Rügamer

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Statistics, Data Science and Machine Learning

Ercong Nie

Computational Linguistics

[231]

S. Dandl.
Causality concepts in machine learning: heterogeneous treatment effect estimation with machine learning and model interpretation with counterfactual and semi-factual explanations.
Dissertation 2023. DOI

Abstract

This thesis explores the growing intersection of machine learning and causality through seven articles, offering new insights into how these fields can enhance one another. It addresses key topics, including adapting machine learning algorithms for heterogeneous treatment effect estimation, where combining causal and model-based forest elements improves performance across diverse datasets. Additionally, the thesis introduces advanced interpretability tools, proposing methods to generate multiple counterfactual and semi-factual explanations that aid in fairness assessments and address interpretability challenges. A modular R package developed in this work provides accessible tools for researchers to apply and compare counterfactual explanation methods, further bridging machine learning and causal inference for practical applications. (Shortened).

MCML Authors

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[230]

E. Garces Arias, V. Pai, M. Schöffel, C. Heumann and M. Aßenmacher.
Automatic transcription of handwritten Old Occitan language.
EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing. Singapore, Dec 06-10, 2023. DOI

Abstract

While existing neural network-based approaches have shown promising results in Handwritten Text Recognition (HTR) for high-resource languages and standardized/machine-written text, their application to low-resource languages often presents challenges, resulting in reduced effectiveness. In this paper, we propose an innovative HTR approach that leverages the Transformer architecture for recognizing handwritten Old Occitan language. Given the limited availability of data, which comprises only word pairs of graphical variants and lemmas, we develop and rely on elaborate data augmentation techniques for both text and image data. Our model combines a custom-trained Swin image encoder with a BERT text decoder, which we pre-train using a large-scale augmented synthetic data set and fine-tune on the small human-labeled data set. Experimental results reveal that our approach surpasses the performance of current state-of-the-art models for Old Occitan HTR, including open-source Transformer-based models such as a fine-tuned TrOCR and commercial applications like Google Cloud Vision. To nurture further research and development, we make our models, data sets, and code publicly available.

MCML Authors

Esteban Garces Arias

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[229]

J. Herbinger.
On grouping and partitioning approaches in interpretable machine learning.
Dissertation 2023. DOI

Abstract

This thesis addresses the challenges of interpreting machine learning models, particularly focusing on the limitations of global explanation methods. It identifies two key issues: the human-incomprehensibility of high-dimensional outputs and the misleading interpretations caused by aggregation bias. The thesis proposes solutions to these problems, such as grouping features for simpler interpretations and using recursive partitioning algorithms to provide regional explanations, ensuring more accurate and understandable insights into model behavior. (Shortened.)

MCML Authors

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[228]

F. Karl, T. Pielok, J. Moosbauer, F. Pfisterer, S. Coors, M. Binder, L. Schneider, J. Thomas, J. Richter, M. Lang, E. C. Garrido-Merchán, J. Branke and B. Bischl.
Multi-Objective Hyperparameter Optimization in Machine Learning—An Overview.
ACM Transactions on Evolutionary Learning and Optimization 3.4 (Dec. 2023). DOI

Abstract

Hyperparameter optimization constitutes a large part of typical modern machine learning (ML) workflows. This arises from the fact that ML methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metrics or constraints must be considered when determining an optimal configuration, resulting in a multi-objective optimization problem. This is often neglected in practice, due to a lack of knowledge and readily available software implementations for multi-objective hyperparameter optimization. In this work, we introduce the reader to the basics of multi-objective hyperparameter optimization and motivate its usefulness in applied ML. Furthermore, we provide an extensive survey of existing optimization strategies from the domains of evolutionary algorithms and Bayesian optimization. We illustrate the utility of multi-objective optimization in several specific ML applications, considering objectives such as operating conditions, prediction time, sparseness, fairness, interpretability, and robustness.

MCML Authors

Florian Karl

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Tobias Pielok

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[227]

C. Koller, G. Kauermann and X. Zhu.
Going Beyond One-Hot Encoding in Classification: Can Human Uncertainty Improve Model Performance in Earth Observation?
IEEE Transactions on Geoscience and Remote Sensing 62 (Dec. 2023). DOI GitHub

Abstract

Technological and computational advances continuously drive forward the field of deep learning in remote sensing. In recent years, the derivation of quantities describing the uncertainty in the prediction—which naturally accompanies the modeling process—has sparked interest in the remote sensing community. Often neglected in the machine learning setting is the human uncertainty that influences numerous labeling processes. As the core of this work, the task of local climate zone (LCZ) classification is studied by means of a dataset that contains multiple label votes by domain experts for each image. The inherent label uncertainty describes the ambiguity among the domain experts and is explicitly embedded into the training process via distributional labels. We show that incorporating the label uncertainty helps the model to generalize better to the test data and increases model performance. Similar to existing calibration methods, the distributional labels lead to better-calibrated probabilities, which in turn yield more certain and trustworthy predictions.

MCML Authors

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[226]

A. T. Stüber, S. Coors, B. Schachtner, T. Weber, D. Rügamer, A. Bender, A. Mittermeier, O. Öcal, M. Seidensticker, J. Ricke, B. Bischl and M. Ingrisch.
A comprehensive machine learning benchmark study for radiomics-based survival analysis of CT imaging data in patients with hepatic metastases of CRC.
Investigative Radiology 58.12 (Dec. 2023). DOI

Abstract

Optimizing a machine learning (ML) pipeline for radiomics analysis involves numerous choices in data set composition, preprocessing, and model selection. Objective identification of the optimal setup is complicated by correlated features, interdependency structures, and a multitude of available ML algorithms. Therefore, we present a radiomics-based benchmarking framework to optimize a comprehensive ML pipeline for the prediction of overall survival. This study is conducted on an image set of patients with hepatic metastases of colorectal cancer, for which radiomics features of the whole liver and of metastases from computed tomography images were calculated. A mixed model approach was used to find the optimal pipeline configuration and to identify the added prognostic value of radiomics features.

MCML Authors

Theresa Stüber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Stefan Coors

* Former Member

Balthasar Schachtner

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Tobias Weber

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Michael Ingrisch

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Clinical Data Science in Radiology

[225]

D. Strieder and M. Drton.
Confidence in causal inference under structure uncertainty in linear causal models with equal variances.
Journal of Causal Inference 11.1 (Dec. 2023). DOI

Abstract

Inferring the effect of interventions within complex systems is a fundamental problem of statistics. A widely studied approach uses structural causal models that postulate noisy functional relations among a set of interacting variables. The underlying causal structure is then naturally represented by a directed graph whose edges indicate direct causal dependencies. In a recent line of work, additional assumptions on the causal models have been shown to render this causal graph identifiable from observational data alone. One example is the assumption of linear causal relations with equal error variances that we will take up in this work. When the graph structure is known, classical methods may be used for calculating estimates and confidence intervals for causal-effects. However, in many applications, expert knowledge that provides an a priori valid causal structure is not available. Lacking alternatives, a commonly used two-step approach first learns a graph and then treats the graph as known in inference. This, however, yields confidence intervals that are overly optimistic and fail to account for the data-driven model choice. We argue that to draw reliable conclusions, it is necessary to incorporate the remaining uncertainty about the underlying causal structure in confidence statements about causal-effects. To address this issue, we present a framework based on test inversion that allows us to give confidence regions for total causal-effects that capture both sources of uncertainty: causal structure and numerical size of non-zero effects.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

Mathematical Statistics

[224]

Y. Sale, P. Hofman, L. Wimmer, E. Hüllermeier and T. Nagler.
Second-Order Uncertainty Quantification: Variance-Based Measures.
Preprint (Dec. 2023). arXiv

Abstract

Uncertainty quantification is a critical aspect of machine learning models, providing important insights into the reliability of predictions and aiding the decision-making process in real-world applications. This paper proposes a novel way to use variance-based measures to quantify uncertainty on the basis of second-order distributions in classification problems. A distinctive feature of the measures is the ability to reason about uncertainties on a class-based level, which is useful in situations where nuanced decision-making is required. Recalling some properties from the literature, we highlight that the variance-based measures satisfy important (axiomatic) properties. In addition to this axiomatic approach, we present empirical results showing the measures to be effective and competitive to commonly used entropy-based measures.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Lisa Wimmer

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

[223]

C. A. Scholbeck, J. Moosbauer, G. Casalicchio, H. Gupta, B. Bischl and C. Heumann.
Position Paper: Bridging the Gap Between Machine Learning and Sensitivity Analysis.
Preprint (Dec. 2023). arXiv

Abstract

We argue that interpretations of machine learning (ML) models or the model-building process can be seen as a form of sensitivity analysis (SA), a general methodology used to explain complex systems in many fields such as environmental modeling, engineering, or economics. We address both researchers and practitioners, calling attention to the benefits of a unified SA-based view of explanations in ML and the necessity to fully credit related work. We bridge the gap between both fields by formally describing how (a) the ML process is a system suitable for SA, (b) how existing ML interpretation methods relate to this perspective, and (c) how other SA techniques could be applied to ML.

MCML Authors

Julia Moosbauer

Dr.

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[222]

D. Rügamer, F. Pfisterer, B. Bischl and B. Grün.
Mixture of Experts Distributional Regression: Implementation Using Robust Estimation with Adaptive First-order Methods.
Advances in Statistical Analysis (Nov. 2023). DOI

Abstract

In this work, we propose an efficient implementation of mixtures of experts distributional regression models which exploits robust estimation by using stochastic first-order optimization techniques with adaptive learning rate schedulers. We take advantage of the flexibility and scalability of neural network software and implement the proposed framework in mixdistreg, an R software package that allows for the definition of mixtures of many different families, estimation in high-dimensional and large sample size settings and robust optimization based on TensorFlow. Numerical experiments with simulated and real-world data applications show that optimization is as reliable as estimation via classical approaches in many different settings and that results may be obtained for complicated scenarios where classical approaches consistently fail.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[221]

L. Bothmann, L. Wimmer, O. Charrakh, T. Weber, H. Edelhoff, W. Peters, H. Nguyen, C. Benjamin and A. Menzel.
Automated wildlife image classification: An active learning tool for ecological applications.
Ecological Informatics 77 (Nov. 2023). DOI

Abstract

Wildlife camera trap images are being used extensively to investigate animal abundance, habitat associations, and behavior, which is complicated by the fact that experts must first classify the images to retrieve relevant information. Artificial intelligence systems can take over this task but usually need a large number of already-labeled training images to achieve sufficient performance. This requirement necessitates human expert labor and poses a particular challenge for projects with few cameras or short durations. We propose a label-efficient learning strategy that enables researchers with small or medium-sized image databases to leverage the potential of modern machine learning, thus freeing crucial resources for subsequent analyses. Our methodological proposal is twofold: On the one hand, we improve current strategies of combining object detection and image classification by tuning the hyperparameters of both models. On the other hand, we provide an active learning system that allows training deep learning models very efficiently in terms of required manually labeled training images. We supply a software package that enables researchers to use these methods without specific programming skills and thereby ensure the broad applicability of the proposed framework in ecological practice. We show that our tuning strategy improves predictive performance, emphasizing that tuning can and must be done separately for a new data set. We demonstrate how the active learning pipeline reduces the amount of pre-labeled data needed to achieve specific predictive performance and that it is especially valuable for improving out-of-sample predictive performance. We conclude that the combination of tuning and active learning increases the predictive performance of automated image classifiers substantially. Furthermore, we argue that our work can broadly impact the community through the ready-to-use software package provided. Finally, the publication of our models tailored to European wildlife data enriches existing model bases mostly trained on data from Africa and North America.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Lisa Wimmer

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[220]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR Embeddings.
Preprint (Nov. 2023). arXiv

Abstract

Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray image classification.

MCML Authors

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[219]

G. König.
If interpretability is the answer, what is the question?: a causal perspective.
Dissertation 2023. DOI

Abstract

This thesis addresses fundamental challenges in the field of interpretable machine learning (IML), particularly the lack of a clear definition of ‘interpretability’, the potential misinterpretation of existing methods, and the computational difficulties of conditional-sampling-based techniques. By disentangling the different goals of interpretability, we provide clearer guidelines for deriving target estimands, with specific examples such as recourse and scientific inference. Additionally, we propose formal interpretation rules for feature importance, highlight common pitfalls in IML, and introduce efficient methods for estimating conditional-sampling techniques by leveraging the data’s dependence structure, with a strong emphasis on causal inference to improve clarity and computational efficiency. (Shortened.)

MCML Authors

Gunnar König

Dr.

* Former Member

[218]

M. Rezaei, F. Soleymani, B. Bischl and S. Azizi.
Deep Bregman divergence for self-supervised representations learning.
Computer Vision and Image Understanding 235.103801 (Oct. 2023). DOI

Abstract

Neural Bregman divergence measures the divergence of data points using convex neural networks, which is beyond Euclidean distance and capable of capturing divergence over distributions. The non-Euclidean geometry is not well explored in deep representation learning and remains a challenging endeavor for self-supervised representation learning. In this paper, we propose deep Bregman divergences for self-supervised pretext task learning, where we aim to enhance self-supervised embedding representation by training additional networks based on functional Bregman divergences. Our framework can capture the divergence of embedding distributions and improve the quality of learned representation using an arbitrary Bregman divergence over data embedding. Specifically, we develop a novel self-supervised architecture and a new divergence loss that measures the asymmetric distance of arbitrary Bergman divergences of neural networks. We show that the combination of self-supervised contrastive learning and our proposed method outperforms the baseline as well as most established methods for self-supervised and semi-supervised learning on multiple classifications and object detection tasks and datasets. Moreover, the learned representations generalize well when transferred to other datasets and tasks.

MCML Authors

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[217]

J. Gauss, F. Scheipl and M. Herrmann.
DCSI–An improved measure of cluster separability based on separation and connectedness.
Preprint (Oct. 2023). arXiv

Abstract

Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness, and neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate them. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted Rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not correspond to meaningful density-based clusters.

MCML Authors

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Functional Data Analysis

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

[216]

L. Bothmann, S. Dandl and M. Schomaker.
Causal Fair Machine Learning via Rank-Preserving Interventional Distributions.
AEQUITAS @ECAI 2023 - 1st Workshop on Fairness and Bias in AI co-located with the 26th European Conference on Artificial Intelligence (ECAI 2023). Kraków, Poland, Sep 30-Oct 04, 2023. PDF

Abstract

A decision can be defined as fair if equal individuals are treated equally and unequals unequally. Adopting this definition, the task of designing machine learning models that mitigate unfairness in automated decision-making systems must include causal thinking when introducing protected attributes. Following a recent proposal, we define individuals as being normatively equal if they are equal in a fictitious, normatively desired (FiND) world, where the protected attribute has no (direct or indirect) causal effect on the target. We propose rank-preserving interventional distributions to define an estimand of this FiND world and a warping method for estimation. Evaluation criteria for both the method and resulting model are presented and validated through simulations and empirical data. With this, we show that our warping approach effectively identifies the most discriminated individuals and mitigates unfairness.

MCML Authors

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michael Schomaker

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biostatistics

[215]

J. Herbinger, S. Dandl, F. K. Ewald, S. Loibl and G. Casalicchio.
Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation.
XI-ML @ECAI 2023 - 3rd International Workshop on Explainable and Interpretable Machine Learning co-located with the 26th European Conference on Artificial Intelligence (ECAI 2023). Kraków, Poland, Sep 30-Oct 04, 2023. DOI

Abstract

Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms’ capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.

MCML Authors

Julia Herbinger

Dr.

* Former Member

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Fiona Katharina Ewald

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[214]

S. Urchs, V. Thurner, M. Aßenmacher, C. Heumann and S. Thiemichen.
How Prevalent is Gender Bias in ChatGPT? - Exploring German and English ChatGPT Responses.
BDCA @ECML-PKDD 2023 - 1st Workshop on Biased Data in Conversational Agents at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems’ output at face value. In this paper, we systematically analyse prompts and the generated responses to identify possible problematic issues with a special focus on gender biases, which users need to be aware of when processing the system’s output. We explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective. In an in-depth investigation, we examine selected prompts and analyse to what extent responses differ if the system is prompted several times in an identical way. On this basis, we show that ChatGPT is indeed useful for helping non-IT users draft texts for their daily work. However, it is absolutely crucial to thoroughly check the system’s responses for biases as well as for syntactic and grammatical mistakes.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[213]

I. T. Öztürk, R. Nedelchev, C. Heumann, E. Garces Arias, M. Roger, B. Bischl and M. Aßenmacher.
How Different Is Stereotypical Bias Across Languages?
BIAS @ECML-PKDD 2023 - 3rd Workshop on Bias and Fairness in AI at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.

MCML Authors

Esteban Garces Arias

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[212]

S. Dandl, G. Casalicchio, B. Bischl and L. Bothmann.
Interpretable Regional Descriptors: Hyperbox-Based Local Explanations.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

This work introduces interpretable regional descriptors, or IRDs, for local, model-agnostic interpretations. IRDs are hyperboxes that describe how an observation’s feature values can be changed without affecting its prediction. They justify a prediction by providing a set of “even if” arguments (semi-factual explanations), and they indicate which features affect a prediction and whether pointwise biases or implausibilities exist. A concrete use case shows that this is valuable for both machine learning modelers and persons subject to a decision. We formalize the search for IRDs as an optimization problem and introduce a unifying framework for computing IRDs that covers desiderata, initialization techniques, and a post-processing method. We show how existing hyperbox methods can be adapted to fit into this unified framework. A benchmark study compares the methods based on several quality measures and identifies two strategies to improve IRDs.

MCML Authors

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

[211]

L. Rauch, M. Aßenmacher, D. Huseljic, M. Wirth, B. Bischl and B. Sick.
ActiveGLAE: A Benchmark for Deep Active Learning with Transformers.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Deep active learning (DAL) seeks to reduce annotation costs by enabling the model to actively query instance annotations from which it expects to learn the most. Despite extensive research, there is currently no standardized evaluation protocol for transformer-based language models in the field of DAL. Diverse experimental settings lead to difficulties in comparing research and deriving recommendations for practitioners. To tackle this challenge, we propose the ACTIVEGLAE benchmark, a comprehensive collection of data sets and evaluation guidelines for assessing DAL. Our benchmark aims to facilitate and streamline the evaluation process of novel DAL strategies. Additionally, we provide an extensive overview of current practice in DAL with transformer-based language models. We identify three key challenges - data set selection, model training, and DAL settings - that pose difficulties in comparing query strategies. We establish baseline results through an extensive set of experiments as a reference point for evaluating future work. Based on our findings, we provide guidelines for researchers and practitioners.

MCML Authors

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[210]

J. G. Wiese, L. Wimmer, T. Papamarkou, B. Bischl, S. Günnemann and D. Rügamer.
Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. Best Paper Award. DOI

Abstract

Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. Such coarse approximations can be detrimental in practical applications, notably safety-critical ones. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. These symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[209]

M. Aßenmacher, L. Rauch, J. Goschenhofer, A. Stephan, B. Bischl, B. Roth and B. Sick.
Towards Enhancing Deep Active Learning with Weak Supervision and Constrained Clustering.
IAL @ECML-PKDD 2023 - 7th International Workshop on Interactive Adaptive Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. PDF

Abstract

Three fields revolving around the question of how to cope with limited amounts of labeled data are Deep Active Learning (DAL), deep Constrained Clustering (CC), and Weakly Supervised Learning (WSL). DAL tackles the problem by adaptively posing the question of which data samples to annotate next in order to achieve the best incremental learning improvement, although it suffers from several limitations that hinder its deployment in practical settings. We point out how CC algorithms and WSL could be employed to overcome these limitations and increase the practical applicability of DAL research. Specifically, we discuss the opportunities to use the class discovery capabilities of CC and the possibility of further reducing human annotation efforts by utilizing WSL. We argue that the practical applicability of DAL algorithms will benefit from employing CC and WSL methods for the learning and labeling process. We inspect the overlaps between the three research areas and identify relevant and exciting research questions at the intersection of these areas.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Jann Goschenhofer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[208]

S. F. Fischer, L. Harutyunyan, M. Feurer and B. Bischl.
OpenML-CTR23 - A curated tabular regression benchmarking suite.
AutoML 2023 - International Conference on Automated Machine Learning - Workshop Track. Berlin, Germany, Sep 12-15, 2023. URL

Abstract

Benchmark experiments are one of the cornerstones of modern machine learning research. An essential part in the design of such experiments is the selection of datasets. We present the OpenML Curated Tabular Regression benchmarking suite 2023 (OpenML-CTR23). It is available on OpenML and comprises 35 regression problems that have been selected according to a set of strict criteria. We compare its design with existing regression benchmark suites and also challenge some of the dataset choices of previous efforts. As a first experiment, we compare five machine learning methods of varying complexity on the OpenML-CTR23.

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[207]

L. O. Purucker, L. Schneider, M. Anastacio, J. Beel, B. Bischl and H. Hoos.
Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML.
AutoML 2023 - International Conference on Automated Machine Learning. Berlin, Germany, Sep 12-15, 2023. URL

Abstract

Automated machine learning (AutoML) systems commonly ensemble models post hoc to improve predictive performance, typically via greedy ensemble selection (GES). However, we believe that GES may not always be optimal, as it performs a simple deterministic greedy search. In this work, we introduce two novel population-based ensemble selection methods, QO-ES and QDO-ES, and compare them to GES. While QO-ES optimises solely for predictive performance, QDO-ES also considers the diversity of ensembles within the population, maintaining a diverse set of well-performing ensembles during optimisation based on ideas of quality diversity optimisation. The methods are evaluated using 71 classification datasets from the AutoML benchmark, demonstrating that QO-ES and QDO-ES often outrank GES, albeit only statistically significant on validation data. Our results further suggest that diversity can be beneficial for post hoc ensembling but also increases the risk of overfitting.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[206]

S. Segel, H. Graf, A. Tornede, B. Bischl and M. Lindauer.
Symbolic Explanations for Hyperparameter Optimization.
AutoML 2023 - International Conference on Automated Machine Learning. Berlin, Germany, Sep 12-15, 2023. URL

Abstract

Hyperparameter optimization (HPO) methods can determine well-performing hyperparameter configurations efficiently but often lack insights and transparency. We propose to apply symbolic regression to meta-data collected with Bayesian optimization (BO) during HPO. In contrast to prior approaches explaining the effects of hyperparameters on model performance, symbolic regression allows for obtaining explicit formulas quantifying the relation between hyperparameter values and model performance. Overall, our approach aims to make the HPO process more explainable and human-centered, addressing the needs of multiple user groups: First, providing insights into the HPO process can support data scientists and machine learning practitioners in their decisions when using and interacting with HPO tools. Second, obtaining explicit formulas and inspecting their properties could help researchers understand the HPO loss landscape better. In an experimental evaluation, we find that naively applying symbolic regression directly to meta-data collected during HPO is affected by the sampling bias introduced by BO. However, the true underlying loss landscape can be approximated by fitting the symbolic regression on the surrogate model trained during BO. By penalizing longer formulas, symbolic regression furthermore allows the user to decide how to balance the accuracy and explainability of the resulting formulas.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[205]

P. Koch, G. V. Nuñez, E. Garces Arias, C. Heumann, M. Schöffel, A. Häberlin and M. Aßenmacher.
A tailored Handwritten-Text-Recognition System for Medieval Latin.
ALP @RANLP 2023 - 1st Workshop on Ancient Language Processing co-located with the Conference on Recent Advances in Natural Language Processing (RANLP 2023). Varna, Bulgaria, Sep 08, 2023. URL

Abstract

The Bavarian Academy of Sciences and Humanities aims to digitize the Medieval Latin Dictionary. This dictionary entails record cards referring to lemmas in medieval Latin, a low-resource language. A crucial step of the digitization process is the handwritten text recognition (HTR) of the handwritten lemmas on the record cards. In our work, we introduce an end-to-end pipeline, tailored for the medieval Latin dictionary, for locating, extracting, and transcribing the lemmas. We employ two state-of-the-art image segmentation models to prepare the initial data set for the HTR task. Further, we experiment with different transformer-based models and conduct a set of experiments to explore the capabilities of different combinations of vision encoders with a GPT-2 decoder. Additionally, we also apply extensive data augmentation resulting in a highly competitive model. The best-performing setup achieved a character error rate of 0.015, which is even superior to the commercial Google Cloud Vision model, and shows more stable performance.

MCML Authors

Esteban Garces Arias

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[204]

H. A. Gündüz, M. Binder, X.-Y. To, R. Mreches, B. Bischl, A. C. McHardy, P. C. Münch and M. Rezaei.
A self-supervised deep learning method for data-efficient training in genomics.
Communications Biology 6.928 (Sep. 2023). DOI

Abstract

Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.

MCML Authors

Hüseyin Anil Gündüz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Xiao-Yin To

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[203]

B. X. W. Liew, F. M. Kovacs, D. Rügamer and A. Royuela.
Automatic variable selection algorithms in prognostic factor research in neck pain.
Journal of Clinical Medicine (Sep. 2023). DOI

Abstract

This study aims to compare the variable selection strategies of different machine learning (ML) and statistical algorithms in the prognosis of neck pain (NP) recovery. A total of 3001 participants with NP were included. Three dichotomous outcomes of an improvement in NP, arm pain (AP), and disability at 3 months follow-up were used. Twenty-five variables (twenty-eight parameters) were included as predictors. There were more parameters than variables, as some categorical variables had >2 levels. Eight modelling techniques were compared: stepwise regression based on unadjusted p values (stepP), on adjusted p values (stepPAdj), on Akaike information criterion (stepAIC), best subset regression (BestSubset) least absolute shrinkage and selection operator [LASSO], Minimax concave penalty (MCP), model-based boosting (mboost), and multivariate adaptive regression splines (MuARS). The algorithm that selected the fewest predictors was stepPAdj (number of predictors, p = 4 to 8). MuARS was the algorithm with the second fewest predictors selected (p = 9 to 14). The predictor selected by all algorithms with the largest coefficient magnitude was “having undergone a neuroreflexotherapy intervention” for NP (β = from 1.987 to 2.296) and AP (β = from 2.639 to 3.554), and “Imaging findings: spinal stenosis” (β = from −1.331 to −1.763) for disability. Stepwise regression based on adjusted p-values resulted in the sparsest models, which enhanced clinical interpretability. MuARS appears to provide the optimal balance between model sparsity whilst retaining high predictive performance across outcomes. Different algorithms produced similar performances but resulted in a different number of variables selected. Rather than relying on any single algorithm, confidence in the variable selection may be increased by using multiple algorithms.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[202]

S. Hoffmann, F. Scheipl and A.-L. Boulesteix.
Reproduzierbare und replizierbare Forschung.
Moderne Verfahren der Angewandten Statistik (Sep. 2023). DOI

Abstract

In den letzten Jahren haben Berichte über die fehlende Replizierbarkeit und Reproduzierbarkeit von Forschungsergebnissen viel Aufmerksamkeit erhalten und dazu geführt, dass die Art und Weise, wie wissenschaftliche Studien geplant, analysiert und berichtet werden, hinterfragt wird. Bei der statistischen Planung und Auswertung wissenschaftlicher Studien muss eine Vielzahl von Entscheidungen getroffen werden, ohne dass es dabei eindeutig richtige oder falsche Wahlmöglichkeiten gäbe. Hier wird erläutert, wie diese Multiplizität an möglichen Analysestrategien, die durch Modell-, Datenaufbereitungs- und Methodenunsicherheit beschrieben werden kann, in Verbindung mit selektiver Berichterstattung zu Ergebnissen führen kann, die sich auf unabhängigen Daten nicht replizieren lassen. Zudem werden Lösungsstrategien vorgestellt, mit denen die Replizierbarkeit der Ergebnisse verbessert werden kann, und Praktiken und Hilfsmittel vorgestellt, mit denen durchgeführte Analysen reproduzierbar werden können.

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[201]

R. P. Prager, K. Dietrich, L. Schneider, L. Schäpermeier, B. Bischl, P. Kerschke, H. Trautmann and O. Mersmann.
Neural Networks as Black-Box Benchmark Functions Optimized for Exploratory Landscape Features.
FOGA 2023 - 17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms. Potsdam, Germany, Aug 30-Sep 01, 2023. DOI

Abstract

Artificial benchmark functions are commonly used in optimization research because of their ability to rapidly evaluate potential solutions, making them a preferred substitute for real-world problems. However, these benchmark functions have faced criticism for their limited resemblance to real-world problems. In response, recent research has focused on automatically generating new benchmark functions for areas where established test suites are inadequate. These approaches have limitations, such as the difficulty of generating new benchmark functions that exhibit exploratory landscape analysis (ELA) features beyond those of existing benchmarks. The objective of this work is to develop a method for generating benchmark functions for single-objective continuous optimization with user-specified structural properties. Specifically, we aim to demonstrate a proof of concept for a method that uses an ELA feature vector to specify these properties in advance. To achieve this, we begin by generating a random sample of decision space variables and objective values. We then adjust the objective values using CMA-ES until the corresponding features of our new problem match the predefined ELA features within a specified threshold. By iteratively transforming the landscape in this way, we ensure that the resulting function exhibits the desired properties. To create the final function, we use the resulting point cloud as training data for a simple neural network that produces a function exhibiting the target ELA features. We demonstrate the effectiveness of this approach by replicating the existing functions of the well-known BBOB suite and creating new functions with ELA feature values that are not present in BBOB.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[200]

A. Scheppach, H. A. Gündüz, E. Dorigatti, P. C. Münch, A. C. McHardy, B. Bischl, M. Rezaei and M. Binder.
Neural Architecture Search for Genomic Sequence Data.
CIBCB 2023 - 20th IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology. Eindhoven, The Netherlands, Aug 29-31, 2023. DOI

Abstract

Deep learning has enabled outstanding progress on bioinformatics datasets and a variety of tasks, such as protein structure prediction, identification of regulatory regions, genome annotation, and interpretation of the noncoding genome. The layout and configuration of neural networks used for these tasks have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Therefore, there is growing interest in automated neural architecture search (NAS) methods in bioinformatics. In this paper, we present a novel search space for NAS algorithms that operate on genome data, thus creating extensions for existing NAS algorithms for sequence data that we name Genome-DARTS, Genome-P-DARTS, Genome-BONAS, Genome-SH, and Genome-RS. Moreover, we introduce two novel NAS algorithms, CWP-DARTS and EDPDARTS, that build on and extend the idea of P-DARTS. We evaluate the presented methods and compare them to manually designed neural architectures on a widely used genome sequence machine learning task to show that NAS methods can be adapted well for bioinformatics sequence datasets. Our experiments show that architectures optimized by our NAS methods outperform manually developed architectures while having significantly fewer parameters.

MCML Authors

Hüseyin Anil Gündüz

* Former Member

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[199]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Auxiliary Cross-Modal Representation Learning With Triplet Loss Functions for Online Handwriting Recognition.
IEEE Access 11 (Aug. 2023). DOI

Abstract

Cross-modal representation learning learns a shared embedding between two or more modalities to improve performance in a given task compared to using only one of the modalities. Cross-modal representation learning from different data types - such as images and time-series data (e.g., audio or text data) – requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the contrastive or triplet loss, which uses positive and negative identities to create sample pairs with different labels, for cross-modal representation learning between image and time-series modalities (CMR-IS). By adapting the triplet loss for cross-modal representation learning, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. We present a triplet loss with a dynamic margin for single label and sequence-to-sequence classification tasks. We perform extensive evaluations on synthetic image and time-series data, and on data for offline handwriting recognition (HWR) and on online HWR from sensor-enhanced pens for classifying written words. Our experiments show an improved classification accuracy, faster convergence, and better generalizability due to an improved cross-modal representation. Furthermore, the more suitable generalizability leads to a better adaptability between writers for online HWR.

MCML Authors

Felix Ott

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[198]

A. Volkmann, A. Stöcker, F. Scheipl and S. Greven.
Multivariate Functional Additive Mixed Models.
Statistical Modelling 23.4 (Aug. 2023). DOI

Abstract

Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

[197]

L. Fahrmeir, G. Kauermann, G. Tutz and M. Windmann.
Spatial smoothing revisited: An application to rental data in Munich.
Statistical Modelling 23.5-6 (Aug. 2023). DOI

Abstract

Spatial smoothing makes use of spatial information to obtain better estimates in regression models. In particular flexible smoothing with B-splines and penalties, which has been propagated by Eilers and Marx (1996), provides strong tools that can be used to include available spatial information. We consider alternative smoothing methods in spatial additive regression and employ them for analysing rental data in Munich. The first method applies tensor product P-splines to the geolocation of apartments, measured on a continuous scale through the centroid of the quarter where an apartment is. The alternative approach exploits the neighbourhood structure of districts on a discrete scale, where districts consist of a set of neighbouring quarters. The discrete modelling approach yields smooth estimates when using ridge-type penalties but can also enforce spatial clustering of districts with a homogeneous structure when using Lasso-type penalties.

MCML Authors

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[196]

F. Pfisterer, S. Wei, S. Vollmer, M. Lang and B. Bischl.
Fairness Audits and Debiasing Using mlr3fairness.
The R Journal 15.1 (Aug. 2023). DOI

Abstract

Given an increase in data-driven automated decision-making based on machine learning (ML) models, it is imperative that, along with tools to develop and improve such models, there are sufficient capabilities to analyze and assess models with respect to potential biases. We present the package mlr3fairness, a collection of metrics and methods that allow for the assessment of bias in machine learning models. Our package implements a variety of widely used fairness metrics that can be used to audit models for potential biases, along with a set of visualizations that can help to provide additional insights into such biases. mlr3fairness furthermore integrates bias mitigation methods for machine learning models through data pre-processing or post-processing of predictions. These allow practitioners to trade off performance and fairness metrics that are appropriate for their use case.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[195]

J. Rodemann, J. Goschenhofer, E. Dorigatti, T. Nagler and T. Augustin.
Approximately Bayes-optimal pseudo-label selection.
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Abstract

Semi-supervised learning by self-training heavily relies on pseudo-label selection (PLS). This selection often depends on the initial model fit on labeled data. Early overfitting might thus be propagated to the final model by selecting instances with overconfident but erroneous predictions, often referred to as confirmation bias. This paper introduces BPLS, a Bayesian framework for PLS that aims to mitigate this issue. At its core lies a criterion for selecting instances to label: an analytical approximation of the posterior predictive of pseudo-samples. We derive this selection criterion by proving Bayes-optimality of the posterior predictive of pseudo-samples. We further overcome computational hurdles by approximating the criterion analytically. Its relation to the marginal likelihood allows us to come up with an approximation based on Laplace’s method and the Gaussian integral. We empirically assess BPLS on simulated and real-world data. When faced with high-dimensional data prone to overfitting, BPLS outperforms traditional PLS methods.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Statistics & Data Science

[194]

L. Wimmer, Y. Sale, P. Hofman, B. Bischl and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures?
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Abstract

The quantification of aleatoric and epistemic uncertainty in terms of conditional entropy and mutual information, respectively, has recently become quite common in machine learning. While the properties of these measures, which are rooted in information theory, seem appealing at first glance, we identify various incoherencies that call their appropriateness into question. In addition to the measures themselves, we critically discuss the idea of an additive decomposition of total uncertainty into its aleatoric and epistemic constituents. Experiments across different computer vision tasks support our theoretical findings and raise concerns about current practice in uncertainty quantification.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[193]

A. Stüber, S. Coors and M. Ingrisch.
Revitalize the Potential of Radiomics: Interpretation and Feature Stability in Medical Imaging Analyses through Groupwise Feature Importance.
LB-D-DC @xAI 2023 - Late-breaking Work, Demos and Doctoral Consortium at the 1st World Conference on eXplainable Artificial Intelligence (xAI 2023). Lisbon, Portugal, Jul 26-28, 2023. PDF

Abstract

Radiomics, involving analysis of calculated, quantitative features from medical images with machine learning tools, shares the instability challenge with other high-dimensional data analyses due to variations in the training set. This instability affects model interpretation and feature importance assessment. To enhance stability and interpretability, we introduce grouped feature importance, shedding light on tool limitations and advocating for more reliable radiomics-based analysis methods.

MCML Authors

Stefan Coors

* Former Member

Michael Ingrisch

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

[192]

C. Molnar, T. Freiesleben, G. König, J. Herbinger, T. Reisinger, G. Casalicchio, M. N. Wright and B. Bischl.
Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. DOI

Abstract

Scientists and practitioners increasingly rely on machine learning to model data and draw conclusions. Compared to statistical modeling approaches, machine learning makes fewer explicit assumptions about data structures, such as linearity. However, their model parameters usually cannot be easily related to the data generating process. To learn about the modeled relationships, partial dependence (PD) plots and permutation feature importance (PFI) are often used as interpretation methods. However, PD and PFI lack a theory that relates them to the data generating process. We formalize PD and PFI as statistical estimators of ground truth estimands rooted in the data generating process. We show that PD and PFI estimates deviate from this ground truth due to statistical biases, model variance and Monte Carlo approximation errors. To account for model variance in PD and PFI estimation, we propose the learner-PD and the learner-PFI based on model refits, and propose corrected variance and confidence interval estimators.

MCML Authors

Gunnar König

Dr.

* Former Member

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[191]

T. Nagler.
Statistical Foundations of Prior-Data Fitted Networks.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning. Instead of training the network to an observed training set, a fixed model is pre-trained offline on small, simulated training sets from a variety of tasks. The pre-trained model is then used to infer class probabilities in-context on fresh training sets with arbitrary size and distribution. Empirically, PFNs achieve state-of-the-art performance on tasks with similar size to the ones used in pre-training. Surprisingly, their accuracy further improves when passed larger data sets during inference. This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior. While PFNs are motivated by Bayesian ideas, a purely frequentistic interpretation of PFNs as pre-tuned, but untrained predictors explains their behavior. A predictor’s variance vanishes if its sensitivity to individual training samples does and the bias vanishes only if it is appropriately localized around the test feature. The transformer architecture used in current PFN implementations ensures only the former. These findings shall prove useful for designing architectures with favorable empirical behavior.

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[190]

D. Rügamer.
A New PHO-rmula for Improved Performance of Semi-Structured Networks.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Recent advances to combine structured regression models and deep neural networks for better interpretability, more expressiveness, and statistically valid uncertainty quantification demonstrate the versatility of semi-structured neural networks (SSNs). We show that techniques to properly identify the contributions of the different model components in SSNs, however, lead to suboptimal network estimation, slower convergence, and degenerated or erroneous predictions. In order to solve these problems while preserving favorable model properties, we propose a non-invasive post-hoc orthogonalization (PHO) that guarantees identifiability of model components and provides better estimation and prediction quality. Our theoretical findings are supported by numerical experiments, a benchmark comparison as well as a real-world application to COVID-19 infections.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[189]

J. Goschenhofer, B. Bischl and Z. Kira.
ConstraintMatch for Semi-constrained Clustering.
IJCNN 2023 - International Joint Conference on Neural Networks. Gold Coast Convention and Exhibition Centre, Queensland, Australia, Jul 18-23, 2023. DOI

Abstract

Constrained clustering allows the training of classi-fication models using pairwise constraints only, which are weak and relatively easy to mine, while still yielding full-supervision-level model performance. While they perform well even in the absence of the true underlying class labels, constrained clustering models still require large amounts of binary constraint annotations for training. In this paper, we propose a semi-supervised context whereby a large amount of unconstrained data is available alongside a smaller set of constraints, and propose ConstraintMatch to leverage such unconstrained data. While a great deal of progress has been made in semi-supervised learning using full labels, there are a number of challenges that prevent a naive application of the resulting methods in the constraint-based label setting. Therefore, we reason about and analyze these challenges, specifically 1) proposing a pseudo-constraining mechanism to overcome the confirmation bias, a major weakness of pseudo-labeling, 2) developing new methods for pseudo-labeling towards the selection of informative unconstrained samples, 3) showing that this also allows the use of pairwise loss functions for the initial and auxiliary losses which facilitates semi-constrained model training. In extensive experiments, we demonstrate the effectiveness of ConstraintMatch over relevant baselines in both the regular clustering and overclustering scenarios on five challenging benchmarks and provide analyses of its several components.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[188]

C. Kolb, B. Bischl, C. L. Müller and D. Rügamer.
Sparse Modality Regression.
IWSM 2023 - 37th International Workshop on Statistical Modelling. Dortmund, Germany, Jul 17-21, 2023. Best Paper Award. PDF

Abstract

Deep neural networks (DNNs) enable learning from various data modalities, such as images or text. This concept has also found its way into statistical modelling through the use of semi-structured regression, a model additively combining structured predictors with unstructured effects from arbitrary data modalities learned through a DNN. This paper introduces a new framework called sparse modality regression (SMR). SMR is a regression model combining different data modalities and uses a group lasso-type regularization approach to perform modality selection by zeroing out potentially uninformative modalities.

MCML Authors

Chris Kolb

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[187]

L. Schneider, B. Bischl and J. Thomas.
Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

Abstract

We present a model-agnostic framework for jointly optimizing the predictive performance and interpretability of supervised machine learning models for tabular data. Interpretability is quantified via three measures: feature sparsity, interaction sparsity of features, and sparsity of non-monotone feature effects. By treating hyperparameter optimization of a machine learning algorithm as a multi-objective optimization problem, our framework allows for generating diverse models that trade off high performance and ease of interpretability in a single optimization run. Efficient optimization is achieved via augmentation of the search space of the learning algorithm by incorporating feature selection, interaction and monotonicity constraints into the hyperparameter search space. We demonstrate that the optimization problem effectively translates to finding the Pareto optimal set of groups of selected features that are allowed to interact in a model, along with finding their optimal monotonicity constraints and optimal hyperparameters of the learning algorithm itself. We then introduce a novel evolutionary algorithm that can operate efficiently on this augmented search space. In benchmark experiments, we show that our framework is capable of finding diverse models that are highly competitive or outperform state-of-the-art XGBoost or Explainable Boosting Machine models, both with respect to performance and interpretability.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Janek Thomas

Dr.

* Former Member

[186]

D. Saggau, M. Rezaei, B. Bischl and I. Chalkidis.
Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both efficiency and quality considerations. Therefore, we train Longfomer-based document encoders using a state-of-the-art unsupervised contrastive learning method (SimCSE). Further on, we complement the baseline method -siamese neural network- with additional convex neural networks based on functional Bregman divergence aiming to enhance the quality of the output document representations. We show that overall the combination of a self-contrastive siamese network and our proposed neural Bregman network outperforms the baselines in two linear classification settings on three long document topic classification tasks from the legal and biomedical domains.

MCML Authors

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[185]

B. X. W. Liew, D. Rügamer, Q. Mei, Z. Altai, X. Zhu, X. Zhai and N. Cortes.
Smooth and accurate predictions of joint contact force timeseries in gait using overparameterised deep neural networks.
Frontiers in Bioengineering and Biotechnology 11 (Jul. 2023). DOI

Abstract

Alterations in joint contact forces (JCFs) are thought to be important mechanisms for the onset and progression of many musculoskeletal and orthopaedic pain disorders. Computational approaches to JCFs assessment represent the only non-invasive means of estimating in-vivo forces; but this cannot be undertaken in free-living environments. Here, we used deep neural networks to train models to predict JCFs, using only joint angles as predictors. Our neural network models were generally able to predict JCFs with errors within published minimal detectable change values. The errors ranged from the lowest value of 0.03 bodyweight (BW) (ankle medial-lateral JCF in walking) to a maximum of 0.65BW (knee VT JCF in running). Interestingly, we also found that over parametrised neural networks by training on longer epochs (>100) resulted in better and smoother waveform predictions. Our methods for predicting JCFs using only joint kinematics hold a lot of promise in allowing clinicians and coaches to continuously monitor tissue loading in free-living environments.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Statistics, Data Science and Machine Learning

[184]

C. Fritz, G. De Nicola, S. Kevork, D. Harhoff and G. Kauermann.
Modelling the large and dynamically growing bipartite network of German patents and inventors.
Journal of the Royal Statistical Society. Series A (Statistics in Society) 186.3 (Jul. 2023). DOI

Abstract

To explore the driving forces behind innovation, we analyse the dynamic bipartite network of all inventors and patents registered within the field of electrical engineering in Germany in the past two decades. To deal with the sheer size of the data, we decompose the network by exploiting the fact that most inventors tend to only stay active for a relatively short period. We thus propose a Temporal Exponential Random Graph Model with time-varying actor set and sufficient statistics mirroring substantial expectations for our analysis. Our results corroborate that inventor characteristics and team formation are essential to the dynamics of invention.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[183]

I. van Mechelen, A.-L. Boulesteix, R. Dangl, N. Dean, C. Hennig, F. Leisch, D. Steinley and M. J. Warrens.
A white paper on good research practices in benchmarking: The case of cluster analysis.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13.6 (Jul. 2023). DOI

Abstract

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance, requiring that proposals of new methods are extensively and carefully compared with their best predecessors, and existing methods subjected to neutral comparison studies. Answers to benchmarking questions should be evidence-based, with the relevant evidence being collected through well-thought-out procedures, in reproducible and replicable ways. In the present paper, we review good research practices in benchmarking from the perspective of the area of cluster analysis. Discussion is given to the theoretical, conceptual underpinnings of benchmarking based on simulated and empirical data in this context. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made based on existing literature.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[182]

M. Aßenmacher, N. Sauter and C. Heumann.
Classifying multilingual party manifestos: Domain transfer across country, time, and genre.
Preprint (Jul. 2023). arXiv

Abstract

Annotating costs of large corpora are still one of the main bottlenecks in empirical social science research. On the one hand, making use of the capabilities of domain transfer allows re-using annotated data sets and trained models. On the other hand, it is not clear how well domain transfer works and how reliable the results are for transfer across different dimensions. We explore the potential of domain transfer across geographical locations, languages, time, and genre in a large-scale database of political manifestos. First, we show the strong within-domain classification performance of fine-tuned transformer models. Second, we vary the genre of the test set across the aforementioned dimensions to test for the fine-tuned models’ robustness and transferability. For switching genres, we use an external corpus of transcribed speeches from New Zealand politicians while for the other three dimensions, custom splits of the Manifesto database are used. While BERT achieves the best scores in the initial experiments across modalities, DistilBERT proves to be competitive at a lower computational expense and is thus used for further experiments across time and country. The results of the additional analysis show that (Distil)BERT can be applied to future data with similar performance. Moreover, we observe (partly) notable differences between the political manifestos of different countries of origin, even if these countries share a language or a cultural background.

MCML Authors

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[181]

C. Kolb, C. L. Müller, B. Bischl and D. Rügamer.
Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization.
Preprint (Jul. 2023). arXiv

Abstract

We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning. The proposed optimization transfer comprises an overparameterization of selected parameters and a change of penalties. In the overparametrized problem, smooth surrogate regularization induces non-smooth, sparse regularization in the base parametrization. We prove that the surrogate objective is equivalent in the sense that it not only has identical global minima but also matching local minima, thereby avoiding the introduction of spurious solutions. Additionally, our theory establishes results of independent interest regarding matching local minima for arbitrary, potentially unregularized, objectives. We comprehensively review sparsity-inducing parametrizations across different fields that are covered by our general theory, extend their scope, and propose improvements in several aspects. Numerical experiments further demonstrate the correctness and effectiveness of our approach on several sparse learning problems ranging from high-dimensional regression to sparse neural network training.

MCML Authors

Chris Kolb

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[180]

S. Kaminwar, J. Goschenhofer, J. Thomas, I. Thon and B. Bischl.
Structured Verification of Machine Learning Models in Industrial Settings.
Big Data 11.3 (Jun. 2023). DOI

Abstract

The use of machine learning (ML) allows us to automate and scale the decision-making processes. The key to this automation is the development of ML models that generalize training data toward unseen data. Such models can become extremely versatile and powerful, which makes democratization of artificial intelligence (AI) possible, that is, providing ML to non-ML experts such as software engineers or domain experts. Typically, automated ML (AutoML) is being referred to as a key step toward it. However, from our perspective, we believe that democratization of the verification process of ML systems is a larger and even more crucial challenge to achieve the democratization of AI. Currently, the process of ensuring that an ML model works as intended is unstructured. It is largely based on experience and domain knowledge that cannot be automated. The current approaches such as cross-validation or explainable AI are not enough to overcome the real challenges and are discussed extensively in this article. Arguing toward structured verification approaches, we discuss a set of guidelines to verify models, code, and data in each step of the ML lifecycle. These guidelines can help to reliably measure and select an optimal solution, besides minimizing the risk of bugs and undesired behavior in edge-cases.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[179]

M. Rezaei, A. Vahidi, B. Bischl, T. Elze and M. Eslami.
Self-supervised Learning and Self-labeling Framework for Glaucoma Detection.
Investigative Ophthalmology and Visual Science 64.8 (Jun. 2023). URL

Abstract

Purpose: Self-supervised learning methods have made a significant impact in recent years on different domains, such as natural language processing and computer vision. Here, we develop a new self-supervised framework for simultaneous retina image clustering and self-supervised representation learning to enhance the diagnosis of glaucoma.
Methods: The network is optimized using both a contrastive self-supervised network and a clustering network that clustering helps to improve the embedding representation. Our method comprises two parallel deep networks; 1) a representation network which is a self-supervised contrastive representation network that takes two augmented views of the retina image, and 2) an image clustering or self-labeling network that takes original retina images. The representation network first projects the augmented views onto an embedding space. Then it processes these representations in a multi-layer perceptron head, which generates the baseline for the pair-wise contrastive objective. On the other hand, the clustering network performs KL divergence on the top embedding layer of the representation network.
Results: We train our framework for simultaneous representation learning and self-labeling using a clustering network. We follow standard protocols by self-supervised learning for empirical analysis and evaluate the learned representation of our model by classification (Table 1), as well as image clustering tasks (Table 2) on two different Glaucoma datasets. According to the result shown in Table 1, our method improves the results of Glaucoma classification by up to 14%, better compared to SOTA self-supervised algorithm in terms of F1 score and 2% better for the task of clustering. Glaucoma-1 is composed of the labeled subset of the human retinal images used in [1]. This dataset contains 2,397 images in total, with 956 glaucoma diagnoses. While the training set for Glaucoma-2 [2] was released by the REFUGE-2 challenge.
Conclusions: We showed that combining self-supervised representation learning along with self-labeling improves the learned representation compared to the existing self-supervised learning models on retina-based glaucoma detection by up to 14% better. Moreover, our method outperformed other self-supervised methods for image clustering tasks.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[178]

J. Moosbauer.
Towards explainable automated machine learning.
Dissertation 2023. DOI

Abstract

This thesis explores the intersection of Automated Machine Learning (AutoML) and explainable AI, addressing the need for transparency at multiple levels: the model, the learning algorithm, and the AutoML system itself. The work develops methods for enhancing model explainability through multi-objective hyperparameter optimization (HPO) and introduces new techniques to understand the effects of hyperparameters and optimizers within AutoML systems. These contributions advance the field by providing more interpretable and reliable tools for AutoML, ultimately increasing the accessibility and trustworthiness of machine learning models and their deployment. (Shortened.)

MCML Authors

Julia Moosbauer

Dr.

* Former Member

[177]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis.
PAKDD 2023 - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka, Japan, May 25-28, 2023. DOI

Abstract

While recent advances in large-scale foundational models show promising results, their application to the medical domain has not yet been explored in detail. In this paper, we progress into the realms of large-scale modeling in medical synthesis by proposing Cheff - a foundational cascaded latent diffusion model, which generates highly-realistic chest radiographs providing state-of-the-art quality on a 1-megapixel scale. We further propose MaCheX, which is a unified interface for public chest datasets and forms the largest open collection of chest X-rays up to date. With Cheff conditioned on radiological reports, we further guide the synthesis process over text prompts and unveil the research area of report-to-chest-X-ray generation.

MCML Authors

Tobias Weber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[176]

K. Rath, D. Rügamer, B. Bischl, U. von Toussaint and C. G. Albert.
Dependent state space Student-t processes for imputation and data augmentation in plasma diagnostics.
Contributions to Plasma Physics 63.5-6 (May. 2023). DOI

Abstract

Multivariate time series measurements in plasma diagnostics present several challenges when training machine learning models: the availability of only a few labeled data increases the risk of overfitting, and missing data points or outliers due to sensor failures pose additional difficulties. To overcome these issues, we introduce a fast and robust regression model that enables imputation of missing points and data augmentation by massive sampling while exploiting the inherent correlation between input signals. The underlying Student-t process allows for a noise distribution with heavy tails and thus produces robust results in the case of outliers. We consider the state space form of the Student-t process, which reduces the computational complexity and makes the model suitable for high-resolution time series. We evaluate the performance of the proposed method using two test cases, one of which was inspired by measurements of flux loop signals.

MCML Authors

Katharina Röck (née Rath)

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[175]

T. Pielok, B. Bischl and D. Rügamer.
Approximate Bayesian Inference with Stein Functional Variational Gradient Descent.
ICLR 2023 - 11th International Conference on Learning Representations. Kigali, Rwanda, May 01-05, 2023. URL

Abstract

We propose a general-purpose variational algorithm that forms a natural analogue of Stein variational gradient descent (SVGD) in function space. While SVGD successively updates a set of particles to match a target density, the method introduced here of Stein functional variational gradient descent (SFVGD) updates a set of particle functions to match a target stochastic process (SP). The update step is found by minimizing the functional derivative of the Kullback-Leibler divergence between SPs. SFVGD can either be used to train Bayesian neural networks (BNNs) or for ensemble gradient boosting. We show the efficacy of training BNNs with SFVGD on various real-world datasets.

MCML Authors

Tobias Pielok

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[174]

E. Dorigatti, B. Schubert, B. Bischl and D. Rügamer.
Frequentist Uncertainty Quantification in Semi-Structured Neural Networks.
AISTATS 2023 - 26th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, Apr 25-27, 2023. URL

Abstract

Semi-structured regression (SSR) models jointly learn the effect of structured (tabular) and unstructured (non-tabular) data through additive predictors and deep neural networks (DNNs), respectively. Inference in SSR models aims at deriving confidence intervals for the structured predictor, although current approaches ignore the variance of the DNN estimation of the unstructured effects. This results in an underestimation of the variance of the structured coefficients and, thus, an increase of Type-I error rates. To address this shortcoming, we present here a theoretical framework for structured inference in SSR models that incorporates the variance of the DNN estimate into confidence intervals for the structured predictor. By treating this estimate as a random offset with known variance, our formulation is agnostic to the specific deep uncertainty quantification method employed. Through numerical experiments and a practical application on a medical dataset, we show that our approach results in increased coverage of the true structured coefficients and thus a reduction in Type-I error rate compared to ignoring the variance of the neural network, naive ensembling of SSR models, and a variational inference baseline.

MCML Authors

Emilio Dorigatti

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Statistics, Data Science and Machine Learning

[173]

G. Keropyan, D. Strieder and M. Drton.
Rank-Based Causal Discovery for Post-Nonlinear Models.
AISTATS 2023 - 26th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, Apr 25-27, 2023. URL

Abstract

Learning causal relationships from empirical observations is a central task in scientific research. A common method is to employ structural causal models that postulate noisy functional relations among a set of interacting variables. To ensure unique identifiability of causal directions, researchers consider restricted subclasses of structural causal models. Post-nonlinear (PNL) causal models constitute one of the most flexible options for such restricted subclasses, containing in particular the popular additive noise models as a further subclass. However, learning PNL models is not well studied beyond the bivariate case. The existing methods learn non-linear functional relations by minimizing residual dependencies and subsequently test independence from residuals to determine causal orientations. However, these methods can be prone to overfitting and, thus, difficult to tune appropriately in practice. As an alternative, we propose a new approach for PNL causal discovery that uses rank-based methods to estimate the functional parameters. This new approach exploits natural invariances of PNL models and disentangles the estimation of the non-linear functions from the independence tests used to find causal orientations. We prove consistency of our method and validate our results in numerical experiments.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Statistics

[172]

C. Luther, G. König and M. Grosse-Wentrup.
Efficient SAGE Estimation via Causal Structure Learning.
AISTATS 2023 - 26th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, Apr 25-27, 2023. URL

Abstract

The Shapley Additive Global Importance (SAGE) value is a theoretically appealing interpretability method that fairly attributes global importance to a model’s features. However, its exact calculation requires the computation of the feature’s surplus performance contributions over an exponential number of feature sets. This is computationally expensive, particularly because estimating the surplus contributions requires sampling from conditional distributions. Thus, SAGE approximation algorithms only take a fraction of the feature sets into account. We propose $d$-SAGE, a method that accelerates SAGE approximation. $d$-SAGE is motivated by the observation that conditional independencies (CIs) between a feature and the model target imply zero surplus contributions, such that their computation can be skipped. To identify CIs, we leverage causal structure learning (CSL) to infer a graph that encodes (conditional) independencies in the data as $d$-separations. This is computationally more efficient because the expense of the one-time graph inference and the $d$-separation queries is negligible compared to the expense of surplus contribution evaluations. Empirically we demonstrate that $d$-SAGE enables the efficient and accurate estimation of SAGE values.

MCML Authors

Gunnar König

Dr.

* Former Member

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

[171]

M. Feurer, K. Eggensperger, E. Bergman, F. Pfisterer, B. Bischl and F. Hutter.
Mind the Gap: Measuring Generalization Performance Across Multiple Objectives.
IDA 2023 - 21st International Symposium on Intelligent Data Analysis. Louvain-la-Neuve, Belgium, Apr 12-14, 2023. DOI

Abstract

Modern machine learning models are often constructed taking into account multiple objectives, e.g., minimizing inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models, and the approximation of the Pareto front is used to assess their performance. In practice, we also want to measure generalization when moving from the validation to the test set. However, some of the models might no longer be Pareto-optimal which makes it unclear how to quantify the performance of the MHPO method when evaluated on the test set. To resolve this, we provide a novel evaluation protocol that allows measuring the generalization performance of MHPO methods and studying its capabilities for comparing two optimization experiments.

MCML Authors

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[170]

D. Schalk, B. Bischl and D. Rügamer.
Accelerated Componentwise Gradient Boosting Using Efficient Data Representation and Momentum-Based Optimization.
Journal of Computational and Graphical Statistics 32.2 (Apr. 2023). DOI

Abstract

Componentwise boosting (CWB), also known as model-based boosting, is a variant of gradient boosting that builds on additive models as base learners to ensure interpretability. CWB is thus often used in research areas where models are employed as tools to explain relationships in data. One downside of CWB is its computational complexity in terms of memory and runtime. In this article, we propose two techniques to overcome these issues without losing the properties of CWB: feature discretization of numerical features and incorporating Nesterov momentum into functional gradient descent. As the latter can be prone to early overfitting, we also propose a hybrid approach that prevents a possibly diverging gradient descent routine while ensuring faster convergence. Our adaptions improve vanilla CWB by reducing memory consumption and speeding up the computation time per iteration (through feature discretization) while also enabling CWB learn faster and hence to require fewer iterations in total using momentum. We perform extensive benchmarks on multiple simulated and real-world datasets to demonstrate the improvements in runtime and memory consumption while maintaining state-of-the-art estimation and prediction performance.

MCML Authors

Daniel Schalk

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistics, Data Science and Machine Learning

[169]

M. Herrmann, F. Pfisterer and F. Scheipl.
A geometric framework for outlier detection in high-dimensional data.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery e1491 (Apr. 2023). DOI

Abstract

Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework which exploits the metric structure of a data set. Our approach rests on the manifold assumption, that is, that the observed, nominally high-dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with manifold learning methods. We show that exploiting this structure significantly improves the detection of outlying observations in high dimensional data. We also suggest a novel, mathematically precise and widely applicable distinction between distributional and structural outliers based on the geometry and topology of the data manifold that clarifies conceptual ambiguities prevalent throughout the literature. Our experiments focus on functional data as one class of structured high-dimensional data, but the framework we propose is completely general and we include image and graph data applications. Our results show that the outlier structure of high-dimensional and non-tabular data can be detected and visualized using manifold learning methods and quantified using standard outlier scoring methods applied to the manifold embedding vectors.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Functional Data Analysis

[168]

S. Dandl, A. Hofheinz, M. Binder, B. Bischl and G. Casalicchio.
counterfactuals: An R Package for Counterfactual Explanation Methods.
Preprint (Apr. 2023). arXiv

Abstract

Counterfactual explanation methods provide information on how feature values of individual observations must be changed to obtain a desired prediction. Despite the increasing amount of proposed methods in research, only a few implementations exist whose interfaces and requirements vary widely. In this work, we introduce the counterfactuals R package, which provides a modular and unified R6-based interface for counterfactual explanation methods. We implemented three existing counterfactual explanation methods and propose some optional methodological extensions to generalize these methods to different scenarios and to make them more comparable. We explain the structure and workflow of the package using real use cases and show how to integrate additional counterfactual explanation methods into the package. In addition, we compared the implemented methods for a variety of models and datasets with regard to the quality of their counterfactual explanations and their runtime behavior.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[167]

D. Schalk.
Modern approaches for component-wise boosting: Automation, efficiency, and distributed computing with application to the medical domain.
Dissertation 2023. DOI

Abstract

This thesis focuses on enhancing component-wise boosting (CWB) by improving its efficiency and usability, particularly in high-dimensional feature spaces and distributed data settings. Key contributions include the optimization of the CWB algorithm through Nesterov’s momentum for faster fitting and reduced memory usage, as well as the development of the Autocompboost framework to integrate CWB with AutoML, emphasizing model interpretability. Additionally, the thesis introduces methods for evaluating binary classification models on distributed data using ROC analysis, and presents several R packages (compboost, dsCWB, Autocompboost, dsBinVal) that implement these advances. (Shortened.)

MCML Authors

Daniel Schalk

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[166]

J. Moosbauer, G. Casalicchio, M. Lindauer and B. Bischl.
Improving Accuracy of Interpretability Measures in Hyperparameter Optimization via Bayesian Algorithm Execution.
COSEAL 2023 - Workshop on Configuration and Selection of Algorithms. Paris, France, Mar 06-08, 2023. arXiv

Abstract

Despite all the benefits of automated hyperparameter optimization (HPO), most modern HPO algorithms are black-boxes themselves. This makes it difficult to understand the decision process which leads to the selected configuration, reduces trust in HPO, and thus hinders its broad adoption. Here, we study the combination of HPO with interpretable machine learning (IML) methods such as partial dependence plots. These techniques are more and more used to explain the marginal effect of hyperparameters on the black-box cost function or to quantify the importance of hyperparameters. However, if such methods are naively applied to the experimental data of the HPO process in a post-hoc manner, the underlying sampling bias of the optimizer can distort interpretations. We propose a modified HPO method which efficiently balances the search for the global optimum w.r.t. predictive performance and the reliable estimation of IML explanations of an underlying black-box function by coupling Bayesian optimization and Bayesian Algorithm Execution. On benchmark cases of both synthetic objectives and HPO of a neural network, we demonstrate that our method returns more reliable explanations of the underlying black-box without a loss of optimization performance.

MCML Authors

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

[165]

T. Ullmann, A. Beer, M. Hünemörder, T. Seidl and A.-L. Boulesteix.
Over-optimistic evaluation and reporting of novel cluster algorithms: An illustrative study.
Advances in Data Analysis and Classification 17 (Mar. 2023). DOI

Abstract

When researchers publish new cluster algorithms, they usually demonstrate the strengths of their novel approaches by comparing the algorithms’ performance with existing competitors. However, such studies are likely to be optimistically biased towards the new algorithms, as the authors have a vested interest in presenting their method as favorably as possible in order to increase their chances of getting published. Therefore, the superior performance of newly introduced cluster algorithms is over-optimistic and might not be confirmed in independent benchmark studies performed by neutral and unbiased authors. This problem is known among many researchers, but so far, the different mechanisms leading to over-optimism in cluster algorithm evaluation have never been systematically studied and discussed. Researchers are thus often not aware of the full extent of the problem. We present an illustrative study to illuminate the mechanisms by which authors—consciously or unconsciously—paint their cluster algorithm’s performance in an over-optimistic light. Using the recently published cluster algorithm Rock as an example, we demonstrate how optimization of the used datasets or data characteristics, of the algorithm’s parameters and of the choice of the competing cluster algorithms leads to Rock’s performance appearing better than it actually is. Our study is thus a cautionary tale that illustrates how easy it can be for researchers to claim apparent ‘superiority’ of a new cluster algorithm. This illuminates the vital importance of strategies for avoiding the problems of over-optimism (such as, e.g., neutral benchmark studies), which we also discuss in the article.

MCML Authors

Theresa Ullmann

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[164]

T. Nagler and T. Vatter.
Solving Estimating Equations With Copulas.
Journal of the American Statistical Association 119.546 (Mar. 2023). DOI

Abstract

MCML Authors

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[163]

B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A.-L. Boulesteix, D. Deng and M. Lindauer.
Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13.2 (Mar. 2023). DOI

Abstract

Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Tobias Pielok

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Theresa Ullmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[162]

G. König, T. Freiesleben and M. Grosse-Wentrup.
Improvement-focused causal recourse (ICR).
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI

Abstract

Algorithmic recourse recommendations, such as Karimi et al.’s (2021) causal recourse (CR), inform stakeholders of how to act to revert unfavorable decisions. However, there are actions that lead to acceptance (i.e., revert the model’s decision) but do not lead to improvement (i.e., may not revert the underlying real-world state). To recommend such actions is to recommend fooling the predictor. We introduce a novel method, Improvement-Focused Causal Recourse (ICR), which involves a conceptual shift: Firstly, we require ICR recommendations to guide toward improvement. Secondly, we do not tailor the recommendations to be accepted by a specific predictor. Instead, we leverage causal knowledge to design decision systems that predict accurately pre- and post-recourse. As a result, improvement guarantees translate into acceptance guarantees. We demonstrate that given correct causal knowledge ICR, in contrast to existing approaches, guides toward both acceptance and improvement.

MCML Authors

Gunnar König

Dr.

* Former Member

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

[161]

D. Rügamer, C. Kolb and N. Klein.
Semi-Structured Distributional Regression.
American Statistician (Feb. 2023). DOI

Abstract

Combining additive models and neural networks allows to broaden the scope of statistical regression and extends deep learning-based approaches by interpretable structured additive predictors at the same time. Existing approaches uniting the two modeling approaches are, however, limited to very specific combinations and, more importantly, involve an identifiability issue. As a consequence, interpretability and stable estimation is typically lost. We propose a general framework to combine structured regression models and deep neural networks into a unifying network architecture. To overcome the inherent identifiability issues between different model parts, we construct an orthogonalization cell that projects the deep neural network into the orthogonal complement of the statistical model predictor. This enables proper estimation of structured model parts and thereby interpretability. We demonstrate the framework’s efficacy in numerical experiments and illustrate its special merits in benchmarks and real-world applications.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Chris Kolb

Statistical Learning and Data Science

[160]

D. Rügamer, P. Baumann, T. Kneib and T. Hothorn.
Probabilistic Time Series Forecasts with Autoregressive Transformation Models.
Statistics and Computing 33.2 (Feb. 2023). DOI

Abstract

Probabilistic forecasting of time series is an important matter in many applications and research fields. In order to draw conclusions from a probabilistic forecast, we must ensure that the model class used to approximate the true forecasting distribution is expressive enough. Yet, characteristics of the model itself, such as its uncertainty or its feature-outcome relationship are not of lesser importance. This paper proposes Autoregressive Transformation Models (ATMs), a model class inspired by various research directions to unite expressive distributional forecasts using a semi-parametric distribution assumption with an interpretable model specification. We demonstrate the properties of ATMs both theoretically and through empirical evaluation on several simulated and real-world forecasting datasets.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[159]

D. Schalk, V. S. Hoffmann, B. Bischl and U. Mansmann.
dsBinVal: Conducting distributed ROC analysis using DataSHIELD.
The Journal of Open Source Software 8.82 (Feb. 2023). DOI

Abstract

Our R (R Core Team, 2021) package dsBinVal implements the methodology explained by Schalk et al. (2022). It extends the ROC-GLM (Pepe, 2000) to distributed data by using techniques of differential privacy (Dwork et al., 2006) and the idea of sharing highly aggregated values only. The package also exports functionality to calculate distributed calibration curves and assess the calibration. Using the package allows us to evaluate a prognostic model based on a binary outcome using the DataSHIELD (Gaye et al., 2014) framework. Therefore, the main functionality makes it able to 1) compute the receiver operating characteristic (ROC) curve using the ROC-GLM from which 2) the area under the curve (AUC) and confidence intervals (CI) are derived to conduct hypothesis testing according to DeLong et al. (1988). Furthermore, 3) the calibration can be assessed distributively via calibration curves and the Brier score. Visualizing the approximated ROC curve, the AUC with confidence intervals, and the calibration curves using ggplot2 is also supported. Examples can be found in the README file of the repository.

MCML Authors

Daniel Schalk

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[158]

D. Rügamer.
mixdistreg: An R Package for Fitting Mixture of Experts Distributional Regression with Adaptive First-order Methods.
Preprint (Feb. 2023). arXiv

Abstract

This paper presents a high-level description of the R software package mixdistreg to fit mixture of experts distributional regression models. The proposed framework is implemented in R using the deepregression software template, which is based on TensorFlow and follows the neural structured additive learning principle. The software comprises various approaches as special cases, including mixture density networks and mixture regression approaches. Various code examples are given to demonstrate the package’s functionality.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[157]

F. Ott.
Representation learning for domain adaptation and cross-modal retrieval: in the context of online handwriting recognition and visual self-localization.
Dissertation 2023. DOI

Abstract

This thesis focuses on domain adaptation and cross-modal retrieval to address the challenges posed by domain shifts in machine learning applications. Specifically, it explores techniques for online handwriting recognition and visual self-localization. For handwriting recognition, the study uses deep metric learning and optimal transport to reduce domain shifts between different writing styles and writing modalities, while for visual self-localization, it enhances pose prediction through auxiliary tasks and representation learning fusion techniques to improve accuracy across sensor modalities. (Shortened.)

MCML Authors

Felix Ott

Dr.

* Former Member

[156]

T. Ullmann, S. Peschel, P. Finger, C. L. Müller and A.-L. Boulesteix.
Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering.
PLOS Computational Biology 19.1 (Jan. 2023). DOI

Abstract

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.

MCML Authors

Theresa Ullmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

Stefanie Peschel

C2 | Biology
→ Group Christian Müller

Biomedical Statistics and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Anne-Laure Boulesteix

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Biometry in Molecular Medicine

[155]

I. Ziegler, B. Ma, B. Bischl, E. Dorigatti and B. Schubert.
Proteasomal cleavage prediction: state-of-the-art and future directions.
Preprint (2023). DOI GitHub

Abstract

Epitope vaccines are a promising approach for precision treatment of pathogens, cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate proteasomal cleavage prediction to ensure that the epitopes included in the vaccine trigger an immune response. The performance of proteasomal cleavage predictors has been steadily improving over the past decades owing to increasing data availability and methodological advances. In this review, we summarize the current proteasomal cleavage prediction landscape and, in light of recent progress in the field of deep learning, develop and compare a wide range of recent architectures and techniques, including long short-term memory (LSTM), transformers, and convolutional neural networks (CNN), as well as four different denoising techniques. All open-source cleavage predictors re-trained on our dataset performed within two AUC percentage points. Our comprehensive deep learning architecture benchmark improved performance by 1.7 AUC percentage points, while closed-source predictors performed considerably worse. We found that a wide range of architectures and training regimes all result in very similar performance, suggesting that the specific modeling approach employed has a limited impact on predictive performance compared to the specifics of the dataset employed. We speculate that the noise and implicit nature of data acquisition techniques used for training proteasomal cleavage prediction models and the complexity of biological processes of the antigen processing pathway are the major limiting factors. While biological complexity can be tackled by more data and, to a lesser extent, better models, noise and randomness inherently limit the maximum achievable predictive performance.

MCML Authors

Bolei Ma

Social Data Science and AI

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Emilio Dorigatti

Dr.

* Former Member

[154]

J. Goschenhofer, P. Ragupathy, C. Heumann, B. Bischl and M. Aßenmacher.
CC-Top: Constrained Clustering for Dynamic Topic Discovery.
EvoNLP 2022 - 1st Workshop on Ever Evolving NLP. Abu Dhabi, United Arab Emirates, Dec 07, 2022. URL

Abstract

Research on multi-class text classification of short texts mainly focuses on supervised (transfer) learning approaches, requiring a finite set of pre-defined classes which is constant over time. This work explores deep constrained clustering (CC) as an alternative to supervised learning approaches in a setting with a dynamically changing number of classes, a task we introduce as dynamic topic discovery (DTD).We do so by using pairwise similarity constraints instead of instance-level class labels which allow for a flexible number of classes while exhibiting a competitive performance compared to supervised approaches. First, we substantiate this through a series of experiments and show that CC algorithms exhibit a predictive performance similar to state-of-the-art supervised learning algorithms while requiring less annotation effort. Second, we demonstrate the overclustering capabilities of deep CC for detecting topics in short text data sets in the absence of the ground truth class cardinality during model training. Third, we showcase that these capabilities can be leveraged for the DTD setting as a step towards dynamic learning over time and finally, we release our codebase to nurture further research in this area.

MCML Authors

Jann Goschenhofer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[153]

T. Ullmann.
Evaluation of clustering results and novel cluster algorithms: a metascientific perspective.
Dissertation 2022. DOI

Abstract

This dissertation addresses the reliability of clustering results and the evaluation of new clustering algorithms, particularly in light of the replication crisis in scientific research. The first contribution presents a framework for validating clustering results using validation data, ensuring the replicability and generalizability of findings. The second contribution quantifies over-optimistic bias in microbiome research by analyzing the effects of multiple analysis strategies on unsupervised tasks, while the third contribution highlights the over-optimism in evaluating new clustering algorithms, using the example of the ‘Rock’ algorithm, and advocates for more rigorous and neutral benchmarking methods. (Shortened.)

MCML Authors

Theresa Ullmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

[152]

R. Foygel Barber, M. Drton, N. Sturma and L. Weihs.
Half-trek criterion for identifiability of latent variable models.
Annals of Statistics 50.6 (Dec. 2022). DOI

Abstract

We consider linear structural equation models with latent variables and develop a criterion to certify whether the direct causal effects between the observable variables are identifiable based on the observed covariance matrix. Linear structural equation models assume that both observed and latent variables solve a linear equation system featuring stochastic noise terms. Each model corresponds to a directed graph whose edges represent the direct effects that appear as coefficients in the equation system. Prior research has developed a variety of methods to decide identifiability of direct effects in a latent projection framework, in which the confounding effects of the latent variables are represented by correlation among noise terms. This approach is effective when the confounding is sparse and effects only small subsets of the observed variables. In contrast, the new latent-factor half-trek criterion (LF-HTC) we develop in this paper operates on the original unprojected latent variable model and is able to certify identifiability in settings, where some latent variables may also have dense effects on many or even all of the observables. Our LF-HTC is an effective sufficient criterion for rational identifiability, under which the direct effects can be uniquely recovered as rational functions of the joint covariance matrix of the observed random variables. When restricting the search steps in LF-HTC to consider subsets of latent variables of bounded size, the criterion can be verified in time that is polynomial in the size of the graph.

MCML Authors

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathematical Statistics

Nils Sturma

Mathematical Statistics

[151]

K. Lotto, T. Nagler and M. Radic.
Modeling Stochastic Data Using Copulas for Applications in the Validation of Autonomous Driving.
Electronics 11.24 (Dec. 2022). DOI

Abstract

The verification and validation processes of fully automated vehicles are linked to an almost intractable challenge of reflecting the real world with all its interactions in a virtual environment. Influential stochastic parameters need to be extracted from real-world measurements and real-time data, capturing all interdependencies, for an accurate simulation of reality. A copula is a probability model that represents a multivariate distribution, examining the dependence between the underlying variables. This model is used on drone measurement data from a roundabout containing dependent stochastic parameters. With the help of the copula model, samples are generated that reflect the real-time data. The resulting applications and possible extensions are discussed and explored.

MCML Authors

Thomas Nagler

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Computational Statistics & Data Science

[150]

C. Fritz, G. De Nicola, F. Günther, D. Rügamer, M. Rave, M. Schneble, A. Bender, M. Weigert, R. Brinks, A. Hoyer, U. Berger, H. Küchenhoff and G. Kauermann.
Challenges in Interpreting Epidemiological Surveillance Data – Experiences from Germany.
Journal of Computational and Graphical Statistics 32.3 (Dec. 2022). DOI

Abstract

As early as March 2020, the authors of this letter started to work on surveillance data to obtain a clearer picture of the pandemic’s dynamic. This letter outlines the lessons learned during this peculiar time, emphasizing the benefits that better data collection, management, and communication processes would bring to the table. We further want to promote nuanced data analyses as a vital element of general political discussion as opposed to drawing conclusions from raw data, which are often flawed in epidemiological surveillance data, and therefore underline the overall need for statistics to play a more central role in public discourse.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[149]

M. Rezaei, E. Dorigatti, D. Rügamer and B. Bischl.
Joint Debiased Representation Learning and Imbalanced Data Clustering.
ICDMW 2022 - IEEE International Conference on Data Mining Workshops. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI

Abstract

One of the most promising approaches for unsu-pervised learning is combining deep representation learning and deep clustering. Some recent works propose to simultaneously learn representation using deep neural networks and perform clustering by defining a clustering loss on top of embedded features. However, these approaches are sensitive to imbalanced data and out-of-distribution samples. As a consequence, these methods optimize clustering by pushing data close to randomly initialized cluster centers. This is problematic when the number of instances varies largely in different classes or a cluster with few samples has less chance to be assigned a good centroid. To overcome these limitations, we introduce a new unsupervised framework for joint debiased representation learning and image clustering. We simultaneously train two deep learning models, a deep representation network that captures the data distribution, and a deep clustering network that learns embedded features and performs clustering. Specifically, the clustering network and learning representation network both take advantage of our proposed statistics pooling block that represents mean, variance, and cardinality to handle the out-of-distribution samples and class imbalance. Our experiments show that using these repre-sentations, one can considerably improve results on imbalanced image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to the out-of-distribution dataset.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[148]

N. Hurmer, X.-Y. To, M. Binder, H. A. Gündüz, P. C. Münch, R. Mreches, A. C. McHardy, B. Bischl and M. Rezaei.
Transformer Model for Genome Sequence Analysis.
LMRL @NeurIPS 2022 - Workshop on Learning Meaningful Representations of Life at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

One major challenge of applying machine learning in genomics is the scarcity of labeled data, which often requires expensive and time-consuming physical experimentation under laboratory conditions to obtain. However, the advent of high throughput sequencing has made large quantities of unlabeled genome data available. This can be used to apply semi-supervised learning methods through representation learning. In this paper, we investigate the impact of a popular and well-established language model, namely BERT [Devlin et al., 2018], for sequence genome analysis. Specifically, we adapt DNABERT [Ji et al., 2021] to GenomeNet-BERT in order to produce useful representations for downstream tasks such as classification and semi10 supervised learning. We explore different pretraining setups and compare their performance on a virus genome classification task to strictly supervised training and baselines on different training set size setups. The conducted experiments show that this architecture provides an increase in performance compared to existing methods at the cost of more resource-intensive training.

MCML Authors

Xiao-Yin To

Statistical Learning and Data Science

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Hüseyin Anil Gündüz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[147]

I. Ziegler, B. Ma, E. Nie, B. Bischl, D. Rügamer, B. Schubert and E. Dorigatti.
What cleaves? Is proteasomal cleavage prediction reaching a ceiling?
LMRL @NeurIPS 2022 - Workshop on Learning Meaningful Representations of Life at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Epitope vaccines are a promising direction to enable precision treatment for cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate prediction of proteasomal cleavage in order to ensure that the epitopes in the vaccine are presented to T cells by the major histocompatibility complex (MHC). While direct identification of proteasomal cleavage in vitro is cumbersome and low throughput, it is possible to implicitly infer cleavage events from the termini of MHC-presented epitopes, which can be detected in large amounts thanks to recent advances in high-throughput MHC ligandomics. Inferring cleavage events in such a way provides an inherently noisy signal which can be tackled with new developments in the field of deep learning that supposedly make it possible to learn predictors from noisy labels. Inspired by such innovations, we sought to modernize proteasomal cleavage predictors by benchmarking a wide range of recent methods, including LSTMs, transformers, CNNs, and denoising methods, on a recently introduced cleavage dataset. We found that increasing model scale and complexity appeared to deliver limited performance gains, as several methods reached about 88.5% AUC on C-terminal and 79.5% AUC on N-terminal cleavage prediction. This suggests that the noise and/or complexity of proteasomal cleavage and the subsequent biological processes of the antigen processing pathway are the major limiting factors for predictive performance rather than the specific modeling approach used. While biological complexity can be tackled by more data and better models, noise and randomness inherently limit the maximum achievable predictive performance.

MCML Authors

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Emilio Dorigatti

Dr.

* Former Member

[146]

E. Pretzsch, V. Heinemann, S. Stintzing, A. Bender, S. Chen, J. W. Holch, F. O. Hofmann, H. Ren, F. Böschand, H. Küchenhoff, J. Werner and M. K. Angele.
EMT-Related Genes Have No Prognostic Relevance in Metastatic Colorectal Cancer as Opposed to Stage II/III: Analysis of the Randomised, Phase III Trial FIRE-3 (AIO KRK 0306; FIRE-3).
Cancers 14.22 (Nov. 2022). DOI

Abstract

Despite huge advances in local and systemic therapies, the 5-year relative survival rate for patients with metastatic CRC is still low. To avoid over- or undertreatment, proper risk stratification with regard to treatment strategy is highly needed. As EMT (epithelial-mesenchymal transition) is a major step in metastatic spread, this study analysed the prognostic effect of EMT-related genes in stage IV colorectal cancer patients using the study cohort of the FIRE-3 trial, an open-label multi-centre randomised controlled phase III trial of stage IV colorectal cancer patients. Overall, the prognostic relevance of EMT-related genes seems stage-dependent. EMT-related genes have no prognostic relevance in stage IV CRC as opposed to stage II/III.

MCML Authors

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

Shuo Chen

Database Systems and Data Mining AI Lab

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Consulting Unit (StaBLab)

[145]

M. Herrmann.
Towards more reliable machine learning: conceptual insights and practical approaches for unsupervised manifold learning and supervised benchmark studies.
Dissertation 2022. DOI

Abstract

This thesis focuses on improving the reliability and trustworthiness of machine learning, particularly in unsupervised learning methods like manifold learning. It investigates the challenges of evaluating manifold learning techniques and proposes improvements for embedding evaluation, outlier detection, and cluster analysis, using methods like UMAP and DBSCAN. Additionally, the thesis contributes to supervised learning by presenting a benchmark study on survival prediction in multi-omics cancer data and exploring the effects of design and analysis choices on benchmark results. (Shortened).

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

[144]

F. Pfisterer.
Democratizing machine learning: contributions in AutoML and fairness.
Dissertation 2022. DOI

Abstract

This thesis focuses on democratizing access to machine learning (ML) by improving automated machine learning (AutoML) systems and making ML tools more accessible to non-experts. Key contributions include methods to accelerate hyperparameter optimization by learning from previous experiments, the integration of fairness considerations in AutoML, and the development of software packages such as mlr3pipelines for creating machine learning pipelines and mlr3fairness for auditing and debiasing models. The thesis also includes tools for estimating and mitigating model fairness, such as the mcboost package for multi-calibration, addressing both the technical and ethical challenges of widespread ML deployment. (Shortened.)

MCML Authors

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[143]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift.
MM 2022 - 30th ACM International Conference on Multimedia. Lisbon, Portugal, Oct 10-14, 2022. DOI

Abstract

The performance of a machine learning model degrades when it is applied to data from a similar but different domain than the data it has initially been trained on. To mitigate this domain shift problem, domain adaptation (DA) techniques search for an optimal transformation that converts the (current) input data from a source domain to a target domain to learn a domain-invariant representation that reduces domain discrepancy. This paper proposes a novel supervised DA based on two steps. First, we search for an optimal class-dependent transformation from the source to the target domain from a few samples. We consider optimal transport methods such as the earth mover’s distance, Sinkhorn transport and correlation alignment. Second, we use embedding similarity techniques to select the corresponding transformation at inference. We use correlation metrics and higher-order moment matching techniques. We conduct an extensive evaluation on time-series datasets with domain shift including simulated and various online handwriting datasets to demonstrate the performance.

MCML Authors

Felix Ott

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Thomas Nagler

Statistical Learning and Data Science

[142]

N. Palm, F. Stroebl and H. Palm.
Parameter Individual Optimal Experimental Design and Calibration of Parametric Models.
IEEE Access 10 (Oct. 2022). DOI GitHub

Abstract

Parametric models allow to reflect system behavior in general and characterize individual system instances by specific parameter values. For a variety of scientific disciplines, model calibration by parameter quantification is therefore of central importance. As the time and cost of calibration experiments increases, the question of how to determine parameter values of required quality with a minimum number of experiments comes to the fore. In this paper, a methodology is introduced allowing to quantify and optimize achievable parameter extraction quality based on an experimental plan including a process and methods how to adapt the experimental plan for improved estimation of individually selectable parameters. The resulting parameter-individual optimal design of experiments (pi-OED) enables experimenters to extract a maximum of parameter-specific information from a given number of experiments. We demonstrate how to minimize variance or covariances of individually selectable parameter estimators by model-based calculation of the experimental designs. Using the Fisher Information Matrix in combination with the Cramer-Raó inequality, the pi-OED plan is reduced to a global optimization problem. The pi-OED workflow is demonstrated using computer experiments to calibrate a model describing calendrical aging of lithium-ion battery cells. Applying bootstrapping methods allows to also quantify parameter estimation distributions for further benchmarking. Comparing pi-OED based computer experimental results with those based on state-of-the-art designs of experiments, reveals its efficiency improvement. All computer experimental results are gained in Python and may be reproduced using a provided Jupyter Notebook along with the source code. Both are available under https://github.com/nicolaipalm/oed.

MCML Authors

Nicolai Palm

Computational Statistics & Data Science

[141]

J. Moosbauer, M. Binder, L. Schneider, F. Pfisterer, M. Becker, M. Lang, L. Kotthoff and B. Bischl.
Automated Benchmark-Driven Design and Explanation of Hyperparameter Optimizers.
IEEE Transactions on Evolutionary Computation 26.6 (Oct. 2022). DOI

Abstract

Automated hyperparameter optimization (HPO) has gained great popularity and is an important component of most automated machine learning frameworks. However, the process of designing HPO algorithms is still an unsystematic and manual process: new algorithms are often built on top of prior work, where limitations are identified and improvements are proposed. Even though this approach is guided by expert knowledge, it is still somewhat arbitrary. The process rarely allows for gaining a holistic understanding of which algorithmic components drive performance and carries the risk of overlooking good algorithmic design choices. We present a principled approach to automated benchmark-driven algorithm design applied to multifidelity HPO (MF-HPO). First, we formalize a rich space of MF-HPO candidates that includes, but is not limited to, common existing HPO algorithms and then present a configurable framework covering this space. To find the best candidate automatically and systematically, we follow a programming-by-optimization approach and search over the space of algorithm candidates via Bayesian optimization. We challenge whether the found design choices are necessary or could be replaced by more naive and simpler ones by performing an ablation analysis. We observe that using a relatively simple configuration (in some ways, simpler than established methods) performs very well as long as some critical configuration parameters are set to the right value.

MCML Authors

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[140]

K. Rath, D. Rügamer, B. Bischl, U. von Toussaint, C. Rea, A. Maris, R. Granetz and C. G. Albert.
Data augmentation for disruption prediction via robust surrogate models.
Journal of Plasma Physics 88.5 (Oct. 2022). DOI

Abstract

The goal of this work is to generate large statistically representative data sets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student $t$ process regression. We apply Student $t$ process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via colouring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics and classic machine learning clustering algorithms.

MCML Authors

Katharina Röck (née Rath)

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[139]

L. Bothmann, S. Strickroth, G. Casalicchio, D. Rügamer, M. Lindauer, F. Scheipl and B. Bischl.
Developing Open Source Educational Resources for Machine Learning and Data Science.
ECML-PKDD 2022 - 3rd Teaching Machine Learning and Artificial Intelligence Workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. URL

Abstract

Education should not be a privilege but a common good. It should be openly accessible to everyone, with as few barriers as possible; even more so for key technologies such as Machine Learning (ML) and Data Science (DS). Open Educational Resources (OER) are a crucial factor for greater educational equity. In this paper, we describe the specific requirements for OER in ML and DS and argue that it is especially important for these fields to make source files publicly available, leading to Open Source Educational Resources (OSER). We present our view on the collaborative development of OSER, the challenges this poses, and first steps towards their solutions. We outline how OSER can be used for blended learning scenarios and share our experiences in university education. Finally, we discuss additional challenges such as credit assignment or granting certificates.

MCML Authors

Ludwig Bothmann

Dr.

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Fabian Scheipl

PD Dr.

Functional Data Analysis

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[138]

D. Rügamer, A. Bender, S. Wiegrebe, D. Racek, B. Bischl, C. L. Müller and C. Stachl.
Factorized Structured Regression for Large-Scale Varying Coefficient Models.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structured regression models, including time-aware varying coefficients models, are, however, limited in their applicability to categorical effects and inclusion of a large number of interactions. Here, we propose Factorized Structured Regression (FaStR) for scalable varying coefficient models. FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation. This fusion provides a scalable framework for the estimation of statistical models in previously infeasible data settings. Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques, while scaling notably better and also being competitive with other time-aware RS in terms of prediction performance. We illustrate FaStR’s performance and interpretability on a large-scale behavioral study with smartphone user data.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Christian Müller

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biomedical Statistics and Data Science

[137]

D. Deng, F. Karl, F. Hutter, B. Bischl and M. Lindauer.
Efficient Automated Deep Learning for Time Series Forecasting.
ECML-PKDD 2022 - Workshops at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

Recent years have witnessed tremendously improved efficiency of Automated Machine Learning (AutoML), especially Automated Deep Learning (AutoDL) systems, but recent work focuses on tabular, image, or NLP tasks. So far, little attention has been paid to general AutoDL frameworks for time series forecasting, despite the enormous success in applying different novel architectures to such tasks. In this paper, we propose an efficient approach for the joint optimization of neural architecture and hyperparameters of the entire data processing pipeline for time series forecasting. In contrast to common NAS search spaces, we designed a novel neural architecture search space covering various state-of-the-art architectures, allowing for an efficient macro-search over different DL approaches. To efficiently search in such a large configuration space, we use Bayesian optimization with multi-fidelity optimization. We empirically study several different budget types enabling efficient multi-fidelity optimization on different forecasting datasets. Furthermore, we compared our resulting system, against several established baselines and show that it significantly outperforms all of them across several datasets.

MCML Authors

Florian Karl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[136]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Implicit Embeddings via GAN Inversion for High Resolution Chest Radiographs.
MAD @MICCAI 2022 - 1st Workshop on Medical Applications with Disentanglements at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI

Abstract

Generative models allow for the creation of highly realistic artificial samples, opening up promising applications in medical imaging. In this work, we propose a multi-stage encoder-based approach to invert the generator of a generative adversarial network (GAN) for high resolution chest radiographs. This gives direct access to its implicitly formed latent space, makes generative models more accessible to researchers, and enables to apply generative techniques to actual patient’s images. We investigate various applications for this embedding, including image compression, disentanglement in the encoded dataset, guided image manipulation, and creation of stylized samples. We find that this type of GAN inversion is a promising research direction in the domain of chest radiograph modeling and opens up new ways to combine realistic X-ray sample synthesis with radiological image analysis.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[135]

R. Sonabend, A. Bender and S. Vollmer.
Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures.
Bioinformatics 38.17 (Sep. 2022). DOI GitHub

Abstract

Motivation: In this article, we consider how to evaluate survival distribution predictions with measures of discrimination. This is non-trivial as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages.
Results: Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons or ‘C-hacking’. We demonstrate by example how simple it can be to manipulate results and use this to argue for better reporting guidelines and transparency in the literature. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[134]

C. Fritz, M. Mehrl, P. W. Thurner and G. Kauermann.
All that Glitters is not Gold: Relational Events Models with Spurious Events.
Network Science 11.2 (Sep. 2022). DOI

Abstract

As relational event models are an increasingly popular model for studying relational structures, the reliability of large-scale event data collection becomes more and more important. Automated or human-coded events often suffer from non-negligible false-discovery rates in event identification. And most sensor data are primarily based on actors’ spatial proximity for predefined time windows; hence, the observed events could relate either to a social relationship or random co-location. Both examples imply spurious events that may bias estimates and inference. We propose the Relational Event Model for Spurious Events (REMSE), an extension to existing approaches for interaction data. The model provides a flexible solution for modeling data while controlling for spurious events. Estimation of our model is carried out in an empirical Bayesian approach via data augmentation. Based on a simulation study, we investigate the properties of the estimation procedure. To demonstrate its usefulness in two distinct applications, we employ this model to combat events from the Syrian civil war and student co-location data. Results from the simulation and the applications identify the REMSE as a suitable approach to modeling relational event data in the presence of spurious events.

MCML Authors

Cornelius Fritz

Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

* Former Member

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[133]

W. Ghada, E. Casellas, J. Herbinger, A. Garcia-Benadí, L. Bothmann, N. Estrella, J. Bech and A. Menzel.
Stratiform and Convective Rain Classification Using Machine Learning Models and Micro Rain Radar.
Remote Sensing 14.18 (Sep. 2022). DOI

Abstract

Rain type classification into convective and stratiform is an essential step required to improve quantitative precipitation estimations by remote sensing instruments. Previous studies with Micro Rain Radar (MRR) measurements and subjective rules have been performed to classify rain events. However, automating this process by using machine learning (ML) models provides the advantages of fast and reliable classification with the possibility to classify rain minute by minute. A total of 20,979 min of rain data measured by an MRR at Das in northeast Spain were used to build seven types of ML models for stratiform and convective rain type classification. The proposed classification models use a set of 22 parameters that summarize the reflectivity, the Doppler velocity, and the spectral width (SW) above and below the so-called separation level (SL). This level is defined as the level with the highest increase in Doppler velocity and corresponds with the bright band in stratiform rain. A pre-classification of the rain type for each minute based on the rain microstructure provided by the collocated disdrometer was performed. Our results indicate that complex ML models, particularly tree-based ensembles such as xgboost and random forest which capture the interactions of different features, perform better than simpler models. Applying methods from the field of interpretable ML, we identified reflectivity at the lowest layer and the average spectral width in the layers below SL as the most important features. High reflectivity and low SW values indicate a higher probability of convective rain.

MCML Authors

Julia Herbinger

Dr.

* Former Member

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[132]

E. Dorigatti, B. Bischl and B. Schubert.
Improved proteasomal cleavage prediction with positive-unlabeled learning.
Preprint (Sep. 2022). arXiv

Abstract

Accurate in silico modeling of the antigen processing pathway is crucial to enable personalized epitope vaccine design for cancer. An important step of such pathway is the degradation of the vaccine into smaller peptides by the proteasome, some of which are going to be presented to T cells by the MHC complex. While predicting MHC-peptide presentation has received a lot of attention recently, proteasomal cleavage prediction remains a relatively unexplored area in light of recent advancesin high-throughput mass spectrometry-based MHC ligandomics. Moreover, as such experimental techniques do not allow to identify regions that cannot be cleaved, the latest predictors generate decoy negative samples and treat them as true negatives when training, even though some of them could actually be positives. In this work, we thus present a new predictor trained with an expanded dataset and the solid theoretical underpinning of positive-unlabeled learning, achieving a new state-of-the-art in proteasomal cleavage prediction. The improved predictive capabilities will in turn enable more precise vaccine development improving the efficacy of epitope-based vaccines. Pretrained models are available on GitHub.

MCML Authors

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[131]

E. Dorigatti, J. Schweisthal, B. Bischl and M. Rezaei.
Robust and Efficient Imbalanced Positive-Unlabeled Learning with Self-supervision.
Preprint (Sep. 2022). arXiv GitHub

Abstract

Learning from positive and unlabeled (PU) data is a setting where the learner only has access to positive and unlabeled samples while having no information on negative examples. Such PU setting is of great importance in various tasks such as medical diagnosis, social network analysis, financial markets analysis, and knowledge base completion, which also tend to be intrinsically imbalanced, i.e., where most examples are actually negatives. Most existing approaches for PU learning, however, only consider artificially balanced datasets and it is unclear how well they perform in the realistic scenario of imbalanced and long-tail data distribution. This paper proposes to tackle this challenge via robust and efficient self-supervised pretraining. However, training conventional self-supervised learning methods when applied with highly imbalanced PU distribution needs better reformulation. In this paper, we present textit{ImPULSeS}, a unified representation learning framework for underline{Im}balanced underline{P}ositive underline{U}nlabeled underline{L}earning leveraging underline{Se}lf-underline{S}upervised debiase pre-training. ImPULSeS uses a generic combination of large-scale unsupervised learning with debiased contrastive loss and additional reweighted PU loss. We performed different experiments across multiple datasets to show that ImPULSeS is able to halve the error rate of the previous state-of-the-art, even compared with previous methods that are given the true prior. Moreover, our method showed increased robustness to prior misspecification and superior performance even when pretraining was performed on an unrelated dataset. We anticipate such robustness and efficiency will make it much easier for practitioners to obtain excellent results on other PU datasets of interest.

MCML Authors

Emilio Dorigatti

Dr.

* Former Member

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[130]

C. A. Scholbeck, H. Funk and G. Casalicchio.
Algorithm-Agnostic Interpretations for Clustering.
Preprint (Sep. 2022). arXiv

Abstract

A clustering outcome for high-dimensional data is typically interpreted via post-processing, involving dimension reduction and subsequent visualization. This destroys the meaning of the data and obfuscates interpretations. We propose algorithm-agnostic interpretation methods to explain clustering outcomes in reduced dimensions while preserving the integrity of the data. The permutation feature importance for clustering represents a general framework based on shuffling feature values and measuring changes in cluster assignments through custom score functions. The individual conditional expectation for clustering indicates observation-wise changes in the cluster assignment due to changes in the data. The partial dependence for clustering evaluates average changes in cluster assignments for the entire feature space. All methods can be used with any clustering algorithm able to reassign instances through soft or hard labels. In contrast to common post-processing methods such as principal component analysis, the introduced methods maintain the original structure of the features.

MCML Authors

Henri Funk

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[129]

S.-F. Zheng, J. Nam, E. Dorigatti, B. Bischl, S. Azizi and M. Rezaei.
Joint Debiased Representation and Image Clustering Learning with Self-Supervision.
Preprint (Sep. 2022). arXiv GitHub

Abstract

Contrastive learning is among the most successful methods for visual representation learning, and its performance can be further improved by jointly performing clustering on the learned representations. However, existing methods for joint clustering and contrastive learning do not perform well on long-tailed data distributions, as majority classes overwhelm and distort the loss of minority classes, thus preventing meaningful representations to be learned. Motivated by this, we develop a novel joint clustering and contrastive learning framework by adapting the debiased contrastive loss to avoid under-clustering minority classes of imbalanced datasets. We show that our proposed modified debiased contrastive loss and divergence clustering loss improves the performance across multiple datasets and learning tasks.

MCML Authors

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[128]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Representation Learning for Tablet and Paper Domain Adaptation in favor of Online Handwriting Recognition.
MPRSS @ICPR 2022 - 7th International Workshop on Multimodal pattern recognition of social signals in human computer interaction at the 26th International Conference on Pattern Recognition (ICPR 2022). Montreal, Canada, Aug 21-25, 2022. arXiv

Abstract

The performance of a machine learning model degrades when it is applied to data from a similar but different domain than the data it has initially been trained on. The goal of domain adaptation (DA) is to mitigate this domain shift problem by searching for an optimal feature transformation to learn a domain-invariant representation. Such a domain shift can appear in handwriting recognition (HWR) applications where the motion pattern of the hand and with that the motion pattern of the pen is different for writing on paper and on tablet. This becomes visible in the sensor data for online handwriting (OnHW) from pens with integrated inertial measurement units. This paper proposes a supervised DA approach to enhance learning for OnHW recognition between tablet and paper data. Our method exploits loss functions such as maximum mean discrepancy and correlation alignment to learn a domain-invariant feature representation (i.e., similar covariances between tablet and paper features). We use a triplet loss that takes negative samples of the auxiliary domain (i.e., paper samples) to increase the amount of samples of the tablet dataset. We conduct an evaluation on novel sequence-based OnHW datasets (i.e., words) and show an improvement on the paper domain with an early fusion strategy by using pairwise learning.

MCML Authors

Felix Ott

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[127]

M. van Smeden, G. Heinze, B. Van Calster, F. W. Asselbergs, P. E. Vardas, N. Bruining, P. de Jaegere, J. H. Moore, S. Denaxas, A.-L. Boulesteix and K. G. M. Moons.
Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease.
European Heart Journal 43.31 (Aug. 2022). DOI

Abstract

The medical field has seen a rapid increase in the development of artificial intelligence (AI)-based prediction models. With the introduction of such AI-based prediction model tools and software in cardiovascular patient care, the cardiovascular researcher and healthcare professional are challenged to understand the opportunities as well as the limitations of the AI-based predictions. In this article, we present 12 critical questions for cardiovascular health professionals to ask when confronted with an AI-based prediction model. We aim to support medical professionals to distinguish the AI-based prediction models that can add value to patient care from the AI that does not.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[126]

M. Schneble and G. Kauermann.
Estimation of Latent Network Flows in Bike-Sharing Systems.
Statistical Modelling 22.2 (Aug. 2022). DOI

Abstract

Estimation of latent network flows is a common problem in statistical network analysis. The typical setting is that we know the margins of the network, that is, in- and outdegrees, but the flows are unobserved. In this article, we develop a mixed regression model to estimate network flows in a bike-sharing network if only the hourly differences of in- and outdegrees at bike stations are known. We also include exogenous covariates such as weather conditions. Two different parameterizations of the model are considered to estimate (a) the whole network flow and (b) the network margins only. The estimation of the model parameters is proposed via an iterative penalized maximum likelihood approach. This is exemplified by modelling network flows in the Vienna bike-sharing system. In order to evaluate our modelling approach, we conduct our analyses exploiting different distributional assumptions while we also respect the provider’s interventions appropriately for keeping the estimation error low. Furthermore, a simulation study is conducted to show the performance of the model. For practical purposes, it is crucial to predict when and at which station there is a lack or an excess of bikes. For this application, our model shows to be well suited by providing quite accurate predictions.

MCML Authors

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Applied Statistics in Social Sciences, Economics and Business

[125]

C. Fritz, G. De Nicola, M. Rave, M. Weigert, Y. Khazaei, U. Berger, H. Küchenhoff and G. Kauermann.
Statistical modelling of COVID-19 data: Putting generalized additive models to work.
Statistical Modelling 24.4 (Aug. 2022). DOI

Abstract

Over the course of the COVID-19 pandemic, Generalized Additive Models (GAMs) have been successfully employed on numerous occasions to obtain vital data-driven insights. In this article we further substantiate the success story of GAMs, demonstrating their flexibility by focusing on three relevant pandemic-related issues. First, we examine the interdepency among infections in different age groups, concentrating on school children. In this context, we derive the setting under which parameter estimates are independent of the (unknown) case-detection ratio, which plays an important role in COVID-19 surveillance data. Second, we model the incidence of hospitalizations, for which data is only available with a temporal delay. We illustrate how correcting for this reporting delay through a nowcasting procedure can be naturally incorporated into the GAM framework as an offset term. Third, we propose a multinomial model for the weekly occupancy of intensive care units (ICU), where we distinguish between the number of COVID-19 patients, other patients and vacant beds. With these three examples, we aim to showcase the practical and ‘off-the-shelf’ applicability of GAMs to gain new insights from real-world data.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[124]

F. Ott, N. L. Raichur, D. Rügamer, T. Feigl, H. Neumann, B. Bischl and C. Mutschler.
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression.
Preprint (Aug. 2022). arXiv

Abstract

Visual-inertial localization is a key problem in computer vision and robotics applications such as virtual reality, self-driving cars, and aerial vehicles. The goal is to estimate an accurate pose of an object when either the environment or the dynamics are known. Absolute pose regression (APR) techniques directly regress the absolute pose from an image input in a known scene using convolutional and spatio-temporal networks. Odometry methods perform relative pose regression (RPR) that predicts the relative pose from a known object dynamic (visual or inertial inputs). The localization task can be improved by retrieving information from both data sources for a cross-modal setup, which is a challenging problem due to contradictory tasks. In this work, we conduct a benchmark to evaluate deep multimodal fusion based on pose graph optimization and attention networks. Auxiliary and Bayesian learning are utilized for the APR task. We show accuracy improvements for the APR-RPR task and for the RPR-RPR task for aerial vehicles and hand-held devices. We conduct experiments on the EuRoC MAV and PennCOSYVIO datasets and record and evaluate a novel industry dataset.

MCML Authors

Felix Ott

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[123]

L. Schneider, L. Schäpermeier, R. Prager, B. Bischl, H. Trautmann and P. Kerschke.
HPO X ELA: Investigating Hyperparameter Optimization Landscapes by Means of Exploratory Landscape Analysis.
Preprint (Aug. 2022). arXiv

Abstract

Hyperparameter optimization (HPO) is a key component of machine learning models for achieving peak predictive performance. While numerous methods and algorithms for HPO have been proposed over the last years, little progress has been made in illuminating and examining the actual structure of these black-box optimization problems. Exploratory landscape analysis (ELA) subsumes a set of techniques that can be used to gain knowledge about properties of unknown optimization problems. In this paper, we evaluate the performance of five different black-box optimizers on 30 HPO problems, which consist of two-, three- and five-dimensional continuous search spaces of the XGBoost learner trained on 10 different data sets. This is contrasted with the performance of the same optimizers evaluated on 360 problem instances from the black-box optimization benchmark (BBOB). We then compute ELA features on the HPO and BBOB problems and examine similarities and differences. A cluster analysis of the HPO and BBOB problems in ELA feature space allows us to identify how the HPO problems compare to the BBOB problems on a structural meta-level. We identify a subset of BBOB problems that are close to the HPO problems in ELA feature space and show that optimizer performance is comparably similar on these two sets of benchmark problems. We highlight open challenges of ELA for HPO and discuss potential directions of future research and applications.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Statistical Learning and Data Science

[122]

C. Fritz.
Statistical approaches to dynamic networks in society.
Dissertation 2022. DOI

Abstract

This dissertation focuses on dynamic networks in the Social Sciences, examining methods and applications in network modeling. Part two provides an overview of modeling frameworks for dynamic networks, including applications in studying COVID-19 infections using social connectivity as covariates. In part three, the dissertation introduces a Signed Exponential Random Graph Model (SERGM) for signed networks and a bipartite variant of the Temporal Exponential Random Graph Model (TERGM) to study co-inventorship in patents. Part four concludes with models for event networks, including a Relational Event Model for Spurious Events (REMSE) to manage false-discovery rates in event data. (Shortened).

MCML Authors

Cornelius Fritz

Dr.

* Former Member

[121]

F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder and B. Bischl.
YAHPO Gym - Design Criteria and a new Multifidelity Benchmark for Hyperparameter Optimization.
AutoML @ICML 2022 - 1st International Conference on Automated Machine Learning co-located with the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 25-27, 2022. URL GitHub

Abstract

When developing and analyzing new hyperparameter optimization (HPO) methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we list desirable properties and requirements for such benchmarks and propose a new set of challenging and relevant multifidelity HPO benchmark problems motivated by these requirements. For this, we revisit the concept of surrogate-based benchmarks and empirically compare them to more widely-used tabular benchmarks, showing that the latter ones may induce bias in performance estimation and ranking of HPO methods. We present a new surrogate-based benchmark suite for multifidelity HPO methods consisting of 9 benchmark collections that constitute over 700 multifidelity HPO problems in total. All our benchmarks also allow for querying of multiple optimization targets, enabling the benchmarking of multi-objective HPO. We examine and compare our benchmark suite with respect to the defined requirements and show that our benchmarks provide viable additions to existing suites.

MCML Authors

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[120]

L. Schneider, F. Pfisterer, P. Kent, J. Branke, B. Bischl and J. Thomas.
Tackling neural architecture search with quality diversity optimization.
AutoML @ICML 2022 - 1st International Conference on Automated Machine Learning co-located with the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 25-27, 2022. URL

Abstract

Neural architecture search (NAS) has been studied extensively and has grown to become a research field with substantial impact. While classical single-objective NAS searches for the architecture with the best performance, multi-objective NAS considers multiple objectives that should be optimized simultaneously, e.g., minimizing resource usage along the validation error. Although considerable progress has been made in the field of multi-objective NAS, we argue that there is some discrepancy between the actual optimization problem of practical interest and the optimization problem that multi-objective NAS tries to solve. We resolve this discrepancy by formulating the multi-objective NAS problem as a quality diversity optimization (QDO) problem and introduce three quality diversity NAS optimizers (two of them belonging to the group of multifidelity optimizers), which search for high-performing yet diverse architectures that are optimal for application-specific niches, e.g., hardware constraints. By comparing these optimizers to their multi-objective counterparts, we demonstrate that quality diversity NAS in general outperforms multi-objective NAS with respect to quality of solutions and efficiency. We further show how applications and future NAS research can thrive on QDO.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Janek Thomas

Dr.

* Former Member

[119]

A. Klaß, S. M. Lorenz, M. W. Lauer-Schmaltz, D. Rügamer, B. Bischl, C. Mutschler and F. Ott.
Uncertainty-aware Evaluation of Time-Series Classification for Online Handwriting Recognition with Domain Shift.
STRL @IJCAI-ECAI 2022 - Workshop on Spatio-Temporal Reasoning and Learning at the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 2022). Vienna, Austria, Jul 23-29, 2022. URL

Abstract

For many applications, analyzing the uncertainty of a machine learning model is indispensable. While research of uncertainty quantification (UQ) techniques is very advanced for computer vision applications, UQ methods for spatio-temporal data are less studied. In this paper, we focus on models for online handwriting recognition, one particular type of spatio-temporal data. The data is observed from a sensor-enhanced pen with the goal to classify written characters. We conduct a broad evaluation of aleatoric (data) and epistemic (model) UQ based on two prominent techniques for Bayesian inference, Stochastic Weight Averaging-Gaussian (SWAG) and Deep Ensembles. Next to a better understanding of the model, UQ techniques can detect out-of-distribution data and domain shifts when combining right-handed and left-handed writers (an underrepresented group).

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Felix Ott

Dr.

* Former Member

[118]

A. Khakzar, Y. Li, Y. Zhang, M. Sanisoglu, S. T. Kim, M. Rezaei, B. Bischl and N. Navab.
Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models.
IMLH @ICML 2022 - 2nd Workshop on Interpretable Machine Learning in Healthcare at the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 17-23, 2022. arXiv

Abstract

One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing the pathology-related concepts encoded by the networks. Our study reveals differences and insights regarding the trained models that are not reflected by quantitative metrics such as AUROC and AP and show up only by looking at the models through a lens.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Yawei Li

Statistical Learning and Data Science

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[117]

S. Dandl, F. Pfisterer and B. Bischl.
Multi-Objective Counterfactual Fairness.
GECCO 2022 - Genetic and Evolutionary Computation Conference. Boston, MA, USA, Jul 09-13, 2022. DOI

Abstract

When machine learning is used to automate judgments, e.g. in areas like lending or crime prediction, incorrect decisions can lead to adverse effects for affected individuals. This occurs, e.g., if the data used to train these models is based on prior decisions that are unfairly skewed against specific subpopulations. If models should automate decision-making, they must account for these biases to prevent perpetuating or creating discriminatory practices. Counter-factual fairness audits models with respect to a notion of fairness that asks for equal outcomes between a decision made in the real world and a counterfactual world where the individual subject to a decision comes from a different protected demographic group. In this work, we propose a method to conduct such audits without access to the underlying causal structure of the data generating process by framing it as a multi-objective optimization task that can be efficiently solved using a genetic algorithm.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[116]

L. Schneider, F. Pfisterer, J. Thomas and B. Bischl.
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models.
GECCO 2022 - Genetic and Evolutionary Computation Conference. Boston, MA, USA, Jul 09-13, 2022. DOI

Abstract

The goal of Quality Diversity Optimization is to generate a collection of diverse yet high-performing solutions to a given problem at hand. Typical benchmark problems are, for example, finding a repertoire of robot arm configurations or a collection of game playing strategies. In this paper, we propose a set of Quality Diversity Optimization problems that tackle hyperparameter optimization of machine learning models - a so far underexplored application of Quality Diversity Optimization. Our benchmark problems involve novel feature functions, such as interpretability or resource usage of models. To allow for fast and efficient benchmarking, we build upon YAHPO Gym, a recently proposed open source benchmarking suite for hyperparameter optimization that makes use of high performing surrogate models and returns these surrogate model predictions instead of evaluating the true expensive black box function. We present results of an initial experimental study comparing different Quality Diversity optimizers on our benchmark problems. Furthermore, we discuss future directions and challenges of Quality Diversity Optimization in the context of hyperparameter optimization.

MCML Authors

Lennart Schneider

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Learning and Data Science

[115]

M. Mittermeier, M. Weigert, D. Rügamer, H. Küchenhoff and R. Ludwig.
A deep learning based classification of atmospheric circulation types over Europe: projection of future changes in a CMIP6 large ensemble.
Environmental Research Letters 17.8 (Jul. 2022). DOI

Abstract

High- and low pressure systems of the large-scale atmospheric circulation in the mid-latitudes drive European weather and climate. Potential future changes in the occurrence of circulation types are highly relevant for society. Classifying the highly dynamic atmospheric circulation into discrete classes of circulation types helps to categorize the linkages between atmospheric forcing and surface conditions (e.g. extreme events). Previous studies have revealed a high internal variability of projected changes of circulation types. Dealing with this high internal variability requires the employment of a single-model initial-condition large ensemble (SMILE) and an automated classification method, which can be applied to large climate data sets. One of the most established classifications in Europe are the 29 subjective circulation types called Grosswetterlagen by Hess & Brezowsky (HB circulation types). We developed, in the first analysis of its kind, an automated version of this subjective classification using deep learning. Our classifier reaches an overall accuracy of 41.1% on the test sets of nested cross-validation. It outperforms the state-of-the-art automatization of the HB circulation types in 20 of the 29 classes. We apply the deep learning classifier to the SMHI-LENS, a SMILE of the Coupled Model Intercomparison Project phase 6, composed of 50 members of the EC-Earth3 model under the SSP37.0 scenario. For the analysis of future frequency changes of the 29 circulation types, we use the signal-to-noise ratio to discriminate the climate change signal from the noise of internal variability. Using a 5%-significance level, we find significant frequency changes in 69% of the circulation types when comparing the future (2071–2100) to a reference period (1991–2020).

MCML Authors

Maximilian Weigert

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[114]

M. Schneble and G. Kauermann.
Intensity Estimation on Geometric Networks with Penalized Splines.
Annals of Applied Statistics 16.2 (Jun. 2022). DOI

Abstract

In the past decades the growing amount of network data lead to many novel statistical models. In this paper we consider so-called geometric networks. Typical examples are road networks or other infrastructure networks. Nevertheless, the neurons or the blood vessels in a human body can also be interpreted as a geometric network embedded in a three-dimensional space. A network-specific metric, rather than the Euclidean metric, is usually used in all these applications, making the analyses of network data challenging. We consider network-based point processes, and our task is to estimate the intensity (or density) of the process which allows us to detect high- and low-intensity regions of the underlying stochastic processes. Available routines that tackle this problem are commonly based on kernel smoothing methods. This paper uses penalized spline smoothing and extends this toward smooth intensity estimation on geometric networks. Furthermore, our approach easily allows incorporating covariates, enabling us to respect the network geometry in a regression model framework. Several data examples and a simulation study show that penalized spline-based intensity estimation on geometric networks is a numerically stable and efficient tool. Furthermore, it also allows estimating linear and smooth covariate effects, distinguishing our approach from already existing methodologies.

MCML Authors

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[113]

Q. Au, J. Herbinger, C. Stachl, B. Bischl and G. Casalicchio.
Grouped Feature Importance and Combined Features Effect Plot.
Data Mining and Knowledge Discovery 36 (Jun. 2022). DOI

Abstract

Interpretable machine learning has become a very active area of research due to the rising popularity of machine learning algorithms and their inherently challenging interpretability. Most work in this area has been focused on the interpretation of single features in a model. However, for researchers and practitioners, it is often equally important to quantify the importance or visualize the effect of feature groups. To address this research gap, we provide a comprehensive overview of how existing model-agnostic techniques can be defined for feature groups to assess the grouped feature importance, focusing on permutation-based, refitting, and Shapley-based methods. We also introduce an importance-based sequential procedure that identifies a stable and well-performing combination of features in the grouped feature space. Furthermore, we introduce the combined features effect plot, which is a technique to visualize the effect of a group of features based on a sparse, interpretable linear combination of features. We used simulation studies and real data examples to analyze, compare, and discuss these methods.

MCML Authors

Julia Herbinger

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[112]

S. Kevork and G. Kauermann.
Bipartite Exponential Random Graph Models with Nodal Random Effects.
Social Networks 70 (Jun. 2022). DOI

Abstract

We examine the inclusion of specific nodal random effects for first- and second-mode nodes towards an ERGM for bipartite networks. The inclusion of such node-specific random effects in the ERGM accounts for unobserved heterogeneity in the bipartite network and ensures stable estimation results, especially for large-scale bipartite networks. Moreover, The predicted nodal random effects deliver reasonable interpretation to understand the network behavior. The estimation is carried out by an iterative estimation technique, iterating between pseudolikelihood estimation for the nodal random effects and maximum likelihood estimation for the network parameters.

MCML Authors

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[111]

P. Kopper, S. Wiegrebe, B. Bischl, A. Bender and D. Rügamer.
DeepPAMM: Deep Piecewise Exponential Additive Mixed Models for Complex Hazard Structures in Survival Analysis.
PAKDD 2022 - 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Chengdu, China, May 16-19, 2022. DOI

Abstract

Survival analysis (SA) is an active field of research that is concerned with time-to-event outcomes and is prevalent in many domains, particularly biomedical applications. Despite its importance, SA remains challenging due to small-scale data sets and complex outcome distributions, concealed by truncation and censoring processes. The piecewise exponential additive mixed model (PAMM) is a model class addressing many of these challenges, yet PAMMs are not applicable in high-dimensional feature settings or in the case of unstructured or multimodal data. We unify existing approaches by proposing DeepPAMM, a versatile deep learning framework that is well-founded from a statistical point of view, yet with enough flexibility for modeling complex hazard structures. We illustrate that DeepPAMM is competitive with other machine learning approaches with respect to predictive performance while maintaining interpretability through benchmark experiments and an extended case study.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistics, Data Science and Machine Learning

[110]

T. Ullmann, C. Hennig and A.-L. Boulesteix.
Validation of cluster analysis results on validation data: A systematic framework.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12.3 (May. 2022). DOI

Abstract

Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To assess the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic, structured review of the existing literature about this topic. For this purpose, we outline a formal framework that covers most existing approaches for validating clustering results on validation data. In particular, we review classical validation techniques such as internal and external validation, stability analysis, and visual validation, and show how they can be interpreted in terms of our framework. We define and formalize different types of validation of clustering results on a validation dataset, and give examples of how clustering studies from the applied literature that used a validation dataset can be seen as instances of our framework.

MCML Authors

Theresa Ullmann

Dr.

* Former Member

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[109]

D. Rügamer.
Additive Higher-Order Factorization Machines.
Preprint (May. 2022). arXiv

Abstract

In the age of big data and interpretable machine learning, approaches need to work at scale and at the same time allow for a clear mathematical understanding of the method’s inner workings. While there exist inherently interpretable semi-parametric regression techniques for large-scale applications to account for non-linearity in the data, their model complexity is still often restricted. One of the main limitations are missing interactions in these models, which are not included for the sake of better interpretability, but also due to untenable computational costs. To address this shortcoming, we derive a scalable high-order tensor product spline model using a factorization approach. Our method allows to include all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions. We prove both theoretically and empirically that our methods scales notably better than existing approaches, derive meaningful penalization schemes and also discuss further theoretical aspects. We finally investigate predictive and estimation performance both with synthetic and real data.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[108]

J. Herbinger, B. Bischl and G. Casalicchio.
REPID: Regional Effect Plots with implicit Interaction Detection.
AISTATS 2022 - 25th International Conference on Artificial Intelligence and Statistics. Virtual, Mar 28-30, 2022. URL

Abstract

Machine learning models can automatically learn complex relationships, such as non-linear and interaction effects. Interpretable machine learning methods such as partial dependence plots visualize marginal feature effects but may lead to misleading interpretations when feature interactions are present. Hence, employing additional methods that can detect and measure the strength of interactions is paramount to better understand the inner workings of machine learning models. We demonstrate several drawbacks of existing global interaction detection approaches, characterize them theoretically, and evaluate them empirically. Furthermore, we introduce regional effect plots with implicit interaction detection, a novel framework to detect interactions between a feature of interest and other features. The framework also quantifies the strength of interactions and provides interpretable and distinct regions in which feature effects can be interpreted more reliably, as they are less confounded by interactions. We prove the theoretical eligibility of our method and show its applicability on various simulation and real-world examples.

MCML Authors

Julia Herbinger

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[107]

F. Pargent, F. Pfisterer, J. Thomas and B. Bischl.
Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features.
Computational Statistics 37 (Mar. 2022). DOI

Abstract

Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techniques that yield numeric representations of categorical variables which can then be used in subsequent ML applications. We focus on the impact of these techniques on a subsequent algorithm’s predictive performance, and—if possible—derive best practices on when to use which technique. We conducted a large-scale benchmark experiment, where we compared different encoding strategies together with five ML algorithms (lasso, random forest, gradient boosting, k-nearest neighbors, support vector machine) using datasets from regression, binary- and multiclass–classification settings. In our study, regularized versions of target encoding (i.e. using target predictions based on the feature levels in the training set as a new numerical feature) consistently provided the best results. Traditionally widely used encodings that make unreasonable assumptions to map levels to integers (e.g. integer encoding) or to reduce the number of levels (possibly based on target information, e.g. leaf encoding) before creating binary indicator variables (one-hot or dummy encoding) were not as effective in comparison.

MCML Authors

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Statistical Learning and Data Science

[106]

D. Strieder and M. Drton.
On the choice of the splitting ratio for the split likelihood ratio test.
Electronic Journal of Statistics 16.2 (Mar. 2022). DOI

Abstract

The recently introduced framework of universal inference provides a new approach to constructing hypothesis tests and confidence regions that are valid in finite samples and do not rely on any specific regularity assumptions on the underlying statistical model. At the core of the methodology is a split likelihood ratio statistic, which is formed under data splitting and compared to a cleverly selected universal critical value. As this critical value can be very conservative, it is interesting to mitigate the potential loss of power by careful choice of the ratio according to which data are split. Motivated by this problem, we study the split likelihood ratio test under local alternatives and introduce the resulting class of noncentral split chi-square distributions. We investigate the properties of this new class of distributions and use it to numerically examine and propose an optimal choice of the data splitting ratio for tests of composite hypotheses of different dimensions.

MCML Authors

David Strieder

Mathematical Statistics

Mathias Drton

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Mathematical Statistics

[105]

C. Fritz, E. Dorigatti and D. Rügamer.
Combining Graph Neural Networks and Spatio-temporal Disease Models to Predict COVID-19 Cases in Germany.
Scientific Reports 12.3930 (Mar. 2022). DOI

Abstract

During 2020, the infection rate of COVID-19 has been investigated by many scholars from different research fields. In this context, reliable and interpretable forecasts of disease incidents are a vital tool for policymakers to manage healthcare resources. In this context, several experts have called for the necessity to account for human mobility to explain the spread of COVID-19. Existing approaches often apply standard models of the respective research field, frequently restricting modeling possibilities. For instance, most statistical or epidemiological models cannot directly incorporate unstructured data sources, including relational data that may encode human mobility. In contrast, machine learning approaches may yield better predictions by exploiting these data structures yet lack intuitive interpretability as they are often categorized as black-box models. We propose a combination of both research directions and present a multimodal learning framework that amalgamates statistical regression and machine learning models for predicting local COVID-19 cases in Germany. Results and implications: the novel approach introduced enables the use of a richer collection of data types, including mobility flows and colocation probabilities, and yields the lowest mean squared error scores throughout the observational period in the reported benchmark study. The results corroborate that during most of the observational period more dispersed meeting patterns and a lower percentage of people staying put are associated with higher infection rates. Moreover, the analysis underpins the necessity of including mobility data and showcases the flexibility and interpretability of the proposed approach.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistics, Data Science and Machine Learning

[104]

C. Nießl, M. Herrmann, C. Wiedemann, G. Casalicchio and A.-L. Boulesteix.
Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12.2 (Mar. 2022). DOI

Abstract

In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over-optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.

MCML Authors

Christina Sauer (née Nießl)

Biometry in Molecular Medicine

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[103]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Cross-Modal Common Representation Learning with Triplet Loss Functions.
Preprint (Mar. 2022). DOI

Abstract

Common representation learning (CRL) learns a shared embedding between two or more modalities to improve in a given task over using only one of the modalities. CRL from different data types such as images and time-series data (e.g., audio or text data) requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the triplet loss, which uses positive and negative identities to create sample pairs with different labels, for CRL between image and time-series modalities. By adapting the triplet loss for CRL, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. Our experiments on synthetic data and handwriting recognition data from sensor-enhanced pens show an improved classification accuracy, faster convergence, and a better generalizability.

MCML Authors

Felix Ott

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[102]

M. Rezaei, J. J. Näppi, B. Bischl and H. Yoshida.
Bayesian uncertainty estimation for detection of long-tail and unseen conditions in abdominal images.
SPIE 2022 - SPIE Medical Imaging: Computer-Aided Diagnosis. San Diego, CA, USA, Feb 20-Mar 28, 2022. DOI

Abstract

Deep supervised learning provides an effective approach for developing robust models for various computer-aided diagnosis tasks. However, the underlying assumption is that the frequency of the samples between the different classes of the training dataset is similar or balanced. In real-world medical data, the positive classes often occur too infrequently to satisfy this assumption. Thus, there is an unmet need for deep learning systems that could automatically identify and adapt to the real-world conditions of imbalanced data. In this paper, we propose a novel Bayesian deep ensemble learning framework to address the problem of the representation learning of longtailed and out-of-distribution samples in medical images. By estimating the relative uncertainties of the input data, our framework is able to adapt to the imbalanced data for learning generalizable classifiers. To evaluate the framework, we trained and tested our framework on two public medical imaging datasets that consist of different imbalance ratios and imaging modalities. Our results on the semantic segmentation of high-resolution CT and MRI images achieved 0.93% recall, which represents a 3% relative improvement over previous state-of-the-art ensemble GANs in the handling of the associated long-tailed data and detection of out-of-distribution samples.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[101]

M. Rezaei, J. J. Näppi, B. Bischl and H. Yoshida.
Deep mutual GANs: representation learning from multiple experts.
SPIE 2022 - SPIE Medical Imaging: Imaging Informatics for Healthcare, Research, and Applications. San Diego, CA, USA, Feb 20-Mar 28, 2022. DOI

Abstract

Representation learning is one of the canonical objectives of most deep learning models. However, the learning of real-world clinical data is often compromised by their inherently imbalanced or long-tailed distribution wherein a few classes have significantly larger numbers of training instances than do the other classes. In this study, we investigated the representation learning of such long-tailed data distributions by the use of a deep mutual ensemble generative adversarial network. Our proposed framework consists of multiple powerful pre-trained discriminator networks that transfer knowledge to multiple individual untrained generator networks. During the training process, each generator learns to collaborate with the other generators. Additionally, each generator receives feedback from the individual discriminators in an adversarial manner. Especially, we explored the use of mutual information shared between the independent generators that makes our framework robust against misclassification of long-tailed data distributions in medical image analysis. We evaluated our proposed framework on four public datasets that represented different medical imaging modalities and imbalance ratios. Our experimental results show that our proposed framework benefits from ensemble learning and shared mutual learning, and achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over traditional deep learning in real-world applications.

MCML Authors

Mina Rezaei

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[100]

G. De Nicola, B. Sischka and G. Kauermann.
Mixture Models and Networks: The Stochastic Block Model.
Statistical Modelling 22.1-2 (Feb. 2022). DOI

Abstract

Mixture models are probabilistic models aimed at uncovering and representing latent subgroups within a population. In the realm of network data analysis, the latent subgroups of nodes are typically identified by their connectivity behaviour, with nodes behaving similarly belonging to the same community. In this context, mixture modelling is pursued through stochastic blockmodelling. We consider stochastic blockmodels and some of their variants and extensions from a mixture modelling perspective. We also explore some of the main classes of estimation methods available and propose an alternative approach based on the reformulation of the blockmodel as a graphon. In addition to the discussion of inferential properties and estimating procedures, we focus on the application of the models to several real-world network datasets, showcasing the advantages and pitfalls of different approaches.

MCML Authors

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[99]

F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Joint Classification and Trajectory Regression of Online Handwriting Using a Multi-Task Learning Approach.
WACV 2022 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2022. DOI

Abstract

Multivariate Time Series (MTS) classification is important in various applications such as signature verification, person identification, and motion recognition. In deep learning these classification tasks are usually learned using the cross-entropy loss. A related yet different task is predicting trajectories observed as MTS. Important use cases include handwriting reconstruction, shape analysis, and human pose estimation. The goal is to align an arbitrary dimensional time series with its ground truth as accurately as possible while reducing the error in the prediction with a distance loss and the variance with a similarity loss. Although learning both losses with Multi-Task Learning (MTL) helps to improve trajectory alignment, learning often remains difficult as both tasks are contradictory. We propose a novel neural network architecture for MTL that notably improves the MTS classification and trajectory regression performance in online handwriting (OnHW) recognition. We achieve this by jointly learning the cross-entropy loss in combination with distance and similarity losses. On an OnHW task of handwritten characters with multivariate inertial and visual data inputs we are able to achieve crucial improvements (lower error with less variance) of trajectory prediction while still improving the character classification accuracy in comparison to models trained on the individual tasks.

MCML Authors

Felix Ott

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[98]

J. Goldsmith and F. Scheipl.
tf: S3 classes and methods for tidy functional data. R package.
2022. GitHub

Abstract

The goal of tidyfun, in turn, is to provide accessible and well-documented software that makes functional data analysis in R easy – specifically data wrangling and exploratory analysis.

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

[97]

J. Goldsmith and F. Scheipl.
tidyfun: Clean, wholesome, tidy fun with functional data in R. R package.
2022. GitHub

Abstract

The goal of tidyfun, in turn, is to provide accessible and well-documented software that makes functional data analysis in R easy – specifically data wrangling and exploratory analysis.

MCML Authors

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Functional Data Analysis

[96]

W. Hartl, P. Kopper, A. Bender, F. Scheipl, A. G. Day, G. Elke and H. Küchenhoff.
Protein intake and outcome of critically ill patients: analysis of a large international database using piece-wise exponential additive mixed models.
Critical Care 26.7 (Jan. 2022). DOI

Abstract

Background: Proteins are an essential part of medical nutrition therapy in critically ill patients. Guidelines almost universally recommend a high protein intake without robust evidence supporting its use.
Methods: Using a large international database, we modelled associations between the hazard rate of in-hospital death and live hospital discharge (competing risks) and three categories of protein intake (low: < 0.8 g/kg per day, standard: 0.8–1.2 g/kg per day, high: > 1.2 g/kg per day) during the first 11 days after ICU admission (acute phase). Time-varying cause-specific hazard ratios (HR) were calculated from piece-wise exponential additive mixed models. We used the estimated model to compare five different hypothetical protein diets (an exclusively low protein diet, a standard protein diet administered early (day 1 to 4) or late (day 5 to 11) after ICU admission, and an early or late high protein diet).
Results: Of 21,100 critically ill patients in the database, 16,489 fulfilled inclusion criteria for the analysis. By day 60, 11,360 (68.9%) patients had been discharged from hospital, 4,192 patients (25.4%) had died in hospital, and 937 patients (5.7%) were still hospitalized. Median daily low protein intake was 0.49 g/kg [IQR 0.27–0.66], standard intake 0.99 g/kg [IQR 0.89– 1.09], and high intake 1.41 g/kg [IQR 1.29–1.60]. In comparison with an exclusively low protein diet, a late standard protein diet was associated with a lower hazard of in-hospital death: minimum 0.75 (95% CI 0.64, 0.87), and a higher hazard of live hospital discharge: maximum HR 1.98 (95% CI 1.72, 2.28). Results on hospital discharge, however, were qualitatively changed by a sensitivity analysis. There was no evidence that an early standard or a high protein intake during the acute phase was associated with a further improvement of outcome.
Conclusions: Provision of a standard protein intake during the late acute phase may improve outcome compared to an exclusively low protein diet. In unselected critically ill patients, clinical outcome may not be improved by a high protein intake during the acute phase.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Fabian Scheipl

PD Dr.

Functional Data Analysis

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

[95]

F. Ott, D. Rügamer, L. Heublein, T. Hamann, J. Barth, B. Bischl and C. Mutschler.
Benchmarking online sequence-to-sequence and character-based handwriting recognition from IMU-enhanced pens.
International Journal on Document Analysis and Recognition 25.4 (2022). DOI

Abstract

Handwriting is one of the most frequently occurring patterns in everyday life and with it comes challenging applications such as handwriting recognition, writer identification and signature verification. In contrast to offline HWR that only uses spatial information (i.e., images), online HWR uses richer spatio-temporal information (i.e., trajectory data or inertial data). While there exist many offline HWR datasets, there are only little data available for the development of OnHWR methods on paper as it requires hardware-integrated pens. This paper presents data and benchmark models for real-time sequence-to-sequence learning and single character-based recognition. Our data are recorded by a sensor-enhanced ballpoint pen, yielding sensor data streams from triaxial accelerometers, a gyroscope, a magnetometer and a force sensor at 100 Hz. We propose a variety of datasets including equations and words for both the writer-dependent and writer-independent tasks. Our datasets allow a comparison between classical OnHWR on tablets and on paper with sensor-enhanced pens. We provide an evaluation benchmark for seq2seq and single character-based HWR using recurrent and temporal convolutional networks and transformers combined with a connectionist temporal classification (CTC) loss and cross-entropy (CE) losses. Our convolutional network combined with BiLSTMs outperforms transformer-based architectures, is on par with InceptionTime for sequence-based classification tasks and yields better results compared to 28 state-of-the-art techniques. Time-series augmentation methods improve the sequence-based task, and we show that CE variants can improve the single classification task. Our implementations together with the large benchmark of state-of-the-art techniques of novel OnHWR datasets serve as a baseline for future research in the area of OnHWR on paper.

MCML Authors

Felix Ott

Dr.

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Statistical Learning and Data Science

[94]

C. Fritz and G. Kauermann.
On the Interplay of Regional Mobility, Social Connectedness, and the Spread of COVID-19 in Germany.
Journal of the Royal Statistical Society. Series A (Statistics in Society) 185.1 (Jan. 2022). DOI

Abstract

Since the primary mode of respiratory virus transmission is person-to-person interaction, we are required to reconsider physical interaction patterns to mitigate the number of people infected with COVID-19. While research has shown that non-pharmaceutical interventions (NPI) had an evident impact on national mobility patterns, we investigate the relative regional mobility behaviour to assess the effect of human movement on the spread of COVID-19. In particular, we explore the impact of human mobility and social connectivity derived from Facebook activities on the weekly rate of new infections in Germany between 3 March and 22 June 2020. Our results confirm that reduced social activity lowers the infection rate, accounting for regional and temporal patterns. The extent of social distancing, quantified by the percentage of people staying put within a federal administrative district, has an overall negative effect on the incidence of infections. Additionally, our results show spatial infection patterns based on geographical as well as social distances.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[93]

A. Python, A. Bender, M. Blangiardo, J. B. Illian, Y. Lin, B. Liu, T. C. D. Lucas, S. Tan, Y. Wen, D. Svanidze and J. Yin.
A downscaling approach to compare COVID-19 count data from databases aggregated at different spatial scales.
Journal of the Royal Statistical Society. Series A (Statistics in Society) 185.1 (Jan. 2022). DOI

Abstract

As the COVID-19 pandemic continues to threaten various regions around the world, obtaining accurate and reliable COVID-19 data is crucial for governments and local communities aiming at rigorously assessing the extent and magnitude of the virus spread and deploying efficient interventions. Using data reported between January and February 2020 in China, we compared counts of COVID-19 from near-real-time spatially disaggregated data (city level) with fine-spatial scale predictions from a Bayesian downscaling regression model applied to a reference province-level data set. The results highlight discrepancies in the counts of coronavirus-infected cases at the district level and identify districts that may require further investigation.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[92]

E. Dorigatti, J. Goschenhofer, B. Schubert, M. Rezaei and B. Bischl.
Positive-Unlabeled Learning with Uncertainty-aware Pseudo-label Selection.
Preprint (Jan. 2022). arXiv

Abstract

Positive-unlabeled learning (PUL) aims at learning a binary classifier from only positive and unlabeled training data. Even though real-world applications often involve imbalanced datasets where the majority of examples belong to one class, most contemporary approaches to PUL do not investigate performance in this setting, thus severely limiting their applicability in practice. In this work, we thus propose to tackle the issues of imbalanced datasets and model calibration in a PUL setting through an uncertainty-aware pseudo-labeling procedure (PUUPL): by boosting the signal from the minority class, pseudo-labeling expands the labeled dataset with new samples from the unlabeled set, while explicit uncertainty quantification prevents the emergence of harmful confirmation bias leading to increased predictive performance. Within a series of experiments, PUUPL yields substantial performance gains in highly imbalanced settings while also showing strong performance in balanced PU scenarios across recent baselines. We furthermore provide ablations and sensitivity analyses to shed light on PUUPL’s several ingredients. Finally, a real-world application with an imbalanced dataset confirms the advantage of our approach.

MCML Authors

Emilio Dorigatti

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Jann Goschenhofer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[91]

J. Moosbauer, J. Herbinger, G. Casalicchio, M. Lindauer and B. Bischl.
Explaining Hyperparameter Optimization via Partial Dependence Plots.
NeurIPS 2021 - 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. URL GitHub

Abstract

Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of explainability makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO with Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, such as the partial dependence plot (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. We propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions.

MCML Authors

Julia Moosbauer

Dr.

* Former Member

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[90]

B. Bischl, G. Casalicchio, M. Feurer, P. Gijsbers, F. Hutter, M. Lang, R. G. Mantovani, J. N. van Rijn and J. Vanschoren.
OpenML Benchmarking Suites.
NeurIPS 2021 - Track on Datasets and Benchmarks at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. URL

Abstract

Machine learning research depends on objectively interpretable, comparable, and reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites of machine learning tasks to standardize the setup, execution, and reporting of benchmarks. We enable this through software tools that help to create and leverage these benchmarking suites. These are seamlessly integrated into the OpenML platform, and accessible through interfaces in Python, Java, and R. OpenML benchmarking suites (a) are easy to use through standardized data formats, APIs, and client libraries; (b) come with extensive meta-information on the included datasets; and (c) allow benchmarks to be shared and reused in future studies. We then present a first, carefully curated and practical benchmarking suite for classification: the OpenML Curated Classification benchmarking suite 2018 (OpenML-CC18). Finally, we discuss use cases and applications which demonstrate the usefulness of OpenML benchmarking suites and the OpenML-CC18 in particular.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Matthias Feurer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

[89]

Y. Zhang, A. Khakzar, Y. Li, A. Farshad, S. T. Kim and N. Navab.
Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information.
NeurIPS 2021 - Track on Datasets and Benchmarks at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. URL

Abstract

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network’s prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features’ information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Yawei Li

Statistical Learning and Data Science

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[88]

T. Weber, M. Ingrisch, M. Fabritius, B. Bischl and D. Rügamer.
Survival-oriented embeddings for improving accessibility to complex data structures.
NeurIPS 2021 - Workshop on Bridging the Gap: from Machine Learning Research to Clinical Practice at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. arXiv

Abstract

Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and consequently, neither medical personnel nor patients do usually accept black box models as reason or basis for decisions. Apart from averseness to new technologies, this is due to missing interpretability, transparency and accountability of many machine learning methods. We propose a hazard-regularized variational autoencoder that supports straightforward interpretation of deep neural architectures in the context of survival analysis, a field highly relevant in healthcare. We apply the proposed approach to abdominal CT scans of patients with liver tumors and their corresponding survival times.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[87]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation.
NeurIPS 2021 - Workshop on Deep Generative Models and Downstream Applications at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. PDF

Abstract

The application of deep learning in survival analysis (SA) allows utilizing unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance, and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches. We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistics, Data Science and Machine Learning

[86]

M. Mittermeier, M. Weigert and D. Rügamer.
Identifying the atmospheric drivers of drought and heat using a smoothed deep learning approach.
NeurIPS 2021 - Workshop on Tackling Climate Change with Machine Learning at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. PDF

Abstract

Europe was hit by several, disastrous heat and drought events in recent summers. Besides thermodynamic influences, such hot and dry extremes are driven by certain atmospheric situations including anticyclonic conditions. Effects of climate change on atmospheric circulations are complex and many open research questions remain in this context, e.g., on future trends of anticyclonic conditions. Based on the combination of a catalog of labeled circulation patterns and spatial atmospheric variables, we propose a smoothed convolutional neural network classifier for six types of anticyclonic circulations that are associated with drought and heat. Our work can help to identify important drivers of hot and dry extremes in climate simulations, which allows to unveil the impact of climate change on these drivers. We address various challenges inherent to circulation pattern classification that are also present in other climate patterns, e.g., subjective labels and unambiguous transition periods.

MCML Authors

Maximilian Weigert

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[85]

S. Kevork and G. Kauermann.
Iterative Estimation of Mixed Exponential Random Graph Models with Nodal Random Effects.
Network Science 9.4 (Dec. 2021). DOI

Abstract

The presence of unobserved node-specific heterogeneity in exponential random graph models (ERGM) is a general concern, both with respect to model validity as well as estimation instability. We, therefore, include node-specific random effects in the ERGM that account for unobserved heterogeneity in the network. This leads to a mixed model with parametric as well as random coefficients, labelled as mixed ERGM. Estimation is carried out by iterating between approximate pseudolikelihood estimation for the random effects and maximum likelihood estimation for the remaining parameters in the model. This approach provides a stable algorithm, which allows to fit nodal heterogeneity effects even for large scale networks. We also propose model selection based on the Akaike Information Criterion to check for node-specific heterogeneity.

MCML Authors

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Applied Statistics in Social Sciences, Economics and Business

[84]

C. Fritz, M. Mehrl, P. W. Thurner and G. Kauermann.
The Role of Governmental Weapons Procurements in Forecasting Monthly Fatalities in Intrastate Conflicts: A Semiparametric Hierarchical Hurdle Model.
International Interactions 48.4 (Nov. 2021). DOI

Abstract

Accurate and interpretable forecasting models predicting spatially and temporally fine-grained changes in the numbers of intrastate conflict casualties are of crucial importance for policymakers and international non-governmental organizations (NGOs). Using a count data approach, we propose a hierarchical hurdle regression model to address the corresponding prediction challenge at the monthly PRIO-grid level. More precisely, we model the intensity of local armed conflict at a specific point in time as a three-stage process. Stages one and two of our approach estimate whether we will observe any casualties at the country- and grid-cell-level, respectively, while stage three applies a regression model for truncated data to predict the number of such fatalities conditional upon the previous two stages. Within this modeling framework, we focus on the role of governmental arms imports as a processual factor allowing governments to intensify or deter from fighting. We further argue that a grid cell’s geographic remoteness is bound to moderate the effects of these military buildups. Out-of-sample predictions corroborate the effectiveness of our parsimonious and theory-driven model, which enables full transparency combined with accuracy in the forecasting process.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[83]

S. Hilbert, S. Coors, E. Kraus, B. Bischl, A. Lindl, M. Frei, J. Wild, S. Krauss, D. Goretzko and C. Stachl.
Machine learning for the educational sciences.
Review of Education 9.3 (Nov. 2021). DOI

Abstract

Machine learning (ML) provides a powerful framework for the analysis of high-dimensional datasets by modelling complex relationships, often encountered in modern data with many variables, cases and potentially non-linear effects. The impact of ML methods on research and practical applications in the educational sciences is still limited, but continuously grows, as larger and more complex datasets become available through massive open online courses (MOOCs) and large-scale investigations. The educational sciences are at a crucial pivot point, because of the anticipated impact ML methods hold for the field. To provide educational researchers with an elaborate introduction to the topic, we provide an instructional summary of the opportunities and challenges of ML for the educational sciences, show how a look at related disciplines can help learning from their experiences, and argue for a philosophical shift in model evaluation. We demonstrate how the overall quality of data analysis in educational research can benefit from these methods and show how ML can play a decisive role in the validation of empirical models. Specifically, we (1) provide an overview of the types of data suitable for ML and (2) give practical advice for the application of ML methods. In each section, we provide analytical examples and reproducible R code. Also, we provide an extensive Appendix on ML-based applications for education. This instructional summary will help educational scientists and practitioners to prepare for the promises and threats that come with the shift towards digitisation and large-scale assessment in education.

MCML Authors

Stefan Coors

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

[82]

M. Herrmann and F. Scheipl.
A Geometric Perspective on Functional Outlier Detection.
Stats 4.4 (Nov. 2021). DOI

Abstract

We consider functional outlier detection from a geometric perspective, specifically: for functional datasets drawn from a functional manifold, which is defined by the data’s modes of variation in shape, translation, and phase. Based on this manifold, we developed a conceptualization of functional outlier detection that is more widely applicable and realistic than previously proposed taxonomies. Our theoretical and experimental analyses demonstrated several important advantages of this perspective: it considerably improves theoretical understanding and allows describing and analyzing complex functional outlier scenarios consistently and in full generality, by differentiating between structurally anomalous outlier data that are off-manifold and distributionally outlying data that are on-manifold, but at its margins. This improves the practical feasibility of functional outlier detection: we show that simple manifold-learning methods can be used to reliably infer and visualize the geometric structure of functional datasets. We also show that standard outlier-detection methods requiring tabular data inputs can be applied to functional data very successfully by simply using their vector-valued representations learned from manifold learning methods as the input features. Our experiments on synthetic and real datasets demonstrated that this approach leads to outlier detection performances at least on par with existing functional-data-specific methods in a large variety of settings, without the highly specialized, complex methodology and narrow domain of application these methods often entail.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Fabian Scheipl

PD Dr.

Functional Data Analysis

[81]

A. Khakzar, Y. Zhang, W. Mansour, Y. Cai, Y. Li, Y. Zhang, S. T. Kim and N. Navab.
Explaining COVID-19 and Thoracic Pathology Model Predictions by Identifying Informative Input Features.
MICCAI 2021 - 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg, France, Sep 27-Oct 01, 2021. DOI GitHub

Abstract

Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks’ prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building on Information Bottleneck Attribution (IBA) method, for each prediction we identify the chest X-ray regions that have high mutual information with the network’s output. Original IBA identifies input regions that have sufficient predictive information. We propose Inverse IBA to identify all informative regions. Thus all predictive cues for pathologies are highlighted on the X-rays, a desirable property for chest X-ray diagnosis. Moreover, we propose Regression IBA for explaining regression models. Using Regression IBA we observe that a model trained on cumulative severity score labels implicitly learns the severity of different X-ray regions. Finally, we propose Multi-layer IBA to generate higher resolution and more detailed attribution/saliency maps. We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-agnostic feature importance metrics on NIH Chest X-ray8 and BrixIA datasets.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Yawei Li

Statistical Learning and Data Science

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[80]

S. Coors, D. Schalk, B. Bischl and D. Rügamer.
Automatic Componentwise Boosting: An Interpretable AutoML System.
ADS @ECML-PKDD 2021 - Automating Data Science Workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021). Virtual, Sep 13-17, 2021. arXiv

Abstract

In practice, machine learning (ML) workflows require various different steps, from data preprocessing, missing value imputation, model selection, to model tuning as well as model evaluation. Many of these steps rely on human ML experts. AutoML - the field of automating these ML pipelines - tries to help practitioners to apply ML off-the-shelf without any expert knowledge. Most modern AutoML systems like auto-sklearn, H20-AutoML or TPOT aim for high predictive performance, thereby generating ensembles that consist almost exclusively of black-box models. This, in turn, makes the interpretation for the layperson more intricate and adds another layer of opacity for users. We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm. Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions, allows for a straightforward calculation of feature importance, and gives insights into the required model complexity to fit the given task. We introduce the general framework and outline its implementation autocompboost. To demonstrate the frameworks efficacy, we compare autocompboost to other existing systems based on the OpenML AutoML-Benchmark. Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets while being more user-friendly and transparent.

MCML Authors

Stefan Coors

* Former Member

Daniel Schalk

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[79]

R. Sonabend, F. J. Király, A. Bender, B. Bischl and M. Lang.
mlr3proba: An R Package for Machine Learning in Survival Analysis.
Bioinformatics 37.17 (Sep. 2021). DOI

Abstract

In tasks like node classification, image segmentation, and named-entity recognition we have a classifier that simultaneously outputs multiple predictions (a vector of labels) based on a single input, i.e. a single graph, image, or document respectively. Existing adversarial robustness certificates consider each prediction independently and are thus overly pessimistic for such tasks. They implicitly assume that an adversary can use different perturbed inputs to attack different predictions, ignoring the fact that we have a single shared input. We propose the first collective robustness certificate which computes the number of predictions that are simultaneously guaranteed to remain stable under perturbation, i.e. cannot be attacked. We focus on Graph Neural Networks and leverage their locality property - perturbations only affect the predictions in a close neighborhood - to fuse multiple single-node certificates into a drastically stronger collective certificate. For example, on the Citeseer dataset our collective certificate for node classification increases the average number of certifiable feature perturbations from 7 to 351.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

[78]

C. Fritz, P. W. Thurner and G. Kauermann.
Separable and Semiparametric Network-based Counting Processes applied to the International Combat Aircraft Trades.
Network Science 9.3 (Sep. 2021). DOI

Abstract

We propose a novel tie-oriented model for longitudinal event network data. The generating mechanism is assumed to be a multivariate Poisson process that governs the onset and repetition of yearly observed events with two separate intensity functions. We apply the model to a network obtained from the yearly dyadic number of international deliveries of combat aircraft trades between 1950 and 2017. Based on the trade gravity approach, we identify economic and political factors impeding or promoting the number of transfers. Extensive dynamics as well as country heterogeneities require the specification of semiparametric time-varying effects as well as random effects. Our findings reveal strong heterogeneous as well as time-varying effects of endogenous and exogenous covariates on the onset and repetition of aircraft trade events.

MCML Authors

Cornelius Fritz

Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

* Former Member

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

[77]

F. Soleymani, M. Eslami, T. Elze, B. Bischl and M. Rezaei.
Deep Variational Clustering Framework for Self-labeling of Large-scale Medical Images.
Preprint (Sep. 2021). arXiv GitHub

Abstract

We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivariate Gaussian posterior captures the latent distribution of a large set of unlabeled images. Then, we perform unsupervised clustering on top of the variational latent space using a clustering loss. In this approach, the probabilistic decoder helps to prevent the distortion of data points in the latent space and to preserve the local structure of data generating distribution. The training process can be considered as a self-training process to refine the latent space and simultaneously optimizing cluster assignments iteratively. We evaluated our proposed framework on three public datasets that represented different medical imaging modalities. Our experimental results show that our proposed framework generalizes better across different datasets. It achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over conventional deep unsupervised learning in real-world applications.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[76]

M. P. Fabritius, M. Seidensticker, J. Rueckel, C. Heinze, M. Pech, K. J. Paprottka, P. M. Paprottka, J. Topalis, A. Bender, J. Ricke, A. Mittermeier and M. Ingrisch.
Bi-Centric Independent Validation of Outcome Prediction after Radioembolization of Primary and Secondary Liver Cancer.
Journal of Clinical Medicine 10.16 (Aug. 2021). DOI

Abstract

Background: Yttrium-90 radioembolization (RE) plays an important role in the treatment of liver malignancies. Optimal patient selection is crucial for an effective and safe treatment. In this study, we aim to validate the prognostic performance of a previously established random survival forest (RSF) with an external validation cohort from a different national center. Furthermore, we compare outcome prediction models with different established metrics. Methods: A previously established RSF model, trained on a consecutive cohort of 366 patients who had received RE due to primary or secondary liver tumor at a national center (center 1), was used to predict the outcome of an independent consecutive cohort of 202 patients from a different national center (center 2) and vice versa. Prognostic performance was evaluated using the concordance index (C-index) and the integrated Brier score (IBS). The prognostic importance of designated baseline parameters was measured with the minimal depth concept, and the influence on the predicted outcome was analyzed with accumulated local effects plots. RSF values were compared to conventional cox proportional hazards models in terms of C-index and IBS. Results: The established RSF model achieved a C-index of 0.67 for center 2, comparable to the results obtained for center 1, which it was trained on (0.66). The RSF model trained on center 2 achieved a C-index of 0.68 on center 2 data and 0.66 on center 1 data. CPH models showed comparable results on both cohorts, with C-index ranging from 0.68 to 0.72. IBS validation showed more differentiated results depending on which cohort was trained on and which cohort was predicted (range: 0.08 to 0.20). Baseline cholinesterase was the most important variable for survival prediction. Conclusion: The previously developed predictive RSF model was successfully validated with an independent external cohort. C-index and IBS are suitable metrics to compare outcome prediction models, with IBS showing more differentiated results. The findings corroborate that survival after RE is critically determined by functional hepatic reserve and thus baseline liver function should play a key role in patient selection.

MCML Authors

Johanna Topalis

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[75]

H. Seibold, A. Charlton, A.-L. Boulesteix and S. Hoffmann.
Statisticians, Roll Up Your Sleeves! There's A Crisis to be Solved.
Significance 18.4 (Aug. 2021). DOI

Abstract

Statisticians play a key role in almost all scientific research. As such, they may be key to solving the reproducibility crisis. Heidi Seibold, Alethea Charlton, Anne-Laure Boulesteix and Sabine Hoffmann urge statisticians to take an active role in promoting more credible science.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[74]

F. Pfisterer, C. Kern, S. Dandl, M. Sun, M. P. Kim and B. Bischl.
mcboost: Multi-Calibration Boosting for R.
The Journal of Open Source Software 6.64 (Aug. 2021). DOI

Abstract

Given the increasing usage of automated prediction systems in the context of high-stakes de- cisions, a growing body of research focuses on methods for detecting and mitigating biases in algorithmic decision-making. One important framework to audit for and mitigate biases in predictions is that of Multi-Calibration, introduced by Hebert-Johnson et al. (2018). The underlying fairness notion, Multi-Calibration, promotes the idea of multi-group fairness and requires calibrated predictions not only for marginal populations, but also for subpopulations that may be defined by complex intersections of many attributes. A simpler variant of Multi- Calibration, referred to as Multi-Accuracy, requires unbiased predictions for large collections of subpopulations. Hebert-Johnson et al. (2018) proposed a boosting-style algorithm for learning multi-calibrated predictors. Kim et al. (2019) demonstrated how to turn this al- gorithm into a post-processing strategy to achieve multi-accuracy, demonstrating empirical effectiveness across various domains. This package provides a stable implementation of the multi-calibration algorithm, called MCBoost. In contrast to other Fair ML approaches, MC- Boost does not harm the overall utility of a prediction model, but rather aims at improving calibration and accuracy for large sets of subpopulations post-training. MCBoost comes with strong theoretical guarantees, which have been explored formally in Hebert-Johnson et al. (2018), Kim et al. (2019), Dwork et al. (2019), Dwork et al. (2020) and Kim et al. (2021).

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Learning and Data Science

[73]

A. Bauer, F. Scheipl and H. Küchenhoff.
Registration for Incomplete Non-Gaussian Functional Data.
Preprint (Aug. 2021). arXiv

Abstract

Accounting for phase variability is a critical challenge in functional data analysis. To separate it from amplitude variation, functional data are registered, i.e., their observed domains are deformed elastically so that the resulting functions are aligned with template functions. At present, most available registration approaches are limited to datasets of complete and densely measured curves with Gaussian noise. However, many real-world functional data sets are not Gaussian and contain incomplete curves, in which the underlying process is not recorded over its entire domain. In this work, we extend and refine a framework for joint likelihood-based registration and latent Gaussian process-based generalized functional principal component analysis that is able to handle incomplete curves. Our approach is accompanied by sophisticated open-source software, allowing for its application in diverse non-Gaussian data settings and a public code repository to reproduce all results. We register data from a seismological application comprising spatially indexed, incomplete ground velocity time series with a highly volatile Gamma structure. We describe, implement and evaluate the approach for such incomplete non-Gaussian functional data and compare it to existing routines.

MCML Authors

Alexander Bauer

Dr.

* Former Member

Fabian Scheipl

PD Dr.

Functional Data Analysis

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

[72]

J. Moosbauer, J. Herbinger, G. Casalicchio, M. Lindauer and B. Bischl.
Towards Explaining Hyperparameter Optimization via Partial Dependence Plots.
AutoML @ICML 2021 - 8th Workshop on Automated Machine Learning at the 38th International Conference on Machine Learning (ICML 2021). Virtual, Jul 18-24, 2021. URL

Abstract

Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of comprehensibility and transparency makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO and especially discuss the popular case of Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, like Partial Dependence Plots (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. In addition, we propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions.

MCML Authors

Julia Moosbauer

Dr.

* Former Member

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[71]

G. König, T. Freiesleben and M. Grosse-Wentrup.
A causal perspective on meaningful and robust algorithmic recourse.
ICML 2021 - Workshop on Algorithmic Recourse at the 38th International Conference on Machine Learning. Virtual, Jul 18-24, 2021. URL

Abstract

Algorithmic recourse explanations inform stakeholders on how to act to revert unfavorable predictions. However, in general ML models do not predict well in interventional distributions. Thus, an action that changes the prediction in the desired way may not lead to an improvement of the underlying target. Such recourse is neither meaningful nor robust to model refits. Extending the work of Karimi et al. (2021), we propose meaningful algorithmic recourse (MAR) that only recommends actions that improve both prediction and target. We justify this selection constraint by highlighting the differences between model audit and meaningful, actionable recourse explanations. Additionally, we introduce a relaxation of MAR called effective algorithmic recourse (EAR), which, under certain assumptions, yields meaningful recourse by only allowing interventions on causes of the target.

MCML Authors

Gunnar König

Dr.

* Former Member

Moritz Grosse-Wentrup

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Principal Investigator

[70]

P. Gijsbers, F. Pfisterer, J. N. van Rijn, B. Bischl and J. Vanschoren.
Meta-Learning for Symbolic Hyperparameter Defaults.
GECCO 2021 - Genetic and Evolutionary Computation Conference. Lile, France, Jul 10-14, 2021. DOI

Abstract

Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but still data-dependent, configuration of the ML algorithm, compared to standard hyperparameter optimization approaches. In the past, symbolic and static default values have usually been obtained as hand-crafted heuristics. We propose an approach of learning such symbolic configurations as formulas of dataset properties from a large set of prior evaluations on multiple datasets by optimizing over a grammar of expressions using an evolutionary algorithm. We evaluate our method on surrogate empirical performance models as well as on real data across 6 ML algorithms on more than 100 datasets and demonstrate that our method indeed finds viable symbolic defaults.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[69]

F. Pfisterer, J. N. van Rijn, P. Probst, A. C. Müller and B. Bischl.
Learning Multiple Defaults for Machine Learning Algorithms.
GECCO 2021 - Genetic and Evolutionary Computation Conference. Lile, France, Jul 10-14, 2021. DOI

Abstract

Modern machine learning methods highly depend on their hyper-parameter configurations for optimal performance. A widely used approach to selecting a configuration is using default settings, often proposed along with the publication of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work on a wide variety of datasets. Different automatic hyperparameter configuration algorithms which select an optimal configuration per dataset have been proposed, but despite its importance, tuning is often skipped in applications because of additional run time, complexity, and experimental design questions. Instead, the learner is often applied in its defaults. This principled approach usually improves performance but adds additional algorithmic complexity and computational costs to the training procedure. We propose and study using a set of complementary default values, learned from a large database of prior empirical results as an alternative. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient, and embarrassingly parallel search over this set. To demonstrate the effectiveness and efficiency of the approach, we compare learned sets of configurations to random search and Bayesian optimization. We show that sets of defaults can improve performance while being easy to deploy in comparison to more complex methods.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[68]

A. Python, A. Bender, A. K. Nandi, P. A. Hancock, R. Arambepola, J. Brandsch and T. C. D. Lucas.
Predicting non-state terrorism worldwide.
Science Advances 7.31 (Jul. 2021). DOI

Abstract

Several thousand people die every year worldwide because of terrorist attacks perpetrated by non-state actors. In this context, reliable and accurate short-term predictions of non-state terrorism at the local level are key for policy makers to target preventative measures. Using only publicly available data, we show that predictive models that include structural and procedural predictors can accurately predict the occurrence of non-state terrorism locally and a week ahead in regions affected by a relatively high prevalence of terrorism. In these regions, theoretically informed models systematically outperform models using predictors built on past terrorist events only. We further identify and interpret the local effects of major global and regional terrorism drivers. Our study demonstrates the potential of theoretically informed models to predict and explain complex forms of political violence at policy-relevant scales.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[67]

M. Binder, F. Pfisterer, M. Lang, L. Schneider, L. Kotthoff and B. Bischl.
mlr3pipelines - Flexible Machine Learning Pipelines in R.
Journal of Machine Learning Research 22.184 (Jun. 2021). URL

Abstract

Recent years have seen a proliferation of ML frameworks. Such systems make ML accessible to non-experts, especially when combined with powerful parameter tuning and AutoML techniques. Modern, applied ML extends beyond direct learning on clean data, however, and needs an expressive language for the construction of complex ML workflows beyond simple pre- and post-processing. We present mlr3pipelines, an R framework which can be used to define linear and complex non-linear ML workflows as directed acyclic graphs. The framework is part of the mlr3 ecosystem, leveraging convenient resampling, benchmarking, and tuning components.

MCML Authors

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[66]

G. König, T. Freiesleben, B. Bischl, G. Casalicchio and M. Grosse-Wentrup.
Decomposition of Global Feature Importance into Direct and Associative Components (DEDACT).
Preprint (Jun. 2021). arXiv

Abstract

Global model-agnostic feature importance measures either quantify whether features are directly used for a model’s predictions (direct importance) or whether they contain prediction-relevant information (associative importance). Direct importance provides causal insight into the model’s mechanism, yet it fails to expose the leakage of information from associated but not directly used variables. In contrast, associative importance exposes information leakage but does not provide causal insight into the model’s mechanism. We introduce DEDACT - a framework to decompose well-established direct and associative importance measures into their respective associative and direct components. DEDACT provides insight into both the sources of prediction-relevant information in the data and the direct and indirect feature pathways by which the information enters the model. We demonstrate the method’s usefulness on simulated examples.

MCML Authors

Gunnar König

Dr.

* Former Member

Timo Freiesleben

Dr.

A2 | Mathematical Foundations
→ Group Tom Sterkenburg

Munich Center for Mathematical Philosophy

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Moritz Grosse-Wentrup

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Principal Investigator

[65]

I. Gerostathopoulos, F. Plášil, C. Prehofer, J. Thomas and B. Bischl.
Automated Online Experiment-Driven Adaptation--Mechanics and Cost Aspects.
IEEE Access 9 (Apr. 2021). DOI

Abstract

As modern software-intensive systems become larger, more complex, and more customizable, it is desirable to optimize their functionality by runtime adaptations. However, in most cases it is infeasible to fully model and predict their behavior in advance, which is a classical requirement of runtime self-adaptation. To address this problem, we propose their self-adaptation based on a sequence of online experiments carried out in a production environment. The key idea is to evaluate each experiment by data analysis and determine the next potential experiment via an optimization strategy. The feasibility of the approach is illustrated on a use case devoted to online self-adaptation of traffic navigation where Bayesian optimization, grid search, and local search are employed as the optimization strategies. Furthermore, the cost of the experiments is discussed and three key cost components are examined-time cost, adaptation cost, and endurability cost.

MCML Authors

Janek Thomas

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[64]

P. Kopper, S. Pölsterl, C. Wachinger, B. Bischl, A. Bender and D. Rügamer.
Semi-Structured Deep Piecewise Exponential Models.
AAAI-SPACA 2021 - AAAI Spring Symposium Series on Survival Prediction: Algorithms, Challenges and Applications. Palo Alto, California, USA, Mar 21-24, 2021. PDF

Abstract

We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise expo-nential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data sources and higher-order interaction effects into the model, we embed the model class in a neural network and thereby enable the si-multaneous estimation of both inherently interpretable structured regression inputs as well as deep neural network components which can potentially process additional unstructured data sources. A proof of concept is provided by using the framework to predict Alzheimer’s disease progression based on tabular and 3D point cloud data and applying it to synthetic data.

MCML Authors

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[63]

S. Klau, S. Hoffmann, C. J. Patel, J. P. A. Ioannidis and A.-L. Boulesteix.
Examining the robustness of observational associations to model, measurement and sampling uncertainty with the vibration of effects framework.
International Journal of Epidemiology 50.1 (Feb. 2021). DOI

Abstract

Uncertainty is a crucial issue in statistics which can be considered from different points of view. One type of uncertainty, typically referred to as sampling uncertainty, arises through the variability of results obtained when the same analysis strategy is applied to different samples. Another type of uncertainty arises through the variability of results obtained when using the same sample but different analysis strategies addressing the same research question. We denote this latter type of uncertainty as method uncertainty. It results from all the choices to be made for an analysis, for example, decisions related to data preparation, method choice, or model selection. In medical sciences, a large part of omics research is focused on the identification of molecular biomarkers, which can either be performed through ranking or by selection from among a large number of candidates. In this paper, we introduce a general resampling-based framework to quantify and compare sampling and method uncertainty. For illustration, we apply this framework to different scenarios related to the selection and ranking of omics biomarkers in the context of acute myeloid leukemia: variable selection in multivariable regression using different types of omics markers, the ranking of biomarkers according to their predictive performance, and the identification of differentially expressed genes from RNA-seq data. For all three scenarios, our findings suggest highly unstable results when the same analysis strategy is applied to two independent samples, indicating high sampling uncertainty and a comparatively smaller, but non-negligible method uncertainty, which strongly depends on the methods being compared.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[62]

J. Goschenhofer, R. Hvingelby, D. Rügamer, J. Thomas, M. Wagner and B. Bischl.
Deep Semi-Supervised Learning for Time Series Classification.
Preprint (Feb. 2021). arXiv

Abstract

While Semi-supervised learning has gained much attention in computer vision on image data, yet limited research exists on its applicability in the time series domain. In this work, we investigate the transferability of state-of-the-art deep semi-supervised models from image to time series classification. We discuss the necessary model adaptations, in particular an appropriate model backbone architecture and the use of tailored data augmentation strategies. Based on these adaptations, we explore the potential of deep semi-supervised learning in the context of time series classification by evaluating our methods on large public time series classification problems with varying amounts of labelled samples. We perform extensive comparisons under a decidedly realistic and appropriate evaluation scheme with a unified reimplementation of all algorithms considered, which is yet lacking in the field. We find that these transferred semi-supervised models show significant performance gains over strong supervised, semi-supervised and self-supervised alternatives, especially for scenarios with very few labelled samples.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Janek Thomas

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[61]

G. König, C. Molnar, B. Bischl and M. Grosse-Wentrup.
Relative Feature Importance.
ICPR 2020 - 25th International Conference on Pattern Recognition. Virtual - Milano, Italy, Jan 10-15, 2021. DOI

Abstract

Interpretable Machine Learning (IML) methods are used to gain insight into the relevance of a feature of interest for the performance of a model. Commonly used IML methods differ in whether they consider features of interest in isolation, e.g., Permutation Feature Importance (PFI), or in relation to all remaining feature variables, e.g., Conditional Feature Importance (CFI). As such, the perturbation mechanisms inherent to PFI and CFI represent extreme reference points. We introduce Relative Feature Importance (RFI), a generalization of PFI and CFI that allows for a more nuanced feature importance computation beyond the PFI versus CFI dichotomy. With RFI, the importance of a feature relative to any other subset of features can be assessed, including variables that were not available at training time. We derive general interpretation rules for RFI based on a detailed theoretical analysis of the implications of relative feature relevance, and demonstrate the method’s usefulness on simulated examples.

MCML Authors

Gunnar König

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Moritz Grosse-Wentrup

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Principal Investigator

[60]

M. Becker, S. Gruber, J. Richter, J. Moosbauer and B. Bischl.
mlr3hyperband: Hyperband for 'mlr3'.
2021. URL GitHub

Abstract

mlr3hyperband adds the optimization algorithms Successive Halving (Jamieson and Talwalkar 2016) and Hyperband (Li et al. 2018) to the mlr3 ecosystem. The implementation in mlr3hyperband features improved scheduling and parallelizes the evaluation of configurations. The package includes tuners for hyperparameter optimization in mlr3tuning and optimizers for black-box optimization in bbotk.

MCML Authors

Marc Becker

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[59]

M. Becker, M. Lang, J. Richter, B. Bischl and D. Schalk.
mlr3tuning: Tuning for 'mlr3'.
2021. URL GitHub

Abstract

mlr3tuning is the hyperparameter optimization package of the mlr3 ecosystem. It features highly configurable search spaces via the paradox package and finds optimal hyperparameter configurations for any mlr3 learner. mlr3tuning works with several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in mlr3mbo) and Hyperband (in mlr3hyperband). Moreover, it can automatically optimize learners and estimate the performance of optimized models with nested resampling. The package is built on the optimization framework bbotk.

MCML Authors

Marc Becker

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Daniel Schalk

Dr.

* Former Member

[58]

M. Becker, J. Richter, M. Lang, B. Bischl and M. Binder.
bbotk: Black-Box Optimization Toolkit.
2021. URL GitHub

Abstract

bbotk is a black-box optimization framework for R. It features highly configurable search spaces via the paradox package and optimizes every user-defined objective function. The package includes several optimization algorithms e.g. Random Search, Grid Search, Iterated Racing, Bayesian Optimization (in mlr3mbo) and Hyperband (in mlr3hyperband). bbotk is the base package of mlr3tuning, mlr3fselect and miesmuschel.

MCML Authors

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

[57]

M. Binder.
mlrintermbo: Model-Based Optimization for 'mlr3' through 'mlrMBO'.
2021. URL GitHub

Abstract

The ‘mlrMBO’ package can ordinarily not be used for optimization within ‘mlr3’, because of incompatibilities of their respective class systems. ‘mlrintermbo’ offers a compatibility interface that provides ‘mlrMBO’ as an ‘mlr3tuning’ ‘Tuner’ object, for tuning of machine learning algorithms within ‘mlr3’, as well as a ‘bbotk’ ‘Optimizer’ object for optimization of general objective functions using the ‘bbotk’ black box optimization framework. The control parameters of ‘mlrMBO’ are faithfully reproduced as a ‘paradox’ ‘ParamSet’.

MCML Authors

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[56]

M. Lang.
mlr3measures: Performance Measures for 'mlr3'.
2021. URL

Abstract

Implements multiple performance measures for supervised learning. Includes over 40 measures for regression and classification. Additionally, meta information about the performance measures can be queried, e.g. what the best and worst possible performances scores are.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[55]

M. Lang, B. Bischl, J. Richter, X. Sun and M. Binder.
paradox: Define and Work with Parameter Spaces for Complex Algorithms.
2021. URL GitHub

Abstract

The paradox package offers a language for the description of parameter spaces, as well as tools for useful operations on these parameter spaces.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

[54]

D. Rügamer, F. Pfisterer and P. Baumann.
deepregression: Fitting Semi-Structured Deep Distributional Regression in R.
2021. URL

Abstract

Allows for the specification of semi-structured deep distributional regression models which are fitted in a neural network as proposed by Ruegamer et al. (2023). Predictors can be modeled using structured (penalized) linear effects, structured non-linear effects or using an unstructured deep network model.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Florian Pfisterer

Dr.

* Former Member

[53]

P. Schratz and M. Becker.
mlr3spatiotempcv: Spatiotemporal Resampling Methods for 'mlr3'.
2021. URL

Abstract

Extends the mlr3 ML framework with spatio-temporal resampling methods to account for the presence of spatiotemporal autocorrelation (STAC) in predictor variables. STAC may cause highly biased performance estimates in cross-validation if ignored.

MCML Authors

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Marc Becker

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[52]

H. Seibold, S. Czerny, S. Decke, R. Dieterle, T. Eder, S. Fohr, N. Hahn, R. Hartmann, C. Heindl, P. Kopper, D. Lepke, V. Loidl, M. M. Mandl, S. Musiol, J. Peter, A. Piehler, E. Rojas, S. Schmid, H. Schmidt, M. Schmoll, L. Schneider, X.-Y. To, V. Tran, A. Völker, M. Wagner, J. Wagner, M. Waize, H. Wecker, R. Yang, S. Zellner and M. Nalenz.
A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses.
PLOS One 16.6 (2021). DOI

Abstract

Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses—such as the analysis of longitudinal data—reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies.

MCML Authors

Maximilian Mandl

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Xiao-Yin To

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Viet Tran

C2 | Biology
→ Group Christian Müller

Biomedical Statistics and Data Science

[51]

M. Herrmann and F. Scheipl.
Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction.
Preprint (Dec. 2020). arXiv

Abstract

In recent years, manifold methods have moved into focus as tools for dimension reduction. Assuming that the high-dimensional data actually lie on or close to a low-dimensional nonlinear manifold, these methods have shown convincing results in several settings. This manifold assumption is often reasonable for functional data, i.e., data representing continuously observed functions, as well. However, the performance of manifold methods recently proposed for tabular or image data has not been systematically assessed in the case of functional data yet. Moreover, it is unclear how to evaluate the quality of learned embeddings that do not yield invertible mappings, since the reconstruction error cannot be used as a performance measure for such representations. In this work, we describe and investigate the specific challenges for nonlinear dimension reduction posed by the functional data setting. The contributions of the paper are three-fold: First of all, we define a theoretical framework which allows to systematically assess specific challenges that arise in the functional data context, transfer several nonlinear dimension reduction methods for tabular and image data to functional data, and show that manifold methods can be used successfully in this setting. Secondly, we subject performance assessment and tuning strategies to a thorough and systematic evaluation based on several different functional data settings and point out some previously undescribed weaknesses and pitfalls which can jeopardize reliable judgment of embedding quality. Thirdly, we propose a nuanced approach to make trustworthy decisions for or against competing nonconforming embeddings more objectively.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Functional Data Analysis

[50]

A. Agrawal, F. Pfisterer, B. Bischl, F. Buet-Golfouse, S. Sood, J. Chen, S. Shah and S. Vollmer.
Debiasing classifiers: is reality at variance with expectation?
Preprint (Nov. 2020). arXiv

Abstract

We present an empirical study of debiasing methods for classifiers, showing that debiasers often fail in practice to generalize out-of-sample, and can in fact make fairness worse rather than better. A rigorous evaluation of the debiasing treatment effect requires extensive cross-validation beyond what is usually done. We demonstrate that this phenomenon can be explained as a consequence of bias-variance trade-off, with an increase in variance necessitated by imposing a fairness constraint. Follow-up experiments validate the theoretical prediction that the estimation variance depends strongly on the base rates of the protected class. Considering fairness–performance trade-offs justifies the counterintuitive notion that partial debiasing can actually yield better results in practice on out-of-sample data.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[49]

A.-L. Boulesteix, S. Hoffmann, A. Charlton and H. Seibold.
A replication crisis in methodological research?
Significance 17.5 (Oct. 2020). DOI

Abstract

Statisticians have been keen to critique statistical aspects of the enquote{replication crisis} in other scientific disciplines. But new statistical tools are often published and promoted without any thought to replicability. This needs to change, argue Anne-Laure Boulesteix, Sabine Hoffmann, Alethea Charlton and Heidi Seibold.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[48]

P. F. M. Baumann, T. Hothorn and D. Rügamer.
Deep Conditional Transformation Models.
Preprint (Oct. 2020). arXiv

Abstract

Learning the cumulative distribution function (CDF) of an outcome variable conditional on a set of features remains challenging, especially in high-dimensional settings. Conditional transformation models provide a semi-parametric approach that allows to model a large class of conditional CDFs without an explicit parametric distribution assumption and with only a few parameters. Existing estimation approaches within this class are, however, either limited in their complexity and applicability to unstructured data sources such as images or text, lack interpretability, or are restricted to certain types of outcomes. We close this gap by introducing the class of deep conditional transformation models which unifies existing approaches and allows to learn both interpretable (non-)linear model terms and more complex neural network predictors in one holistic framework. To this end we propose a novel network architecture, provide details on different model definitions and derive suitable constraints as well as network regularization terms. We demonstrate the efficacy of our approach through numerical experiments and applications.

MCML Authors

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[47]

D. Rügamer, F. Pfisterer and B. Bischl.
Neural Mixture Distributional Regression.
Preprint (Oct. 2020). arXiv

Abstract

We present neural mixture distributional regression (NMDR), a holistic framework to estimate complex finite mixtures of distributional regressions defined by flexible additive predictors. Our framework is able to handle a large number of mixtures of potentially different distributions in high-dimensional settings, allows for efficient and scalable optimization and can be applied to recent concepts that combine structured regression models with deep neural networks. While many existing approaches for mixture models address challenges in optimization of such and provide results for convergence under specific model assumptions, our approach is assumption-free and instead makes use of optimizers well-established in deep learning. Through extensive numerical experiments and a high-dimensional deep learning application we provide evidence that the proposed approach is competitive to existing approaches and works well in more complex scenarios.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Florian Pfisterer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[46]

A. Bender, D. Rügamer, F. Scheipl and B. Bischl.
A General Machine Learning Framework for Survival Analysis.
ECML-PKDD 2020 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Virtual, Sep 14-18, 2020. DOI

Abstract

The modeling of time-to-event data, also known as survival analysis, requires specialized methods that can deal with censoring and truncation, time-varying features and effects, and that extend to settings with multiple competing events. However, many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption. The methods that do provide extensions usually address at most a subset of these challenges and often require specialized software that can not be integrated into standard machine learning workflows directly. In this work, we present a very general machine learning framework for time-to-event analysis that uses a data augmentation strategy to reduce complex survival tasks to standard Poisson regression tasks. This reformulation is based on well developed statistical theory. With the proposed approach, any algorithm that can optimize a Poisson (log-)likelihood, such as gradient boosted trees, deep neural networks, model-based boosting and many more can be used in the context of time-to-event analysis. The proposed technique does not require any assumptions with respect to the distribution of event times or the functional shapes of feature and interaction effects. Based on the proposed framework we develop new methods that are competitive with specialized state of the art approaches in terms of accuracy, and versatility, but with comparatively small investments of programming effort or requirements for specialized methodological know-how.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Fabian Scheipl

PD Dr.

Functional Data Analysis

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[45]

C. Molnar, G. Casalicchio and B. Bischl.
Interpretable Machine Learning -- A Brief History, State-of-the-Art and Challenges.
ECML-PKDD 2020 - Workshops at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Virtual, Sep 14-18, 2020. DOI

Abstract

We present a brief history of the field of interpretable machine learning (IML), give an overview of state-of-the-art interpretation methods and discuss challenges. Research in IML has boomed in recent years. As young as the field is, it has over 200 years old roots in regression modeling and rule-based machine learning, starting in the 1960s. Recently, many new IML methods have been proposed, many of them model-agnostic, but also interpretation techniques specific to deep learning and tree-based ensembles. IML methods either directly analyze model components, study sensitivity to input perturbations, or analyze local or global surrogate approximations of the ML model. The field approaches a state of readiness and stability, with many methods not only proposed in research, but also implemented in open-source software. But many important challenges remain for IML, such as dealing with dependent features, causal interpretation, and uncertainty estimation, which need to be resolved for its successful application to scientific problems. A further challenge is a missing rigorous definition of interpretability, which is accepted by the community. To address the challenges and advance the field, we urge to recall our roots of interpretable, data-driven modeling in statistics and (rule-based) ML, but also to consider other areas such as sensitivity analysis, causal inference, and the social sciences.

MCML Authors

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[44]

S. Dandl, C. Molnar, M. Binder and B. Bischl.
Multi-Objective Counterfactual Explanations.
PPSN 2020 - 16th International Conference on Parallel Problem Solving from Nature. Leiden, Netherlands, Sep 05-09, 2020. DOI

Abstract

Counterfactual explanations are one of the most popular methods to make predictions of black box machine learning models interpretable by providing explanations in the form of ‘what-if scenarios’. Most current approaches optimize a collapsed, weighted sum of multiple objectives, which are naturally difficult to balance a-priori. We propose the Multi-Objective Counterfactuals (MOC) method, which translates the counterfactual search into a multi-objective optimization problem. Our approach not only returns a diverse set of counterfactuals with different trade-offs between the proposed objectives, but also maintains diversity in feature space. This enables a more detailed post-hoc analysis to facilitate better understanding and also more options for actionable user responses to change the predicted outcome. Our approach is also model-agnostic and works for numerical and categorical input features. We show the usefulness of MOC in concrete cases and compare our approach with state-of-the-art methods for counterfactual explanations.

MCML Authors

Susanne Dandl

Dr.

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

[43]

M. Herrmann, P. Probst, R. Hornung, V. Jurinovic and A.-L. Boulesteix.
Large-scale benchmark study of survival prediction methods using multi-omics data.
Briefings in Bioinformatics (Aug. 2020). DOI

Abstract

Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database ‘The Cancer Genome Atlas’ (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan–Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups—especially clinical variables—from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Roman Hornung

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

Biometry in Molecular Medicine

[42]

C. Fritz, M. Lebacher and G. Kauermann.
Tempus volat, hora fugit: A survey of tie-oriented dynamic network models in discrete and continuous time.
Statistica Neerlandica 74.3 (Aug. 2020). DOI

Abstract

Given the growing number of available tools for modeling dynamic networks, the choice of a suitable model becomes central. The goal of this survey is to provide an overview of tie-oriented dynamic network models. The survey is focused on introducing binary network models with their corresponding assumptions, advantages, and shortfalls. The models are divided according to generating processes, operating in discrete and continuous time. First, we introduce the temporal exponential random graph model (TERGM) and the separable TERGM (STERGM), both being time-discrete models. These models are then contrasted with continuous process models, focusing on the relational event model (REM). We additionally show how the REM can handle time-clustered observations, that is, continuous-time data observed at discrete time points. Besides the discussion of theoretical properties and fitting procedures, we specifically focus on the application of the models on two networks that represent international arms transfers and email exchange, respectively. The data allow to demonstrate the applicability and interpretation of the network models.

MCML Authors

Cornelius Fritz

Dr.

* Former Member

Göran Kauermann

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Applied Statistics in Social Sciences, Economics and Business

[41]

M. Binder, F. Pfisterer and B. Bischl.
Collecting empirical data about hyperparameters for data driven AutoML.
AutoML @ICML 2020 - 7th Workshop on Automated Machine Learning co-located with ICML 2020. Virtual, Jul 18, 2020. PDF

Abstract

All optimization needs some kind of prior over the functions it is optimizing over. We used a large computing cluster to collect empirical data about the behavior of ML performance, by randomly sampling hyperparameter values and performing cross-validation. We also collected information about cross-validation error by performing some evaluations multiple times, and information about progression of performance with respect to training data size by performing some evaluations on data subsets. We present how we collected data, make some preliminary analyses on the surrogate models that can be built with them, and give an outlook over interesting analyses this should enable.

MCML Authors

Martin Binder

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[40]

C. Molnar, G. König, J. Herbinger, T. Freiesleben, S. Dandl, C. A. Scholbeck, G. Casalicchio, M. Grosse-Wentrup and B. Bischl.
General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models.
XXAI @ICML 2020 - Workshop on Extending Explainable AI Beyond Deep Models and Classifiers at the 37th International Conference on Machine Learning (ICML 2020). Virtual, Jul 12-18, 2020. DOI

Abstract

An increasing number of model-agnostic interpretation techniques for machine learning (ML) models such as partial dependence plots (PDP), permutation feature importance (PFI) and Shapley values provide insightful model interpretations, but can lead to wrong conclusions if applied incorrectly. We highlight many general pitfalls of ML model interpretation, such as using interpretation techniques in the wrong context, interpreting models that do not generalize well, ignoring feature dependencies, interactions, uncertainty estimates and issues in high-dimensional settings, or making unjustified causal interpretations, and illustrate them with examples. We focus on pitfalls for global methods that describe the average model behavior, but many pitfalls also apply to local methods that explain individual predictions. Our paper addresses ML practitioners by raising awareness of pitfalls and identifying solutions for correct model interpretation, but also addresses ML researchers by discussing open issues for further research.

MCML Authors

Gunnar König

Dr.

* Former Member

Julia Herbinger

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[39]

M. Binder, J. Moosbauer, J. Thomas and B. Bischl.
Multi-Objective Hyperparameter Tuning and Feature Selection Using Filter Ensembles.
GECCO 2020 - Genetic and Evolutionary Computation Conference. Cancun, Mexico, Jul 08-12, 2020. DOI

Abstract

Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield better model interpretability and lower cost of data acquisition, data handling and model inference. While sparsity may have a beneficial or detrimental effect on predictive performance, a small drop in performance may be acceptable in return for a substantial gain in sparseness. We therefore treat feature selection as a multi-objective optimization task. We perform hyperparameter tuning and feature selection simultaneously because the choice of features of a model may influence what hyperparameters perform well. We present, benchmark, and compare two different approaches for multi-objective joint hyperparameter optimization and feature selection: The first uses multi-objective model-based optimization. The second is an evolutionary NSGA-II-based wrapper approach to feature selection which incorporates specialized sampling, mutation and recombination operators. Both methods make use of parameterized filter ensembles. While model-based optimization needs fewer objective evaluations to achieve good performance, it incurs computational overhead compared to the NSGA-II, so the preferred choice depends on the cost of evaluating a model on given data.

MCML Authors

Martin Binder

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

[38]

N. Ellenbach, A.-L. Boulesteix, B. Bischl, K. Unger and R. Hornung.
Improved outcome prediction across data sources through robust parameter tuning.
Journal of Classification 38 (Jul. 2020). DOI

Abstract

In many application areas, prediction rules trained based on high-dimensional data are subsequently applied to make predictions for observations from other sources, but they do not always perform well in this setting. This is because data sets from different sources can feature (slightly) differing distributions, even if they come from similar populations. In the context of high-dimensional data and beyond, most prediction methods involve one or several tuning parameters. Their values are commonly chosen by maximizing the cross-validated prediction performance on the training data. This procedure, however, implicitly presumes that the data to which the prediction rule will be ultimately applied, follow the same distribution as the training data. If this is not the case, less complex prediction rules that slightly underfit the training data may be preferable. Indeed, a tuning parameter does not only control the degree of adjustment of a prediction rule to the training data, but also, more generally, the degree of adjustment to the distribution of the training data. On the basis of this idea, in this paper we compare various approaches including new procedures for choosing tuning parameter values that lead to better generalizing prediction rules than those obtained based on cross-validation. Most of these approaches use an external validation data set. In our extensive comparison study based on a large collection of 15 transcriptomic data sets, tuning on external data and robust tuning with a tuned robustness parameter are the two approaches leading to better generalizing prediction rules.

MCML Authors

Nicole Ellenbach

Biometry in Molecular Medicine

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Statistical Learning and Data Science

Roman Hornung

Dr.

Biometry in Molecular Medicine

[37]

C. Stachl, Q. Au, R. Schoedel, S. D. Gosling, G. M. Harari, D. Buschek, S. T. Völkel, T. Schuwerk, M. Oldemeier, T. Ullmann, H. Hussmann, B. Bischl and M. Bühner.
Predicting personality from patterns of behavior collected with smartphones.
Proceedings of the National Academy of Sciences 117.30 (Jul. 2020). DOI

Abstract

Smartphones enjoy high adoption rates around the globe. Rarely more than an arm’s length away, these sensor-rich devices can easily be repurposed to collect rich and extensive records of their users’ behaviors (e.g., location, communication, media consumption), posing serious threats to individual privacy. Here we examine the extent to which individuals’ Big Five personality dimensions can be predicted on the basis of six different classes of behavioral information collected via sensor and log data harvested from smartphones. Taking a machine-learning approach, we predict personality at broad domain ( = 0.37) and narrow facet levels ( = 0.40) based on behavioral data collected from 624 volunteers over 30 consecutive days (25,347,089 logging events). Our cross-validated results reveal that specific patterns in behaviors in the domains of 1) communication and social behavior, 2) music consumption, 3) app usage, 4) mobility, 5) overall phone activity, and 6) day- and night-time activity are distinctively predictive of the Big Five personality traits. The accuracy of these predictions is similar to that found for predictions based on digital footprints from social media platforms and demonstrates the possibility of obtaining information about individuals’ private traits from behavioral patterns passively collected from their smartphones. Overall, our results point to both the benefits (e.g., in research settings) and dangers (e.g., privacy implications, psychological targeting) presented by the widespread collection and modeling of behavioral data obtained from smartphones.

MCML Authors

Theresa Ullmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[36]

A. Beyer, G. Kauermann and H. Schütze.
Embedding Space Correlation as a Measure of Domain Similarity.
LREC 2020 - 12th International Conference on Language Resources and Evaluation. Marseille, France, May 13-15, 2020. URL

Abstract

Prior work has determined domain similarity using text-based features of a corpus. However, when using pre-trained word embeddings, the underlying text corpus might not be accessible anymore. Therefore, we propose the CCA measure, a new measure of domain similarity based directly on the dimension-wise correlations between corresponding embedding spaces. Our results suggest that an inherent notion of domain can be captured this way, as we are able to reproduce our findings for different domain comparisons for English, German, Spanish and Czech as well as in cross-lingual comparisons. We further find a threshold at which the CCA measure indicates that two corpora come from the same domain in a monolingual setting by applying permutation tests. By evaluating the usability of the CCA measure in a domain adaptation application, we also show that it can be used to determine which corpora are more similar to each other in a cross-domain sentiment detection task.

MCML Authors

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[35]

S. Klau, M.-L. Martin-Magniette, A.-L. Boulesteix and S. Hoffmann.
Sampling uncertainty versus method uncertainty: a general framework with applications to omics biomarker selection.
Biometrical Journal 62.3 (May. 2020). DOI

Abstract

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[34]

M. Becker, P. Schratz, M. Lang and B. Bischl.
mlr3fselect: Feature Selection for 'mlr3'.
2020. URL

Abstract

Feature selection package of the ‘mlr3’ ecosystem. It selects the optimal feature set for any ‘mlr3’ learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling.

MCML Authors

Marc Becker

Statistical Learning and Data Science

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[33]

M. Binder, F. Pfisterer, L. Schneider, B. Bischl, M. Lang and S. Dandl.
mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'.
2020. URL GitHub

Abstract

mlr3pipelines is a dataflow programming toolkit for machine learning in R utilising the mlr3 package. Machine learning workflows can be written as directed “Graphs” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the mlr3tuning package, it is even possible to simultaneously optimize parameters of multiple processing units.

MCML Authors

Martin Binder

Statistical Learning and Data Science

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[32]

M. Herrmann.
fda-ndr: Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction. R package.
2020. GitHub

Abstract

manifun: Collection of functions to work with embeddings and functional data.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[31]

M. Herrmann.
manifun: Collection of functions to work with embeddings and functional data. R package.
2020. GitHub

Abstract

Repository contains material to reproduce the results of ‘Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction’(https://arxiv.org/abs/2012.11987).

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Biometry in Molecular Medicine

[30]

M. Lang.
mlr3db: Data Base Backend for 'mlr3'.
2020. URL GitHub

Abstract

Extends the mlr3 package with a DataBackend to transparently work with databases.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[29]

M. Lang.
mlr3oml: Connector Between 'mlr3' and 'OpenML'.
2020. URL GitHub

Abstract

OpenML is an open-source platform that facilitates the sharing and dissemination of machine learning research data. All entities on the platform have unique identifiers and standardized (meta)data that can be accessed via an open-access REST API or the web interface. mlr3oml allows to work with the REST API through R and integrates OpenML with the mlr3 ecosystem.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[28]

M. Lang, Q. Au, S. Coors and P. Schratz.
mlr3learners: Recommended Learners for 'mlr3'.
2020. URL GitHub

Abstract

This packages provides essential learners for mlr3, maintained by the mlr-org team. Additional learners can be found in the mlr3extralearners package on GitHub. Request additional learners over there.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[27]

M. Lang, P. Schratz and R. Sonabend.
mlr3viz: Visualizations for 'mlr3'.
2020. URL GitHub

Abstract

mlr3viz is the visualization package of the mlr3 ecosystem. It features plots for mlr3 objects such as tasks, learners, predictions, benchmark results, tuning instances and filters via the autoplot() generic of ggplot2. The package draws plots with the viridis color palette and applies the minimal theme. Visualizations include barplots, boxplots, histograms, ROC curves, and Precision-Recall curves.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[26]

D. Pulatov and M. Lang.
mlr3cluster: Cluster Extension for 'mlr3'.
2020. URL GitHub

Abstract

mlr3cluster is an extension package for cluster analysis within the mlr3 ecosystem. It is a successor of clustering capabilities of mlr2.

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[25]

F. Scheipl, J. Goldsmith and J. Wrobel.
tidyfun: Tools for Tidy Functional Data. R package.
2020. URL GitHub

Abstract

The goal of tidyfun is to provide accessible and well-documented software that makes functional data analysis in R easy – specifically data wrangling and exploratory analysis.

MCML Authors

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Functional Data Analysis

[24]

P. Schratz, M. Lang, B. Bischl and M. Binder.
mlr3filters: Filter Based Feature Selection for 'mlr3'.
2020. URL GitHub

Abstract

mlr3filters adds feature selection filters to mlr3. The implemented filters can be used stand-alone, or as part of a machine learning pipeline in combination with mlr3pipelines and the filter operator.

MCML Authors

Patrick Schratz

* Former Member

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Martin Binder

Statistical Learning and Data Science

[23]

R. Sonabend, F. J. Kiraly, A. Bender, B. Bischl and M. Lang.
mlr3proba: Probabilistic Supervised Learning for 'mlr3'. R package version 0.2.6.
2020. DOI URL

Abstract

As machine learning has become increasingly popular over the last few decades, so too has the number of machine-learning interfaces for implementing these models. Whilst many R libraries exist for machine learning, very few offer extended support for survival analysis. This is problematic considering its importance in fields like medicine, bioinformatics, economics, engineering and more. mlr3proba provides a comprehensive machine-learning interface for survival analysis and connects with mlr3’s general model tuning and benchmarking facilities to provide a systematic infrastructure for survival modelling and evaluation.

MCML Authors

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Michel Lang

Dr.

* Former Member

[22]

J. Wrobel, A. Bauer, J. McDonnel and F. Scheipl.
registr: Curve Registration for Exponential Family Functional Data. R package.
2020. GitHub

Abstract

Registration for incomplete exponential family functional data.

MCML Authors

Alexander Bauer

Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Functional Data Analysis

[21]

M. Urban, K. Heckel, C. Berger, P. Schratz, I. P. Smit, T. Strydom, J. Baade and C. Schmullius.
Woody cover mapping in the savanna ecosystem of the Kruger National Park using Sentinel-1 C-Band time series data.
Koedoe 62.1 (Jan. 2020). DOI

Abstract

The savanna ecosystems in South Africa, which are predominantly characterised by woody vegetation (e.g. shrubs and trees) and grasslands with annual phenological cycles, are shaped by ecosystem processes such as droughts, fires and herbivory interacting with management actions. Therefore, monitoring of the intra- and inter-annual vegetation structure dynamics is one of the essential components for the management of complex savanna ecosystems such as the Kruger National Park (KNP). To map the woody cover in the KNP, data from European Space Agency’s (ESA) Copernicus Sentinel-1 radar satellite (C-Band vertical-vertical [VV]/vertical-horizontal [VH]) for the years 2016 and 2017, at 10 m spatial resolution and repeated acquisitions every 12 days, were utilised. A high-resolution light detection and ranging (LiDAR) data set was reclassified to produce woody cover percentages and consequently used for calibration and validation. Woody cover estimation for different spatial resolutions was carried out by fitting a random forest (RF) model. Model accuracy was assessed via spatial cross-validation and revealed an overall root mean squared error (RMSE) of 22.8% for the product with a spatial resolution of 10 m and improved with spatial averaging to 15.8% for 30 m, 14.8% for 50 m and 13.4% for 100 m. In addition, the product was validated against a second LiDAR data set, confirming the results of the spatial cross-validation of the model. The methodology of this study is designed for savanna vegetation structure mapping based on height estimates by using open-source software and open-access data, to allow for a continuation of woody cover classification and change monitoring in these types of ecosystems.

MCML Authors

Patrick Schratz

* Former Member

[20]

M. Lang, M. Binder, J. Richter, P. Schratz, F. Pfisterer, S. Coors, Q. Au, G. Casalicchio, L. Kotthoff and B. Bischl.
mlr3: A modern object-oriented machine learning framework in R.
The Journal of Open Source Software 4.44 (Dec. 2019). DOI

Abstract

The R (R Core Team, 2019) package mlr3 and its associated ecosystem of extension packages implements a powerful, object-oriented and extensible framework for machine learning (ML) in R. It provides a unified interface to many learning algorithms available on CRAN, augmenting them with model-agnostic general-purpose functionality that is needed in every ML project, for example train-test-evaluation, resampling, preprocessing, hyperparameter tuning, nested resampling, and visualization of results from ML experiments. The package is a complete reimplementation of the mlr (Bischl et al., 2016) package that leverages many years of experience and learned best practices to provide a state-of-the-art system that is powerful, flexible, extensible, and maintainable. We target both practitioners who want to quickly apply ML algorithms to their problems and researchers who want to implement, benchmark, and compare their new methods in a structured environment. mlr3 is suitable for short scripts that test an idea, for complex multi-stage experiments with advanced functionality that use a broad range of ML functionality, as a foundation to implement new ML (meta-)algorithms (for example AutoML systems), and everything in between. Functional correctness is ensured through extensive unit and integration tests.
Several other general-purpose ML toolboxes exist for different programing languages. The most widely used ones are scikit-learn (Pedregosa et al., 2011) for Python , Weka (Hall et al., 2009) for Java, and mlj (Blaom, Kiraly, Lienart, & Vollmer, 2019) for Julia. The most important toolboxes for R are mlr, caret (Kuhn, 2008) and tidymodels (Kuhn & Wickham, 2019).

MCML Authors

Michel Lang

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Martin Binder

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Patrick Schratz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[19]

M. Binder, J. Moosbauer, J. Thomas and B. Bischl.
Multi-Objective Hyperparameter Tuning and Feature Selection using Filter Ensembles.
Preprint (Dec. 2019). arXiv

Abstract

MCML Authors

Martin Binder

Statistical Learning and Data Science

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[18]

F. Pfisterer, L. Beggel, X. Sun, F. Scheipl and B. Bischl.
Benchmarking time series classification -- Functional data vs machine learning approaches.
Preprint (Nov. 2019). arXiv

Abstract

Time series classification problems have drawn increasing attention in the machine learning and statistical community. Closely related is the field of functional data analysis (FDA): it refers to the range of problems that deal with the analysis of data that is continuously indexed over some domain. While often employing different methods, both fields strive to answer similar questions, a common example being classification or regression problems with functional covariates. We study methods from functional data analysis, such as functional generalized additive models, as well as functionality to concatenate (functional-) feature extraction or basis representations with traditional machine learning algorithms like support vector machines or classification trees. In order to assess the methods and implementations, we run a benchmark on a wide variety of representative (time series) data sets, with in-depth analysis of empirical results, and strive to provide a reference ranking for which method(s) to use for non-expert practitioners. Additionally, we provide a software framework in R for functional data analysis for supervised learning, including machine learning and more linear approaches from statistics. This allows convenient access, and in connection with the machine-learning toolbox mlr, those methods can now also be tuned and benchmarked.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Fabian Scheipl

PD Dr.

Functional Data Analysis

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[17]

F. Pfisterer, J. Thomas and B. Bischl.
Towards Human Centered AutoML.
Preprint (Nov. 2019). arXiv

Abstract

Building models from data is an integral part of the majority of data science workflows. While data scientists are often forced to spend the majority of the time available for a given project on data cleaning and exploratory analysis, the time available to practitioners to build actual models from data is often rather short due to time constraints for a given project. AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. In this position paper, we aim to discuss the impact of the rising popularity of such systems and how a user-centered interface for such systems could look like. More importantly, we also want to point out features that are currently missing in those systems and start to explore better usability of such systems from a data-scientists perspective.

MCML Authors

Florian Pfisterer

Dr.

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[16]

G. König and M. Grosse-Wentrup.
A Causal Perspective on Challenges for AI in Precision Medicine.
PMBC 2019 - 2nd International Congress on Precision Medicine. Munich, Germany, Oct 14-15, 2019.

MCML Authors

Gunnar König

Dr.

* Former Member

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

[15]

L. Beggel, M. Pfeiffer and B. Bischl.
Robust Anomaly Detection in Images Using Adversarial Autoencoders.
ECML-PKDD 2019 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. DOI

Abstract

Reliably detecting anomalies in a given set of images is a task of high practical relevance for visual quality inspection, surveillance, or medical image analysis. Autoencoder neural networks learn to reconstruct normal images, and hence can classify those images as anomalies, where the reconstruction error exceeds some threshold. Here we analyze a fundamental problem of this approach when the training set is contaminated with a small fraction of outliers. We find that continued training of autoencoders inevitably reduces the reconstruction error of outliers, and hence degrades the anomaly detection performance. In order to counteract this effect, an adversarial autoencoder architecture is adapted, which imposes a prior distribution on the latent representation, typically placing anomalies into low likelihood-regions. Utilizing the likelihood model, potential anomalies can be identified and rejected already during training, which results in an anomaly detector that is significantly more robust to the presence of outliers during training.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[14]

J. Goschenhofer, F. M. J. Pfister, K. A. Yuksel, B. Bischl, U. Fietzek and J. Thomas.
Wearable-based Parkinson's Disease Severity Monitoring using Deep Learning.
ECML-PKDD 2019 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. DOI

Abstract

One major challenge in the medication of Parkinson’s disease is that the severity of the disease, reflected in the patients’ motor state, cannot be measured using accessible biomarkers. Therefore, we develop and examine a variety of statistical models to detect the motor state of such patients based on sensor data from a wearable device. We find that deep learning models consistently outperform a classical machine learning model applied on hand-crafted features in this time series classification task. Furthermore, our results suggest that treating this problem as a regression instead of an ordinal regression or a classification task is most appropriate. For consistent model evaluation and training, we adopt the leave-one-subject-out validation scheme to the training of deep learning models. We also employ a class-weighting scheme to successfully mitigate the problem of high multi-class imbalances in this domain. In addition, we propose a customized performance measure that reflects the requirements of the involved medical staff on the model. To solve the problem of limited availability of high quality training data, we propose a transfer learning technique which helps to improve model performance substantially. Our results suggest that deep learning techniques offer a high potential to autonomously detect motor states of patients with Parkinson’s disease.

MCML Authors

Jann Goschenhofer

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Janek Thomas

Dr.

* Former Member

[13]

C. Molnar, G. Casalicchio and B. Bischl.
Quantifying Model Complexity via Functional Decomposition for Better Post-hoc Interpretability.
ECML-PKDD 2019 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. DOI

Abstract

Post-hoc model-agnostic interpretation methods such as partial dependence plots can be employed to interpret complex machine learning models. While these interpretation methods can be applied regardless of model complexity, they can produce misleading and verbose results if the model is too complex, especially w.r.t. feature interactions. To quantify the complexity of arbitrary machine learning models, we propose model-agnostic complexity measures based on functional decomposition: number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures is more reliable and compact. Furthermore, we demonstrate the application of these measures in a multi-objective optimization approach which simultaneously minimizes loss and complexity.

MCML Authors

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[12]

C. A. Scholbeck, C. Molnar, C. Heumann, B. Bischl and G. Casalicchio.
Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model Agnostic Interpretations.
ECML-PKDD 2019 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. DOI

Abstract

Model-agnostic interpretation techniques allow us to explain the behavior of any predictive model. Due to different notations and terminology, it is difficult to see how they are related. A unified view on these methods has been missing. We present the generalized SIPA (sampling, intervention, prediction, aggregation) framework of work stages for model-agnostic interpretations and demonstrate how several prominent methods for feature effects can be embedded into the proposed framework. Furthermore, we extend the framework to feature importance computations by pointing out how variance-based and performance-based importance measures are based on the same work stages. The SIPA framework reduces the diverse set of model-agnostic techniques to a single methodology and establishes a common terminology to discuss them in future work.

MCML Authors

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[11]

F. Pfisterer, S. Coors, J. Thomas and B. Bischl.
Multi-Objective Automatic Machine Learning with AutoxgboostMC.
ECML-PKDD 2019 - Workshops at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Wuerzburg, Germany, Sep 16-20, 2019. arXiv

Abstract

AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. They often combine techniques from many different sub-fields of machine learning in order to find a model or set of models that optimize a user-supplied criterion, such as predictive performance. The ultimate goal of such systems is to reduce the amount of time spent on menial tasks, or tasks that can be solved better by algorithms while leaving decisions that require human intelligence to the end-user. In recent years, the importance of other criteria, such as fairness and interpretability, and many others have become more and more apparent. Current AutoML frameworks either do not allow to optimize such secondary criteria or only do so by limiting the system’s choice of models and preprocessing steps. We propose to optimize additional criteria defined by the user directly to guide the search towards an optimal machine learning pipeline. In order to demonstrate the need and usefulness of our approach, we provide a simple multi-criteria AutoML system and showcase an exemplary application.

MCML Authors

Florian Pfisterer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Stefan Coors

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Janek Thomas

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[10]

L. M. Weber, W. Saelens, R. Cannoodt, C. Soneson, A. Hapfelmeier, P. P. Gardner, A.-L. Boulesteix, Y. Saeys and M. D. Robinson.
Essential guidelines for computational method benchmarking.
Genome Biology 20.125 (Jun. 2019). DOI

Abstract

In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Biometry in Molecular Medicine

[9]

J. Thomas.
Gradient boosting in automatic machine learning: feature selection and hyperparameter optimization.
Dissertation 2019. DOI

Abstract

This thesis focuses on automating model selection in AutoML, specifically through gradient boosting techniques like gradient tree and component-wise boosting. It addresses challenges in hyperparameter optimization using Bayesian methods, introduces a new feature selection technique, and proposes an AutoML approach that simplifies the process while maintaining accuracy. Four R packages were developed: mlrMBO for Bayesian optimization, autoxgboost for AutoML, compboost for component-wise boosting, and gamboostLSS for generalized additive models (Shortened.)

MCML Authors

Janek Thomas

Dr.

* Former Member

[8]

Q. Au, D. Schalk, G. Casalicchio, R. Schoedel, C. Stachl and B. Bischl.
Component-Wise Boosting of Targets for Multi-Output Prediction.
Preprint (Apr. 2019). arXiv

Abstract

Multi-output prediction deals with the prediction of several targets of possibly diverse types. One way to address this problem is the so called problem transformation method. This method is often used in multi-label learning, but can also be used for multi-output prediction due to its generality and simplicity. In this paper, we introduce an algorithm that uses the problem transformation method for multi-output prediction, while simultaneously learning the dependencies between target variables in a sparse and interpretable manner. In a first step, predictions are obtained for each target individually. Target dependencies are then learned via a component-wise boosting approach. We compare our new method with similar approaches in a benchmark using multi-label, multivariate regression and mixed-type datasets.

MCML Authors

Daniel Schalk

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[7]

G. Casalicchio.
On benchmark experiments and visualization methods for the evaluation and interpretation of machine learning models.
Dissertation 2019. DOI

Abstract

This cumulative dissertation consists of five articles divided into three parts. The first part extends the mlr package in R to implement and benchmark multilabel classification methods. The second part focuses on simplifying benchmark experiments with OpenML.org, introducing the OpenML R package and the OpenML100 benchmarking suite for standardized dataset and result management. The third part addresses model evaluation and interpretability, proposing the residual-based predictiveness (RBP) curve to improve upon the predictiveness curve and introducing new visualization tools, including the Shapley feature importance (SFIMP) measure for model interpretation. (Shortened.)

MCML Authors

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[6]

P. Probst, A.-L. Boulesteix and B. Bischl.
Tunability: Importance of Hyperparameters of Machine Learning Algorithms.
Journal of Machine Learning Research 20 (Mar. 2019). PDF

Abstract

Modern supervised machine learning algorithms involve hyperparameters that have to be set before running them. Options for setting hyperparameters are default values from the software package, manual configuration by the user or configuring them for optimal predictive performance by a tuning procedure. The goal of this paper is two-fold. Firstly, we formalize the problem of tuning from a statistical point of view, define data-based defaults and suggest general measures quantifying the tunability of hyperparameters of algorithms. Secondly, we conduct a large-scale benchmarking study based on 38 datasets from the OpenML platform and six common machine learning algorithms. We apply our measures to assess the tunability of their parameters. Our results yield default values for hyperparameters and enable users to decide whether it is worth conducting a possibly time consuming tuning strategy, to focus on the most important hyperparameters and to choose adequate hyperparameter spaces for tuning.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[5]

C. Happ, F. Scheipl, A.-A. Gabriel and S. Greven.
A general framework for multivariate functional principal component analysis of amplitude and phase variation.
Stat 8.2 (Feb. 2019). DOI

Abstract

Functional data typically contain amplitude and phase variation. In many data situations, phase variation is treated as a nuisance effect and is removed during preprocessing, although it may contain valuable information. In this note, we focus on joint principal component analysis (PCA) of amplitude and phase variation. As the space of warping functions has a complex geometric structure, one key element of the analysis is transforming the warping functions to urn:x-wiley:sta4:media:sta4220:sta4220-math-0001. We present different transformation approaches and show how they fit into a general class of transformations. This allows us to compare their strengths and limitations. In the context of PCA, our results offer arguments in favour of the centred log-ratio transformation. We further embed two existing approaches from the literature for joint PCA of amplitude and phase variation into the framework of multivariate functional PCA, where we study the properties of the estimators based on an appropriate metric. The approach is illustrated through an application from seismology.

MCML Authors

Fabian Scheipl

PD Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Functional Data Analysis

[4]

M. Binder, S. Dandl and J. Moosbauer.
mosmafs: Multi-Objective Simultaneous Model and Feature Selection. R package.
2019. GitHub

Abstract

mosmafs offers a variety of tools that make it possible to use the ecr package for multi-objective optimization of mixed parameter spaces. Mixed here means spaces that both include categorical and numeric hyperparameters. The following (a little contrived) example shows how to use these tools.

MCML Authors

Martin Binder

Statistical Learning and Data Science

Susanne Dandl

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Julia Moosbauer

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[3]

J. Goldsmith, F. Scheipl, L. Huang, J. Wrobel, C. Di, J. Gellar, J. Harezlak, M. W. McLean, B. Swihart, L. Xiao, C. Crainiceanu and P. T. Reiss.
refund: Regression with Functional Data.
2019. URL

Abstract

Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

[2]

P. Probst, M. N. Wright and A.-L. Boulesteix.
Hyperparameters and Tuning Strategies for Random Forest.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9.3 (Jan. 2019). DOI

Abstract

The random forest (RF) algorithm has several hyperparameters that have to be set by the user, for example, the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain, and the number of trees. In this paper, we first provide a literature review on the parameters’ influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a presenting brief overview of tuning strategies, we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters.

MCML Authors

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[1]

J. Minkwitz, F. Scheipl, E. Binder, C. Sander, U. Hegerl and H. Himmerich.
Generalised functional additive models for brain arousal state dynamics.
IPEG 2018 - 20th International Pharmaco-EEG Society for Preclinical and Clinical Electrophysiological Brain Research Meeting. Zurich, Switzerland, Nov 21-25, 2018. DOI

MCML Authors

Fabian Scheipl

PD Dr.

Functional Data Analysis

A2 | Mathematical Foundations

Some of the tremendous successes of ML have been achieved through the use of mathematical insights. The contribution of our mathematicians in MCML can be divided into two main research areas: Mathematics for ML, i.e. mathematical principles are used to develop new reliable ML algorithms, and ML for mathematics, i.e. ML is used to advance mathematical research, e.g. in imaging, inverse problems, optimal control, or numerical analysis of partial differential equations.

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

Suvrit Sra

Prof. Dr.

Resource Aware Machine Learning

Felix Dietrich

Prof. Dr.

Associate

Physics-enhanced Machine Learning

Christian Kühn

Prof. Dr.

Associate

Multiscale and Stochastic Dynamics

Johannes Maly

Prof. Dr.

Associate

Mathematical Data Science and Artificial Intelligence

©all images: LMU | TUM

Publications in Research Area A2

[142]

S. Bamberger, R. Heckel and F. Krahmer.
Approximating Positive Homogeneous Functions with Scale Invariant Neural Networks.
Journal of Approximation Theory 311.106177 (Nov. 2025). DOI

Abstract

We investigate the approximation of positive homogeneous functions, i.e., functions satisfying for all , with neural networks. Extending previous work, we establish new results explaining under which conditions such functions can be approximated with neural networks. As a key application for this, we analyze to what extent it is possible to solve linear inverse problems with networks. Due to the scaling invariance arising from the linearity, an optimal reconstruction function for such a problem is positive homogeneous. In a network, this condition translates to considering networks without bias terms. For the recovery of sparse vectors from few linear measurements, our results imply that networks with two hidden layers allow approximate recovery with arbitrary precision and arbitrary sparsity level in a stable way. In contrast, we also show that with only one hidden layer such networks cannot even recover 1-sparse vectors, not even approximately, and regardless of the width of the network. These findings even apply to a wider class of recovery problems including low-rank matrix recovery and phase retrieval. Our results also shed some light on the seeming contradiction between previous works showing that neural networks for inverse problems typically have very large Lipschitz constants, but still perform very well also for adversarial noise. Namely, the error bounds in our expressivity results include a combination of a small constant term and a term that is linear in the noise level, indicating that robustness issues may occur only for very small noise levels.

MCML Authors

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

[141]

A. Scagliotti and S. Farinelli.
Normalizing flows as approximations of optimal transport maps via linear-control neural ODEs.
Nonlinear Analysis 257.113811 (Aug. 2025). DOI

Abstract

In this paper, we consider the problem of recovering the W2-optimal transport map T between absolutely continuous measures as the flow of a linear-control neural ODE, where the control depends only on the time variable and takes values in a finite-dimensional space. We first show that, under suitable assumptions on and on the controlled vector fields governing the neural ODE, the optimal transport map is contained in the -closure of the flows generated by the system. Then, we tackle the problem under the assumption that only discrete approximations of of the original measures are available: we formulate approximated optimal control problems, and we show that their solutions give flows that approximate the original optimal transport map . In the framework of generative models, the approximating flow constructed here can be seen as a ‘Normalizing Flow’, which usually refers to the task of providing invertible transport maps between probability measures by means of deep neural networks. We propose an iterative numerical scheme based on the Pontryagin Maximum Principle for the resolution of the optimal control problem, resulting in a method for the practical computation of the approximated optimal transport map, and we test it on a two-dimensional example.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[140]

S. Dirksen, W. Li and J. Maly.
Subspace estimation under coarse quantization.
SampTA 2025 - 15th International Conference on Sampling Theory and Applications. Vienna, Austria, Jul 28-Aug 01, 2025. To be published. Preprint available. URL

Abstract

We study subspace estimation from coarsly quantized data. In particular, we analyze two stochastic quantization schemes which use dithering: a one-bit quantizer combined with rectangular dither and a multi-bit quantizer with triangular dither. For each quantizer, we derive rigorous high probability bounds for the distances between the true and estimated signal subspaces. Using our analysis, we identify scenarios in which subspace estimation via triangular dithering qualitatively outperforms rectangular dithering. We verify in numerical simulations that our estimates are optimal in their dependence on the smallest non-zero eigenvalue of the target matrix.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[139]

F. Krahmer, F. Pagginelli Patricio and P. Catala.
On a Recovery Method with Approximation Guarantees for Noisy Unlimited Sampling.
SampTA 2025 - 15th International Conference on Sampling Theory and Applications. Vienna, Austria, Jul 28-Aug 01, 2025. To be published. Preprint available. URL

Abstract

The unlimited sampling problem of recovering a bandlimited signal from measurements that are affected by a modulo operation has recently been addressed in a number of works employing different approaches. Many of these methods, however, are not robust to Gaussian noise, as local outliers can affect the global solution quality. In this talk we propose and analyze a method to address this challenge by locally optimizing the choice of the function representation among the many equivalent modulo representatives – separately for each sub-interval in a given subdivision of the domain. Our analysis reveals that a successful recovery requires a careful balance between two types of potential limitations. On the one hand, the feasibility of our least-squares retrieval strategy requires the amount of sub-intervals to be large enough, so that the input varies little inside each of them. On the other hand, we show that the conditioning of the resulting linear system matrix deteriorates for too many intervals. The study of this trade-off provides a first step towards the theoretical understanding of our proposed algorithm and a practical guidance for its implementation.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

[138]

A. Scagliotti and S. Farinelli.
Normalizing flows as approximations of optimal transport maps via linear-control neural ODEs.
VC 2025 - 16th Viennese Conference on Optimal Control and Dynamic Games. Vienna, Austria, Jul 15-18, 2025. To be published. Preprint available. DOI

Abstract

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[137]

J. von Berg, A. Fono, M. Datres, S. Maskey and G. Kutyniok.
The Price of Robustness: Stable Classifiers Need Overparameterization.
HiLD @ICML 2025 - Workshop on High-dimensional Learning Dynamics at the 42nd International Conference on Machine Learning (ICML 2025). Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

In this work, we show that class stability, the expected distance of an input to the decision boundary, captures what classical capacity measures, such as weight norms, fail to explain. We prove a generalization bound that improves inversely with the class stability, interpreted as a quantifiable notion of robustness. As a corollary, we derive a law of robustness for classification: any interpolating model with parameters must be unstable, so high stability requires significant overparameterization. Crucially, our results extend beyond smoothness assumptions and apply to discontinuous classifiers. Preliminary experiments support our theory: empirical stability increases with model size, while norm-based measures remain uninformative.

MCML Authors

Jonas von Berg

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Sohir Maskey

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Suvrit Sra

Mathematical Foundations of Artificial Intelligence

[136]

P. Fatemi, E. Sharifian and M. H. Yassaee.
A New Approach to Backtracking Counterfactual Explanations: A Unified Causal Framework for Efficient Model Interpretability.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Counterfactual explanations enhance interpretability by identifying alternative inputs that produce different outputs, offering localized insights into model decisions. However, traditional methods often neglect causal relationships, leading to unrealistic examples. While newer approaches integrate causality, they are computationally expensive. To address these challenges, we propose an efficient method called BRACE based on backtracking counterfactuals that incorporates causal reasoning to generate actionable explanations. We first examine the limitations of existing methods and then introduce our novel approach and its features. We also explore the relationship between our method and previous techniques, demonstrating that it generalizes them in specific scenarios. Finally, experiments show that our method provides deeper insights into model outputs.

MCML Authors

Pouria Fatemi

Resource Aware Machine Learning

[135]

S. Karnik, A. Veselovska, M. Iwen and F. Krahmer.
Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.

MCML Authors

Anna Veselovska

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Optimization & Data Analysis

[134]

D. A. Nguyen, E. Araya, A. Fono and G. Kutyniok.
Time to Spike? Understanding the Representational Power of Spiking Neural Networks in Discrete Time.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Recent years have seen significant progress in developing spiking neural networks (SNNs) as a potential solution to the energy challenges posed by conventional artificial neural networks (ANNs). However, our theoretical understanding of SNNs remains relatively limited compared to the ever-growing body of literature on ANNs. In this paper, we study a discrete-time model of SNNs based on leaky integrate-and-fire (LIF) neurons, referred to as discrete-time LIF-SNNs, a widely used framework that still lacks solid theoretical foundations. We demonstrate that discrete-time LIF-SNNs with static inputs and outputs realize piecewise constant functions defined on polyhedral regions, and more importantly, we quantify the network size required to approximate continuous functions. Moreover, we investigate the impact of latency (number of time steps) and depth (number of layers) on the complexity of the input space partitioning induced by discrete-time LIF-SNNs. Our analysis highlights the importance of latency and contrasts these networks with ANNs employing piecewise linear activation functions. Finally, we present numerical experiments to support our theoretical findings.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[133]

M. Herold, J. S. Jehle, F. Krahmer and A. Veselovska.
Non-intrusive surrogate modelling using sparse random features with applications in crashworthiness analysis.
International Journal for Uncertainty Quantification 15.4 (Jul. 2025).

Abstract

Efficient surrogate modelling is a key requirement for uncertainty quantification in data-driven scenarios. In this work, a novel approach of using Sparse Random Features for surrogate modelling in combination with self-supervised dimensionality reduction is described. The method is compared to other methods on synthetic and real data obtained from crashworthiness analyses. The results show a superiority of the here described approach over state of the art surrogate modelling techniques, Polynomial Chaos Expansions and Neural Networks.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Anna Veselovska

Dr.

Applied Numerical Analysis

[132]

A. Datar, A. Datar, F. Dietrich and W. Schilders.
Systematic Construction of Continuous-Time Neural Networks for Linear Dynamical Systems.
SIAM Journal on Scientific Computing 47.4 (Jul. 2025). DOI

Abstract

Discovering a suitable neural network architecture for modeling complex dynamical systems poses a formidable challenge, often involving extensive trial and error and navigation through a high-dimensional hyperparameter space. In this paper, we discuss a systematic approach to constructing neural architectures for modeling a subclass of dynamical systems, namely, linear time-invariant (LTI) systems. We use a variant of continuous-time neural networks in which the output of each neuron evolves continuously as a solution of a first-order or second-order ordinary differential equation. Instead of deriving the network architecture and parameters from data, we propose a gradient-free algorithm to compute sparse architecture and network parameters directly from the given LTI system, leveraging its properties. We bring forth a novel neural architecture paradigm featuring horizontal hidden layers and provide insights into why employing conventional neural architectures with vertical hidden layers may not be favorable. We also provide an upper bound on the numerical errors of our neural networks. Finally, we demonstrate the high accuracy of our constructed networks on three numerical examples.

MCML Authors

Felix Dietrich

Prof. Dr.

Physics-enhanced Machine Learning

[131]

E. M. Achour, K. Kohn and H. Rauhut.
The Riemannian Geometry associated to Gradient Flows of Linear Convolutional Networks.
Preprint (Jul. 2025). arXiv

Abstract

We study geometric properties of the gradient flow for learning deep linear convolutional networks. For linear fully connected networks, it has been shown recently that the corresponding gradient flow on parameter space can be written as a Riemannian gradient flow on function space (i.e., on the product of weight matrices) if the initialization satisfies a so-called balancedness condition. We establish that the gradient flow on parameter space for learning linear convolutional networks can be written as a Riemannian gradient flow on function space regardless of the initialization. This result holds for D-dimensional convolutions with D≥2, and for D=1 it holds if all so-called strides of the convolutions are greater than one. The corresponding Riemannian metric depends on the initialization.

MCML Authors

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[130]

D. Chemnitz, M. Engel, C. Kühn and S.-V. Kuntz.
A Dynamical Systems Perspective on the Analysis of Neural Networks.
Preprint (Jul. 2025). arXiv

Abstract

In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

MCML Authors

Christian Kühn

Prof. Dr.

A2 | Mathematical Foundations
→ Group Christian Kühn

Multiscale and Stochastic Dynamics

Sara-Viola Kuntz

Multiscale and Stochastic Dynamics

[129]

J. Li and G. Kutyniok.
Expressivity of deep neural networks.
Preprint (Jul. 2025). PDF

Abstract

This chapter focuses on the approximation theory of deep ReLU neural networks, analyzing their ability to approximate various target functions with different network architectures. We begin by introducing the universal approximation theory of deep neural networks, stating that given enough neurons, neural networks can approximate general functions. We then delve into the fundamental properties of ReLU neural networks and explore the role of width and depth of neural networks, highlighting that increasing layers could be more effective than increasing width in improving approximation accuracy. Next, we discuss the approximation rates for Sobolev functions using fully connected and convolutional neural networks. To alleviate the curse of dimensionality, we further consider Korobov functions. Finally, we focus on the approximation properties of self-attention and transformers, which have become increasingly important in modern deep learning. These results shed light on the expressivity and reliability of deep learning models, providing valuable insights into networks’ behavior and performance.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Mathematical Foundations of Artificial Intelligence

[128]

F. Weindel, M. Girsch and R. Heckel.
Trace Reconstruction with Language Models.
Preprint (Jul. 2025). arXiv

Abstract

The general trace reconstruction problem seeks to recover an original sequence from its noisy copies independently corrupted by deletions, insertions, and substitutions. This problem arises in applications such as DNA data storage, a promising storage medium due to its high information density and longevity. However, errors introduced during DNA synthesis, storage, and sequencing require correction through algorithms and codes, with trace reconstruction often used as part of the data retrieval process. In this work, we propose TReconLM, which leverages language models trained on next-token prediction for trace reconstruction. We pretrain language models on synthetic data and fine-tune on real-world data to adapt to technology-specific error patterns. TReconLM outperforms state-of-the-art trace reconstruction algorithms, including prior deep learning approaches, recovering a substantially higher fraction of sequences without error.

MCML Authors

Franziska Weindel

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[127]

F. P. Patricio, F. Krahmer and P. Catala.
Stable Retrieval for Unlimited Sampling via Adaptive Local Representations.
SSP 2025 - IEEE Statistical Signal Processing Workshop. Edinburgh, Scotland, Jun 08-11, 2025. DOI

Abstract

We examine theoretical guarantees for our recently proposed local adaptive iterative algorithm with least-squares approach for noisy modulo sampling recovery. We demonstrate that under modest property of the input signal variation, the least-squares formulation remains valid in noisy settings when using a sufficient amount of sub-intervals during preprocessing. We quantify first-iteration retrieval error through noise amplification analysis based on the least singular value of the system matrix. Those findings reveal a fundamental trade-off: while an increasing number of interval suffices for preprocessing effectiveness, it reduces the stability of the retrieval. This characterization provides both theoretical understanding and practical implementation guidance for unlimited sampling for our proposed algorithm. Finally, we show that the retrieval error can be controlled in terms of the noise level with high probability.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Optimization & Data Analysis

[126]

H. Boche, A. Fono and G. Kutyniok.
Mathematical Algorithm Design for Deep Learning under Societal and Judicial Constraints: The Algorithmic Transparency Requirement.
Applied and Computational Harmonic Analysis 77.101763 (Jun. 2025). DOI

Abstract

Deep learning still has drawbacks in terms of trustworthiness, which describes a comprehensible, fair, safe, and reliable method. To mitigate the potential risk of AI, clear obligations associated to trustworthiness have been proposed via regulatory guidelines, e.g., in the European AI Act. Therefore, a central question is to what extent trustworthy deep learning can be realized. Establishing the described properties constituting trustworthiness requires that the factors influencing an algorithmic computation can be retraced, i.e., the algorithmic implementation is transparent. Motivated by the observation that the current evolution of deep learning models necessitates a change in computing technology, we derive a mathematical framework which enables us to analyze whether a transparent implementation in a computing model is feasible. We exemplarily apply our trustworthiness framework to analyze deep learning approaches for inverse problems in digital and analog computing models represented by Turing and Blum-Shub-Smale Machines, respectively. Based on previous results, we find that Blum-Shub-Smale Machines have the potential to establish trustworthy solvers for inverse problems under fairly general conditions, whereas Turing machines cannot guarantee trustworthiness to the same degree.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[125]

E. Pozzoli and A. Scagliotti.
Approximation of diffeomorphisms for quantum state transfers.
IEEE Control Systems Letters Early Access (Jun. 2025). DOI

Abstract

In this paper, we seek to combine two emerging standpoints in control theory. On the one hand, recent advances in infinite-dimensional geometric control have unlocked a method for controlling (with arbitrary precision and in arbitrarily small times) state transfers for bilinear Schrödinger PDEs posed on a Riemannian manifold M. In particular, these arguments rely on controllability results in the group of the diffeomorphisms of M. On the other hand, using tools of Γ-convergence, it has been proved that we can phrase the retrieve of a diffeomorphism of M as an ensemble optimal control problem. More precisely, this is done by employing a control-affine system for emph{simultaneously} steering a finite swarm of points towards the respective targets. Here we blend these two theoretical approaches and numerically find control laws driving state transitions (such as eigenstate transfers) in a bilinear Schrödinger PDE posed on the torus. Such systems have experimental relevance and are currently used to model rotational dynamics of molecules, and cold atoms trapped in periodic optical lattices.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[124]

M. Rauscher, A. Scagliotti and F. Pagginelli Patricio.
Shortest-path recovery from signature with an optimal control approach.
Mathematics of Control, Signals, and Systems 37 (Jun. 2025). DOI

Abstract

In this paper, we consider the signature-to-path reconstruction problem from the control-theoretic perspective. Namely, we design an optimal control problem whose solution leads to the minimal-length path that generates a given signature. In order to do that, we minimize a cost functional consisting of two competing terms, i.e., a weighted final-time cost combined with the -norm squared of the controls. Moreover, we can show that, by taking the limit to infinity of the parameter that tunes the final-time cost, the problem -converges to the problem of finding a sub-Riemannian geodesic connecting two signatures. Finally, we provide an alternative reformulation of the latter problem, which is particularly suitable for the numerical implementation.

MCML Authors

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[123]

S. Maskey, G. Kutyniok and R. Levie.
Generalization Bounds for Message Passing Networks on Mixture of Graphons.
SIAM Journal on Mathematics of Data Science 7.2 (Jun. 2025). DOI

Abstract

We study the generalization capabilities of Message Passing Neural Networks (MPNNs), a prevalent class of Graph Neural Networks (GNN). We derive generalization bounds specifically for MPNNs with normalized sum aggregation and mean aggregation. Our analysis is based on a data generation model incorporating a finite set of template graphons. Each graph within this framework is generated by sampling from one of the graphons with a certain degree of perturbation. In particular, we extend previous MPNN generalization results to a more realistic setting, which includes the following modifications: 1) we analyze simple random graphs with Bernoulli-distributed edges instead of weighted graphs; 2) we sample both graphs and graph signals from perturbed graphons instead of clean graphons; and 3) we analyze sparse graphs instead of dense graphs. In this more realistic and challenging scenario, we provide a generalization bound that decreases as the average number of nodes in the graphs increases. Our results imply that MPNNs with higher complexity than the size of the training set can still generalize effectively, as long as the graphs are sufficiently large.

MCML Authors

Sohir Maskey

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[122]

S. Almi, M. Fornasier, J. Klemenc and A. Scagliotti.
Balanced quasistatic evolutions of critical points in metric spaces.
Preprint (Jun. 2025). arXiv

Abstract

Quasistatic evolutions of critical points of time-dependent energies exhibit piecewise smooth behavior, making them useful for modeling continuum mechanics phenomena like elastic-plasticity and fracture. Traditionally, such evolutions have been derived as vanishing viscosity and inertia limits, leading to balanced viscosity solutions. However, for nonconvex energies, these constructions have been realized in Euclidean spaces and assume non-degenerate critical points. In this paper, we take a different approach by decoupling the time scales of the energy evolution and of the transition to equilibria. Namely, starting from an equilibrium configuration, we let the energy evolve, while keeping frozen the system state; then, we update the state by freezing the energy, while letting the system transit via gradient flow or an approximation of it (e.g., minimizing movement or backward differentiation schemes). This approach has several advantages. It aligns with the physical principle that systems transit through energy-minimizing steady states. It is also fully constructive and computationally implementable, with physical and computational costs governed by appropriate action functionals. Additionally, our analysis is simpler and more general than previous formulations in the literature, as it does not require non-degenerate critical points. Finally, this approach extends to evolutions in locally compact metric path spaces, and our axiomatic presentation allows for various realizations.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Jona Klemenc

Applied Numerical Analysis

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[121]

A. Bergmeister, M. K. Lal, S. Jegelka and S. Sra.
A projection-based framework for gradient-free and parallel learning.
Preprint (Jun. 2025). arXiv

Abstract

We present a feasibility-seeking approach to neural network training. This mathematical optimization framework is distinct from conventional gradient-based loss minimization and uses projection operators and iterative projection algorithms. We reformulate training as a large-scale feasibility problem: finding network parameters and states that satisfy local constraints derived from its elementary operations. Training then involves projecting onto these constraints, a local operation that can be parallelized across the network. We introduce PJAX, a JAX-based software framework that enables this paradigm. PJAX composes projection operators for elementary operations, automatically deriving the solution operators for the feasibility problems (akin to autodiff for derivatives). It inherently supports GPU/TPU acceleration, provides a familiar NumPy-like API, and is extensible. We train diverse architectures (MLPs, CNNs, RNNs) on standard benchmarks using PJAX, demonstrating its functionality and generality. Our results show that this approach is as a compelling alternative to gradient-based training, with clear advantages in parallelism and the ability to handle non-differentiable operations.

MCML Authors

Andreas Bergmeister

Foundations of Deep Neural Networks

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

Suvrit Sra

Prof. Dr.

Resource Aware Machine Learning

[120]

E. Guha, R. Marten, S. Keh, N. Raoof, G. Smyrnis, H. Bansal, M. Nezhurina, J. Mercat, T. Vu, Z. Sprague, A. Suvarna, B. Feuer, L. Chen, Z. Khan, E. Frankel, S. Grover, C. Choi, N. Muennighoff, S. Su, W. Zhao, J. Yang, S. Pimpalgaonkar, K. Sharma, C. C.-J. Ji, Y. Deng, S. Pratt, V. Ramanujan, J. Saad-Falcon, J. Li, A. Dave, A. Albalak, K. Arora, B. Wulfe, C. Hegde, G. Durrett, S. Oh, M. Bansal, S. Gabriel, A. Grover, K.-W. Chang, V. Shankar, A. Gokaslan, M. A. Merrill, T. Hashimoto, Y. Choi, J. Jitsev, R. Heckel, M. Sathiamoorthy, A. G. Dimakis and L. Schmidt.
OpenThoughts: Data Recipes for Reasoning Models.
Preprint (Jun. 2025). arXiv URL

Abstract

Reasoning models have made rapid progress on many benchmarks involving math, code, and science. Yet, there are still many open questions about the best training recipes for reasoning since state-of-the-art models often rely on proprietary datasets with little to no public information available. To address this, the goal of the OpenThoughts project is to create open-source datasets for training reasoning models. After initial explorations, our OpenThoughts2-1M dataset led to OpenThinker2-32B, the first model trained on public reasoning data to match DeepSeek-R1-Distill-32B on standard reasoning benchmarks such as AIME and LiveCodeBench. We then improve our dataset further by systematically investigating each step of our data generation pipeline with 1,000+ controlled experiments, which led to OpenThoughts3. Scaling the pipeline to 1.2M examples and using QwQ-32B as teacher yields our OpenThoughts3-7B model, which achieves state-of-the-art results: 53% on AIME 2025, 51% on LiveCodeBench 06/24-01/25, and 54% on GPQA Diamond - improvements of 15.3, 17.2, and 20.5 percentage points compared to the DeepSeek-R1-Distill-Qwen-7B.

MCML Authors

Reinhard Heckel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Machine Learning and Information Processing

[119]

P.-F. Massiani, C. Fiedler, L. Haverbeck, F. Solowjow and S. Trimpe.
A kernel conditional two-sample test.
Preprint (Jun. 2025). arXiv

Abstract

We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct conditional two-sample statistical tests. These tests identify the inputs – called covariates in this context – where two conditional expectations differ with high probability. Our key idea is to transform confidence bounds of a learning method into a conditional two-sample test, and we instantiate this principle for kernel ridge regression (KRR) and conditional kernel mean embeddings. We generalize existing pointwise-in-time or time-uniform confidence bounds for KRR to previously-inaccessible yet essential cases such as infinite-dimensional outputs with non-trace-class kernels. These bounds enable circumventing the need for independent data in our statistical tests, since they allow online sampling. We also introduce bootstrapping schemes leveraging the parametric form of testing thresholds identified in theory to avoid tuning inaccessible parameters, making our method readily applicable in practice. Such conditional two-sample tests are especially relevant in applications where data arrive sequentially or non-independently, or when output distributions vary with operational parameters. We demonstrate their utility through examples in process monitoring and comparison of dynamical systems. Overall, our results establish a comprehensive foundation for conditional two-sample testing, from theoretical guarantees to practical implementation, and advance the state-of-the-art on the concentration of vector-valued least squares estimation.

MCML Authors

Christian Fiedler

Dr.

Applied Numerical Analysis

[118]

A. Rahma, C. Datar, A. Cukarska and F. Dietrich.
Rapid training of Hamiltonian graph networks without gradient descent.
Preprint (Jun. 2025). arXiv

Abstract

Learning dynamical systems that respect physical symmetries and constraints remains a fundamental challenge in data-driven modeling. Integrating physical laws with graph neural networks facilitates principled modeling of complex N-body dynamics and yields accurate and permutation-invariant models. However, training graph neural networks with iterative, gradient-based optimization algorithms (e.g., Adam, RMSProp, LBFGS) often leads to slow training, especially for large, complex systems. In comparison to 15 different optimizers, we demonstrate that Hamiltonian Graph Networks (HGN) can be trained up to 600x faster–but with comparable accuracy–by replacing iterative optimization with random feature-based parameter construction. We show robust performance in diverse simulations, including N-body mass-spring systems in up to 3 dimensions with different geometries, while retaining essential physical invariances with respect to permutation, rotation, and translation. We reveal that even when trained on minimal 8-node systems, the model can generalize in a zero-shot manner to systems as large as 4096 nodes without retraining. Our work challenges the dominance of iterative gradient-descent-based optimization algorithms for training neural network models for physical systems.

MCML Authors

Atamert Rahma

A2 | Mathematical Foundations
→ Group Felix Dietrich

Physics-enhanced Machine Learning

Chinmay Datar

A2 | Mathematical Foundations
→ Group Felix Dietrich

Physics-enhanced Machine Learning

Ana Cukarska

A2 | Mathematical Foundations
→ Group Felix Dietrich

Physics-enhanced Machine Learning

Felix Dietrich

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Physics-enhanced Machine Learning

[117]

K. Wang, T. Klug, S. Ruschke, J. Kirschke and R. Heckel.
Reliable Evaluation of MRI Motion Correction: Dataset and Insights.
Preprint (Jun. 2025). arXiv

Abstract

Correcting motion artifacts in MRI is important, as they can hinder accurate diagnosis. However, evaluating deep learning-based and classical motion correction methods remains fundamentally difficult due to the lack of accessible ground-truth target data. To address this challenge, we study three evaluation approaches: real-world evaluation based on reference scans, simulated motion, and reference-free evaluation, each with its merits and shortcomings. To enable evaluation with real-world motion artifacts, we release PMoC3D, a dataset consisting of unprocessed Paired Motion-Corrupted 3D brain MRI data. To advance evaluation quality, we introduce MoMRISim, a feature-space metric trained for evaluating motion reconstructions. We assess each evaluation approach and find real-world evaluation together with MoMRISim, while not perfect, to be most reliable. Evaluation based on simulated motion systematically exaggerates algorithm performance, and reference-free evaluation overrates oversmoothed deep learning outputs.

MCML Authors

Tobit Klug

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[116]

C. Kühn and S.-V. Kuntz.
Analysis of the Geometric Structure of Neural Networks and Neural ODEs via Morse Functions.
DS 2025 - SIAM Conference on Applications of Dynamical Systems. Denver, CO, USA, May 11-15, 2025. To be published. Preprint available. arXiv

Abstract

Besides classical feed-forward neural networks, also neural ordinary differential equations (neural ODEs) have gained particular interest in recent years. Neural ODEs can be interpreted as an infinite depth limit of feed-forward or residual neural networks. We study the input-output dynamics of finite and infinite depth neural networks with scalar output. In the finite depth case, the input is a state associated with a finite number of nodes, which maps under multiple non-linear transformations to the state of one output node. In analogy, a neural ODE maps an affine linear transformation of the input to an affine linear transformation of its time-T map. We show that depending on the specific structure of the network, the input-output map has different properties regarding the existence and regularity of critical points, which can be characterized via Morse functions. We prove that critical points cannot exist if the dimension of the hidden layer is monotonically decreasing or the dimension of the phase space is smaller or equal to the input dimension. In the case that critical points exist, we classify their regularity depending on the specific architecture of the network. We show that except for a Lebesgue measure zero set in the weight space, each critical point is non-degenerate, if for finite depth neural networks the underlying graph has no bottleneck, and if for neural ODEs, the affine linear transformations used have full rank. For each type of architecture, the proven properties are comparable in the finite and the infinite depth case. The established theorems allow us to formulate results on universal embedding, i.e., on the exact representation of maps by neural networks and neural ODEs. Our dynamical systems viewpoint on the geometric structure of the input-output map provides a fundamental understanding of why certain architectures perform better than others.

MCML Authors

Christian Kühn

Prof. Dr.

A2 | Mathematical Foundations
→ Group Christian Kühn

Multiscale and Stochastic Dynamics

Sara-Viola Kuntz

Multiscale and Stochastic Dynamics

[115]

H.-H. Chou, J. Maly, C. M. Verdun, B. Freitas Paulo da Costa and H. Mirandola.
Get rid of your constraints and reparametrize: A study in NNLS and implicit bias.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. To be published. URL

Abstract

Over the past years, there has been significant interest in understanding the implicit bias of gradient descent optimization and its connection to the generalization properties of overparametrized neural networks. Several works observed that when training linear diagonal networks on the square loss for regression tasks (which corresponds to overparametrized linear regression) gradient descent converges to special solutions, e.g., non-negative ones. We connect this observation to Riemannian optimization and view overparametrized GD with identical initialization as a Riemannian GD. We use this fact for solving non-negative least squares (NNLS), an important problem behind many techniques, e.g., non-negative matrix factorization. We show that gradient flow on the reparametrized objective converges globally to NNLS solutions, providing convergence rates also for its discretized counterpart. Unlike previous methods, we do not rely on the calculation of exponential maps or geodesics. We further show accelerated convergence using a second-order ODE, lending itself to accelerated descent methods. Finally, we establish the stability against negative perturbations and discuss generalization to other constrained optimization problems.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Data Science and Artificial Intelligence

[114]

H. Boche, V. Fojtik, A. Fono and G. Kutyniok.
Computability of Classification and Deep Learning: From Theoretical Limits to Practical Feasibility through Quantization.
Journal of Fourier Analysis and Applications 31.35 (May. 2025). DOI

Abstract

The unwavering success of deep learning in the past decade led to the increasing prevalence of deep learning methods in various application fields. However, the downsides of deep learning, most prominently its lack of trustworthiness, may not be compatible with safety-critical or high-responsibility applications requiring stricter performance guarantees. Recently, several instances of deep learning applications have been shown to be subject to theoretical limitations of computability, undermining the feasibility of performance guarantees when employed on real-world computers. We extend the findings by studying computability in the deep learning framework from two perspectives: From an application viewpoint in the context of classification problems and a general limitation viewpoint in the context of training neural networks. In particular, we show restrictions on the algorithmic solvability of classification problems that also render the algorithmic detection of failure in computations in a general setting infeasible. Subsequently, we prove algorithmic limitations in training deep neural networks even in cases where the underlying problem is well-behaved. Finally, we end with a positive observation, showing that in quantized versions of classification and deep network training, computability restrictions do not arise or can be overcome to a certain degree.

MCML Authors

Vit Fojtik

Mathematical Foundations of Artificial Intelligence

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[113]

V. Fojtik, M. Matveev, H.-H. Chou, G. Kutyniok and J. Maly.
Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization.
Preprint (May. 2025). arXiv

Abstract

A widely believed explanation for the remarkable generalization capacities of overparameterized neural networks is that the optimization algorithms used for training induce an implicit bias towards benign solutions. To grasp this theoretically, recent works examine gradient descent and its variants in simplified training settings, often assuming vanishing learning rates. These studies reveal various forms of implicit regularization, such as ℓ1-norm minimizing parameters in regression and max-margin solutions in classification. Concurrently, empirical findings show that moderate to large learning rates exceeding standard stability thresholds lead to faster, albeit oscillatory, convergence in the so-called Edge-of-Stability regime, and induce an implicit bias towards minima of low sharpness (norm of training loss Hessian). In this work, we argue that a comprehensive understanding of the generalization performance of gradient descent requires analyzing the interaction between these various forms of implicit regularization. We empirically demonstrate that the learning rate balances between low parameter norm and low sharpness of the trained model. We furthermore prove for diagonal linear networks trained on a simple regression task that neither implicit bias alone minimizes the generalization error. These findings demonstrate that focusing on a single implicit bias is insufficient to explain good generalization, and they motivate a broader view of implicit regularization that captures the dynamic trade-off between norm and sharpness induced by non-negligible learning rates.

MCML Authors

Vit Fojtik

Mathematical Foundations of Artificial Intelligence

Maria Matveev

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations
→ Group Holger Rauhut

Mathematical Data Science and Artificial Intelligence

[112]

T. Karvonen, G. Santin and T. Wenzel.
General superconvergence for kernel-based approximation.
Preprint (May. 2025). arXiv

Abstract

Kernel interpolation is a fundamental technique for approximating functions from scattered data, with a well-understood convergence theory when interpolating elements of a reproducing kernel Hilbert space. Beyond this classical setting, research has focused on two regimes: misspecified interpolation, where the kernel smoothness exceeds that of the target function, and superconvergence, where the target is smoother than the Hilbert space. This work addresses the latter, where smoother target functions yield improved convergence rates, and extends existing results by characterizing superconvergence for projections in general Hilbert spaces. We show that functions lying in ranges of certain operators, including adjoint of embeddings, exhibit accelerated convergence, which we extend across interpolation scales between these ranges and the full Hilbert space. In particular, we analyze Mercer operators and embeddings into Lp spaces, linking the images of adjoint operators to Mercer power spaces. Applications to Sobolev spaces are discussed in detail, highlighting how superconvergence depends critically on boundary conditions. Our findings generalize and refine previous results, offering a broader framework for understanding and exploiting superconvergence. The results are supported by numerical experiments.

MCML Authors

Tizian Wenzel

Dr.

Mathematical Data Science and Artificial Intelligence

[111]

C. Kühn and S.-V. Kuntz.
The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property.
Preprint (May. 2025). arXiv

Abstract

Neural Ordinary Differential Equations (Neural ODEs), which are the continuous-time analog of Residual Neural Networks (ResNets), have gained significant attention in recent years. Similarly, Neural Delay Differential Equations (Neural DDEs) can be interpreted as an infinite depth limit of Densely Connected Residual Neural Networks (DenseResNets). In contrast to traditional ResNet architectures, DenseResNets are feed-forward networks that allow for shortcut connections across all layers. These additional connections introduce memory in the network architecture, as typical in many modern architectures. In this work, we explore how the memory capacity in neural DDEs influences the universal approximation property. The key parameter for studying the memory capacity is the product Kτ of the Lipschitz constant and the delay of the DDE. In the case of non-augmented architectures, where the network width is not larger than the input and output dimensions, neural ODEs and classical feed-forward neural networks cannot have the universal approximation property. We show that if the memory capacity Kτ is sufficiently small, the dynamics of the neural DDE can be approximated by a neural ODE. Consequently, non-augmented neural DDEs with a small memory capacity also lack the universal approximation property. In contrast, if the memory capacity Kτ is sufficiently large, we can establish the universal approximation property of neural DDEs for continuous functions. If the neural DDE architecture is augmented, we can expand the parameter regions in which universal approximation is possible. Overall, our results show that by increasing the memory capacity Kτ, the infinite-dimensional phase space of DDEs with positive delay τ>0 is not sufficient to guarantee a direct jump transition to universal approximation, but only after a certain memory threshold, universal approximation holds.

MCML Authors

Christian Kühn

Prof. Dr.

A2 | Mathematical Foundations
→ Group Christian Kühn

Multiscale and Stochastic Dynamics

Sara-Viola Kuntz

Multiscale and Stochastic Dynamics

[110]

S. Maskey, R. Paolino, F. Jogl, G. Kutyniok and J. Lutzeyer.
Graph Representational Learning: When Does More Expressivity Hurt Generalization?
Preprint (May. 2025). arXiv

Abstract

Graph Neural Networks (GNNs) are powerful tools for learning on structured data, yet the relationship between their expressivity and predictive performance remains unclear. We introduce a family of premetrics that capture different degrees of structural similarity between graphs and relate these similarities to generalization, and consequently, the performance of expressive GNNs. By considering a setting where graph labels are correlated with structural features, we derive generalization bounds that depend on the distance between training and test graphs, model complexity, and training set size. These bounds reveal that more expressive GNNs may generalize worse unless their increased complexity is balanced by a sufficiently large training set or reduced distance between training and test graphs. Our findings relate expressivity and generalization, offering theoretical insights supported by empirical results.

MCML Authors

Sohir Maskey

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Raffaele Paolino

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[109]

P. Scholl, A. Dietrich, S. Wolf, J. Lee, A.-A. Schäffer, G. Kutyniok and M. Iskandar.
Interpretable Robotic Friction Learning via Symbolic Regression.
Preprint (May. 2025). arXiv

Abstract

Accurately modeling the friction torque in robotic joints has long been challenging due to the request for a robust mathematical description. Traditional model-based approaches are often labor-intensive, requiring extensive experiments and expert knowledge, and they are difficult to adapt to new scenarios and dependencies. On the other hand, data-driven methods based on neural networks are easier to implement but often lack robustness, interpretability, and trustworthiness–key considerations for robotic hardware and safety-critical applications such as human-robot interaction. To address the limitations of both approaches, we propose the use of symbolic regression (SR) to estimate the friction torque. SR generates interpretable symbolic formulas similar to those produced by model-based methods while being flexible to accommodate various dynamic effects and dependencies. In this work, we apply SR algorithms to approximate the friction torque using collected data from a KUKA LWR-IV+ robot. Our results show that SR not only yields formulas with comparable complexity to model-based approaches but also achieves higher accuracy. Moreover, SR-derived formulas can be seamlessly extended to include load dependencies and other dynamic factors.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[108]

C. Bülte, S. Maskey, P. Scholl, J. von Berg and G. Kutyniok.
Graph Neural Networks for Enhancing Ensemble Forecasts of Extreme Rainfall.
Climate Change AI @ICLR 2025 - Workshop on Tackling Climate Change with Machine Learning at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. URL

Abstract

Climate change is increasing the occurrence of extreme precipitation events, threatening infrastructure, agriculture, and public safety. Ensemble prediction systems provide probabilistic forecasts but exhibit biases and difficulties in capturing extreme weather. While post-processing techniques aim to enhance forecast accuracy, they rarely focus on precipitation, which exhibits complex spatial dependencies and tail behavior. Our novel framework leverages graph neural networks to post-process ensemble forecasts, specifically modeling the extremes of the underlying distribution. This allows to capture spatial dependencies and improves forecast accuracy for extreme events, thus leading to more reliable forecasts and mitigating risks of extreme precipitation and flooding.

MCML Authors

Christopher Bülte

Mathematical Foundations of Artificial Intelligence

Sohir Maskey

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Philipp Scholl

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Jonas von Berg

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[107]

L. Lux, A. H. Berger, A. Weers, N. Stucki, D. Rückert, U. Bauer and J. C. Paetzold.
Topograph: An efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Topological correctness plays a critical role in many image segmentation tasks, yet most networks are trained using pixel-wise loss functions, such as Dice, neglecting topological accuracy. Existing topology-aware methods often lack robust topological guarantees, are limited to specific use cases, or impose high computational costs. In this work, we propose a novel, graph-based framework for topologically accurate image segmentation that is both computationally efficient and generally applicable. Our method constructs a component graph that fully encodes the topological information of both the prediction and ground truth, allowing us to efficiently identify topologically critical regions and aggregate a loss based on local neighborhood information. Furthermore, we introduce a strict topological metric capturing the homotopy equivalence between the union and intersection of prediction-label pairs. We formally prove the topological guarantees of our approach and empirically validate its effectiveness on binary and multi-class datasets. Our loss demonstrates state-of-the-art performance with up to fivefold faster loss computation compared to persistent homology methods.

MCML Authors

Laurin Lux

C1 | Medicine
→ Group Martin Menten

Artificial Intelligence in Healthcare and Medicine

Alexander Weers

Artificial Intelligence in Healthcare and Medicine

Nico Stucki

Applied Topology and Geometry

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Ulrich Bauer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Applied Topology and Geometry

[106]

P. Scholl, K. Bieker, H. Hauger and G. Kutyniok.
ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL GitHub

Abstract

The problem of symbolic regression (SR) arises in many different applications, such as identifying physical laws or deriving mathematical equations describing the behavior of financial markets from given data. Various methods exist to address the problem of SR, often based on genetic programming. However, these methods are usually complicated and involve various hyperparameters. In this paper, we present our new approach ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. In combination with a global optimizer, this approach results in a highly effective method to tackle the problem of SR. We theoretically analyze the expressivity of ParFam and demonstrate its performance with extensive numerical experiments based on the common SR benchmark suit SRBench, showing that we achieve state-of-the-art results. Moreover, we present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam, accelerating the optimization process by up to two magnitudes.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[105]

H. Hauger, P. Scholl and G. Kutyniok.
Robust identifiability for symbolic recovery of differential equations.
ICASSP 2025 - IEEE International Conference on Acoustics, Speech and Signal Processing. Hyderabad, India, Apr 06-11, 2025. DOI

Abstract

Recent advancements in machine learning have transformed the discovery of physical laws, moving from manual derivation to data-driven methods that simultaneously learn both the structure and parameters of governing equations. This shift introduces new challenges regarding the validity of the discovered equations, particularly concerning their uniqueness and, hence, identifiability. While the issue of non-uniqueness has been well-studied in the context of parameter estimation, it remains underexplored for algorithms that recover both structure and parameters simultaneously. Early studies have primarily focused on idealized scenarios with perfect, noise-free data. In contrast, this paper investigates how noise influences the uniqueness and identifiability of physical laws governed by partial differential equations (PDEs). We develop a comprehensive mathematical framework to analyze the uniqueness of PDEs in the presence of noise and introduce new algorithms that account for noise, providing thresholds to assess uniqueness and identifying situations where excessive noise hinders reliable conclusions. Numerical experiments demonstrate the effectiveness of these algorithms in detecting uniqueness despite the presence of noise.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[104]

J. Kostin, F. Krahmer and D. Stöger.
How robust is randomized blind deconvolution via nuclear norm minimization against adversarial noise?
Applied and Computational Harmonic Analysis 76.101746 (Apr. 2025). DOI

Abstract

In this paper, we study the problem of recovering two unknown signals from their convolution, which is commonly referred to as blind deconvolution. Reformulation of blind deconvolution as a low-rank recovery problem has led to multiple theoretical recovery guarantees in the past decade due to the success of the nuclear norm minimization heuristic. In particular, in the absence of noise, exact recovery has been established for sufficiently incoherent signals contained in lower-dimensional subspaces. However, if the convolution is corrupted by additive bounded noise, the stability of the recovery problem remains much less understood. In particular, existing reconstruction bounds involve large dimension factors and therefore fail to explain the empirical evidence for dimension-independent robustness of nuclear norm minimization. Recently, theoretical evidence has emerged for ill-posed behavior of low-rank matrix recovery for sufficiently small noise levels. In this work, we develop improved recovery guarantees for blind deconvolution with adversarial noise which exhibit square-root scaling in the noise level. Hence, our results are consistent with existing counterexamples which speak against linear scaling in the noise level as demonstrated for related low-rank matrix recovery problems.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

[103]

C. Cipriani, M. Fornasier and A. Scagliotti.
From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks.
European Journal of Applied Mathematics 36.Special Issue 2: From integro-differential models to data-oriented approaches for emergent phenomena (Apr. 2025). DOI

Abstract

The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks, which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modelling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularisation, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularisation may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.

MCML Authors

Cristina Cipriani

Dr.

* Former Member

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Alessandro Scagliotti

Applied Numerical Analysis

[102]

M. Fornasier, P. Richtárik, K. Riedl and L. Sun.
Consensus-Based Optimization with Truncated Noise.
European Journal of Applied Mathematics 36.Special Issue 2: From integro-differential models to data-oriented approaches for emergent phenomena (Apr. 2025).

Abstract

Consensus-based optimisation (CBO) is a versatile multi-particle metaheuristic optimisation method suitable for performing non-convex and non-smooth global optimisations in high dimensions. It has proven effective in various applications while at the same time being amenable to a theoretical convergence analysis. In this paper, we explore a variant of CBO, which incorporates truncated noise in order to enhance the well-behavedness of the statistics of the law of the dynamics. By introducing this additional truncation in the noise term of the CBO dynamics, we achieve that, in contrast to the original version, higher moments of the law of the particle system can be effectively bounded. As a result, our proposed variant exhibits enhanced convergence performance, allowing in particular for wider flexibility in choosing the noise parameter of the method as we confirm experimentally. By analysing the time evolution of the Wasserstein- 2 distance between the empirical measure of the interacting particle system and the global minimiser of the objective function, we rigorously prove convergence in expectation of the proposed CBO variant requiring only minimal assumptions on the objective function and on the initialisation. Numerical evidences demonstrate the benefit of truncating the noise in CBO.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Konstantin Riedl

Dr.

* Former Member

Lukang Sun

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[101]

G. Kutyniok.
How Can Reliability of Artificial Intelligence Be Ensured?
Harvard Data Science Review 7.2 (Apr. 2025). DOI

Abstract

Column Editor’s Note: Artificial intelligence (AI) is having a profound impact across many areas of science and society. However, there remain important gaps in our understanding of the deep neural networks that underpin these developments, and in many cases AI models lack robustness and reliability. In this Diving into Data column, Professor Kutyniok explores these issues from a mathematical perspective, highlighting open theoretical questions that will need to be resolved in order to develop AIs that are truly reliable, generalizable, and trustworthy.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[100]

C. Bülte, Y. Sale, T. Löhr, P. Hofman, G. Kutyniok and E. Hüllermeier.
An Axiomatic Assessment of Entropy- and Variance-based Uncertainty Quantification in Regression.
Preprint (Apr. 2025). arXiv

Abstract

Uncertainty quantification (UQ) is crucial in machine learning, yet most (axiomatic) studies of uncertainty measures focus on classification, leaving a gap in regression settings with limited formal justification and evaluations. In this work, we introduce a set of axioms to rigorously assess measures of aleatoric, epistemic, and total uncertainty in supervised regression. By utilizing a predictive exponential family, we can generalize commonly used approaches for uncertainty representation and corresponding uncertainty measures. More specifically, we analyze the widely used entropy- and variance-based measures regarding limitations and challenges. Our findings provide a principled foundation for UQ in regression, offering theoretical insights and practical guidelines for reliable uncertainty assessment.

MCML Authors

Christopher Bülte

Mathematical Foundations of Artificial Intelligence

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Artificial Intelligence and Machine Learning

[99]

F. Weindel and R. Heckel.
LLM-Guided Search for Deletion-Correcting Codes.
Preprint (Apr. 2025). arXiv

Abstract

Finding deletion-correcting codes of maximum size has been an open problem for over 70 years, even for a single deletion. In this paper, we propose a novel approach for constructing deletion-correcting codes. A code is a set of sequences satisfying certain constraints, and we construct it by greedily adding the highest-priority sequence according to a priority function. To find good priority functions, we leverage FunSearch, a large language model (LLM)-guided evolutionary search proposed by Romera et al., 2024. FunSearch iteratively generates, evaluates, and refines priority functions to construct large deletion-correcting codes. For a single deletion, our evolutionary search finds functions that construct codes which match known maximum sizes, reach the size of the largest (conjectured optimal) Varshamov-Tenengolts codes where the maximum is unknown, and independently rediscover them in equivalent form. For two deletions, we find functions that construct codes with new best-known sizes for code lengths ( n = 12, 13 ), and ( 16 ), establishing improved lower bounds. These results demonstrate the potential of LLM-guided search for information theory and code design and represent the first application of such methods for constructing error-correcting codes.

MCML Authors

Franziska Weindel

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

Machine Learning and Information Processing

[98]

F. Krahmer and A. Veselovska.
The mathematics of dots and pixels: On the theoretical foundations of image halftoning.
GAMM Mitteilungen 48.1 (Mar. 2025). DOI

Abstract

The evolution of image halftoning, from its analog roots to contemporary digital methodologies, encapsulates a fascinating journey marked by technological advancements and creative innovations. Yet the theoretical understanding of halftoning is much more recent. In this article, we explore various approaches towards shedding light on the design of halftoning approaches and why they work. We discuss both halftoning in a continuous domain and on a pixel grid. We start by reviewing the mathematical foundation of the so-called electrostatic halftoning method, which departed from the heuristic of considering the back dots of the halftoned image as charged particles attracted by the grey values of the image in combination with mutual repulsion. Such an attraction-repulsion model can be mathematically represented via an energy functional in a reproducing kernel Hilbert space allowing for a rigorous analysis of the resulting optimization problem as well as a convergence analysis in a suitable topology. A second class of methods that we discuss in detail is the class of error diffusion schemes, arguably among the most popular halftoning techniques due to their ability to work directly on a pixel grid and their ease of application. The main idea of these schemes is to choose the locations of the black pixels via a recurrence relation designed to agree with the image in terms of the local averages. We discuss some recent mathematical understanding of these methods that is based on a connection to Σ∆ quantizers, a popular class of algorithms for analog-to-digital conversion.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Anna Veselovska

Dr.

Applied Numerical Analysis

[97]

C. Bülte, P. Scholl and G. Kutyniok.
Probabilistic neural operators for functional uncertainty quantification.
Transactions on Machine Learning Research (Mar. 2025). URL

Abstract

Neural operators aim to approximate the solution operator of a system of differential equations purely from data. They have shown immense success in modeling complex dynamical systems across various domains. However, the occurrence of uncertainties inherent in both model and data has so far rarely been taken into accounttextemdash{}a critical limitation in complex, chaotic systems such as weather forecasting. In this paper, we introduce the probabilistic neural operator (PNO), a framework for learning probability distributions over the output function space of neural operators. PNO extends neural operators with generative modeling based on strictly proper scoring rules, integrating uncertainty information directly into the training process. We provide a theoretical justification for the approach and demonstrate improved performance in quantifying uncertainty across different domains and with respect to different baselines. Furthermore, PNO requires minimal adjustment to existing architectures, shows improved performance for most probabilistic prediction tasks, and leads to well-calibrated predictive distributions and adequate uncertainty representations even for long dynamical trajectories. Implementing our approach into large-scale models for physical applications can lead to improvements in corresponding uncertainty quantification and extreme event identification, ultimately leading to a deeper understanding of the prediction of such surrogate models.

MCML Authors

Christopher Bülte

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Philipp Scholl

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[96]

A. Fono, M. Singh, E. Araya, P. C. Petersen, H. Boche and G. Kutyniok.
Sustainable AI: Mathematical Foundations of Spiking Neural Networks.
Preprint (Mar. 2025). arXiv

Abstract

Deep learning’s success comes with growing energy demands, raising concerns about the long-term sustainability of the field. Spiking neural networks, inspired by biological neurons, offer a promising alternative with potential computational and energy-efficiency gains. This article examines the computational properties of spiking networks through the lens of learning theory, focusing on expressivity, training, and generalization, as well as energy-efficient implementations while comparing them to artificial neural networks. By categorizing spiking models based on time representation and information encoding, we highlight their strengths, challenges, and potential as an alternative computational paradigm.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Manjot Singh

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[95]

A. Scagliotti, F. Scagliotti, L. Locati and F. Sottotetti.
Ensemble optimal control for managing drug resistance in cancer therapies.
Preprint (Mar. 2025). arXiv

Abstract

In this paper, we explore the application of ensemble optimal control to derive enhanced strategies for pharmacological cancer treatment. In particular, we focus on moving beyond the classical clinical approach of giving the patient the maximal tolerated drug dose (MTD), which does not properly exploit the fight among sensitive and resistant cells for the available resources. Here, we employ a Lotka-Volterra model to describe the two competing subpopulations, and we enclose this system within the ensemble control framework. In the first part, we establish general results suitable for application to various solid cancers. Then, we carry out numerical simulations in the setting of prostate cancer treated with androgen deprivation therapy, yielding a computed policy that is reminiscent of the medical ‘active surveillance’ paradigm. Finally, inspired by the numerical evidence, we propose a variant of the celebrated adaptive therapy (AT), which we call ‘Off-On’ AT.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[94]

M. Fornasier and L. Sun.
A PDE Framework of Consensus-Based Optimization for Objectives with Multiple Global Minimizers.
Communications in Partial Differential Equations 50.4 (Feb. 2025). DOI

Abstract

Introduced in 2017, Consensus-Based Optimization (CBO) has rapidly emerged as a significant breakthrough in global optimization. This straightforward yet powerful multi-particle, zero-order optimization method draws inspiration from Simulated Annealing and Particle Swarm Optimization. Using a quantitative mean-field approximation, CBO dynamics can be described by a nonlinear Fokker-Planck equation with degenerate diffusion, which does not follow a gradient flow structure. In this paper, we demonstrate that solutions to the CBO equation remain positive and maintain full support. Building on this foundation, we establish the { unconditional} global convergence of CBO methods to global minimizers. Our results are derived through an analysis of solution regularity and the proof of existence for smooth, classical solutions to a broader class of drift-diffusion equations, despite the challenges posed by degenerate diffusion.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Lukang Sun

Applied Numerical Analysis

[93]

S. Dirksen, W. Li and J. Maly.
Subspace and DOA estimation under coarse quantization.
Preprint (Feb. 2025). arXiv

Abstract

We study direction-of-arrival (DOA) estimation from coarsely quantized data. We focus on a two-step approach which first estimates the signal subspace via covariance estimation and then extracts DOA angles by the ESPRIT algorithm. In particular, we analyze two stochastic quantization schemes which use dithering: a one-bit quantizer combined with rectangular dither and a multi-bit quantizer with triangular dither. For each quantizer, we derive rigorous high probability bounds for the distances between the true and estimated signal subspaces and DOA angles. Using our analysis, we identify scenarios in which subspace and DOA estimation via triangular dithering qualitatively outperforms rectangular dithering. We verify in numerical simulations that our estimates are optimal in their dependence on the smallest non-zero eigenvalue of the target matrix. The resulting subspace estimation guarantees are equally applicable in the analysis of other spectral estimation algorithms and related problems.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[92]

M. Fornasier and L. Sun.
Regularity and positivity of solutions of the Consensus-Based Optimization equation: unconditional global convergence.
Preprint (Feb. 2025). arXiv

Abstract

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Lukang Sun

Applied Numerical Analysis

[91]

H. Laus, S. Parkinson, V. Charisopoulos, F. Krahmer and R. Willett.
Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay.
Preprint (Feb. 2025). arXiv

Abstract

Machine learning methods are commonly used to solve inverse problems, wherein an unknown signal must be estimated from few measurements generated via a known acquisition procedure. In particular, neural networks perform well empirically but have limited theoretical guarantees. In this work, we study an underdetermined linear inverse problem that admits several possible solution mappings. A standard remedy (e.g., in compressed sensing) establishing uniqueness of the solution mapping is to assume knowledge of latent low-dimensional structure in the source signal. We ask the following question: do deep neural networks adapt to this low-dimensional structure when trained by gradient descent with weight decay regularization? We prove that mildly overparameterized deep linear networks trained in this manner converge to an approximate solution that accurately solves the inverse problem while implicitly encoding latent subspace structure. To our knowledge, this is the first result to rigorously show that deep linear networks trained with weight decay automatically adapt to latent subspace structure in the data under practical stepsize and weight initialization schemes. Our work highlights that regularization and overparameterization improve generalization, while overparameterization also accelerates convergence during training.

MCML Authors

Hannah Laus

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[90]

M. Fornasier, J. Klemenc and A. Scagliotti.
Trade-off Invariance Principle for minimizers of regularized functionals.
Math4AiMl 2025 - 3rd Workshop of UMI Group Mathematics for Artificial Intelligence and Machine Learning. Bari, Italy, Jan 29-31, 2025. arXiv PDF

Abstract

In this paper, we consider functionals of the form Hα(u)=F(u)+αG(u) with α∈[0,+∞), where u varies in a set U≠∅ (without further structure). We first show that, excluding at most countably many values of α, we have that infH⋆αG=supH⋆αG, where H⋆α:=argminUHα, which is assumed to be non-empty. We further prove a stronger result that concerns the {invariance of the} limiting value of the functional G along minimizing sequences for Hα. This fact in turn implies an unexpected consequence for functionals regularized with uniformly convex norms: excluding again at most countably many values of α, it turns out that for a minimizing sequence, convergence to a minimizer in the weak or strong sense is equivalent.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Jona Klemenc

Applied Numerical Analysis

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[89]

A. Scagliotti.
Minimax Problems for Ensembles of Control-Affine Systems.
SIAM Journal on Control and Optimization 63.1 (Jan. 2025). DOI

Abstract

In this paper, we consider ensembles of control-affine systems in ℝd, and we study simultaneous optimal control problems related to the worst-case minimization. After proving that such problems admit solutions, denoting with (ΘN)N a sequence of compact sets that parametrize the ensembles of systems, we first show that the corresponding minimax optimal control problems are Γ-convergent whenever (ΘN)N has a limit with respect to the Hausdorff distance. Besides its independent interest, the previous result plays a crucial role for establishing the Pontryagin Maximum Principle (PMP) when the ensemble is parametrized by a set Θ consisting of infinitely many points. Namely, we first approximate Θ by finite and increasing-in-size sets (ΘN)N for which the PMP is known, and then we derive the PMP for the Γ-limiting problem. The same strategy can be pursued in applications, where we can reduce infinite ensembles to finite ones to compute the minimizers numerically. We bring as a numerical example the Schrödinger equation for a qubit with uncertain resonance frequency.

MCML Authors

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[88]

K. Bieker, H. T. Kussaba, P. Scholl, J. Jung, A. Swikir, S. Haddadin and G. Kutyniok.
Compositional Construction of Barrier Functions for Switched Impulsive Systems.
CDC 2024 - 63rd IEEE Conference on Decision and Control. Milan, Italy, Dec 16-19, 2024. DOI

Abstract

Many systems occurring in real-world applications, such as controlling the motions of robots or modeling the spread of diseases, are switched impulsive systems. To ensure that the system state stays in a safe region (e.g., to avoid collisions with obstacles), barrier functions are widely utilized. As the system dimension increases, deriving suitable barrier functions becomes extremely complex. Fortunately, many systems consist of multiple subsystems, such as different areas where the disease occurs. In this work, we present sufficient conditions for interconnected switched impulsive systems to maintain safety by constructing local barrier functions for the individual subsystems instead of a global one, allowing for much easier and more efficient derivation. To validate our results, we numerically demonstrate its effectiveness using an epidemiological model.

MCML Authors

Philipp Scholl

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[87]

C. Bülte, P. Scholl and G. Kutyniok.
Probabilistic predictions with Fourier neural operators.
BDU @NeurIPS 2024 - Workshop Bayesian Decision-making and Uncertainty: from probabilistic and spatiotemporal modeling to sequential experiment design at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Neural networks have been successfully applied in modeling partial differential equations, especially in dynamical systems. Commonly used models, such as neural operators, are performing well at deterministic prediction tasks, but lack a quantification of the uncertainty inherent in many complex systems, for example weather forecasting. In this paper, we explore a new approach that combines Fourier neural operators with generative modeling based on strictly proper scoring rules in order to create well-calibrated probabilistic predictions of dynamical systems. We demonstrate improved predictive uncertainty for our approach, especially in settings with very high inherent uncertainty.

MCML Authors

Christopher Bülte

Mathematical Foundations of Artificial Intelligence

Philipp Scholl

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[86]

A. Bonfanti, G. Bruno and C. Cipriani.
The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief. Another significant difference from the linear regime is that, even in the idealistic infinite-width limit, the Hessian does not vanish and hence it cannot be disregarded during training. This motivates the adoption of second-order optimization methods. We explore the convergence guarantees of such methods in both linear and nonlinear cases, addressing challenges such as spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we highlight the benefits of second-order methods in benchmark test cases.

MCML Authors

Cristina Cipriani

Dr.

* Former Member

[85]

F. Hoppe, C. M. Verdun, H. Laus, F. Krahmer and H. Rauhut.
Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.

MCML Authors

Hannah Laus

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Holger Rauhut

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Data Science and Artificial Intelligence

[84]

R. Paolino, S. Maskey, P. Welke and G. Kutyniok.
Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

We introduce r-loopy Weisfeiler-Leman (r-ℓWL), a novel hierarchy of graph isomorphism tests and a corresponding GNN framework, r-ℓMPNN, that can count cycles up to length r+2. Most notably, we show that r-ℓWL can count homomorphisms of cactus graphs. This strictly extends classical 1-WL, which can only count homomorphisms of trees and, in fact, is incomparable to k-WL for any fixed k. We empirically validate the expressive and counting power of the proposed r-ℓMPNN on several synthetic datasets and present state-of-the-art predictive performance on various real-world datasets.

MCML Authors

Raffaele Paolino

Mathematical Foundations of Artificial Intelligence

Sohir Maskey

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[83]

Y. N. Böck, H. Boche, F. H. P. Fitzek and G. Kutyniok.
Computing-Model and Computing-Hardware Selection for ICT Under Societal and Judicial Constraints.
IEEE Access 12 (Dec. 2024). DOI

Abstract

This article discusses a formalization of aspects of Cyber-Sovereignty (CyS) for information and communication technology (ICT), linking them to technological trustworthiness and deriving an associated paradigm for hard- and software design. The upcoming 6G ICT standard is considered a keystone within modern society’s increasing interconnectedness and automatization, as it provides the necessary technological infrastructure for applications such as the Metaverse or large-scale digital twinning. Since emerging technological systems increasingly affect sensitive human goods, hard- and software manufacturers must consider a new dimension of societal and judicial constraints in the context of technological trustworthiness. This article aims to establish a formalized theory of specific aspects of CyS, providing a paradigm for hard- and software engineering in ICT. This paradigm is directly applicable in formal technology assessment and ensures that the relevant facets of CyS – specifically, the principle of Algorithmic Transparency (AgT) – are satisfied. The framework follows an axiomatic approach. Particularly, the formal basis of our theory consists of four fundamental assumptions about the general nature of physical problems and algorithmic implementations. This formal basis allows for drawing general conclusions on the relation between CyS and technological trustworthiness and entails a formal meta-thesis on AgT in digital computing.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Mathematical Foundations of Artificial Intelligence

[82]

Y. Mansour and R. Heckel.
Measuring Bias of Web-filtered Text Datasets and Bias Propagation Through Training.
Preprint (Dec. 2024). arXiv

Abstract

We investigate biases in pretraining datasets for large language models (LLMs) through dataset classification experiments. Building on prior work demonstrating the existence of biases in popular computer vision datasets, we analyze popular open-source pretraining datasets for LLMs derived from CommonCrawl including C4, RefinedWeb, DolmaCC, RedPajama-V2, FineWeb, and DCLM-Baseline. Despite those datasets being obtained with similar filtering and deduplication steps, neural networks can classify surprisingly well which dataset a single text sequence belongs to, significantly better than a human can. This indicates that popular pretraining datasets have their own unique biases or fingerprints. Those biases remain even when the text is rewritten with LLMs. Moreover, these biases propagate through training: Random sequences generated by models trained on those datasets can be classified well by a classifier trained on the original datasets.

MCML Authors

Youssef Mansour

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Machine Learning and Information Processing

[81]

C. Geldhauser and C. Kuehn.
Travelling waves for discrete stochastic bistable equations.
Partial Differential Equations and Applications 5.35 (Nov. 2024). DOI

Abstract

Many physical, chemical and biological systems have an inherent discrete spatial structure that strongly influences their dynamical behaviour. Similar remarks apply to internal or external noise. In this paper we study the combined effect of spatial discretization and stochastic perturbations on travelling waves in the Nagumo equation, which is a prototypical model for bistable reaction-diffusion partial differential equations (PDEs). We prove that under suitable parameter conditions, various discrete-stochastic variants of the Nagumo equation have solutions, which stay close on long time scales to the classical monotone Nagumo front with high probability if the noise covariance and spatial discretization are sufficiently small.

MCML Authors

Carina Geldhauser

Dr.

* Former Member

[80]

K. Jin, J. Latz, C. Liu and A. Scagliotti.
Losing momentum in continuous-time stochastic optimisation.
Preprint (Nov. 2024). arXiv

Abstract

The training of modern machine learning models often consists in solving high-dimensional non-convex optimisation problems that are subject to large-scale data. In this context, momentum-based stochastic optimisation algorithms have become particularly widespread. The stochasticity arises from data subsampling which reduces computational cost. Both, momentum and stochasticity help the algorithm to converge globally. In this work, we propose and analyse a continuous-time model for stochastic gradient descent with momentum. This model is a piecewise-deterministic Markov process that represents the optimiser by an underdamped dynamical system and the data subsampling through a stochastic switching. We investigate longtime limits, the subsampling-to-no-subsampling limit, and the momentum-to-no-momentum limit. We are particularly interested in the case of reducing the momentum over time. Under convexity assumptions, we show convergence of our dynamical system to the global minimiser when reducing momentum over time and letting the subsampling rate go to infinity. We then propose a stable, symplectic discretisation scheme to construct an algorithm from our continuous-time dynamical system. In experiments, we study our scheme in convex and non-convex test problems. Additionally, we train a convolutional neural network in an image classification problem. Our algorithm {attains} competitive results compared to stochastic gradient descent with momentum.

MCML Authors

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[79]

A. H. Berger, L. Lux, N. Stucki, V. Bürgin, S. Shit, A. Banaszaka, D. Rückert, U. Bauer and J. C. Paetzold.
Topologically faithful multi-class segmentation in medical images.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Topological accuracy in medical image segmentation is a highly important property for downstream applications such as network analysis and flow modeling in vessels or cell counting. Recently, significant methodological advancements have brought well-founded concepts from algebraic topology to binary segmentation. However, these approaches have been underexplored in multi-class segmentation scenarios, where topological errors are common. We propose a general loss function for topologically faithful multi-class segmentation extending the recent Betti matching concept, which is based on induced matchings of persistence barcodes. We project the N-class segmentation problem to N single-class segmentation tasks, which allows us to use 1-parameter persistent homology, making training of neural networks computationally feasible. We validate our method on a comprehensive set of four medical datasets with highly variant topological characteristics. Our loss formulation significantly enhances topological correctness in cardiac, cell, artery-vein, and Circle of Willis segmentation.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Nico Stucki

Applied Topology and Geometry

Vincent Bürgin

Foundations of Deep Neural Networks

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Ulrich Bauer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Applied Topology and Geometry

[78]

P. Scholl, M. Iskandar, S. Wolf, J. Lee, A. Bacho, A. Dietrich, A. Albu-Schäffer and G. Kutyniok.
Learning-based adaption of robotic friction models.
Robotics and Computer-Integrated Manufacturing 89 (Oct. 2024). DOI

Abstract

In the Fourth Industrial Revolution, wherein artificial intelligence and the automation of machines occupy a central role, the deployment of robots is indispensable. However, the manufacturing process using robots, especially in collaboration with humans, is highly intricate. In particular, modeling the friction torque in robotic joints is a longstanding problem due to the lack of a good mathematical description. This motivates the usage of data-driven methods in recent works. However, model-based and data-driven models often exhibit limitations in their ability to generalize beyond the specific dynamics they were trained on, as we demonstrate in this paper. To address this challenge, we introduce a novel approach based on residual learning, which aims to adapt an existing friction model to new dynamics using as little data as possible. We validate our approach by training a base neural network on a symmetric friction data set to learn an accurate relation between the velocity and the friction torque. Subsequently, to adapt to more complex asymmetric settings, we train a second network on a small dataset, focusing on predicting the residual of the initial network’s output. By combining the output of both networks in a suitable manner, our proposed estimator outperforms the conventional model-based approach, an extended LuGre model, and the base neural network significantly. Furthermore, we evaluate our method on trajectories involving external loads and still observe a substantial improvement, approximately 60%–70%, over the conventional approach. Our method does not rely on data with external load during training, eliminating the need for external torque sensors. This demonstrates the generalization capability of our approach, even with a small amount of data – less than a minute – enabling adaptation to diverse scenarios based on prior knowledge about friction in different settings.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[77]

O. Åström, C. Geldhauser, M. Grillitsch, O. Hall and A. Sopasakis.
Enhancing Carbon Emission Reduction Strategies using OCO and ICOS data.
Preprint (Oct. 2024). arXiv

Abstract

We propose a methodology to enhance local CO2 monitoring by integrating satellite data from the Orbiting Carbon Observatories (OCO-2 and OCO-3) with ground level observations from the Integrated Carbon Observation System (ICOS) and weather data from the ECMWF Reanalysis v5 (ERA5). Unlike traditional methods that downsample national data, our approach uses multimodal data fusion for high-resolution CO2 estimations. We employ weighted K-nearest neighbor (KNN) interpolation with machine learning models to predict ground level CO2 from satellite measurements, achieving a Root Mean Squared Error of 3.92 ppm. Our results show the effectiveness of integrating diverse data sources in capturing local emission patterns, highlighting the value of high-resolution atmospheric transport models. The developed model improves the granularity of CO2 monitoring, providing precise insights for targeted carbon mitigation strategies, and represents a novel application of neural networks and KNN in environmental monitoring, adaptable to various regions and temporal scales.

MCML Authors

Carina Geldhauser

Dr.

* Former Member

[76]

M. Fornasier, P. Heid and G. Sodini.
Approximation Theory, Computing, and Deep Learning on the Wasserstein Space.
Preprint (Oct. 2024). arXiv

Abstract

The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. We delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional’s Euler-Lagrange equation. We furnish explicit and quantitative bounds on generalization errors for each of these solutions. We leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude. This allows evaluations over large datasets several times faster, including training, than traditional optimal transport algorithms. Our analytically designed deep learning architecture slightly outperforms the test error of state-of-the-art CNN architectures on datasets of images.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Pascal Heid

Dr.

* Former Member

[75]

P. Scholl, A. Bacho, H. Boche and G. Kutyniok.
Symbolic Recovery of Differential Equations: The Identifiability Problem.
Preprint (Oct. 2024). arXiv

Abstract

Symbolic recovery of differential equations is the ambitious attempt at automating the derivation of governing equations with the use of machine learning techniques. In contrast to classical methods which assume the structure of the equation to be known and focus on the estimation of specific parameters, these algorithms aim to learn the structure and the parameters simultaneously. While the uniqueness and, therefore, the identifiability of parameters of governing equations are a well-addressed problem in the field of parameter estimation, it has not been investigated for symbolic recovery. However, this problem should be even more present in this field since the algorithms aim to cover larger spaces of governing equations. In this paper, we investigate under which conditions a solution of a differential equation does not uniquely determine the equation itself. For various classes of differential equations, we provide both necessary and sufficient conditions for a function to uniquely determine the corresponding differential equation. We then use our results to devise numerical algorithms aiming to determine whether a function solves a differential equation uniquely. Finally, we provide extensive numerical experiments showing that our algorithms can indeed guarantee the uniqueness of the learned governing differential equation, without assuming any knowledge about the analytic form of function, thereby ensuring the reliability of the learned equation.

MCML Authors

Philipp Scholl

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Mathematical Foundations of Artificial Intelligence

[74]

F. Hoppe, C. M. Verdun, H. Laus, S. Endt, M. I. Menzel, F. Krahmer and H. Rauhut.
Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Establishing certified uncertainty quantification (UQ) in imaging processing applications continues to pose a significant challenge. In particular, such a goal is crucial for accurate and reliable medical imaging if one aims for precise diagnostics and appropriate intervention. In the case of magnetic resonance imaging, one of the essential tools of modern medicine, enormous advancements in fast image acquisition were possible after the introduction of compressive sensing and, more recently, deep learning methods. Still, as of now, there is no UQ method that is both fully rigorous and scalable. This work takes a step towards closing this gap by proposing a total variation minimization-based method for pixel-wise sharp confidence intervals for undersampled MRI. We demonstrate that our method empirically achieves the predicted confidence levels. We expect that our approach will also have implications for other imaging modalities as well as deep learning applications in computer vision.

MCML Authors

Claudio Mayrink Verdun

Dr.

* Former Member

Hannah Laus

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Holger Rauhut

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Mathematical Data Science and Artificial Intelligence

[73]

Y. Mansour, X. Zhong, S. Caglar and R. Heckel.
TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Neural networks trained end-to-end give state-of-the-art performance for image denoising. However, when applied to an image outside of the training distribution, the performance often degrades significantly. In this work, we propose a test-time training (TTT) method based on masked image modeling (MIM) to improve denoising performance for out-of-distribution images. The method, termed TTT-MIM, consists of a training stage and a test time adaptation stage. At training, we minimize a standard supervised loss and a self-supervised loss aimed at reconstructing masked image patches. At test-time, we minimize a self-supervised loss to fine-tune the network to adapt to a single noisy image. Experiments show that our method can improve performance under natural distribution shifts, in particular it adapts well to real-world camera and microscope noise. A competitor to our method of training and finetuning is to use a zero-shot denoiser that does not rely on training data. However, compared to state-of-the-art zero-shot denoisers, our method shows superior performance, and is much faster, suggesting that training and finetuning on the test instance is a more efficient approach to image denoising than zero-shot methods in setups where little to no data is available.

MCML Authors

Youssef Mansour

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Machine Learning and Information Processing

[72]

K. Riedl.
Mathematical Foundations of Interacting Multi-Particle Systems for Optimization.
Dissertation 2024. URL

Abstract

This dissertation lays mathematical foundations for the numerical analysis of interacting multi-particle systems in the setting of optimization. While such systems are of paramount importance in and beyond applied mathematics, their rigorous analysis largely remained elusive. Given the necessity for capable, reliable, and robust algorithms with informative and solid convergence guarantees, we provide an analytical framework that builds upon insights obtained by taking a mean-field perspective.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

[71]

F. Hoppe, C. M. Verdun, F. Krahmer, M. I. Menzel and H. Rauhut.
With or Without Replacement? Improving Confidence in Fourier Imaging.
CoSeRa 2024 - International Workshop on the Theory of Computational Sensing and its Applications to Radar, Multimodal Sensing and Imaging. Santiago de Compostela, Spain, Sep 18-20, 2024. DOI

Abstract

Over the last few years, debiased estimators have been proposed in order to establish rigorous confidence intervals for high-dimensional problems in machine learning and data science. The core argument is that the error of these estimators with respect to the ground truth can be expressed as a Gaussian variable plus a remainder term that vanishes as long as the dimension of the problem is sufficiently high. Thus, uncertainty quantification (UQ) can be performed exploiting the Gaussian model. Empirically, however, the remainder term cannot be neglected in many realistic situations of moderately-sized dimensions, in particular in certain structured measurement scenarios such as Magnetic Resonance Imaging (MRI). This, in turn, can downgrade the advantage of the UQ methods as compared to non-UQ approaches such as the standard LASSO. In this paper, we present a method to improve the debiased estimator by sampling without replacement. Our approach leverages recent results of ours on the structure of the random nature of certain sampling schemes showing how a transition between sampling with and without replacement can lead to a weighted reconstruction scheme with improved performance for the standard LASSO. In this paper, we illustrate how this reweighted sampling idea can also improve the debiased estimator and, consequently, provide a better method for UQ in Fourier imaging.

MCML Authors

Claudio Mayrink Verdun

Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

* Former Member

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[70]

F. P. Patricio, P. Catala and F. Krahmer.
Noisy Recovery in Unlimited Sampling via Adaptive Modulo Representations.
CoSeRa 2024 - International Workshop on the Theory of Computational Sensing and its Applications to Radar, Multimodal Sensing and Imaging. Santiago de Compostela, Spain, Sep 18-20, 2024. DOI

Abstract

Recent works put forth the Unlimited Sensing Framework (USF), a novel approach to analog-to-digital conversion for high dynamic range sensing. It addresses the saturation phenomenon that commonly arises when physical measurements exceed the dynamic range of a sensor, yielding permanent loss of the input data. However, the USF still has some limitations when dealing with random noise. In the present paper, we propose a novel iterative method to tackle unlimited sensing in a noisy setting. In one step, our approach applies local transformations of the range to remove strong artifacts caused by the noise on local subdivisions of the domain. In the following step, the signal is then approximated via a least squares method. These two types of steps are then alternated. We illustrate the performances of our algorithm in high noise regime.

MCML Authors

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[69]

P. Römer and F. Krahmer.
A one-bit quantization approach for low-dose Poisson phase retrieval.
CoSeRa 2024 - International Workshop on the Theory of Computational Sensing and its Applications to Radar, Multimodal Sensing and Imaging. Santiago de Compostela, Spain, Sep 18-20, 2024. DOI

Abstract

Imaging quality for biological tissue is commonly affected by damages of the specimen caused by illumination particles. To mitigate this issue, often very low doses of illumination have to be used in the experiment. Consequently, the resulting inverse problem is subject to highly noisy data. In this note, we address this issue for the case of diffraction imaging by studying the problem of phase retrieval with low-count Poisson data. Our key idea is to exploit the close connection between the Poisson measurement model and the one-bit quantization problem. We propose a reconstruction method based on algorithmic approaches to that problem and compare the performance of this method with state-of-the-art algorithms for noisy phase retrieval, observing superior performance in a number of relevant examples.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

[68]

C. Geldhauser and K. Malyshev.
Semi-automatic annotation of Greek majuscule manuscripts: Steps towards integrated transcription and annotation.
FedCSIS 2024 - 19th Conference on Computer Science and Intelligence Systems. Belgrade, Serbia, Sep 08-11, 2024. DOI

Abstract

We present a prototype for the integration of HTR transcription and semi-automated markup of textual features in the eScriptorium GUI.

MCML Authors

Carina Geldhauser

Dr.

* Former Member

[67]

Ç. Yapar, R. Levie, G. Kutyniok and G. Caire.
Dataset of Pathloss and ToA Radio Maps With Localization Application.
Preprint (Sep. 2024). arXiv

Abstract

In this article, we present a collection of radio map datasets in dense urban setting, which we generated and made publicly available. The datasets include simulated pathloss/received signal strength (RSS) and time of arrival (ToA) radio maps over a large collection of realistic dense urban setting in real city maps. The two main applications of the presented dataset are 1) learning methods that predict the pathloss from input city maps (namely, deep learning-based simulations), and, 2) wireless localization. The fact that the RSS and ToA maps are computed by the same simulations over the same city maps allows for a fair comparison of the RSS and ToA-based localization methods.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[66]

C. Geldhauser, M. Herrmann and D. Janßen.
Traveling Phase Interfaces in Viscous Forward–Backward Diffusion Equations.
Journal of Dynamics and Differential Equations (Aug. 2024). DOI

Abstract

The viscous regularization of an ill-posed diffusion equation with bistable nonlinearity predicts a hysteretic behavior of dynamical phase transitions but a complete mathematical understanding of the intricate multiscale evolution is still missing. We shed light on the fine structure of propagating phase boundaries by carefully examining traveling wave solutions in a special case. Assuming a trilinear constitutive relation we characterize all waves that possess a monotone profile and connect the two phases by a single interface of positive width. We further study the two sharp-interface regimes related to either vanishing viscosity or the bilinear limit.

MCML Authors

Carina Geldhauser

Dr.

* Former Member

[65]

H. Boche, A. Fono and G. Kutyniok.
A Mathematical Framework for Computability Aspects of Algorithmic Transparency.
ISIT 2024 - IEEE International Symposium on Information Theory. Athens, Greece, Jul 07-12, 2024. DOI

Abstract

The lack of trustworthiness is a major downside of deep learning. To mitigate the associated risks clear obligations of deep learning models have been proposed via regulatory guidelines. Therefore, a crucial question is to what extent trustworthy deep learning can be realized. Establishing trust-worthiness requires that the factors influencing an algorithmic computation can be retraced, i.e., the algorithmic implementation is transparent. Motivated by the observation that the current evolution of deep learning models necessitates a change in computing technology, we derive a mathematical framework that enables us to analyze whether a transparent implementation in a given computing model is feasible. We exemplarily apply our trustworthiness framework to analyze deep learning approaches for inverse problems in digital and analog computing models represented by Turing and Blum-Shub-Smale Machines, respectively. Based on previous results, we find that Blum-Shub-Smale Machines have the potential to establish trustworthy solvers for inverse problems under fairly general conditions, whereas, Turing machines cannot guarantee trustworthiness to the same degree. For a longer version of this paper with more details and proofs, we refer to [1].

MCML Authors

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Holger Rauhut

Mathematical Foundations of Artificial Intelligence

[64]

G. M. Nguegnang, H. Rauhut and U. Terstiege.
Convergence of gradient descent for learning linear neural networks.
Advances in Continuous and Discrete Models 2024.23 (Jul. 2024). DOI

Abstract

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the stepsizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers, we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

MCML Authors

Gabin Maxime Nguegnang

Mathematical Data Science and Artificial Intelligence

Holger Rauhut

Prof. Dr.

A2 | Mathematical Foundations
→ Group Holger Rauhut

Mathematical Data Science and Artificial Intelligence

Ulrich Terstiege

Dr.

Mathematical Data Science and Artificial Intelligence

[63]

M. Fornasier, T. Klock and K. Riedl.
Consensus-Based Optimization Methods Converge Globally.
SIAM Journal on Optimization 34.3 (Jul. 2024). DOI

Abstract

In this paper we study consensus-based optimization (CBO), which is a multiagent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. Based on an experimentally supported intuition that, on average, CBO performs a gradient descent of the squared Euclidean distance to the global minimizer, we devise a novel technique for proving the convergence to the global minimizer in mean-field law for a rich class of objective functions. The result unveils internal mechanisms of CBO that are responsible for the success of the method. In particular, we prove that CBO performs a convexification of a large class of optimization problems as the number of optimizing agents goes to infinity. Furthermore, we improve prior analyses by requiring mild assumptions about the initialization of the method and by covering objectives that are merely locally Lipschitz continuous. As a core component of this analysis, we establish a quantitative nonasymptotic Laplace principle, which may be of independent interest. From the result of CBO convergence in mean-field law, it becomes apparent that the hardness of any global optimization problem is necessarily encoded in the rate of the mean-field approximation, for which we provide a novel probabilistic quantitative estimate. The combination of these results allows us to obtain probabilistic global convergence guarantees of the numerical CBO method.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

Konstantin Riedl

Dr.

* Former Member

[62]

J. Beddrich, E. Chenchene, M. Fornasier, H. Huang and B. Wohlmuth.
Constrained Consensus-Based Optimization and Numerical Heuristics for the Few Particle Regime.
Preprint (Jul. 2024). arXiv

Abstract

Consensus-based optimization (CBO) is a versatile multi-particle optimization method for performing nonconvex and nonsmooth global optimizations in high dimensions. Proofs of global convergence in probability have been achieved for a broad class of objective functions in unconstrained optimizations. In this work we adapt the algorithm for solving constrained optimizations on compact and unbounded domains with boundary by leveraging emerging reflective boundary conditions. In particular, we close a relevant gap in the literature by providing a global convergence proof for the many-particle regime comprehensive of convergence rates. On the one hand, for the sake of minimizing running cost, it is desirable to keep the number of particles small. On the other hand, reducing the number of particles implies a diminished capability of exploration of the algorithm. Hence numerical heuristics are needed to ensure convergence of CBO in the few-particle regime. In this work, we also significantly improve the convergence and complexity of CBO by utilizing an adaptive region control mechanism and by choosing geometry-specific random noise. In particular, by combining a hierarchical noise structure with a multigrid finite element method, we are able to compute global minimizers for a constrained p-Allen-Cahn problem with obstacles, a very challenging variational problem.

MCML Authors

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Applied Numerical Analysis

[61]

C. M. Verdun, O. Melnyk, F. Krahmer and P. Jung.
Fast, blind, and accurate: Tuning-free sparse regression with global linear convergence.
COLT 2024 - 37th Annual Conference on Learning Theory. Edmonton, Canada, Jun 30-Jul 03, 2024. URL

Abstract

Many algorithms for high-dimensional regression problems require the calibration of regularization hyperparameters. This, in turn, often requires the knowledge of the unknown noise variance in order to produce meaningful solutions. Recent works show, however, that there exist certain estimators that are pivotal, i.e., the regularization parameter does not depend on the noise level; the most remarkable example being the square-root lasso. Such estimators have also been shown to exhibit strong connections to distributionally robust optimization. Despite the progress in the design of pivotal estimators, the resulting minimization problem is challenging as both the loss function and the regularization term are non-smooth. To date, the design of fast, robust, and scalable algorithms with strong convergence rate guarantees is still an open problem. This work addresses this problem by showing that an iteratively reweighted least squares (IRLS) algorithm exhibits global linear convergence under the weakest assumption available in the literature. We expect our findings will also have implications for multi-task learning and distributionally robust optimization.

MCML Authors

Claudio Mayrink Verdun

Dr.

* Former Member

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

[60]

C. Cipriani, A. Scagliotti and T. Wöhrer.
A Minimax Optimal Control Approach for Robust Neural ODEs.
ECC 2024 - European Control Conference. Stockholm, Sweden, Jun 25-28, 2024. DOI

Abstract

In this paper, we address the adversarial training of neural ODEs from a robust control perspective. This is an alternative to the classical training via empirical risk minimization, and it is widely used to enforce reliable outcomes for input perturbations. Neural ODEs allow the interpretation of deep neural networks as discretizations of control systems, unlocking powerful tools from control theory for the development and the understanding of machine learning. In this specific case, we formulate the adversarial training with perturbed data as a minimax optimal control problem, for which we derive first order optimality conditions in the form of Pontryagin’s Maximum Principle. We provide a novel interpretation of robust training leading to an alternative weighted technique, which we test on a low-dimensional classification task.

MCML Authors

Cristina Cipriani

Dr.

* Former Member

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[59]

A. Scagliotti and P. Colli Franzone.
A subgradient method with constant step-size for l1-composite optimization.
Bollettino dell’Unione Matematica Italiana 17 (Jun. 2024). DOI

Abstract

Subgradient methods are the natural extension to the non-smooth case of the classical gradient descent for regular convex optimization problems. However, in general, they are characterized by slow convergence rates, and they require decreasing step-sizes to converge. In this paper we propose a subgradient method with constant step-size for composite convex objectives with -regularization. If the smooth term is strongly convex, we can establish a linear convergence result for the function values. This fact relies on an accurate choice of the element of the subdifferential used for the update, and on proper actions adopted when non-differentiability regions are crossed. Then, we propose an accelerated version of the algorithm, based on conservative inertial dynamics and on an adaptive restart strategy, that is guaranteed to achieve a linear convergence rate in the strongly convex case. Finally, we test the performances of our algorithms on some strongly and non-strongly convex examples.

MCML Authors

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[58]

Ç. Yapar, F. Jaensch, R. Levie, G. Kutyniok and G. Caire.
Overview of the First Pathloss Radio Map Prediction Challenge.
IEEE Open Journal of Signal Processing 5 (Jun. 2024). DOI

Abstract

Pathloss quantifies the reduction in power density of a signal radiated from a transmitter. The attenuation is due to large-scale effects such as free-space propagation loss and interactions (e.g., penetration, reflection, and diffraction) of the signal with objects such as buildings, vehicles, trees, and pedestrians in the propagation environment. Many current or planned wireless communications applications require the knowledge (or a reliable approximation) of the pathloss on a dense grid (radio map) of the environment of interest. Deterministic simulation methods such as ray tracing are known to provide very good estimates of pathloss values. However, their high computational complexity makes them unsuitable for most of the applications envisaged. To promote research and facilitate a fair comparison among the recently proposed fast and accurate deep learning-based pathloss radio map prediction methods, we have organized the ICASSP 2023 First Pathloss Radio Map Prediction Challenge. In this overview paper, we describe the pathloss radio map prediction problem, provide a literature survey of the current state of the art, describe the challenge datasets, the challenge task, and the challenge evaluation methodology. Finally, we provide a brief overview of the submitted methods and present the results of the challenge.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[57]

M. Brunner, M. Innerberger, A. Miraçi, D. Praetorius, J. Streitberger and P. Heid.
Adaptive FEM with quasi-optimal overall cost for nonsymmetric linear elliptic PDEs.
IMA Journal of Numerical Analysis 44.3 (May. 2024). DOI

Abstract

We consider a general nonsymmetric second-order linear elliptic partial differential equation in the framework of the Lax–Milgram lemma. We formulate and analyze an adaptive finite element algorithm with arbitrary polynomial degree that steers the adaptive meshrefinement and the inexact iterative solution of the arising linear systems. More precisely, the iterative solver employs, as an outer loop, the so-called Zarantonello iteration to symmetrize the system and, as an inner loop, a uniformly contractive algebraic solver, for example, an optimally preconditioned conjugate gradient method or an optimal geometric multigrid algorithm. We prove that the proposed inexact adaptive iteratively symmetrized finite element method leads to full linear convergence and, for sufficiently small adaptivity parameters, to optimal convergence rates with respect to the overall computational cost, i.e., the total computational time. Numerical experiments underline the theory.

MCML Authors

Pascal Heid

Dr.

* Former Member

[56]

Y. Lee, H. Boche and G. Kutyniok.
Computability of Optimizers.
IEEE Transactions on Information Theory 70.4 (Apr. 2024). DOI

Abstract

Optimization problems are a staple of today’s scientific and technical landscape. However, at present, solvers of such problems are almost exclusively run on digital hardware. Using Turing machines as a mathematical model for any type of digital hardware, in this paper, we analyze fundamental limitations of this conceptual approach of solving optimization problems. Since in most applications, the optimizer itself is of significantly more interest than the optimal value of the corresponding function, we will focus on computability of the optimizer. In fact, we will show that in various situations the optimizer is unattainable on Turing machines and consequently on digital computers. Moreover, even worse, there does not exist a Turing machine, which approximates the optimizer itself up to a certain constant error. We prove such results for a variety of well-known problems from very different areas, including artificial intelligence, financial mathematics, and information theory, often deriving the even stronger result that such problems are not Banach-Mazur computable, also not even in an approximate sense.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Mathematical Foundations of Artificial Intelligence

[55]

Y. Mansour and R. Heckel.
GAMA-IR: Global Additive Multidimensional Averaging for Fast Image Restoration.
Preprint (Apr. 2024). arXiv

Abstract

Deep learning-based methods have shown remarkable success for various image restoration tasks such as denoising and deblurring. The current state-of-the-art networks are relatively deep and utilize (variants of) self attention mechanisms. Those networks are significantly slower than shallow convolutional networks, which however perform worse. In this paper, we introduce an image restoration network that is both fast and yields excellent image quality. The network is designed to minimize the latency and memory consumption when executed on a standard GPU, while maintaining state-of-the-art performance. The network is a simple shallow network with an efficient block that implements global additive multidimensional averaging operations. This block can capture global information and enable a large receptive field even when used in shallow networks with minimal computational overhead. Through extensive experiments and evaluations on diverse tasks, we demonstrate that our network achieves comparable or even superior results to existing state-of-the-art image restoration networks with less latency. For instance, we exceed the state-of-the-art result on real-world SIDD denoising by 0.11dB, while being 2 to 10 times faster.

MCML Authors

Youssef Mansour

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Machine Learning and Information Processing

[54]

R. Bailo, A. Barbaro, S. N. Gomes, K. Riedl, T. Roith, C. Totzeck and U. Vaes.
CBX: Python and Julia packages for consensus-based interacting particle methods.
Preprint (Mar. 2024). arXiv

Abstract

We introduce CBXPy and ConsensusBasedX.jl, Python and Julia implementations of consensus-based interacting particle systems (CBX), which generalise consensus-based optimization methods (CBO) for global, derivative-free optimisation. The raison d’ˆetre of our libraries is twofold: on the one hand, to offer high- performance implementations of CBX methods that the community can use directly, while on the other, providing a general interface that can accommodate and be extended to further variations of the CBX family. Python and Julia were selected as the leading high-level languages in terms of usage and performance, as well as for their popularity among the scientific computing community. Both libraries have been developed with a common ethos, ensuring a similar API and core functionality, while leveraging the strengths of each language and writing idiomatic code.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

[53]

B. Lorenz, A. Bacho and G. Kutyniok.
Error Estimation for Physics-informed Neural Networks Approximating Semilinear Wave Equations.
Preprint (Mar. 2024). arXiv

Abstract

This paper provides rigorous error bounds for physics-informed neural networks approximating the semilinear wave equation. We provide bounds for the generalization and training error in terms of the width of the network’s layers and the number of training points for a tanh neural network with two hidden layers. Our main result is a bound of the total error in the H1([0,T];L2(Ω))-norm in terms of the training error and the number of training points, which can be made arbitrarily small under some assumptions. We illustrate our theoretical bounds with numerical experiments.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[52]

M. Singh, A. Fono and G. Kutyniok.
Expressivity of Spiking Neural Networks.
Preprint (Mar. 2024). arXiv

Abstract

The synergy between spiking neural networks and neuromorphic hardware holds promise for the development of energy-efficient AI applications. Inspired by this potential, we revisit the foundational aspects to study the capabilities of spiking neural networks where information is encoded in the firing time of neurons. Under the Spike Response Model as a mathematical model of a spiking neuron with a linear response function, we compare the expressive power of artificial and spiking neural networks, where we initially show that they realize piecewise linear mappings. In contrast to ReLU networks, we prove that spiking neural networks can realize both continuous and discontinuous functions. Moreover, we provide complexity bounds on the size of spiking neural networks to emulate multi-layer (ReLU) neural networks. Restricting to the continuous setting, we also establish complexity bounds in the reverse direction for one-layer spiking neural networks.

MCML Authors

Manjot Singh

Mathematical Foundations of Artificial Intelligence

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[51]

C. Geldhauser and H. Diebel-Fischer.
Is diverse and inclusive AI trapped in the gap between reality and algorithmizability?
NLDL 2024 - Northern Lights Deep Learning Conference. Tromsø, Norway, Jan 09-11, 2024. URL

Abstract

We investigate the preconditions of an operationalization of ethics on the example algorithmization, i.e. the mathematical implementation, of the concepts of fairness and diversity in AI. From a non-technical point of view in ethics, this implementation entails two major drawbacks, (1) as it narrows down big concepts to a single model that is deemed manageable, and (2) as it hides unsolved problems of humanity in a system that could be mistaken as the `solution’ to these problems. We encourage extra caution when dealing with such issues and vote for human oversight.

MCML Authors

Carina Geldhauser

Dr.

* Former Member

[50]

T. Yang, J. Maly, S. Dirksen and G. Caire.
Plug-In Channel Estimation With Dithered Quantized Signals in Spatially Non-Stationary Massive MIMO Systems.
IEEE Transactions on Communications 72.1 (Jan. 2024). DOI

Abstract

As the array dimension of massive MIMO systems increases to unprecedented levels, two problems occur. First, the spatial stationarity assumption along the antenna elements is no longer valid. Second, the large array size results in an unacceptably high power consumption if high-resolution analog-to-digital converters are used. To address these two challenges, we consider a Bussgang linear minimum mean square error (BLMMSE)-based channel estimator for large scale massive MIMO systems with one-bit quantizers and a spatially non-stationary channel. Whereas other works usually assume that the channel covariance is known at the base station, we consider a plug-in BLMMSE estimator that uses an estimate of the channel covariance and rigorously analyze the distortion produced by using an estimated, rather than the true, covariance. To cope with the spatial non-stationarity, we introduce dithering into the quantized signals and provide a theoretical error analysis. In addition, we propose an angular domain fitting procedure which is based on solving an instance of non-negative least squares. For the multi-user data transmission phase, we further propose a BLMMSE-based receiver to handle one-bit quantized data signals. Our numerical results show that the performance of the proposed BLMMSE channel estimator is very close to the oracle-aided scheme with ideal knowledge of the channel covariance matrix. The BLMMSE receiver outperforms the conventional maximum-ratio-combining and zero-forcing receivers in terms of the resulting ergodic sum rate.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[49]

S. Dirksen and J. Maly.
Tuning-free one-bit covariance estimation using data-driven dithering.
Preprint (Jan. 2024). arXiv

Abstract

We consider covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry. Recent work has shown that a reliable estimator can be constructed if uniformly distributed dithers on [−λ,λ] are used in the one-bit quantizer. This estimator enjoys near-minimax optimal, non-asymptotic error estimates in the operator and Frobenius norms if λ is chosen proportional to the largest variance of the distribution. However, this quantity is not known a-priori, and in practice λ needs to be carefully tuned to achieve good performance. In this work we resolve this problem by introducing a tuning-free variant of this estimator, which replaces λ by a data-driven quantity. We prove that this estimator satisfies the same non-asymptotic error estimates - up to small (logarithmic) losses and a slightly worse probability estimate. We also show that by using refined data-driven dithers that vary per entry of each sample, one can construct an estimator satisfying the same estimation error bound as the sample covariance of the samples before quantization – again up logarithmic losses. Our proofs rely on a new version of the Burkholder-Rosenthal inequalities for matrix martingales, which is expected to be of independent interest.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[48]

C. Kümmerle and J. Maly.
Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

We propose a new algorithm for the problem of recovering data that adheres to multiple, heterogenous low-dimensional structures from linear observations. Focussing on data matrices that are simultaneously row-sparse and low-rank, we propose and analyze an iteratively reweighted least squares (IRLS) algorithm that is able to leverage both structures. In particular, it optimizes a combination of non-convex surrogates for row-sparsity and rank, a balancing of which is built into the algorithm. We prove locally quadratic convergence of the iterates to a simultaneously structured data matrix in a regime of minimal sample complexity (up to constants and a logarithmic factor), which is known to be impossible for a combination of convex surrogates. In experiments, we show that the IRLS method exhibits favorable empirical convergence, identifying simultaneously row-sparse and low-rank matrices from fewer measurements than state-of-the-art methods.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Data Science and Artificial Intelligence

[47]

S. Maskey, R. Paolino, A. Bacho and G. Kutyniok.
A Fractional Graph Laplacian Approach to Oversmoothing.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL GitHub

Abstract

Graph neural networks (GNNs) have shown state-of-the-art performances in various applications. However, GNNs often struggle to capture long-range dependencies in graphs due to oversmoothing. In this paper, we generalize the concept of oversmoothing from undirected to directed graphs. To this aim, we extend the notion of Dirichlet energy by considering a directed symmetrically normalized Laplacian. As vanilla graph convolutional networks are prone to oversmooth, we adopt a neural graph ODE framework. Specifically, we propose fractional graph Laplacian neural ODEs, which describe non-local dynamics. We prove that our approach allows propagating information between distant nodes while maintaining a low probability of long-distance jumps. Moreover, we show that our method is more flexible with respect to the convergence of the graph’s Dirichlet energy, thereby mitigating oversmoothing. We conduct extensive experiments on synthetic and real-world graphs, both directed and undirected, demonstrating our method’s versatility across diverse graph homophily levels.

MCML Authors

Sohir Maskey

Mathematical Foundations of Artificial Intelligence

Raffaele Paolino

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[46]

M. Seleznova, D. Weitzner, R. Giryes, G. Kutyniok and H.-H. Chou.
Neural (Tangent Kernel) Collapse.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

This work bridges two important concepts: the Neural Tangent Kernel (NTK), which captures the evolution of deep neural networks (DNNs) during training, and the Neural Collapse (NC) phenomenon, which refers to the emergence of symmetry and structure in the last-layer features of well-trained classification DNNs. We adopt the natural assumption that the empirical NTK develops a block structure aligned with the class labels, i.e., samples within the same class have stronger correlations than samples from different classes. Under this assumption, we derive the dynamics of DNNs trained with mean squared (MSE) loss and break them into interpretable phases. Moreover, we identify an invariant that captures the essence of the dynamics, and use it to prove the emergence of NC in DNNs with block-structured NTK. We provide large-scale numerical experiments on three common DNN architectures and three benchmark datasets to support our theory.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[45]

M. Singh, A. Fono and G. Kutyniok.
Are Spiking Neural Networks more expressive than Artificial Neural Networks?
UniReps @NeurIPS 2023 - 1st Workshop on Unifying Representations in Neural Models at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

This article studies the expressive power of spiking neural networks with firing-time-based information encoding, highlighting their potential for future energy-efficient AI applications when deployed on neuromorphic hardware. The computational power of a network of spiking neurons has already been studied via their capability of approximating any continuous function. By using the Spike Response Model as a mathematical model of a spiking neuron and assuming a linear response function, we delve deeper into this analysis and prove that spiking neural networks generate continuous piecewise linear mappings. We also show that they can emulate any multi-layer (ReLU) neural network with similar complexity. Furthermore, we prove that the maximum number of linear regions generated by a spiking neuron scales exponentially with respect to the input dimension, a characteristic that distinguishes it significantly from an artificial (ReLU) neuron. Our results further extend the understanding of the approximation properties of spiking neural networks and open up new avenues where spiking neural networks can be deployed instead of artificial neural networks without any performance loss.

MCML Authors

Manjot Singh

Mathematical Foundations of Artificial Intelligence

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[44]

H. Boche, A. Fono and G. Kutyniok.
Limitations of Deep Learning for Inverse Problems on Digital Hardware.
IEEE Transactions on Information Theory 69.12 (Dec. 2023). DOI

Abstract

Deep neural networks have seen tremendous success over the last years. Since the training is performed on digital hardware, in this paper, we analyze what actually can be computed on current hardware platforms modeled as Turing machines, which would lead to inherent restrictions of deep learning. For this, we focus on the class of inverse problems, which, in particular, encompasses any task to reconstruct data from measurements. We prove that finite-dimensional inverse problems are not Banach-Mazur computable for small relaxation parameters. Even more, our results introduce a lower bound on the accuracy that can be obtained algorithmically.

MCML Authors

Adalbert Fono

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[43]

Ç. Yapar, R. Levie, G. Kutyniok and G. Caire.
Real-Time Outdoor Localization Using Radio Maps: A Deep Learning Approach.
IEEE Transactions on Wireless Communications 22.12 (Dec. 2023). DOI

Abstract

Global Navigation Satellite Systems typically perform poorly in urban environments, where the likelihood of line-of-sight conditions between devices and satellites is low. Therefore, alternative location methods are required to achieve good accuracy. We present LocUNet: A convolutional, end-to-end trained neural network (NN) for the localization task, which is able to estimate the position of a user from the received signal strength (RSS) of a small number of Base Stations (BS). Using estimations of pathloss radio maps of the BSs and the RSS measurements of the users to be localized, LocUNet can localize users with state-of-the-art accuracy and enjoys high robustness to inaccuracies in the estimations of radio maps. The proposed method does not require generating RSS fingerprints of each specific area where the localization task is performed and is suitable for real-time applications. Moreover, two novel datasets that allow for numerical evaluations of RSS and ToA methods in realistic urban environments are presented and made publicly available for the research community. By using these datasets, we also provide a fair comparison of state-of-the-art RSS and ToA-based methods in the dense urban scenario and show numerically that LocUNet outperforms all the compared methods.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[42]

S. Kolek, A. Chattopadhyay, K. H. R. Chan, H. Andrade-Loarca, G. Kutyniok and R. Vidal.
Learning Interpretable Queries for Explainable Image Classification with Information Pursuit.
Preprint (Dec. 2023). arXiv

Abstract

Information Pursuit (IP) is an explainable prediction algorithm that greedily selects a sequence of interpretable queries about the data in order of information gain, updating its posterior at each step based on observed query-answer pairs. The standard paradigm uses hand-crafted dictionaries of potential data queries curated by a domain expert or a large language model after a human prompt. However, in practice, hand-crafted dictionaries are limited by the expertise of the curator and the heuristics of prompt engineering. This paper introduces a novel approach: learning a dictionary of interpretable queries directly from the dataset. Our query dictionary learning problem is formulated as an optimization problem by augmenting IP’s variational formulation with learnable dictionary parameters. To formulate learnable and interpretable queries, we leverage the latent space of large vision and language models like CLIP. To solve the optimization problem, we propose a new query dictionary learning algorithm inspired by classical sparse dictionary learning. Our experiments demonstrate that learned dictionaries significantly outperform hand-crafted dictionaries generated with large language models.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[41]

J. Maly.
Robust sensing of low-rank matrices with non-orthogonal sparse decomposition.
Applied and Computational Harmonic Analysis 67 (Nov. 2023). 2024 ACHA Charles Chui Young Researcher Best Paper Award. DOI

Abstract

We consider the problem of recovering an unknown low-rank matrix with (possibly) non-orthogonal, effectively sparse rank-1 decomposition from measurements y gathered in a linear measurement process . We propose a variational formulation that lends itself to alternating minimization and whose global minimizers provably approximate up to noise level. Working with a variant of robust injectivity, we derive reconstruction guarantees for various choices of including sub-gaussian, Gaussian rank-1, and heavy-tailed measurements. Numerical experiments support the validity of our theoretical considerations.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[40]

H. Andrade-Loarca, J. Hege, D. Cremers and G. Kutyniok.
Neural Poisson Surface Reconstruction: Resolution-Agnostic Shape Reconstruction from Point Clouds.
Preprint (Nov. 2023). arXiv

Abstract

We introduce Neural Poisson Surface Reconstruction (nPSR), an architecture for shape reconstruction that addresses the challenge of recovering 3D shapes from points. Traditional deep neural networks face challenges with common 3D shape discretization techniques due to their computational complexity at higher resolutions. To overcome this, we leverage Fourier Neural Operators to solve the Poisson equation and reconstruct a mesh from oriented point cloud measurements. nPSR exhibits two main advantages: First, it enables efficient training on low-resolution data while achieving comparable performance at high-resolution evaluation, thanks to the resolution-agnostic nature of FNOs. This feature allows for one-shot super-resolution. Second, our method surpasses existing approaches in reconstruction quality while being differentiable and robust with respect to point sampling rates. Overall, the neural Poisson surface reconstruction not only improves upon the limitations of classical deep neural networks in shape reconstruction but also achieves superior results in terms of reconstruction quality, running time, and resolution agnosticism.

MCML Authors

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[39]

F. Filbir, M. Tasche and A. Veselovska.
Regularized Shannon sampling formulas related to the special affine Fourier transform.
Preprint (Nov. 2023). arXiv

Abstract

In this paper, we present new regularized Shannon sampling formulas related to the special affine Fourier transform (SAFT). These sampling formulas use localized sampling with special compactly supported window functions, namely B-spline, sinh-type, and continuous Kaiser-Bessel window functions. In contrast to the Shannon sampling series for SAFT, the regularized Shannon sampling formulas for SAFT possesses an exponential decay of the approximation error and are numerically robust in the presence of noise, if certain oversampling condition is fulfilled. Several numerical experiments illustrate the theoretical results.

MCML Authors

Anna Veselovska

Dr.

Applied Numerical Analysis

[38]

K. Riedl.
Leveraging Memory Effects and Gradient Information in Consensus-Based Optimisation: On Global Convergence in Mean-Field Law.
European Journal of Applied Mathematics (Oct. 2023). DOI

Abstract

In this paper, we study consensus-based optimisation (CBO), a versatile, flexible and customisable optimisation method suitable for performing nonconvex and nonsmooth global optimisations in high dimensions. CBO is a multi-particle metaheuristic, which is effective in various applications and at the same time amenable to theoretical analysis thanks to its minimalistic design. The underlying dynamics, however, is flexible enough to incorporate different mechanisms widely used in evolutionary computation and machine learning, as we show by analysing a variant of CBO which makes use of memory effects and gradient information. We rigorously prove that this dynamics converges to a global minimiser of the objective function in mean-field law for a vast class of functions under minimal assumptions on the initialisation of the method. The proof in particular reveals how to leverage further, in some applications advantageous, forces in the dynamics without loosing provable global convergence. To demonstrate the benefit of the herein investigated memory effects and gradient information in certain applications, we present numerical evidence for the superiority of this CBO variant in applications such as machine learning and compressed sensing, which en passant widen the scope of applications of CBO.

MCML Authors

Konstantin Riedl

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

* Former Member

[37]

F. Hoppe, C. M. Verdun, H. Laus, F. Krahmer and H. Rauhut.
Uncertainty Quantification For Learned ISTA.
MLSP 2023 - IEEE Workshop on Machine Learning for Signal Processing. Rome, Italy, Sep 17-20, 2023. DOI

Abstract

Model-based deep learning solutions to inverse problems have attracted increasing attention in recent years as they bridge state-of-the-art numerical performance with interpretability. In addition, the incorporated prior domain knowledge can make the training more efficient as the smaller number of parameters allows the training step to be executed with smaller datasets. Algorithm unrolling schemes stand out among these model-based learning techniques. Despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap proposing a rigorous way to obtain confidence intervals for the LISTA estimator.

MCML Authors

Claudio Mayrink Verdun

Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

* Former Member

Hannah Laus

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[36]

Ç. Yapar, F. Jaensch, R. Ron, G. Kutyniok and G. Caire.
Overview of the Urban Wireless Localization Competition.
MLSP 2023 - IEEE Workshop on Machine Learning for Signal Processing. Rome, Italy, Sep 17-20, 2023. DOI

Abstract

In dense urban environments, Global Navigation Satellite Systems do not provide good accuracy due to the low probability of line-of-sight (LOS) between the user equipment (UE) to be located and the satellites due to the presence of obstacles such as buildings. As a result, it is necessary to resort to other technologies that can operate reliably under non-line-of-sight (NLOS) conditions. To promote research in the reviving field of radio map-based wireless localization, we have launched the MLSP 2023 Urban Wireless Localization Competition. In this short overview paper, we describe the urban wireless localization problem, the provided datasets and baseline methods, the challenge task, and the challenge evaluation methodology. Finally, we present the results of the challenge.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[35]

A. Bacho, H. Boche and G. Kutyniok.
Complexity Blowup for Solutions of the Laplace and the Diffusion Equation.
Preprint (Sep. 2023). arXiv

Abstract

In this paper, we investigate the computational complexity of solutions to the Laplace and the diffusion equation. We show that for a certain class of initial-boundary value problems of the Laplace and the diffusion equation, the solution operator is #P1/#P-complete in the sense that it maps polynomial-time computable functions to the set of #P1/#P-complete functions. Consequently, there exists polynomial-time (Turing) computable input data such that the solution is not polynomial-time computable, unless FP=#P or FP1=#P1. In this case, we can, in general, not simulate the solution of the Laplace or the diffusion equation on a digital computer without having a complexity blowup, i.e., the computation time for obtaining an approximation of the solution with up to a finite number of significant digits grows non-polynomially in the number of digits. This indicates that the computational complexity of the solution operator that models a physical phenomena is intrinsically high, independent of the numerical algorithm that is used to approximate a solution.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[34]

F. Hoppe, F. Krahmer, C. M. Verdun, M. I. Menzel and H. Rauhut.
Uncertainty quantification for sparse Fourier recovery.
Preprint (Sep. 2023). arXiv

Abstract

One of the most prominent methods for uncertainty quantification in high-dimen-sional statistics is the desparsified LASSO that relies on unconstrained ℓ1-minimization. The majority of initial works focused on real (sub-)Gaussian designs. However, in many applications, such as magnetic resonance imaging (MRI), the measurement process possesses a certain structure due to the nature of the problem. The measurement operator in MRI can be described by a subsampled Fourier matrix. The purpose of this work is to extend the uncertainty quantification process using the desparsified LASSO to design matrices originating from a bounded orthonormal system, which naturally generalizes the subsampled Fourier case and also allows for the treatment of the case where the sparsity basis is not the standard basis. In particular we construct honest confidence intervals for every pixel of an MR image that is sparse in the standard basis provided the number of measurements satisfies n≳max{slog2slogp,slog2p} or that is sparse with respect to the Haar Wavelet basis provided a slightly larger number of measurements.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Claudio Mayrink Verdun

Dr.

* Former Member

Holger Rauhut

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Mathematical Data Science and Artificial Intelligence

[33]

S. Endt, M. Engel, E. Naldi, R. Assereto, M. Molendowska, L. Mueller, C. M. Verdun, C. M. Pirkl, M. Palombo, D. K. Jones and M. I. Menzel.
In vivo myelin water quantification using diffusion--relaxation correlation MRI: A comparison of 1D and 2D methods.
Applied Magnetic Resonance 54 (Aug. 2023). DOI

Abstract

Multidimensional Magnetic Resonance Imaging (MRI) is a versatile tool for microstructure mapping. We use a diffusion weighted inversion recovery spin echo (DW-IR-SE) sequence with spiral readouts at ultra-strong gradients to acquire a rich diffusion–relaxation data set with sensitivity to myelin water. We reconstruct 1D and 2D spectra with a two-step convex optimization approach and investigate a variety of multidimensional MRI methods, including 1D multi-component relaxometry, 1D multi-component diffusometry, 2D relaxation correlation imaging, and 2D diffusion-relaxation correlation spectroscopic imaging (DR-CSI), in terms of their potential to quantify tissue microstructure, including the myelin water fraction (MWF). We observe a distinct spectral peak that we attribute to myelin water in multi-component T1 relaxometry, T1-T2 correlation, T1-D correlation, and T2-D correlation imaging. Due to lower achievable echo times compared to diffusometry, MWF maps from relaxometry have higher quality. Whilst 1D multi-component T1 data allows much faster myelin mapping, 2D approaches could offer unique insights into tissue microstructure and especially myelin diffusion.

MCML Authors

Claudio Mayrink Verdun

Dr.

* Former Member

[32]

P. Heid.
A damped Kačanov scheme for the numerical solution of a relaxed p(x)-Poisson equation.
Partial Differential Equations and Applications 4.40 (Aug. 2023). DOI

Abstract

The focus of the present work is the (theoretical) approximation of a solution of the p(x)-Poisson equation. To devise an iterative solver with guaranteed convergence, we will consider a relaxation of the original problem in terms of a truncation of the nonlinearity from below and from above by using a pair of positive cut-off parameters. We will then verify that, for any such pair, a damped Kačanov scheme generates a sequence converging to a solution of the relaxed equation. Subsequently, it will be shown that the solutions of the relaxed problems converge to the solution of the original problem in the discrete setting. Finally, the discrete solutions of the unrelaxed problem converge to the continuous solution. Our work will finally be rounded up with some numerical experiments that underline the analytical findings.

MCML Authors

Pascal Heid

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

* Former Member

[31]

H.-H. Chou, J. Maly and D. Stöger.
How to induce regularization in linear models: A guide to reparametrizing gradient flow.
Preprint (Aug. 2023). arXiv

Abstract

In this work, we analyze the relation between reparametrizations of gradient flow and the induced implicit bias in linear models, which encompass various basic regression tasks. In particular, we aim at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow. Our results provide conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed. We furthermore show how to use these insights for designing reparametrization functions that lead to specific implicit biases which are closely connected to ℓp- or trigonometric regularizers.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[30]

N. Stucki, J. C. Paetzold, S. Shit, B. Menze and U. Bauer.
Topologically faithful image segmentation via induced matching of persistence barcodes.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL GitHub

Abstract

Segmentation models predominantly optimize pixel-overlap-based loss, an objective that is actually inadequate for many segmentation tasks. In recent years, their limitations fueled a growing interest in topology-aware methods, which aim to recover the topology of the segmented structures. However, so far, existing methods only consider global topological properties, ignoring the need to preserve topological features spatially, which is crucial for accurate segmentation. We introduce the concept of induced matchings from persistent homology to achieve a spatially correct matching between persistence barcodes in a segmentation setting. Based on this concept, we define the Betti matching error as an interpretable, topologically and feature-wise accurate metric for image segmentations, which resolves the limitations of the Betti number error. Our Betti matching error is differentiable and efficient to use as a loss function. We demonstrate that it improves the topological performance of segmentation networks significantly across six diverse datasets while preserving the performance with respect to traditional scores.

MCML Authors

Nico Stucki

Applied Topology and Geometry

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry

[29]

S. Alberti, N. Dern, L. Thesing and G. Kutyniok.
Sumformer: Universal Approximation for Efficient Transformers.
TAG-ML @ICML 2023 - 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning at the 40th International Conference on Machine Learning (ICML 2023). Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handling long sequences. While efficient Transformer architectures like Linformer and Performer with linear complexity have emerged as promising solutions, their theoretical understanding remains limited. In this paper, we introduce Sumformer, a novel and simple architecture capable of universally approximating equivariant sequence-to-sequence functions. We use Sumformer to give the first universal approximation results for Linformer and Performer. Moreover, we derive a new proof for Transformers, showing that just one attention layer is sufficient for universal approximation.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[28]

T. Fuchs, F. Krahmer and R. Kueng.
Greedy-type sparse recovery from heavy-tailed measurements.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

Recovering a s-sparse signal vector x∈Cn from a comparably small number of measurements y:=(Ax)∈Cm is the underlying challenge of compressed sensing. By now, a variety of efficient greedy algorithms has been established and strong recovery guarantees have been proven for random measurement matrices A∈Cm×n.However, they require a strong concentration of A ∗ Ax around its mean x (in particular, the Restricted Isometry Property), which is generally not fulfilled for heavy-tailed matrices. In order to overcome this issue and even cover applications where only limited knowledge about the distribution of the measurements matrix is known, we suggest substituting A ∗ Ax by a median-of-means estimator.In the following, we present an adapted greedy algorithm, based on median-of-means, and prove that it can recover any s-sparse unit vector x∈Cn up to a l 2 -error ∥x−x^∥2<∈ with high probability, while only requiring a bound on the fourth moment of the entries of A. The sample complexity is of the order O(slog(nlog(1∈))log(1∈)).

MCML Authors

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[27]

F. Hoppe, F. Krahmer, C. M. Verdun, M. I. Menzel and H. Rauhut.
Sampling Strategies for Compressive Imaging Under Statistical Noise.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

Most of the compressive sensing literature in signal processing assumes that the noise present in the measurement has an adversarial nature, i.e., it is bounded in a certain norm. At the same time, the randomization introduced in the sampling scheme usually assumes an i.i.d. model where rows are sampled with replacement. In this case, if a sample is measured a second time, it does not add additional information. For many applications, where the statistical noise model is a more accurate one, this is not true anymore since a second noisy sample comes with an independent realization of the noise, so there is a fundamental difference between sampling with and without replacement. Therefore, a more careful analysis must be performed. In this short note, we illustrate how one can mathematically transition between these two noise models. This transition gives rise to a weighted LASSO reconstruction method for sampling without replacement, which numerically improves the solution of high-dimensional compressive imaging problems.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

Optimization & Data Analysis

Claudio Mayrink Verdun

Dr.

* Former Member

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[26]

R. Joy, F. Krahmer, A. Lupoli and R. Ramakrishan.
Quantization of Bandlimited Functions Using Random Samples.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

We investigate the compatibility of distributed noise-shaping quantization with random samples of bandlimited functions. Let f be a real-valued π-bandlimited function. Suppose R > 1 is a real number, and assume that {xi}mi=1 is a sequence of i.i.d random variables uniformly distributed on [−R~,R~], where R~>R is appropriately chosen. We show that on using a distributed noise-shaping quantizer to quantize the values of f at {xi}mi=1, a function f ♯ can be reconstructed from these quantized values such that ∥∥f−f♯∥∥L2[−R,R] decays with high probability as m and R~ increase.

MCML Authors

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[25]

F. Krahmer, H. Lyu, R. Saab, A. Veselovska and R. Wang.
Quantization of Bandlimited Graph Signals.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

Graph models and graph-based signals are becoming increasingly important in machine learning, natural sciences, and modern signal processing. In this paper, we address the problem of quantizing bandlimited graph signals. We introduce two classes of noise-shaping algorithms for graph signals that differ in their sampling methodologies. We demonstrate that these algorithms can be efficiently used to construct quantized representatives of bandlimited graph-based signals with bounded amplitude. Moreover, for one of the algorithms, we provide theoretical guarantees on the relative error between the quantized representative and the true signal.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Anna Veselovska

Dr.

Applied Numerical Analysis

[24]

F. Krahmer and A. Veselovska.
Digital Halftoning via Mixed-Order Weighted Σ∆ Modulation.
SampTA 2023 - 14th International Conference on Sampling Theory and Applications. Yale, CT, USA, Jul 10-14, 2023. DOI

Abstract

In this paper, we propose 1-bit weighted Σ∆ quantization schemes of mixed order as a technique for digital halftoning. These schemes combine weighted Σ∆ schemes of different orders for two-dimensional signals so one can profit both from the better stability properties of low order schemes and the better accuracy properties of higher order schemes. We demonstrate that the resulting mixed-order Σ∆ schemes in combination with a padding strategy yield improved representation quality in digital halftoning as measured in the Feature Similarity Index.These empirical results are complemented by mathematical error bounds for the model of two-dimensional bandlimited signals as motivated by a mathematical model of human visual perception.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Anna Veselovska

Dr.

Applied Numerical Analysis

[23]

G. Kutyniok.
An introduction to the mathematics of deep learning.
European Congress of Mathematics (Jul. 2023). DOI

Abstract

Despite the outstanding success of deep neural networks in real-world applications, ranging from science to public life, most of the related research is empirically driven and a comprehensive mathematical foundation is still missing. At the same time, these methods have already shown their impressive potential in mathematical research areas such as imaging sciences, inverse problems, or numerical analysis of partial differential equations, sometimes by far outperforming classical mathematical approaches for particular problem classes. The goal of this paper, which is based on a plenary lecture at the 8th European Congress of Mathematics in 2021, is to first provide an introduction into this new vibrant research area. We will then showcase some recent advances in two directions, namely the development of a mathematical foundation of deep learning and the introduction of novel deep learning-based approaches to solve inverse problems and partial differential equations.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[22]

F. Krahmer and A. Veselovska.
Enhanced Digital Halftoning via Weighted Sigma-Delta Modulation.
SIAM Journal on Imaging Sciences 16.3 (Jul. 2023). DOI

Abstract

In this paper, we study error diffusion techniques for digital halftoning from the perspective of 1-bit quantization. We introduce a method to generate schemes for two-dimensional signals as a weighted combination of their one-dimensional counterparts and show that various error diffusion schemes proposed in the literature can be represented in this framework via schemes of first order. Under the model of two-dimensional bandlimited signals, which is motivated by a mathematical model of human visual perception, we derive quantitative error bounds for such weighted schemes. We see these bounds as a step towards a mathematical understanding of the good empirical performance of error diffusion, even though they are formulated in the supremum norm, which is known to not fully capture the visual similarity of images. Motivated by the correspondence between existing error diffusion algorithms and first-order schemes, we study the performance of the analogous weighted combinations of second-order schemes and show that they exhibit a superior performance in terms of guaranteed error decay for two-dimensional bandlimited signals. In extensive numerical simulations for real-world images, we demonstrate that with some modifications to enhance stability this superior performance also translates to the problem of digital halftoning. More concretely, we find that certain second-order weighted schemes exhibit competitive performance for digital halftoning of real-world images in terms of the Feature Similarity Index (FSIM), a state-of-the-art measure for image quality assessment.

MCML Authors

Felix Krahmer

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Optimization & Data Analysis

Anna Veselovska

Dr.

Applied Numerical Analysis

[21]

A. Bacho, H. Boche and G. Kutyniok.
Reliable AI: Does the Next Generation Require Quantum Computing?
Preprint (Jul. 2023). arXiv

Abstract

In this survey, we aim to explore the fundamental question of whether the next generation of artificial intelligence requires quantum computing. Artificial intelligence is increasingly playing a crucial role in many aspects of our daily lives and is central to the fourth industrial revolution. It is therefore imperative that artificial intelligence is reliable and trustworthy. However, there are still many issues with reliability of artificial intelligence, such as privacy, responsibility, safety, and security, in areas such as autonomous driving, healthcare, robotics, and others. These problems can have various causes, including insufficient data, biases, and robustness problems, as well as fundamental issues such as computability problems on digital hardware. The cause of these computability problems is rooted in the fact that digital hardware is based on the computing model of the Turing machine, which is inherently discrete. Notably, our findings demonstrate that digital hardware is inherently constrained in solving problems about optimization, deep learning, or differential equations. Therefore, these limitations carry substantial implications for the field of artificial intelligence, in particular for machine learning. Furthermore, although it is well known that the quantum computer shows a quantum advantage for certain classes of problems, our findings establish that some of these limitations persist when employing quantum computing models based on the quantum circuit or the quantum Turing machine paradigm. In contrast, analog computing models, such as the Blum-Shub-Smale machine, exhibit the potential to surmount these limitations.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Reinhard Heckel

Mathematical Foundations of Artificial Intelligence

[20]

Y. Mansour and R. Heckel.
Zero-Shot Noise2Noise: Efficient Image Denoising without any Data.
CVPR 2023 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023. DOI

Abstract

Recently, self-supervised neural networks have shown excellent image denoising performance. How-ever, current dataset free methods are either computationally expensive, require a noise model, or have inad-equate image quality. In this work we show that a simple 2-layer network, without any training data or knowledge of the noise distribution, can enable high-quality image denoising at low computational cost. Our approach is motivated by Noise2Noise and Neighbor2Neighbor and works well for denoising pixel-wise independent noise. Our experiments on artificial, real-world cam-era, and microscope noise show that our method termed ZS-N2N (Zero Shot Noise2Noise) often outperforms ex-isting dataset-free methods at a reduced cost, making it suitable for use cases with scarce data availability and limited compute.

MCML Authors

Youssef Mansour

Machine Learning and Information Processing

Reinhard Heckel

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Machine Learning and Information Processing

[19]

P. Scholl, A. Bacho, H. Boche and G. Kutyniok.
The Uniqueness Problem of Physical Law Learning.
ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing. Rhode Island, Greece, Jun 04-10, 2023. DOI

Abstract

Physical law learning is the ambiguous attempt at automating the derivation of governing equations with the use of machine learning techniques. This paper shall serve as a first step to build a comprehensive theoretical framework for learning physical laws, aiming to provide reliability to according algorithms. One key problem consists in the fact that the governing equations might not be uniquely determined by the given data. We will study this problem in the common situation that a physical law is described by an ordinary or partial differential equation. For various different classes of differential equations, we provide both necessary and sufficient conditions for a function from a given function class to uniquely determine the differential equation which is governing the phenomenon. We then use our results to determine in extensive numerical experiments whether a function solves a differential equation uniquely.

MCML Authors

Philipp Scholl

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[18]

Ç. Yapar, F. Jaensch, R. Levie, G. Kutyniok and G. Caire.
The First Pathloss Radio Map Prediction Challenge.
ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing. Rhode Island, Greece, Jun 04-10, 2023. DOI

Abstract

To foster research and facilitate fair comparisons among recently proposed pathloss radio map prediction methods, we have launched the ICASSP 2023 First Pathloss Radio Map Prediction Challenge. In this short overview paper, we briefly describe the pathloss prediction problem, the provided datasets, the challenge task and the challenge evaluation methodology. Finally, we present the results of the challenge.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[17]

K. Riedl, T. Klock, C. Geldhauser and M. Fornasier.
Gradient is All You Need?
Preprint (Jun. 2023). arXiv

Abstract

In this paper we provide a novel analytical perspective on the theoretical understanding of gradient-based learning algorithms by interpreting consensus-based optimization (CBO), a recently proposed multi-particle derivative-free optimization method, as a stochastic relaxation of gradient descent. Remarkably, we observe that through communication of the particles, CBO exhibits a stochastic gradient descent (SGD)-like behavior despite solely relying on evaluations of the objective function. The fundamental value of such link between CBO and SGD lies in the fact that CBO is provably globally convergent to global minimizers for ample classes of nonsmooth and nonconvex objective functions, hence, on the one side, offering a novel explanation for the success of stochastic relaxations of gradient descent. On the other side, contrary to the conventional wisdom for which zero-order methods ought to be inefficient or not to possess generalization abilities, our results unveil an intrinsic gradient descent nature of such heuristics. This viewpoint furthermore complements previous insights into the working principles of CBO, which describe the dynamics in the mean-field limit through a nonlinear nonlocal partial differential equation that allows to alleviate complexities of the nonconvex function landscape. Our proofs leverage a completely nonsmooth analysis, which combines a novel quantitative version of the Laplace principle (log-sum-exp trick) and the minimizing movement scheme (proximal iteration). In doing so, we furnish useful and precise insights that explain how stochastic perturbations of gradient descent overcome energy barriers and reach deep levels of nonconvex functions. Instructive numerical illustrations support the provided theoretical insights.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

Carina Geldhauser

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

* Former Member

Massimo Fornasier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[16]

H. Huang, J. Qiu and K. Riedl.
On the global convergence of particle swarm optimization methods.
Applied Mathematics and Optimization 88.2 (May. 2023). DOI

Abstract

In this paper we provide a rigorous convergence analysis for the renowned particle swarm optimization method by using tools from stochastic calculus and the analysis of partial differential equations. Based on a continuous-time formulation of the particle dynamics as a system of stochastic differential equations, we establish convergence to a global minimizer of a possibly nonconvex and nonsmooth objective function in two steps. First, we prove consensus formation of an associated mean-field dynamics by analyzing the time-evolution of the variance of the particle distribution, which acts as Lyapunov function of the dynamics. We then show that this consensus is close to a global minimizer by employing the asymptotic Laplace principle and a tractability condition on the energy landscape of the objective function. These results allow for the usage of memory mechanisms, and hold for a rich class of objectives provided certain conditions of well-preparation of the hyperparameters and the initial datum. In a second step, at least for the case without memory effects, we provide a quantitative result about the mean-field approximation of particle swarm optimization, which specifies the convergence of the interacting particle system to the associated mean-field limit. Combining these two results allows for global convergence guarantees of the numerical particle swarm optimization method with provable polynomial complexity. To demonstrate the applicability of the method we propose an efficient and parallelizable implementation, which is tested in particular on a competitive and well-understood high-dimensional benchmark problem in machine learning.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

[15]

R. Paolino, A. Bojchevski, S. Günnemann, G. Kutyniok and R. Levie.
Unveiling the Sampling Density in Non-Uniform Geometric Graphs.
ICLR 2023 - 11th International Conference on Learning Representations. Kigali, Rwanda, May 01-05, 2023. URL

Abstract

A powerful framework for studying graphs is to consider them as geometric graphs: nodes are randomly sampled from an underlying metric space, and any pair of nodes is connected if their distance is less than a specified neighborhood radius. Currently, the literature mostly focuses on uniform sampling and constant neighborhood radius. However, real-world graphs are likely to be better represented by a model in which the sampling density and the neighborhood radius can both vary over the latent space. For instance, in a social network communities can be modeled as densely sampled areas, and hubs as nodes with larger neighborhood radius. In this work, we first perform a rigorous mathematical analysis of this (more general) class of models, including derivations of the resulting graph shift operators. The key insight is that graph shift operators should be corrected in order to avoid potential distortions introduced by the non-uniform sampling. Then, we develop methods to estimate the unknown sampling density in a self-supervised fashion. Finally, we present exemplary applications in which the learnt density is used to 1) correct the graph shift operator and improve performance on a variety of tasks, 2) improve pooling, and 3) extract knowledge from networks. Our experimental findings support our theory and provide strong evidence for our model.

MCML Authors

Raffaele Paolino

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[14]

H.-H. Chou, H. Rauhut and R. Ward.
Robust implicit regularization via weight normalization.
Preprint (May. 2023). arXiv

Abstract

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice. However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing gradient flow (continuous-time version of gradient descent) with weight normalization, where the weight vector is reparameterized in terms of polar coordinates, and gradient flow is applied to the polar coordinates. By analyzing key invariants of the gradient flow and using Lojasiewicz Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient flow, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale. Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization in overparameterized diagonal linear network models.

MCML Authors

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

[13]

J. Maly and R. Saab.
A simple approach for quantizing neural networks.
Preprint (Apr. 2023). arXiv

Abstract

In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving the network performance on given training data. On one hand, the computational complexity of this pre-processing slightly exceeds that of state-of-the-art algorithms in the literature. On the other hand, our approach does not require any hyper-parameter tuning and, in contrast to previous methods, allows a plain analysis. We provide rigorous theoretical guarantees in the case of quantizing single network layers and show that the relative error decays with the number of parameters in the network if the training data behaves well, e.g., if it is sampled from suitable random distributions. The developed method also readily allows the quantization of deep networks by consecutive application to single layers.

MCML Authors

Johannes Maly

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Data Science and Artificial Intelligence

[12]

L. Sun.
Well-posedness and L1−Lp Smoothing Effect of the Porous Media Equation under Poincaré Inequality.
Preprint (Apr. 2023). arXiv

Abstract

We investigate the well-posedness and uniqueness of the Cauchy problem for a class of porous media equations defined on ℝd, and demonstrate the L1−Lp smoothing effect. In particular, we establish that the logarithm of the ratio of the Lp norm to the L1 norm decreases super-exponentially fast during the initial phase, subsequently decaying to zero exponentially fast in the latter phase. This implies that if the initial data is solely in L1, then for t>0, the solution will belong to Lp for any p∈[1,∞). The results are obtained under the assumption of a Poincaré inequality.

MCML Authors

Lukang Sun

Applied Numerical Analysis

[11]

P. Heid.
A short note on an adaptive damped Newton method for strongly monotone and Lipschitz continuous operator equations.
Archiv der Mathematik (Mar. 2023). URL

Abstract

We consider the damped Newton method for strongly monotone and Lipschitz continuous operator equations in a variational setting. We provide a very accessible justification why the undamped Newton method performs better than its damped counterparts in a vicinity of a solution. Moreover, in the given setting, an adaptive step-size strategy be presented, which guarantees the global convergence and favours an undamped update if admissible.

MCML Authors

Pascal Heid

Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

* Former Member

[10]

A. Scagliotti.
Optimal control of ensembles of dynamical systems.
ESAIM - Control, Optimisation and Calculus of Variations 29.22 (Mar. 2023). DOI

Abstract

In this paper we consider the problem of the optimal control of an ensemble of affine-control systems. After proving the well-posedness of the minimization problem under examination, we establish a $Gamma$-convergence result that allows us to substitute the original (and usually infinite) ensemble with a sequence of finite increasing-in-size sub-ensembles. The solutions of the optimal control problems involving these sub-ensembles provide approximations in the $L^2$-strong topology of the minimizers of the original problem. Using again a $Gamma$-convergence argument, we manage to derive a Maximum Principle for ensemble optimal control problems with end-point cost. Moreover, in the case of finite sub-ensembles, we can address the minimization of the related cost through numerical schemes. In particular, we propose an algorithm that consists of a subspace projection of the gradient field induced on the space of admissible controls by the approximating cost functional. In addition, we consider an iterative method based on the Pontryagin Maximum Principle. Finally, we test the algorithms on an ensemble of linear systems in mathbb{R^2}.

MCML Authors

Alessandro Scagliotti

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Applied Numerical Analysis

[9]

C. M. Verdun.
Scalability in Ill-posed Machine Learning Problems: Bridging Least Squares Methods with (Non-)convex Algorithms.
Dissertation 2022. DOI

Abstract

We introduce novel algorithms to address some challenges in machine learning, including ill-conditioned low-rank matrix retrieval, constrained least squares, and high-dimensional regression with unknown noise. By bridging least squares with modern (non-)convex optimization, our methods achieve scalability, data efficiency, and robustness. We provide theoretical guarantees with minimal assumptions and numerically validate their performance.

MCML Authors

Claudio Mayrink Verdun

Dr.

A2 | Mathematical Foundations
→ Group Felix Krahmer

* Former Member

[8]

H. Boche, A. Fono and G. Kutyniok.
Non-Computability of the Pseudoinverse on Digital Computers.
Preprint (Dec. 2022). arXiv

Abstract

The pseudoinverse of a matrix, a generalized notion of the inverse, is of fundamental importance in linear algebra. However, there does not exist a closed form representation of the pseudoinverse, which can be straightforwardly computed. Therefore, an algorithmic computation is necessary. An algorithmic computation can only be evaluated by also considering the underlying hardware, typically digital hardware, which is responsible for performing the actual computations step by step. In this paper, we analyze if and to what degree the pseudoinverse actually can be computed on digital hardware platforms modeled as Turing machines. For this, we utilize the notion of an effective algorithm which describes a provably correct computation: upon an input of any error parameter, the algorithm provides an approximation within the given error bound with respect to the unknown solution. We prove that an effective algorithm for computing the pseudoinverse of any matrix can not exist on a Turing machine, although provably correct algorithms do exist for specific classes of matrices. Even more, our results introduce a lower bound on the accuracy that can be obtained algorithmically when computing the pseudoinverse on Turing machines.

MCML Authors

Adalbert Fono

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[7]

H. Huang, J. Qiu and K. Riedl.
Consensus-Based Optimization for Saddle Point Problems.
Preprint (Dec. 2022). arXiv

Abstract

In this paper, we propose consensus-based optimization for saddle point problems (CBO-SP), a novel multi-particle metaheuristic derivative-free optimization method capable of provably finding global Nash equilibria. Following the idea of swarm intelligence, the method employs a group of interacting particles, which perform a minimization over one variable and a maximization over the other. This paradigm permits a passage to the mean-field limit, which makes the method amenable to theoretical analysis and allows to obtain rigorous convergence guarantees under reasonable assumptions about the initialization and the objective function, which most notably include nonconvex-nonconcave objectives.

MCML Authors

Konstantin Riedl

Dr.

* Former Member

[6]

C. Koke and G. Kutyniok.
Graph Scattering beyond Wavelet Shackles.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

This work develops a flexible and mathematically sound framework for the design and analysis of graph scattering networks with variable branching ratios and generic functional calculus filters.Spectrally-agnostic stability guarantees for node- and graph-level perturbations are derived; the vertex-set non-preserving case is treated by utilizing recently developed mathematical-physics based tools. Energy propagation through the network layers is investigated and related to truncation stability. New methods of graph-level feature aggregation are introduced and stability of the resulting composite scattering architectures is established. Finally, scattering transforms are extended to edge- and higher order tensorial input. Theoretical results are complemented by numerical investigations: Suitably chosen scattering networks conforming to the developed theory perform better than traditional graph-wavelet based scattering approaches in social network graph classification tasks andsignificantly outperform other graph-based learning approaches to regression of quantum-chemical energies on QM7.

MCML Authors

Christian Koke

Computer Vision & Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Mathematical Foundations of Artificial Intelligence

[5]

S. Maskey, R. Levie, Y. Lee and G. Kutyniok.
Generalization Analysis of Message Passing Neural Networks on Large Random Graphs.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Message passing neural networks (MPNN) have seen a steep rise in popularity since their introduction as generalizations of convolutional neural networks to graph-structured data, and are now considered state-of-the-art tools for solving a large variety of graph-focused problems. We study the generalization error of MPNNs in graph classification and regression. We assume that graphs of different classes are sampled from different random graph models. We show that, when training a MPNN on a dataset sampled from such a distribution, the generalization gap increases in the complexity of the MPNN, and decreases, not only with respect to the number of training samples, but also with the average number of nodes in the graphs. This shows how a MPNN with high complexity can generalize from a small dataset of graphs, as long as the graphs are large. The generalization bound is derived from a uniform convergence result, that shows that any MPNN, applied on a graph, approximates the MPNN applied on the geometric model that the graph discretizes.

MCML Authors

Sohir Maskey

Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[4]

Y. Zhou, G. Kutyniok and B. Ribeiro.
OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

This work provides the first theoretical study on the ability of graph Message Passing Neural Networks (gMPNNs) —such as Graph Neural Networks (GNNs)— to perform inductive out-of-distribution (OOD) link prediction tasks, where deployment (test) graph sizes are larger than training graphs. We first prove non-asymptotic bounds showing that link predictors based on permutation-equivariant (structural) node embeddings obtained by gMPNNs can converge to a random guess as test graphs get larger. We then propose a theoretically-sound gMPNN that outputs structural pairwise (2-node) embeddings and prove non-asymptotic bounds showing that, as test graphs grow, these embeddings converge to embeddings of a continuous function that retains its ability to predict links OOD. Empirical results on random graphs show agreement with our theoretical results.

MCML Authors

Gitta Kutyniok

Prof. Dr.

A2 | Mathematical Foundations
→ Group Massimo Fornasier

Mathematical Foundations of Artificial Intelligence

[3]

A. Scagliotti and P. Colli Franzone.
Accelerated subgradient methods.
Preprint (Feb. 2022). arXiv

Abstract

Subgradient methods are the natural extension to the non-smooth case of the classical gradient descent for regular convex optimization problems. However, in general, they are characterized by slow convergence rates, and they require decreasing step-sizes to converge. In this paper we propose a subgradient method with constant step-size for composite convex objectives with ℓ1-regularization. If the smooth term is strongly convex, we can establish a linear convergence result for the function values. This fact relies on an accurate choice of the element of the subdifferential used for the update, and on proper actions adopted when non-differentiability regions are crossed. Then, we propose an accelerated version of the algorithm, based on conservative inertial dynamics and on an adaptive restart strategy, that is guaranteed to achieve a linear convergence rate in the strongly convex case. Finally, we test the performances of our algorithms on some strongly and non-strongly convex examples.

MCML Authors

Alessandro Scagliotti

Applied Numerical Analysis

[2]

C. M. Verdun, T. Fuchs, P. Harar, D. Elbrächter, D. S. Fischer, J. Berner, P. Grohs, F. J. Theis and F. Krahmer.
Group Testing for SARS-CoV-2 Allows for Up to 10-Fold Efficiency Increase Across Realistic Scenarios and Testing Strategies.
Frontiers in Public Health 9 (Aug. 2021). DOI

Abstract

Background: Due to the ongoing COVID-19 pandemic, demand for diagnostic testing has increased drastically, resulting in shortages of necessary materials to conduct the tests and overwhelming the capacity of testing laboratories. The supply scarcity and capacity limits affect test administration: priority must be given to hospitalized patients and symptomatic individuals, which can prevent the identification of asymptomatic and presymptomatic individuals and hence effective tracking and tracing policies. We describe optimized group testing strategies applicable to SARS-CoV-2 tests in scenarios tailored to the current COVID-19 pandemic and assess significant gains compared to individual testing.
Methods: We account for biochemically realistic scenarios in the context of dilution effects on SARS-CoV-2 samples and consider evidence on specificity and sensitivity of PCR-based tests for the novel coronavirus. Because of the current uncertainty and the temporal and spatial changes in the prevalence regime, we provide analysis for several realistic scenarios and propose fast and reliable strategies for massive testing procedures.
Key Findings: We find significant efficiency gaps between different group testing strategies in realistic scenarios for SARS-CoV-2 testing, highlighting the need for an informed decision of the pooling protocol depending on estimated prevalence, target specificity, and high- vs. low-risk population. For example, using one of the presented methods, all 1.47 million inhabitants of Munich, Germany, could be tested using only around 141 thousand tests if the infection rate is below 0.4% is assumed. Using 1 million tests, the 6.69 million inhabitants from the city of Rio de Janeiro, Brazil, could be tested as long as the infection rate does not exceed 1%. Moreover, we provide an interactive web application, available at www.group-testing.com, for visualizing the different strategies and designing pooling schemes according to specific prevalence scenarios and test configurations.
Interpretation: Altogether, this work may help provide a basis for an efficient upscaling of current testing procedures, which takes the population heterogeneity into account and is fine-grained towards the desired study populations, e.g., mild/asymptomatic individuals vs. symptomatic ones but also mixtures thereof.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Felix Krahmer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Optimization & Data Analysis

[1]

G. König, T. Freiesleben, B. Bischl, G. Casalicchio and M. Grosse-Wentrup.
Decomposition of Global Feature Importance into Direct and Associative Components (DEDACT).
Preprint (Jun. 2021). arXiv

Abstract

MCML Authors

Gunnar König

Dr.

* Former Member

Timo Freiesleben

Dr.

A2 | Mathematical Foundations
→ Group Tom Sterkenburg

Munich Center for Mathematical Philosophy

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

Moritz Grosse-Wentrup

Prof. Dr.

* Former Principal Investigator

A3 | Computational Models

Mathematical models and statistical concepts, which are core elements of ML methods, must be reflected by efficient algorithmic implementations. Furthermore, the execution of corresponding algorithms requires a suitable computational infrastructure. Currently, the steady growth of ML applications brings new algorithmic problems and computational challenges that MCML is addressing in this research area.

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Thomas Gabor

Prof. Dr.

Associate

Technology and Research on Artificial Intelligence Laboratory

Johannes Kinder

Prof. Dr.

Associate

Programming Languages and Artificial Intelligence

Marcus Paradies

Prof. Dr.

Associate

Database Systems and Data Mining AI Lab

Steffen Schneider

Dr.

Associate

Dynamical Inference

©all images: LMU | TUM

Publications in Research Area A3

[439]

S. Chen, J. Liu, Z. Han, Y. Xia, D. Cremers, P. Torr, V. Tresp and J. Gu.
True Multimodal In-Context Learning Needs Attention to the Visual Context.
COLM 2025 - Conference on Language Modeling. Montreal, Canada, Oct 07-09, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Multimodal Large Language Models (MLLMs), built on powerful language backbones, have enabled Multimodal In-Context Learning (MICL)-adapting to new tasks from a few multimodal demonstrations consisting of images, questions, and answers. Despite showing noticeable improvement on standard vision-language datasets, current MLLMs struggle to leverage visual information in the demonstrations. Specifically, they tend to neglect visual cues and over-rely on textual patterns, leading to mere text imitation rather than genuine multimodal adaptation. This behavior makes MICL still unimodal and largely restricts its practical utility. More importantly, this limitation is often concealed by the improved performance on tasks that do not require understanding the visual context. As a result, how to effectively enhance MICL ability and reliably evaluate the MICL performance remains underexplored. To address these issues, we first introduce Dynamic Attention Reallocation (DARA), an efficient fine-tuning strategy that encourages models to attend to the visual context by rebalancing attention across visual and textual tokens. In addition, we present TrueMICL, an MICL-dedicated dataset with both support and test sets that explicitly requires the integration of multimodal information-particularly visual content-for correct task completion. Extensive experiments demonstrate the effectiveness of our holistic solution, showcasing substantial improvements in the true multimodal in-context learning capabilities.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[438]

P. Jahn, W. Durani, C. Leiber, A. Beer and T. Seidl.
Going Offline: An Evaluation of the Offline Phase in Stream Clustering.
ECML-PKDD 2025 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Porto, Portugal, Sep 15-19, 2025. To be published. GitHub

Abstract

Data streams are a challenging and ever more relevant setting for clustering methods as more data arrives faster and faster. Stream clustering strategies either determine the clusters in an online manner directly as the instances appear, or they employ an offline phase where the online summarization structures are processed to obtain a clustering result. A recent analysis found that offline clustering may often be unnecessary or even counterproductive. The methods used in the offline phase are usually fixed for each stream clustering approach and typically stem from only a handful of clustering techniques. In this paper, we perform a broad experimental analysis specifically targeting the offline phase of stream clustering. We analyze several ways of extracting information from the summarization structures, including a novel strategy
based on data generation. Ultimately, we showcase that an offline phase is an impactful design choice for stream clustering. We also find that the chosen offline method significantly impacts the clustering performance, with the clustering quality improving drastically for some settings.

MCML Authors

Philipp Jahn

Database Systems and Data Mining AI Lab

Walid Durani

Database Systems and Data Mining AI Lab

Collin Leiber

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[437]

U. Schlegel, G. M. Tavares and T. Seidl.
Towards Explainable Deep Clustering for Time Series Data.
TempXAI @ECML-PKDD 2025 - Workshop Explainable AI for Time Series and Data Streams at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2025). Porto, Portugal, Sep 15-19, 2025. To be published.

Abstract

Deepclustering uncovers hidden patterns and groups incomplex time series data, yet its opaque decision-making limits use in safety-critical settings. This survey offers a structured overview of explainable deep clustering for time series, collecting current methods and their realworld applications. We thoroughly discuss and compare peer-reviewed and preprint papers through application domains across healthcare, finance, IoT, and climate science. Our analysis reveals that most work relies on autoencoder and attention architectures, with limited support for streaming, irregularly sampled, or privacy-preserved series, and interpretability is still primarily treated as an add-on. To push the field forward, we outline six research opportunities: (1) combining complex networks with built-in interpretability; (2) setting up clear, faithfulness-focused evaluation metrics for unsupervised explanations; (3) building explainers that adapt to live data streams; (4) crafting explanations tailored to specific domains; (5) adding human-in-the-loop methods that refine clusters and explanations together; and (6) improving our understanding of how time series clustering models work internally. By making interpretability a primary design goal rather than an afterthought, we propose the groundwork for the next generation of trustworthy deep clustering time series analytics.

MCML Authors

Udo Schlegel

Database Systems and Data Mining AI Lab

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[436]

Y. Sale, A. Javanmardi and E. Hüllermeier.
Aleatoric and Epistemic Uncertainty in Conformal Prediction.
COPA 2025 - 14th Symposium on Conformal and Probabilistic Prediction with Applications. Egham, UK, Sep 10-12, 2025. To be published.

Abstract

Recently, there has been a particular interest in distinguishing different types of uncertainty in supervised machine learning (ML) settings (H¨ullermeier and Waegeman, 2021). Aleatoric uncertainty captures the inherent randomness in the data-generating process. As it represents variability that cannot be reduced even with more data, it is often referred to as irreducible uncertainty. In contrast, epistemic uncertainty arises from a lack of knowledge about the underlying data-generating process, which—in principle—can be reduced by acquiring additional data or improving the model itself (viz. reducible uncertainty). In parallel, interest in conformal prediction (CP)—both its theory and applications—has become equally vigorous. Conformal Prediction (Vovk et al., 2005) is a model-agnostic framework for uncertainty quantification that provides prediction sets or intervals with rigorous statistical coverage guarantees. Notably, CP is distribution-free and makes only the mild assumption of exchangeability. Under this assumption, it yields prediction intervals that contain the true label with a user-specified probability. Thus, conformal prediction is seen as a promising tool to quantify uncertainty. But how is it related to aleatoric and epistemic uncertainty? In particular, we first analyze how (estimates of) aleatoric and epistemic uncertainty enter into the construction of vanilla CP—that is, how noise and model error jointly shape the global threshold. We then review ‘uncertainty-aware’ extensions that integrate these uncertainty estimates into the CP pipeline.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[435]

S. Rauch, C. M. M. Frey, A. Maldonado and T. Seidl.
BEST: Bilaterally Expanding Subtrace Tree for Event Sequence Prediction.
BPM 2025 - 23rd International Conference on Business Process Management. Seville, Spain, Aug 31-Sep 05, 2025. To be published.

Abstract

In Predictive Process Monitoring, handling uncertainty regarding future case execution is the core building block for reliable predictive or prescriptive methods.In the last decade, deep learning methods are increasingly the preferred approach when it comes to Next Activity Prediction and/or Remaining Trace Prediction. However, it remains an open question whether deep learning models finally surpass traditional data mining techniques for these tasks. In our paper, we contribute to answering this question by proposing a sequence prediction framework based on bilaterally expanding hierarchical subtraces that serves as an alternative approach for currently established deep learning techniques. We mine sequential patterns from activity traces and arrange them into a hierarchical subtrace tree by their structural relationship and inter-pattern distances. The tree structure can directly be leveraged for forecasting the most probable future activities given the trace history. We achieve competitive forecasting results for Remaining Trace Prediction, even surpassing state-of-the-art deep learning approaches on the majority of the analyzed real-world benchmark process event logs while only relying on the available control-flow information.

MCML Authors

Simon Rauch

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Christian Frey

Dr.

* Former Member

Andrea Maldonado

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[434]

J. Blake and M. Schubert.
Aerial Coverage Path Planning in Nuclear Emergencies A Training and Evaluation Environment.
Demonstration Track @IJCAI 2025 - Demonstration Track at the 34th International Joint Conference on Artificial Intelligence (IJCAI 2025). Montreal, Canada, Aug 16-22, 2025. To be published.

Abstract

We formulate a Coverage Path Planning (CPP) problem for a helicopter or a UAV tasked with mapping ground-level radiation while avoiding radiation that is too strong. We introduce a simulation environment that incorporates digital elevation models, altitude-dependent measurement footprints and realistic flight constraints, as well as state-of-the-art radiation scenario simulations, such as nuclear explosions, provided by the German Federal Office for Radiation Protection. We highlight the complexity of radiological survey missions and demonstrate the necessity for new CPP approaches that address these unique challenges. The code to our simulation environment will be provided upon acceptance.

MCML Authors

Johann Blake

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[433]

T. Benoit, Y. Wang, M. Dannehl and J. Kinder.
BLens: Contrastive Captioning of Binary Functions using Ensemble Embedding.
USENIX 2025 - 34th USENIX Security Symposium. Seattle, WA, USA, Aug 13-15, 2025. To be published. Preprint available. PDF

Abstract

Function names can greatly aid human reverse engineers, which has spurred the development of machine learning-based approaches to predicting function names in stripped binaries. Much current work in this area now uses transformers, applying a metaphor of machine translation from code to function names. Still, function naming models face challenges in generalizing to projects unrelated to the training set. In this paper, we take a completely new approach by transferring advances in automated image captioning to the domain of binary reverse engineering, such that different parts of a binary function can be associated with parts of its name. We propose BLens, which combines multiple binary function embeddings into a new ensemble representation, aligns it with the name representation latent space via a contrastive learning approach, and generates function names with a transformer architecture tailored for function names. Our experiments demonstrate that BLens significantly outperforms the state of the art. In the usual setting of splitting per binary, we achieve an F1 score of 0.79 compared to 0.70. In the cross-project setting, which emphasizes generalizability, we achieve an F1 score of 0.46 compared to 0.29. Finally, in an experimental setting reducing shared components across projects, we achieve an F1 score of 0.32 compared to 0.19.

MCML Authors

Yunru Wang

Programming Languages and Artificial Intelligence

Moritz Dannehl

Programming Languages and Artificial Intelligence

Johannes Kinder

Prof. Dr.

Programming Languages and Artificial Intelligence

[432]

Z. Ding, Y. Li, Y. He, A. Norelli, J. Wu, V. Tresp, Y. Ma and M. Bronstein.
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models.
TGL @KDD 2025 - Temporal Graph Learning Workshop at the 31st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2025). Toronto, ON, Canada, Aug 03-07, 2025. To be published. Preprint available. arXiv

Abstract

Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2) Meanwhile, more powerful models are needed to identify and select the most critical temporal information within the extended context provided by longer histories. To address these problems, we propose a CTDG representation learning model named DyGMamba, originating from the popular Mamba state space model (SSM). DyGMamba first leverages a node-level SSM to encode the sequence of historical node interactions. Another time-level SSM is then employed to exploit the temporal patterns hidden in the historical graph, where its output is used to dynamically select the critical information from the interaction history. We validate DyGMamba experimentally on the dynamic link prediction task. The results show that our model achieves state-of-the-art in most cases. DyGMamba also maintains high efficiency in terms of computational resources, making it possible to capture long temporal dependencies with a limited computation budget.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[431]

J. Bi, Y. Wang, H. Chen, X. Xiao, A. Hecker, V. Tresp and Y. Ma.
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL

Abstract

Multimodal Large Language Models (MLLMs) have significantly advanced visual tasks by integrating visual representations into large language models (LLMs). The textual modality, inherited from LLMs, equips MLLMs with abilities like instruction following and in-context learning. In contrast, the visual modality enhances performance in downstream tasks by leveraging rich semantic content, spatial information, and grounding capabilities. These intrinsic modalities work synergistically across various visual tasks. Our research initially reveals a persistent imbalance between these modalities, with text often dominating output generation during visual instruction tuning. This imbalance occurs when using both full fine-tuning and parameter-efficient fine-tuning (PEFT) methods. We then found that re-balancing these modalities can significantly reduce the number of trainable parameters required, inspiring a direction for further optimizing visual instruction tuning. We introduce Modality Linear Representation-Steering (MoReS) to achieve the goal. MoReS effectively re-balances the intrinsic modalities throughout the model, where the key idea is to steer visual representations through linear transformations in the visual subspace across each model layer. To validate our solution, we composed LLaVA Steering, a suite of models integrated with the proposed MoReS method. Evaluation results show that the composed LLaVA Steering models require, on average, 500 times fewer trainable parameters than LoRA needs while still achieving comparable performance across three visual benchmarks and eight visual question-answering tasks. Last, we present the LLaVA Steering Factory, an in-house developed platform that enables researchers to quickly customize various MLLMs with component-based architecture for seamlessly integrating state-of-the-art models, and evaluate their intrinsic modality imbalance.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[430]

T. Liu, Z. Lai, J. Wang, G. Zhang, S. Chen, P. Torr, V. Demberg, V. Tresp and J. Gu.
Multimodal Pragmatic Jailbreak on Text-to-image Models.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL GitHub

Abstract

Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two closed-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from around 10% to 70% where DALLE 3 demonstrates almost the highest unsafety. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while these filters may be effective for single modality detection, they fail to work against our jailbreak. We also investigate the underlying reason for such jailbreaks, from the perspective of text rendering capability and training data. Our work provides a foundation for further development towards more secure and reliable T2I models.

MCML Authors

Tong Liu

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[429]

T. Liu, X. Yu, W. Zhou, J. Gu and V. Tresp.
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL

Abstract

Efficient preference optimization algorithms such as Direct Preference Optimization (DPO) have become a popular approach in aligning large language models (LLMs) with human preferences. These algorithms implicitly treat the LLM as a reward model, and focus on training it to correct misranked preference pairs. However, recent work~citep{chen2024preference} empirically finds that DPO training textit{rarely improves these misranked preference pairs}, despite its gradient emphasizing on these cases. We introduce FocalPO, a DPO variant that instead textit{down-weighs} misranked preference pairs and prioritizes enhancing the model’s understanding of pairs that it can already rank correctly. Inspired by Focal Loss used in vision tasks, FocalPO achieves this by adding a modulating factor to dynamically scale DPO loss. Our experiment demonstrates that FocalPO surpasses DPO and its variants on popular benchmarks like Alpaca Eval 2.0 using Mistral-Base-7B and Llama-3-Instruct-8B. Additionally, we empirically reveals how FocalPO affects training on correct and incorrect sample groups, further underscoring its effectiveness.

MCML Authors

Tong Liu

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

[428]

E. Nie, B. Shao, Z. Ding, M. Wang, H. Schmid and H. Schütze.
BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL GitHub

Abstract

Large language models (LLMs) possess extensive parametric knowledge, but this knowledge is difficult to update with new information because retraining is very expensive and infeasible for closed-source models. Knowledge editing (KE) has emerged as a viable solution for updating the knowledge of LLMs without compromising their overall performance. On-the-fly KE methods, inspired by in-context learning (ICL), have shown great promise and allow LLMs to be treated as black boxes. In the past, KE was primarily employed in English contexts, whereas the potential for cross-lingual KE in current English-centric LLMs has not been fully explored. To foster more research in this direction, we introduce the BMIKE-53 benchmark for evaluating cross-lingual KE on 53 diverse languages across three KE task types. We also propose a gradient-free KE method called Multilingual In-context Knowledge Editing (MIKE) and evaluate it on BMIKE-53. Our evaluation focuses on cross-lingual knowledge transfer in terms of reliability, generality, locality, and portability, offering valuable insights and a framework for future research in cross-lingual KE.

MCML Authors

Ercong Nie

Computational Linguistics

Zifeng Ding

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

Mingyang Wang

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[427]

J. Hanselle, A. Javanmardi, T. Oberkofler, Y. Sale and E. Hüllermeier.
Conformal Prediction without Nonconformity Scores.
UAI 2025 - 41st Conference on Uncertainty in Artificial Intelligence. Rio de Janeiro, Brazil, Jul 21-25, 2025. To be published.

Abstract

Conformal prediction (CP) is an uncertainty quantification framework that allows for constructing
statistically valid prediction sets. Key to the construction of these sets is the notion of nonconformity function, which assigns a real-valued score to individual data points: only those (hypothetical) data points contribute to a prediction set that sufficiently conform to the data. The point of departure of this work is the observation that CP predictions are invariant against (strictly) monotone transformations of a nonconformity function. In other words, it is only the ordering of the scores that matters, not their quantitative values. Consequently, instead of scoring individual data points, a conformal predictor only needs to be able to compare pairs of data points, deciding which of them is the more conforming one. This suggests an interesting connection between CP and preference learning, in particular learning-to-rank methods, and makes CP amenable to training data in the form of (qualitative) preferences. Elaborating on
this connection, we propose methods for learning (latent) nonconformity functions from data of that
kind and show their usefulness in real-world classification tasks.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Tobias Oberkofler

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[426]

F. Kiwitt, B. Tahmasebi and S. Jegelka.
Symmetries in Weight Space Learning: To Retain or Remove?
HiLD @ICML 2025 - Workshop on High-dimensional Learning Dynamics at the 42nd International Conference on Machine Learning (ICML 2025). Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

Weight space learning, an emerging paradigm that seeks to understand neural networks through their space of parameters (weights), has shown promise in a variety of applications, including but not limited to predicting model behavior and addressing privacy concerns. However, weight spaces often exhibit inherent symmetries that impact both theory and practice, such as the scale and rotational invariances found in the Low-Rank Adaptation (LoRA) method, which is the state-of-the-art fine-tuning algorithm for Large Language Models (LLMs). In this work, we investigate a general weight space learning problem under symmetries, focusing on a fundamental question: What is the appropriate formulation for this problem in the presence of symmetries (such as those in LoRA), and should redundant representations that encode the same end-to-end function be removed? We address this question by fully characterizing a new space of symmetric weights, demonstrating that the relevance of redundancy depends on the function being predicted. Specifically, we show that end-to-end symmetries (such as those in LoRA) should not always be removed, as doing so may compromise the universality of the weight space learning problem. To our knowledge, this is the first time this phenomenon has been formally identified and presented, yielding insights into a broad class of weight space learning problems.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[425]

W. Durani, T. Nitzl, C. Plant and C. Böhm.
Weakly Supervised Anomaly Detection via Dual-Tailed Kernel.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. URL

Abstract

Detecting anomalies with limited supervision is challenging due to the scarcity of labeled anomalies, which often fail to capture the diversity of abnormal behaviors. We propose Weakly Supervised Anomaly Detection via Dual-Tailed Kernel (WSAD-DT), a novel framework that learns robust latent representations to distinctly separate anomalies from normal samples under weak supervision. WSAD-DT introduces two centroids—one for normal samples and one for anomalies—and leverages a dual-tailed kernel scheme: a light-tailed kernel to compactly model in-class points and a heavy-tailed kernel to main- tain a wider margin against out-of-class instances. To preserve intra-class diversity, WSAD-DT in- corporates kernel-based regularization, encouraging richer representations within each class. Furthermore, we devise an ensemble strategy that partition unlabeled data into diverse subsets, while sharing the limited labeled anomalies among these partitions to maximize their impact. Empirically, WSAD-DT achieves state-of-the-art performance on several challenging anomaly detection benchmarks, outperforming leading ensemble-based methods such as XGBOD.

MCML Authors

Walid Durani

Database Systems and Data Mining AI Lab

[424]

X. Feng, Z. Jiang, T. Kaufmann, E. Hüllermeier, P. Weng and Y. Zhu.
Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. URL

Abstract

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. However, traditional methods based on pairwise trajectory comparisons face notable challenges, including the difficulty in comparing trajectories with subtle differences and the limitation of conveying only ordinal information, limiting direct inference of preference strength. In this paper, we introduce a novel distinguishability query, allowing humans to express preference strength by comparing two pairs of trajectories. Labelers first indicate which pair is easier to compare, then provide preference feedback only on the easier pair. Our proposed query type directly captures preference strength and is expected to reduce the cognitive load on the labeler. We further connect this query to cardinal utility and difference relations and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results demonstrate the potential of our method for faster, data-efficient learning and improved user-friendliness in RLHF benchmarks, particularly in classical control settings where preference strength is critical for expected utility maximization.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[423]

R. Sharma, S. Mukherjee, A. Šipka, E. Hüllermeier, S. Vollmer, S. Redyuk and D. A. Selby.
X-Hacking: The Threat of Misguided AutoML.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. URL

Abstract

Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values. We show how easily an automated machine learning pipeline can be adapted to exploit model multiplicity at scale: searching a set of ‘defensible’ models with similar predictive performance to find a desired explanation. We formulate the trade-off between explanation and accuracy as a multi-objective optimisation problem, and illustrate empirically on familiar real-world datasets that, on average, Bayesian optimisation accelerates X-hacking 3-fold for features susceptible to it, versus random sampling. We show the vulnerability of a dataset to X-hacking can be determined by information redundancy among features. Finally, we suggest possible methods for detection and prevention, and discuss ethical implications for the credibility and reproducibility of XAI.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence and Machine Learning

[422]

J. Schweisthal, D. Frauen, M. Schröder, K. Heß, N. Kilbertus and S. Feuerriegel.
Learning Representations of Instruments for Partial Identification of Treatment Effects.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Reliable estimation of treatment effects from observational data is important in many disciplines such as medicine. However, estimation is challenging when unconfoundedness as a standard assumption in the causal inference literature is violated. In this work, we leverage arbitrary (potentially high-dimensional) instruments to estimate bounds on the conditional average treatment effect (CATE). Our contributions are three-fold: (1) We propose a novel approach for partial identification through a mapping of instruments to a discrete representation space so that we yield valid bounds on the CATE. This is crucial for reliable decision-making in real-world applications. (2) We derive a two-step procedure that learns tight bounds using a tailored neural partitioning of the latent instrument space. As a result, we avoid instability issues due to numerical approximations or adversarial training. Furthermore, our procedure aims to reduce the estimation variance in finite-sample settings to yield more reliable estimates. (3) We show theoretically that our procedure obtains valid bounds while reducing estimation variance. We further perform extensive experiments to demonstrate the effectiveness across various settings. Overall, our procedure offers a novel path for practitioners to make use of potentially high-dimensional instruments (e.g., as in Mendelian randomization).

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Maresa Schröder

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[421]

A. Soleymani, B. Tahmasebi, S. Jegelka and P. Jaillet.
Learning with Exact Invariances in Polynomial Time.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

We study the statistical-computational trade-offs for learning with exact invariances (or symmetries) using kernel regression. Traditional methods, such as data augmentation, group averaging, canonicalization, and frame-averaging, either fail to provide a polynomial-time solution or are not applicable in the kernel setting. However, with oracle access to the geometric properties of the input space, we propose a polynomial-time algorithm that learns a classifier with emph{exact} invariances. Moreover, our approach achieves the same excess population risk (or generalization error) as the original kernel regression problem. To the best of our knowledge, this is the first polynomial-time algorithm to achieve exact (not approximate) invariances in this context. Our proof leverages tools from differential geometry, spectral theory, and optimization. A key result in our development is a new reformulation of the problem of learning under invariances as optimizing an infinite number of linearly constrained convex quadratic programs, which may be of independent interest.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[420]

J. Zausinger, L. Pennig, A. Kozina, S. Sdahl, J. Sikora, A. Dendorfer, T. Kuznetsov, M. Hagog, N. Wiedemann, K. Chlodny, V. Limbach, A. Ketteler, T. Prein, V. M. Singh, M. M. Danziger and J. Born.
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL GitHub

Abstract

While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially arithmetic. One fundamental limitation is the nature of the Cross Entropy loss, which assumes a nominal scale and thus cannot convey proximity between generated number tokens. In response, we here present a regression-like loss that operates purely on token level. Our proposed Number Token Loss (NTL) comes in two flavors and minimizes either the norm or the Wasserstein distance between the numerical values of the real and predicted number tokens. NTL can easily be added to any language model and extend the Cross Entropy objective during training without runtime overhead. We evaluate the proposed scheme on various mathematical datasets and find that it consistently improves performance in math-related tasks. In a direct comparison on a regression task, we find that NTL can match the performance of a regression head, despite operating on token level. Finally, we scale NTL up to 3B parameter models and observe improved performance, demonstrating its potential for seamless integration into LLMs. We hope that this work can inspire LLM developers to improve their pretraining objectives.

MCML Authors

Lars Pennig

A3 | Computational Models
→ Group Matthias Schubert

Ethics in Systems Design and Machine Learning

[419]

L. Xu, M. Sarkar, A. I. Lonappan, Í. Zubeldia, P. Villanueva-Domingo, S. Casas, C. Fidler, C. Amancharla, U. Tiwari, A. Bayer, C. A. Ekioui, M. Cranmer, A. Dimitrov, J. Fergusson, K. Gandhi, S. Krippendorf, A. Laverick, J. Lesgourgues, A. Lewis, T. Meier, B. Sherwin, K. Surrao, F. Villaescusa-Navarro, C. Wang, X. Xu and B. Bolliet.
Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery.
ML4Astro @ICML 2025 - Machine Learning for Astrophysics at the 42nd International Conference on Machine Learning (ICML 2025). Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

We present a multi-agent system for automation of scientific research tasks, cmbagent (this https URL). The system is formed by about 30 Large Language Model (LLM) agents and implements a Planning & Control strategy to orchestrate the agentic workflow, with no human-in-the-loop at any point. Each agent specializes in a different task (performing retrieval on scientific papers and codebases, writing code, interpreting results, critiquing the output of other agents) and the system is able to execute code locally. We successfully apply cmbagent to carry out a PhD level cosmology task (the measurement of cosmological parameters using supernova data) and evaluate its performance on two benchmark sets, finding superior performance over state-of-the-art LLMs. The source code is available on GitHub, demonstration videos are also available, and the system is deployed on HuggingFace and will be available on the cloud.

MCML Authors

Thomas Meier

Dr.

[418]

Z. Li, X. Han, Y. Li, N. Strauß and M. Schubert.
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions.
WM @ICML 2025 - Workshop on Building Physically Plausible World Models at the 42nd International Conference on Machine Learning (ICML 2025). Vancouver, Canada, Jul 13-19, 2025. To be published.

Abstract

Diffusion-based world models have demonstrated strong capabilities in synthesizing realistic long-horizon trajectories for offline reinforcement learning (RL). However, many existing methods do not directly generate actions alongside states and rewards, limiting their compatibility with standard value-based offline RL algorithms that rely on one-step temporal difference (TD) learning. While prior work has explored joint modeling of states, rewards, and actions to address this issue, such formulations often lead to increased training complexity and reduced performance in practice. Therefore, in this paper, we propose a diffusion-based world model that generates state-reward trajectories conditioned on the current state, action, and return-to-go value, and efficiently infers missing actions via an inverse dynamics model (IDM). This modular design produces complete synthetic transitions suitable for one-step TD-based offline RL, enabling effective and computationally efficient training. Empirically, we show that conservative offline RL algorithms such as TD3BC and IQL benefit significantly from training on these augmented trajectories, consistently outperforming prior diffusion-based baselines across multiple tasks in the D4RL benchmark.

MCML Authors

Zongyue Li

Spatial Artificial Intelligence

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[417]

P. Kolpaczki, T. Nielen and E. Hüllermeier.
Antithetic Sampling for Top-k Shapley Identification.
xAI 2025 - 3rd World Conference on Explainable Artificial Intelligence. Istanbul, Turkey, Jul 09-11, 2025. Preprint. arXiv

Abstract

Additive feature explanations rely primarily on game-theoretic notions such as the Shapley value by viewing features as cooperating players. The Shapley value’s popularity in and outside of explainable AI stems from its axiomatic uniqueness. However, its computational complexity severely limits practicability. Most works investigate the uniform approximation of all features’ Shapley values, needlessly consuming samples for insignificant features. In contrast, identifying the k most important features can already be sufficiently insightful and yields the potential to leverage algorithmic opportunities connected to the field of multi-armed bandits. We propose Comparable Marginal Contributions Sampling (CMCS), a method for the top-k identification problem utilizing a new sampling scheme taking advantage of correlated observations. We conduct experiments to showcase the efficacy of our method in compared to competitive baselines. Our empirical findings reveal that estimation quality for the approximate-all problem does not necessarily transfer to top-k identification and vice versa.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[416]

P. Knab, S. Marton, U. Schlegel and C. Bartelt.
Which LIME should I trust? Concepts, Challenges, and Solutions.
xAI 2025 - 3rd World Conference on Explainable Artificial Intelligence. Istanbul, Turkey, Jul 09-11, 2025. To be published. Preprint available. arXiv GitHub

Abstract

As neural networks become dominant in essential systems, Explainable Artificial Intelligence (XAI) plays a crucial role in fostering trust and detecting potential misbehavior of opaque models. LIME (Local Interpretable Model-agnostic Explanations) is among the most prominent model-agnostic approaches, generating explanations by approximating the behavior of black-box models around specific instances. Despite its popularity, LIME faces challenges related to fidelity, stability, and applicability to domain-specific problems. Numerous adaptations and enhancements have been proposed to address these issues, but the growing number of developments can be overwhelming, complicating efforts to navigate LIME-related research. To the best of our knowledge, this is the first survey to comprehensively explore and collect LIME’s foundational concepts and known limitations. We categorize and compare its various enhancements, offering a structured taxonomy based on intermediate steps and key issues. Our analysis provides a holistic overview of advancements in LIME, guiding future research and helping practitioners identify suitable approaches. Additionally, we provide a continuously updated interactive website (this https URL), offering a concise and accessible overview of the survey.

MCML Authors

Udo Schlegel

Database Systems and Data Mining AI Lab

[415]

Y. Sale and A. Ramdas.
Online Selective Conformal Prediction: Errors and Solutions.
Transactions on Machine Learning Research (Jul. 2025). Preprint. URL

Abstract

In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for this by suitably selecting the calibration data. In this paper, we evaluate existing calibration selection strategies and pinpoint some fundamental errors in the associated claims that guarantee selection-conditional coverage and control of the false coverage rate (FCR). To address these shortcomings, we propose novel calibration selection strategies that provably preserve the exchangeability of the calibration data and the selected test datum. Consequently, we demonstrate that online selective conformal inference with these strategies guarantees both selection-conditional coverage and FCR control. Our theoretical findings are supported by experimental evidence examining tradeoffs between valid methods.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

[414]

M. F. Dasdelen, H. Lim, M. Buck, K. S. Götze, C. Marr and S. Schneider.
CytoSAE: Interpretable Cell Embeddings for Hematology.
Preprint (Jul. 2025). arXiv GitHub

Abstract

Sparse autoencoders (SAEs) emerged as a promising tool for mechanistic interpretability of transformer-based foundation models. Very recently, SAEs were also adopted for the visual domain, enabling the discovery of visual concepts and their patch-wise attribution to tokens in the transformer model. While a growing number of foundation models emerged for medical imaging, tools for explaining their inferences are still lacking. In this work, we show the applicability of SAEs for hematology. We propose CytoSAE, a sparse autoencoder which is trained on over 40,000 peripheral blood single-cell images. CytoSAE generalizes to diverse and out-of-domain datasets, including bone marrow cytology, where it identifies morphologically relevant concepts which we validated with medical experts. Furthermore, we demonstrate scenarios in which CytoSAE can generate patient-specific and disease-specific concepts, enabling the detection of pathognomonic cells and localized cellular abnormalities at the patch level. We quantified the effect of concepts on a patient-level AML subtype classification task and show that CytoSAE concepts reach performance comparable to the state-of-the-art, while offering explainability on the sub-cellular level.

MCML Authors

Steffen Schneider

Dr.

Dynamical Inference

[413]

S. Haas and E. Hüllermeier.
Aleatoric and Epistemic Uncertainty Measures for Ordinal Classification through Binary Reduction.
Preprint (Jul. 2025). arXiv

Abstract

Ordinal classification problems, where labels exhibit a natural order, are prevalent in high-stakes fields such as medicine and finance. Accurate uncertainty quantification, including the decomposition into aleatoric (inherent variability) and epistemic (lack of knowledge) components, is crucial for reliable decision-making. However, existing research has primarily focused on nominal classification and regression. In this paper, we introduce a novel class of measures of aleatoric and epistemic uncertainty in ordinal classification, which is based on a suitable reduction to (entropy- and variance-based) measures for the binary case. These measures effectively capture the trade-off in ordinal classification between exact hit-rate and minimial error distances. We demonstrate the effectiveness of our approach on various tabular ordinal benchmark datasets using ensembles of gradient-boosted trees and multi-layer perceptrons for approximate Bayesian inference. Our method significantly outperforms standard and label-wise entropy and variance-based measures in error detection, as indicated by misclassification rates and mean absolute error. Additionally, the ordinal measures show competitive performance in out-of-distribution (OOD) detection. Our findings highlight the importance of considering the ordinal nature of classification problems when assessing uncertainty.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[412]

T. Meier and K. Khutsishvili.
Who Owns the Future? Ways to Understand Power, Technology, and the Moral Commons.
Preprint (Jul. 2025). URL

Abstract

The ascent of tech billionaires—and, depending on the market, soon trillionaires—signals more than a shift in global economic structures; it marks a transformation in the moral and cultural conditions under which democratic life is sustained. This contribution offers a communitarian critique of Big Tech’s influence, grounded in the philosophical frameworks of Charles Taylor, Michael Sandel, and virtue ethicist Shannon Vallor, and further supported by public goods theory and economic insights from Paul Samuelson and Joseph Stiglitz, with Elinor Ostrom’s work emphasizing the civic importance of collective stewardship. It contends that the challenge to democracy posed by concentrated digital power is not merely institutional, economic, or ethical, but a disruption of the very conditions for democratic citizenship.

MCML Authors

Thomas Meier

Dr.

[411]

H. Chen, H. Li, Y. Zhang, G. Zhang, J. Bi, P. Torr, J. Gu, D. Krompass and V. Tresp.
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM’s pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client’s local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Hang Li

* Former Member

Yao Zhang

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[410]

T. Hannan, M. M. Islam, J. Gu, T. Seidl and G. Bertasius.
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL GitHub

Abstract

Large language models (LLMs) excel at retrieving information from lengthy text, but their vision-language counterparts (VLMs) face difficulties with hour-long videos, especially for temporal grounding. Specifically, these VLMs are constrained by frame limitations, often losing essential temporal details needed for accurate event localization in extended video content. We propose ReVisionLLM, a recursive vision-language model designed to locate events in hour-long videos. Inspired by human search strategies, our model initially targets broad segments of interest, progressively revising its focus to pinpoint exact temporal boundaries. Our model can seamlessly handle videos of vastly different lengths, from minutes to hours. We also introduce a hierarchical training strategy that starts with short clips to capture distinct events and progressively extends to longer videos. To our knowledge, ReVisionLLM is the first VLM capable of temporal grounding in hour-long videos, outperforming previous state-of-the-art methods across multiple datasets by a significant margin (+2.6% R1@0.1 on MAD).

MCML Authors

Tanveer Hannan

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[409]

G. Zhang, M. L. A. Fok, J. Ma, Y. Xia, D. Cremers, P. Torr, V. Tresp and J. Gu.
Localizing Events in Videos with Multimodal Queries.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current research is that semantic queries are typically in natural language that depicts the semantics of the target event. This setting overlooks the potential for multimodal semantic queries composed of images and texts. To address this gap, we introduce a new benchmark, ICQ, for localizing events in videos with multimodal queries, along with a new evaluation dataset ICQ-Highlight. Our new benchmark aims to evaluate how well models can localize an event given a multimodal semantic query that consists of a reference image, which depicts the event, and a refinement text to adjust the images’ semantics. To systematically benchmark model performance, we include 4 styles of reference images and 5 types of refinement texts, allowing us to explore model performance across different domains. We propose 3 adaptation methods that tailor existing models to our new setting and evaluate 10 SOTA models, ranging from specialized to large-scale foundation models. We believe this benchmark is an initial step toward investigating multimodal queries in video event localization.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[408]

T. Liu, Z. Lai, J. Wang, G. Zhang, S. Chen, P. Torr, V. Demberg, V. Tresp and J. Gu.
Multimodal Pragmatic Jailbreak on Text-to-image Models.
ReGenAI @CVPR 2025 - 2nd Workshop on Responsible Generative AI at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025). Nashville, TN, USA, Jun 11-15, 2025. Best Paper Award. URL GitHub

Abstract

MCML Authors

Tong Liu

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[407]

M. Aljoud, G. M. Tavares, C. Leiber and T. Seidl.
DCMatch - Identify Matching Architectures in Deep Clustering through Meta-Learning.
PAKDD 2025 - 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Sydney, Australia, Jun 10-13, 2025. To be published.

Abstract

The effectiveness of deepclustering algorithms like DeepEmbedded Clustering (DEC) is heavily influenced by the architecture of the neural network employed. However, selecting an optimal architecture is challenging due to the absence of labels in clustering tasks, which makes traditional Neural Architecture Search (NAS) methods unsuitable. To address this, we propose a novel dataset characterization method specifically tailored for image datasets, combining deep-learning-based and sta tistical feature extraction techniques. By utilizing features extracted from a small subset of images, our method effectively captures both high-level semantic and low-level statistical properties of the data. These dataset characteristics are then employed in a meta-learning framework to recommend autoencoder architectures likely to outperform default configurations. Extensive experiments on 20 image datasets validate the robustness of our approach, achieving improved clustering performance on 16 datasets compared to the baseline configuration.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

Collin Leiber

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[406]

M. Ahmadpanah, M. Gobbi, D. Hedin, J. Kinder and A. Sabelfeld.
CodeX: Contextual Flow Tracking for Browser Extensions.
CODASPY 2025 - 15th ACM Conference on Data and Application Security and Privacy. Pittsburgh, PA, USA, Jun 04-06, 2025. DOI

Abstract

Browser extensions put millions of users at risk when misusing their elevated privileges. Despite the current practices of semi-automated code vetting, privacy-violating extensions still thrive in the official stores. We propose an approach for tracking contextual flows from browser-specific sensitive sources like cookies, browsing history, bookmarks, and search terms to suspicious network sinks through network requests. We demonstrate the effectiveness of the approach by a prototype called CodeX that leverages the power of CodeQL while breaking away from the conservativeness of bug-finding flavors of the traditional CodeQL taint analysis. Applying CodeX to the extensions published on the Chrome Web Store between March 2021 and March 2024 identified 1,588 extensions with risky flows. Manual verification of 339 of those extensions resulted in flagging 212 as privacy-violating, impacting up to 3.6M users.

MCML Authors

Johannes Kinder

Prof. Dr.

Programming Languages and Artificial Intelligence

[405]

V. Margraf, T. Koerner, A. Tornede and M. Wever.
RunAndSchedule2Survive: Algorithm Scheduling Based on Run2Survive.
ACM Transactions on Evolutionary Learning and Optimization Just accepted (Jun. 2025). DOI

Abstract

The algorithm selection problem aims to identify the most suitable algorithm for a given problem instance under specific time constraints, where suitability typically refers to a performance metric such as algorithm runtime. While previous work has employed machine learning techniques to tackle this challenge, methods from survival analysis have proven particularly effective. This paper presents RunAndSchedule2Survive to address the more general and complex problem of algorithm scheduling, where the objective is to allocate computational resources across multiple algorithms to maximize performance within specified time constraints. Our approach combines survival analysis with evolutionary algorithms to optimize algorithm schedules by leveraging runtime distributions modeled as survival functions. Experimental results across various standard benchmarks demonstrate that our approach significantly outperforms previous methods for algorithm scheduling and yields more robust results than its algorithm selection variant. More specifically, RunAndSchedule2Survive achieves superior performance in 20 out of 25 benchmark scenarios, surpassing hitherto state-of-the-art approaches.

MCML Authors

Valentin Margraf

Artificial Intelligence and Machine Learning

[404]

P. Gupta, M. Wever and E. Hüllermeier.
Information Leakage Detection through Approximate Bayes-optimal Prediction.
Information Sciences In Press, Journal Pre-proof.122419 (Jun. 2025). DOI

Abstract

In today’s data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor’s log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

MCML Authors

Marcel Wever

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Artificial Intelligence and Machine Learning

[403]

Abstract

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Debarghya Ghoshdastidar

Prof. Dr.

Theoretical Foundations of Artificial Intelligence

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[402]

A. Bergmeister, M. K. Lal, S. Jegelka and S. Sra.
A projection-based framework for gradient-free and parallel learning.
Preprint (Jun. 2025). arXiv

Abstract

MCML Authors

Andreas Bergmeister

Foundations of Deep Neural Networks

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

Suvrit Sra

Prof. Dr.

Resource Aware Machine Learning

[401]

C. Casolo, S. Becker and N. Kilbertus.
Identifiability Challenges in Sparse Linear Ordinary Differential Equations.
Preprint (Jun. 2025). arXiv

Abstract

Dynamical systems modeling is a core pillar of scientific inquiry across natural and life sciences. Increasingly, dynamical system models are learned from data, rendering identifiability a paramount concept. For systems that are not identifiable from data, no guarantees can be given about their behavior under new conditions and inputs, or about possible control mechanisms to steer the system. It is known in the community that ’linear ordinary differential equations (ODE) are almost surely identifiable from a single trajectory.’ However, this only holds for dense matrices. The sparse regime remains underexplored, despite its practical relevance with sparsity arising naturally in many biological, social, and physical systems. In this work, we address this gap by characterizing the identifiability of sparse linear ODEs. Contrary to the dense case, we show that sparse systems are unidentifiable with a positive probability in practically relevant sparsity regimes and provide lower bounds for this probability. We further study empirically how this theoretical unidentifiability manifests in state-of-the-art methods to estimate linear ODEs from data. Our results corroborate that sparse systems are also practically unidentifiable. Theoretical limitations are not resolved through inductive biases or optimization dynamics. Our findings call for rethinking what can be expected from data-driven dynamical system modeling and allows for quantitative assessments of how much to trust a learned linear ODE.

MCML Authors

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Sören Becker

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[400]

E. S. E. Eduardo Santos Escriche and S. Jegelka.
Learning equivariant models by discovering symmetries with learnable augmentations.
Preprint (Jun. 2025). arXiv

Abstract

Recently, a trend has emerged that favors learning relevant symmetries from data in geometric domains instead of designing constrained architectures. To do so, two popular options are (1) to modify the training protocol, e.g., with a specific loss and data augmentations (soft equivariance), or (2) to ignore equivariance and infer it only implicitly. However, both options have limitations: soft equivariance requires a priori knowledge about relevant symmetries, while inferring symmetries merely via the task and larger data lacks interpretability. To address both limitations, we propose SEMoLA, an end-to-end approach that jointly (1) discovers a priori unknown symmetries in the data via learnable data augmentations, and (2) softly encodes the respective approximate equivariance into an arbitrary unconstrained model. Hence, it does not need prior knowledge about symmetries, it offers interpretability, and it maintains robustness to distribution shifts. Empirically, we demonstrate the ability of SEMoLA to robustly discover relevant symmetries while achieving high prediction accuracy across various datasets, encompassing multiple data modalities and underlying symmetry groups.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[399]

X. Ma, C. Lin, Y. Zhang, V. Tresp and Y. Ma.
Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation.
Preprint (Jun. 2025). arXiv

Abstract

Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative ’team’ focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase-Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase-Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles. We plan to open-source the entire framework.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

[398]

Z. S. Taghavi, A. Modarressi, Y. Ma and H. Schütze.
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge.
Preprint (Jun. 2025). arXiv GitHub

Abstract

Retrieval systems are central to many NLP pipelines, but often rely on surface-level cues such as keyword overlap and lexical semantic similarity. To evaluate retrieval beyond these shallow signals, recent benchmarks introduce reasoning-heavy queries; however, they primarily shift the burden to query-side processing techniques – like prompting or multi-hop retrieval – that can help resolve complexity. In contrast, we present ImpliRet, a benchmark that shifts the reasoning challenge to document-side processing: The queries are simple, but relevance depends on facts stated implicitly in documents through temporal (e.g., resolving ’two days ago’), arithmetic, and world knowledge relationships. We evaluate a range of sparse and dense retrievers, all of which struggle in this setting: the best nDCG@10 is only 15.07%. We also test whether long-context models can overcome this limitation. But even with a short context of only ten documents, including the positive document, GPT-4.1 scores only 35.06%, showing that document-side reasoning remains a challenge.

MCML Authors

Zeinab Sadat Taghavi

Computational Linguistics

Ali Modarressi

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[397]

A. Wang, D. Shu, Y. Wang, Y. Ma and M. Du.
Improving LLM Reasoning through Interpretable Role-Playing Steering.
Preprint (Jun. 2025). arXiv

Abstract

Role-playing has emerged as an effective technique for enhancing the reasoning capabilities of large language models (LLMs). However, existing methods primarily rely on prompt engineering, which often lacks stability and interpretability. In this paper, we introduce Sparse Autoencoder Role-Playing Steering (SRPS), a novel framework that identifies and manipulates internal model features associated with role-playing behavior. Our approach extracts latent representations from role-play prompts, selects the most relevant features based on activation patterns, and constructs a steering vector that can be injected into the model’s residual stream with controllable intensity. Our method enables fine-grained control over role-specific behavior and offers insights into how role information influences internal model activations. Extensive experiments across various reasoning benchmarks and model sizes demonstrate consistent performance gains. Notably, in the zero-shot chain-of-thought (CoT) setting, the accuracy of Llama3.1-8B on CSQA improves from 31.86% to 39.80%, while Gemma2-9B on SVAMP increases from 37.50% to 45.10%. These results highlight the potential of SRPS to enhance reasoning ability in LLMs, providing better interpretability and stability compared to traditional prompt-based role-playing.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[396]

M. Wang, S. Chen, K. Kersting, V. Tresp and Y. Ma.
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding.
Preprint (Jun. 2025). arXiv

Abstract

Recent advances in Video Large Language Models (VLLMs) have significantly enhanced their ability to understand video content. Nonetheless, processing long videos remains challenging due to high computational demands and the redundancy present in the visual data. In this work, we propose METok, a training-free, Multi-stage Event-based Token compression framework designed to accelerate VLLMs’ inference while preserving accuracy. METok progressively eliminates redundant visual tokens across three critical stages: (1) event-aware compression during vision encoding, (2) hierarchical token pruning in the prefilling stage based on semantic alignment and event importance, and (3) a decoding-stage KV Cache optimization that further reduces memory consumption. Our experiments on diverse video benchmarks demonstrate that METok achieves an optimal trade-off between efficiency and accuracy by dynamically selecting informative visual tokens. For instance, equipping LongVA-7B with METok realizes an 80.6% FLOPs reduction and 93.5% KV Cache memory savings, all while maintaining comparable or even superior accuracy.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[395]

Y. Wang, J. Bi, Y. Ma and S. Pirk.
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM.
Preprint (Jun. 2025). arXiv

Abstract

Multimodal Large Language Model (MLLM) often suffer from hallucinations. They over-rely on partial cues and generate incorrect responses. Recently, methods like Visual Contrastive Decoding (VCD) and Instruction Contrastive Decoding (ICD) have been proposed to mitigate hallucinations by contrasting predictions from perturbed or negatively prefixed inputs against original outputs. In this work, we uncover that methods like VCD and ICD fundamentally influence internal attention dynamics of the model. This observation suggests that their effectiveness may not stem merely from surface-level modifications to logits but from deeper shifts in attention distribution. Inspired by this insight, we propose an attention-steerable contrastive decoding framework that directly intervenes in attention mechanisms of the model to offer a more principled approach to mitigating hallucinations. Our experiments across multiple MLLM architectures and diverse decoding methods demonstrate that our approach significantly reduces hallucinations and improves the performance on benchmarks such as POPE, CHAIR, and MMHal-Bench, while simultaneously enhancing performance on standard VQA benchmarks.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[394]

G. Zhang, T. Hannan, H. Kleiner, B. Aydemir, X. Xie, J. Lan, T. Seidl, V. Tresp and J. Gu.
AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data Interaction.
Preprint (Jun. 2025). arXiv

Abstract

An ideal vision-language agent serves as a bridge between the human users and their surrounding physical world in real-world applications like autonomous driving and embodied agents, and proactively provides accurate and timely responses given user intents. An intriguing challenge arises when agents interact with the world as a dynamic data stream and ad-hoc queries from users: supporting knowledge for queries, namely evidence, usually appears asynchronously with the arrival time of queries, and agents need to ground their responses in historical data, present observations, and even future streams. We frame this challenge as Query-Evidence Asynchrony, where user queries and their supporting evidence typically arrive asynchronously in the streaming setting. This setting requires not only strong reasoning capabilities but also the ability to retain past observations and respond to queries with temporal awareness. In this paper, we introduce a diagnostic benchmark that evaluates Multimodal Large Language Models (MLLMs) on their ability to handle interaction with streaming data. Further, we present AViLA, Asynchronous Video-Language Agent for streaming data interaction that can handle ad-hoc queries and give time-aware responses. For this purpose, AViLA consists of three key modules: comprehensive memory retention, evidence identification, and evidence-grounded trigger, that are designed to maintain a general-purpose memory and respond readily and timely to queries. Our experiments show that existing models often fail to respond at appropriate times, while AViLA significantly improves both accuracy and temporal awareness. Our code and dataset will be publicly available.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Tanveer Hannan

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[393]

Y. Zhang, H. Gao, H. Chen, W. Li, Y. Ma and V. Tresp.
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models.
Preprint (Jun. 2025). arXiv

Abstract

Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challenges, including high computational demands, limited client capacity, substantial communication costs, and heterogeneous client data. Existing FL methods assume client-side deployment of full models, an assumption that breaks down for large-scale MLLMs due to their massive size and communication demands. To address these limitations, we propose FedNano, the first FL framework that centralizes the LLM on the server while introducing NanoEdge, a lightweight module for client-specific adaptation. NanoEdge employs modality-specific encoders, connectors, and trainable NanoAdapters with low-rank adaptation. This design eliminates the need to deploy LLM on clients, reducing client-side storage by 95%, and limiting communication overhead to only 0.01% of the model parameters. By transmitting only compact NanoAdapter updates, FedNano handles heterogeneous client data and resource constraints while preserving privacy. Experiments demonstrate that FedNano outperforms prior FL baselines, bridging the gap between MLLM scale and FL feasibility, and enabling scalable, decentralized multimodal AI systems.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Haokun Chen

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[392]

Y. Zhang, C. Lin, S. Tang, H. Chen, S. Zhou, Y. Ma and V. Tresp.
SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence.
Preprint (Jun. 2025). arXiv GitHub

Abstract

The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated system multi-agent generation.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Haokun Chen

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[391]

M. Dannehl, S. Valenzuela and J. Kinder.
Which Instructions Matter the Most: A Saliency Analysis of Binary Function Embedding Models.
DLSP @SPW 2025 - 8th Deep Learning Security and Privacy Workshop co-located with the 46th IEEE Symposium on Security and Privacy (SPW 2025). San Francisco, CA, May 15, 2025. DOI

Abstract

Current deep learning models for binary code struggle with explainability, since it is often unclear which factors are important for a given output. In this paper, we apply occlusion-based saliency analysis as an explainability method to binary code embedding models. We conduct experiments on two state-of-the-art Transformer-based models that take preprocessed assembly code as input and calculate embedding vectors for each function. We show that, during training, the models learn the importance of different instructions. From the results, we observe that call instructions and the names of external call targets are important. This observation confirms the intuition that function calls significantly impact the semantics of a function and therefore should also have a large impact on its learned embedding. This motivates the need for developing model architectures that integrate stronger analysis into preprocessing to further leverage call relationships.

MCML Authors

Moritz Dannehl

Programming Languages and Artificial Intelligence

Samuel Valenzuela

Programming Languages and Artificial Intelligence

Johannes Kinder

Prof. Dr.

Programming Languages and Artificial Intelligence

[390]

G. Manten, C. Casolo, S. W. Mogensen and N. Kilbertus.
An Asymmetric Independence Model for Causal Discovery on Path Spaces.
CLeaR 2025 - 4th Conference on Causal Learning and Reasoning. Lausanne, Switzerland, May 07-09, 2025. To be published. Preprint available. arXiv

Abstract

We develop the theory linking ‘E-separation’ in directed mixed graphs (DMGs) with conditional independence relations among coordinate processes in stochastic differential equations (SDEs), where causal relationships are determined by ‘which variables enter the governing equation of which other variables’. We prove a global Markov property for cyclic SDEs, which naturally extends to partially observed cyclic SDEs, because our asymmetric independence model is closed under marginalization. We then characterize the class of graphs that encode the same set of independence relations, yielding a result analogous to the seminal ‘same skeleton and v-structures’ result for directed acyclic graphs (DAGs). In the fully observed case, we show that each such equivalence class of graphs has a greatest element as a parsimonious representation and develop algorithms to identify this greatest element from data. We conjecture that a greatest element also exists under partial observations, which we verify computationally for graphs with up to four nodes.

MCML Authors

Georg Manten

Ethics in Systems Design and Machine Learning

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[389]

A. Koebler, T. Decker, I. Thon, V. Tresp and F. Buettner.
Incremental Uncertainty-aware Performance Monitoring with Active Labeling Intervention.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. To be published. URL

Abstract

We study the problem of monitoring machine learning models under gradual distribution shifts, where circumstances change slowly over time, often leading to unnoticed yet significant declines in accuracy. To address this, we propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a novel label-free method that estimates performance changes by modeling gradual shifts using optimal transport. In addition, IUPM quantifies the uncertainty in the performance prediction and introduces an active labeling procedure to restore a reliable estimate under a limited labeling budget. Our experiments show that IUPM outperforms existing performance estimation baselines in various gradual shift scenarios and that its uncertainty awareness guides label acquisition more effectively compared to other strategies.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[388]

J. Bi, D. Yan, Y. Wang, W. Huang, H. Chen, G. Wan, M. Ye, X. Xiao, H. Schütze, V. Tresp and Y. Ma.
CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process.
Preprint (May. 2025). arXiv

Abstract

Recent Large Reasoning Models significantly improve the reasoning ability of Large Language Models by learning to reason, exhibiting the promising performance in solving complex tasks. LRMs solve tasks that require complex reasoning by explicitly generating reasoning trajectories together with answers. Nevertheless, judging the quality of such an output answer is not easy because only considering the correctness of the answer is not enough and the soundness of the reasoning trajectory part matters as well. Logically, if the soundness of the reasoning part is poor, even if the answer is correct, the confidence of the derived answer should be low. Existing methods did consider jointly assessing the overall output answer by taking into account the reasoning part, however, their capability is still not satisfactory as the causal relationship of the reasoning to the concluded answer cannot properly reflected. In this paper, inspired by classical mechanics, we present a novel approach towards establishing a CoT-Kinetics energy equation. Specifically, our CoT-Kinetics energy equation formulates the token state transformation process, which is regulated by LRM internal transformer layers, as like a particle kinetics dynamics governed in a mechanical field. Our CoT-Kinetics energy assigns a scalar score to evaluate specifically the soundness of the reasoning phase, telling how confident the derived answer could be given the evaluated reasoning. As such, the LRM’s overall output quality can be accurately measured, rather than a coarse judgment (e.g., correct or incorrect) anymore.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[387]

H. Chen, Y. Zhang, Y. Bi, Y. Zhang, T. Liu, J. Bi, J. Lan, J. Gu, C. Grosser, D. Krompass, N. Navab and V. Tresp.
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs.
Preprint (May. 2025). arXiv

Abstract

In recent years, Large Language Models (LLMs) have achieved remarkable advancements, drawing significant attention from the research community. Their capabilities are largely attributed to large-scale architectures, which require extensive training on massive datasets. However, such datasets often contain sensitive or copyrighted content sourced from the public internet, raising concerns about data privacy and ownership. Regulatory frameworks, such as the General Data Protection Regulation (GDPR), grant individuals the right to request the removal of such sensitive information. This has motivated the development of machine unlearning algorithms that aim to remove specific knowledge from models without the need for costly retraining. Despite these advancements, evaluating the efficacy of unlearning algorithms remains a challenge due to the inherent complexity and generative nature of LLMs. In this work, we introduce a comprehensive auditing framework for unlearning evaluation, comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods. By using various auditing algorithms, we evaluate the effectiveness and robustness of different unlearning strategies. To explore alternatives beyond prompt-based auditing, we propose a novel technique that leverages intermediate activation perturbations, addressing the limitations of auditing methods that rely solely on model inputs and outputs.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Yao Zhang

Database Systems and Data Mining AI Lab

Tong Liu

Database Systems and Data Mining AI Lab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[386]

X. Guo, A. Li, Y. Wang, S. Jegelka and Y. Wang.
G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning.
Preprint (May. 2025). arXiv GitHub

Abstract

Although Large Language Models (LLMs) have demonstrated remarkable progress, their proficiency in graph-related tasks remains notably limited, hindering the development of truly general-purpose models. Previous attempts, including pretraining graph foundation models or employing supervised fine-tuning, often face challenges such as the scarcity of large-scale, universally represented graph data. We introduce G1, a simple yet effective approach demonstrating that Reinforcement Learning (RL) on synthetic graph-theoretic tasks can significantly scale LLMs’ graph reasoning abilities. To enable RL training, we curate Erdõs, the largest graph reasoning dataset to date comprising 50 diverse graph-theoretic tasks of varying difficulty levels, 100k training data and 5k test data, all drived from real-world graphs. With RL on Erdõs, G1 obtains substantial improvements in graph reasoning, where our finetuned 3B model even outperforms Qwen2.5-72B-Instruct (24x size). RL-trained models also show strong zero-shot generalization to unseen tasks, domains, and graph encoding schemes, including other graph-theoretic benchmarks as well as real-world node classification and link prediction tasks, without compromising general reasoning abilities. Our findings offer an efficient, scalable path for building strong graph reasoners by finetuning LLMs with RL on graph-theoretic tasks, which combines the strengths of pretrained LLM capabilities with abundant, automatically generated synthetic data, suggesting that LLMs possess graph understanding abilities that RL can elicit successfully.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[385]

P. Hofman, Y. Sale and E. Hüllermeier.
Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks.
Preprint (May. 2025). arXiv

Abstract

We address the problem of uncertainty quantification and propose measures of total, aleatoric, and epistemic uncertainty based on a known decomposition of (strictly) proper scoring rules, a specific type of loss function, into a divergence and an entropy component. This leads to a flexible framework for uncertainty quantification that can be instantiated with different losses (scoring rules), which makes it possible to tailor uncertainty quantification to the use case at hand. We show that this flexibility is indeed advantageous. In particular, we analyze the task of selective prediction and show that the scoring rule should ideally match the task loss. In addition, we perform experiments on two other common tasks. For out-of-distribution detection, our results confirm that a widely used measure of epistemic uncertainty, mutual information, performs best. Moreover, in the setting of active learning, our measure of epistemic uncertainty based on the zero-one-loss consistently outperforms other uncertainty measures.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[384]

A. Javanmardi, S. H. Zargarbashi, S. M. A. R. Thies, W. Waegeman, A. Bojchevski and E. Hüllermeier.
Optimal Conformal Prediction under Epistemic Uncertainty.
Preprint (May. 2025). arXiv

Abstract

Conformal prediction (CP) is a popular frequentist framework for representing uncertainty by providing prediction sets that guarantee coverage of the true label with a user-adjustable probability. In most applications, CP operates on confidence scores coming from a standard (first-order) probabilistic predictor (e.g., softmax outputs). Second-order predictors, such as credal set predictors or Bayesian models, are also widely used for uncertainty quantification and are known for their ability to represent both aleatoric and epistemic uncertainty. Despite their popularity, there is still an open question on ``how they can be incorporated into CP’’. In this paper, we discuss the desiderata for CP when valid second-order predictions are available. We then introduce Bernoulli prediction sets (BPS), which produce the smallest prediction sets that ensure conditional coverage in this setting. When given first-order predictions, BPS reduces to the well-known adaptive prediction sets (APS). Furthermore, when the validity assumption on the second-order predictions is compromised, we apply conformal risk control to obtain a marginal coverage guarantee while still accounting for epistemic uncertainty.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[383]

J. Lan, Y. Fu, U. Schlegel, G. Zhang, T. Hannan, H. Chen and T. Seidl.
My Answer Is NOT 'Fair': Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals.
Preprint (May. 2025). arXiv

Abstract

Social bias is a critical issue in large vision-language models (VLMs), where fairness- and ethics-related problems harm certain groups of people in society. It is unknown to what extent VLMs yield social bias in generative responses. In this study, we focus on evaluating and mitigating social bias on both the model’s response and probability distribution. To do so, we first evaluate four state-of-the-art VLMs on PAIRS and SocialCounterfactuals datasets with the multiple-choice selection task. Surprisingly, we find that models suffer from generating gender-biased or race-biased responses. We also observe that models are prone to stating their responses are fair, but indeed having mis-calibrated confidence levels towards particular social groups. While investigating why VLMs are unfair in this study, we observe that VLMs’ hidden layers exhibit substantial fluctuations in fairness levels. Meanwhile, residuals in each layer show mixed effects on fairness, with some contributing positively while some lead to increased bias. Based on these findings, we propose a post-hoc method for the inference stage to mitigate social bias, which is training-free and model-agnostic. We achieve this by ablating bias-associated residuals while amplifying fairness-associated residuals on model hidden layers during inference. We demonstrate that our post-hoc method outperforms the competing training strategies, helping VLMs have fairer responses and more reliable confidence levels.

MCML Authors

Udo Schlegel

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Tanveer Hannan

Database Systems and Data Mining AI Lab

Haokun Chen

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[382]

T. Löhr, P. Hofman, F. Mohr and E. Hüllermeier.
Credal Prediction based on Relative Likelihood.
Preprint (May. 2025). arXiv

Abstract

Predictions in the form of sets of probability distributions, so-called credal sets, provide a suitable means to represent a learner’s epistemic uncertainty. In this paper, we propose a theoretically grounded approach to credal prediction based on the statistical notion of relative likelihood: The target of prediction is the set of all (conditional) probability distributions produced by the collection of plausible models, namely those models whose relative likelihood exceeds a specified threshold. This threshold has an intuitive interpretation and allows for controlling the trade-off between correctness and precision of credal predictions. We tackle the problem of approximating credal sets defined in this way by means of suitably modified ensemble learning techniques. To validate our approach, we illustrate its effectiveness by experiments on benchmark datasets demonstrating superior uncertainty representation without compromising predictive performance. We also compare our method against several state-of-the-art baselines in credal prediction.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[381]

J. Wang, P. Gupta, I. Habernal and E. Hüllermeier.
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs.
Preprint (May. 2025). arXiv

Abstract

Recent studies demonstrate that Large Language Models (LLMs) are vulnerable to different prompt-based attacks, generating harmful content or sensitive information. Both closed-source and open-source LLMs are underinvestigated for these attacks. This paper studies effective prompt injection attacks against the 14 most popular open-source LLMs on five attack benchmarks. Current metrics only consider successful attacks, whereas our proposed Attack Success Probability (ASP) also captures uncertainty in the model’s response, reflecting ambiguity in attack feasibility. By comprehensively analyzing the effectiveness of prompt injection attacks, we propose a simple and effective hypnotism attack; results show that this attack causes aligned language models, including Stablelm2, Mistral, Openchat, and Vicuna, to generate objectionable behaviors, achieving around 90% ASP. They also indicate that our ignore prefix attacks can break all 14 open-source LLMs, achieving over 60% ASP on a multi-categorical dataset. We find that moderately well-known LLMs exhibit higher vulnerability to prompt injection attacks, highlighting the need to raise public awareness and prioritize efficient mitigation strategies.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence and Machine Learning

[380]

M. Wang, L. Lange, H. Adel, Y. Ma, J. Strötgen and H. Schütze.
Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes.
Preprint (May. 2025). arXiv

Abstract

Reasoning language models (RLMs) excel at complex tasks by leveraging a chain-of-thought process to generate structured intermediate steps. However, language mixing, i.e., reasoning steps containing tokens from languages other than the prompt, has been observed in their outputs and shown to affect performance, though its impact remains debated. We present the first systematic study of language mixing in RLMs, examining its patterns, impact, and internal causes across 15 languages, 7 task difficulty levels, and 18 subject areas, and show how all three factors influence language mixing. Moreover, we demonstrate that the choice of reasoning language significantly affects performance: forcing models to reason in Latin or Han scripts via constrained decoding notably improves accuracy. Finally, we show that the script composition of reasoning traces closely aligns with that of the model’s internal representations, indicating that language mixing reflects latent processing preferences in RLMs. Our findings provide actionable insights for optimizing multilingual reasoning and open new directions for controlling reasoning languages to build more interpretable and adaptable RLMs.

MCML Authors

Mingyang Wang

Computational Linguistics

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[379]

M. Spliethöver, T. Knebler, F. Fumagalli, M. Muschalik, B. Hammer, E. Hüllermeier and H. Wachsmuth.
Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. DOI

Abstract

Recent advances on instruction fine-tuning have led to the development of various prompting techniques for large language models, such as explicit reasoning steps. However, the success of techniques depends on various parameters, such as the task, language model, and context provided. Finding an effective prompt is, therefore, often a trial-and-error process. Most existing approaches to automatic prompting aim to optimize individual techniques instead of compositions of techniques and their dependence on the input. To fill this gap, we propose an adaptive prompting approach that predicts the optimal prompt composition ad-hoc for a given input. We apply our approach to social bias detection, a highly context-dependent task that requires semantic understanding. We evaluate it with three large language models on three datasets, comparing compositions to individual techniques and other baselines. The results underline the importance of finding an effective prompt composition. Our approach robustly ensures high detection performance, and is best in several settings. Moreover, first experiments on other tasks support its generalizability.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[378]

L. Fang, Y. Wang, Z. Liu, C. Zhang, S. Jegelka, J. Gao, B. Ding and Y. Wang.
What is Wrong with Perplexity for Long-context Language Modeling?
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL GitHub

Abstract

Handling long-context inputs is crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning. While recent approaches have extended the context windows of LLMs and employed perplexity (PPL) as a standard evaluation metric, PPL has proven unreliable for assessing long-context capabilities. The underlying cause of this limitation has remained unclear. In this work, we provide a comprehensive explanation for this issue. We find that PPL overlooks key tokens, which are essential for long-context understanding, by averaging across all tokens and thereby obscuring the true performance of models in long-context scenarios. To address this, we propose textbf{LongPPL}, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them. Our experiments demonstrate that LongPPL strongly correlates with performance on various long-context benchmarks (e.g., Pearson correlation of -0.96), significantly outperforming traditional PPL in predictive accuracy. Additionally, we introduce textbf{LongCE} (Long-context Cross-Entropy) loss, a re-weighting strategy for fine-tuning that prioritizes key tokens, leading to consistent improvements across diverse benchmarks. In summary, these contributions offer deeper insights into the limitations of PPL and present effective solutions for accurately evaluating and enhancing the long-context capabilities of LLMs.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[377]

A. Findeis, T. Kaufmann, E. Hüllermeier, S. Albanie and R. D. Mullins.
Inverse Constitutional AI: Compressing Preferences into Principles.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL GitHub

Abstract

Feedback data is widely used for fine-tuning and evaluating state-of-the-art AI models. Pairwise text preferences, where human or AI annotators select the “better” of two options, are particularly common. Such preferences are used to train (reward) models or to rank models with aggregate statistics. For many applications it is desirable to understand annotator preferences in addition to modelling them – not least because extensive prior work has shown various unintended biases in preference datasets. Yet, preference datasets remain challenging to interpret. Neither black-box reward models nor statistics can answer why one text is preferred over another. Manual interpretation of the numerous (long) response pairs is usually equally infeasible. In this paper, we introduce the Inverse Constitutional AI (ICAI) problem, formulating the interpretation of pairwise text preference data as a compression task. In constitutional AI, a set of principles (a constitution) is used to provide feedback and fine-tune AI models. ICAI inverts this process: given a feedback dataset, we aim to extract a constitution that best enables a large language model (LLM) to reconstruct the original annotations. We propose a corresponding ICAI algorithm and validate its generated constitutions quantitatively based on annotation reconstruction accuracy on several datasets: (a) synthetic feedback data with known principles; (b) AlpacaEval cross-annotated human feedback data; (c) crowdsourced Chatbot Arena data; and (d) PRISM data from diverse demographic groups. As an example application, we further demonstrate the detection of biases in human feedback data. As a short and interpretable representation of the original dataset, generated constitutions have many potential use cases: they may help identify undesirable annotator biases, better understand model performance, scale feedback to unseen data, or assist with adapting AI models to individual user or group preferences.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Artificial Intelligence and Machine Learning

[376]

M. Kollovieh, M. Lienen, D. Lüdke, L. Schwinn and S. Günnemann.
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Recent advancements in generative modeling, particularly diffusion models, have opened new directions for time series modeling, achieving state-of-the-art performance in forecasting and synthesis. However, the reliance of diffusion-based models on a simple, fixed prior complicates the generative process since the data and prior distributions differ significantly. We introduce TSFlow, a conditional flow matching (CFM) model for time series combining Gaussian processes, optimal transport paths, and data-dependent prior distributions. By incorporating (conditional) Gaussian processes, TSFlow aligns the prior distribution more closely with the temporal structure of the data, enhancing both unconditional and conditional generation. Furthermore, we propose conditional prior sampling to enable probabilistic forecasting with an unconditionally trained model. In our experimental evaluation on eight real-world datasets, we demonstrate the generative capabilities of TSFlow, producing high-quality unconditional samples. Finally, we show that both conditionally and unconditionally trained models achieve competitive results across multiple forecasting benchmarks.

MCML Authors

Marcel Kollovieh

Data Analytics & Machine Learning

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Steffen Schneider

Data Analytics & Machine Learning

[375]

R. G. Laiz, T. Schmidt and S. Schneider.
Self-supervised contrastive learning performs non-linear system identification.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Self-supervised learning (SSL) approaches have brought tremendous success across many tasks and domains. It has been argued that these successes can be attributed to a link between SSL and identifiable representation learning: Temporal structure and auxiliary variables ensure that latent representations are related to the true underlying generative factors of the data. Here, we deepen this connection and show that SSL can perform system identification in latent space. We propose DynCL, a framework to uncover linear, switching linear and non-linear dynamics under a non-linear observation model, give theoretical guarantees and validate them empirically.

MCML Authors

Tobias Schmidt

Dynamical Inference

Steffen Schneider

Dr.

Dynamical Inference

[374]

H. Lim, J. Choi, J. Choo and S. Schneider.
Sparse autoencoders reveal selective remapping of visual concepts during adaptation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g., shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.

MCML Authors

Steffen Schneider

Dr.

Dynamical Inference

[373]

G. Manten, C. Casolo, E. Ferrucci, S. Mogensen, C. Salvi and N. Kilbertus.
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Inferring the causal structure underlying stochastic dynamical systems from observational data holds great promise in domains ranging from science and health to finance. Such processes can often be accurately modeled via stochastic differential equations (SDEs), which naturally imply causal relationships via ‘which variables enter the differential of which other variables’. In this paper, we develop conditional independence (CI) constraints on coordinate processes over selected intervals that are Markov with respect to the acyclic dependence graph (allowing self-loops) induced by a general SDE model. We then provide a sound and complete causal discovery algorithm, capable of handling both fully and partially observed data, and uniquely recovering the underlying or induced ancestral graph by exploiting time directionality assuming a CI oracle. Finally, to make our algorithm practically usable, we also propose a flexible, consistent signature kernel-based CI test to infer these constraints from data. We extensively benchmark the CI test in isolation and as part of our causal discovery algorithms, outperforming existing approaches in SDE models and beyond.

MCML Authors

Georg Manten

Ethics in Systems Design and Machine Learning

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[372]

M. Muschalik, F. Fumagalli, P. Frazzetto, J. Strotherm, L. Hermes, A. Sperduti, E. Hüllermeier and B. Hammer.
Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Albeit the ubiquitous use of Graph Neural Networks (GNNs) in machine learning (ML) prediction tasks involving graph-structured data, their interpretability remains challenging. In explainable artificial intelligence (XAI), the Shapley Value (SV) is the predominant method to quantify contributions of individual features to a ML model’s output. Addressing the limitations of SVs in complex prediction models, Shapley Interactions (SIs) extend the SV to groups of features. In this work, we explain single graph predictions of GNNs with SIs that quantify node contributions and interactions among multiple nodes. By exploiting the GNN architecture, we show that the structure of interactions in node embeddings are preserved for graph prediction. As a result, the exponential complexity of SIs depends only on the receptive fields, i.e. the message-passing ranges determined by the connectivity of the graph and the number of convolutional layers. Based on our theoretical results, we introduce GraphSHAP-IQ, an efficient approach to compute any-order SIs exactly. GraphSHAP-IQ is applicable to popular message passing techniques in conjunction with a linear global pooling and output layer. We showcase that GraphSHAP-IQ substantially reduces the exponential complexity of computing exact SIs on multiple benchmark datasets. Beyond exact computation, we evaluate GraphSHAP-IQ’s approximation of SIs on popular GNN architectures and compare with existing baselines. Lastly, we visualize SIs of real-world water distribution networks and molecule structures using a SI-Graph.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[371]

L. Rauchwerger, S. Jegelka and R. Levie.
Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

We analyze the universality and generalization of graph neural networks (GNNs) on attributed graphs, i.e., with node attributes. To this end, we propose pseudometrics over the space of all attributed graphs that describe the fine-grained expressivity of GNNs. Namely, GNNs are both Lipschitz continuous with respect to our pseudometrics and can separate attributed graphs that are distant in the metric. Moreover, we prove that the space of all attributed graphs is relatively compact with respect to our metrics. Based on these properties, we prove a universal approximation theorem for GNNs and generalization bounds for GNNs on any data distribution of attributed graphs. The proposed metrics compute the similarity between the structures of attributed graphs via a hierarchical optimal transport between computation trees. Our work extends and unites previous approaches which either derived theory only for graphs with no attributes, derived compact metrics under which GNNs are continuous but without separation power, or derived metrics under which GNNs are continuous and separate points but the space of graphs is not relatively compact, which prevents universal approximation and generalization analysis.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[370]

B. Tahmasebi and S. Jegelka.
Generalization Bounds for Canonicalization: A Comparative Study with Group Averaging.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Canonicalization, a popular method for generating invariant or equivariant function classes from arbitrary function sets, involves initial data projection onto a reduced input space subset, followed by applying any learning method to the projected dataset. Despite recent research on the expressive power and continuity of functions represented by canonicalization, its generalization capabilities remain less explored. This paper addresses this gap by theoretically examining the generalization benefits and sample complexity of canonicalization, comparing them with group averaging, another popular technique for creating invariant or equivariant function classes. Our findings reveal two distinct regimes where canonicalization may outperform or underperform compared to group averaging, with precise quantification of this phase transition in terms of sample size, group action characteristics, and a newly introduced concept of alignment. To the best of our knowledge, this study represents the first theoretical exploration of such behavior, offering insights into the relative effectiveness of canonicalization and group averaging under varying conditions.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[369]

D. Herbst and S. Jegelka.
Higher-Order Graphon Neural Networks: Approximation and Cut Distance.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. Spotlight Presentation. URL

Abstract

Graph limit models, like graphons for limits of dense graphs, have recently been used to study size transferability of graph neural networks (GNNs). While most literature focuses on message passing GNNs (MPNNs), in this work we attend to the more powerful higher-order GNNs. First, we extend the -WL test for graphons (Böker, 2023) to the graphon-signal space and introduce signal-weighted homomorphism densities as a key tool. As an exemplary focus, we generalize Invariant Graph Networks (IGNs) to graphons, proposing Invariant Graphon Networks (IWNs) defined via a subset of the IGN basis corresponding to bounded linear operators. Even with this restricted basis, we show that IWNs of order are at least as powerful as the -WL test, and we establish universal approximation results for graphon-signals in distances. This significantly extends the prior work of Cai & Wang (2022), showing that IWNs—a subset of their IGN-small—retain effectively the same expressivity as the full IGN basis in the limit. In contrast to their approach, our blueprint of IWNs also aligns better with the geometry of graphon space, for example facilitating comparability to MPNNs. We highlight that, while typical higher-order GNNs are discontinuous w.r.t. cut distance—which causes their lack of convergence and is inherently tied to the definition of -WL—their transferability remains comparable to MPNNs.

MCML Authors

Daniel Herbst

Foundations of Deep Neural Networks

Stefanie Jegelka

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Foundations of Deep Neural Networks

[368]

Abstract

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Debarghya Ghoshdastidar

Prof. Dr.

Theoretical Foundations of Artificial Intelligence

[367]

Q. Zhang, Y. Wang, J. Cui, X. Pan, Q. Lei, S. Jegelka and Y. Wang.
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Deep learning models often suffer from a lack of interpretability due to polysemanticity, where individual neurons are activated by multiple unrelated semantics, resulting in unclear attributions of model behavior. Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability but are commonly believed to compromise accuracy. In this work, we challenge the prevailing belief of the accuracy-interpretability tradeoff, showing that monosemantic features not only enhance interpretability but also bring concrete gains in model performance. Across multiple robust learning scenarios-including input and label noise, few-shot learning, and out-of-domain generalization-our results show that models leveraging monosemantic features significantly outperform those relying on polysemantic features. Furthermore, we provide empirical and theoretical understandings on the robustness gains of feature monosemanticity. Our preliminary analysis suggests that monosemanticity, by promoting better separation of feature representations, leads to more robust decision boundaries. This diverse evidence highlights the generality of monosemanticity in improving model robustness. As a first step in this new direction, we embark on exploring the learning benefits of monosemanticity beyond interpretability, supporting the long-standing hypothesis of linking interpretability and robustness.

MCML Authors

Stefanie Jegelka

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Foundations of Deep Neural Networks

[366]

Z. Li, S. Yan, Y. Ma, Y. Li, X. Lyu and M. Schubert.
Beyond Single-Step: Multi-Frame Action-Conditiones Video Generation for Reinforcement Learning Environments.
World Models @ICLR 2025 - Workshop on World Models: Understanding, Modelling and Scaling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. URL

Abstract

World models achieved great success in learning the dynamics from both low-dimensional and high-dimensional states. Yet, there is no existing work to address multi-step generation task with high dimensional data. In this paper, we propose the first action-conditioned multi-frame video generation model, advancing world
model development by generating future states from actions. As opposed to recent single-step or autoregressive approaches, our model directly generates multiple future frames conditioned on past observations and action sequences. Our framework extends its capabilities to action-conditioned video generation by introducing an action encoder. This addition enables the spatiotemporal variational autoencoder and diffusion transformer in Open-Sora to effectively incorporate action information, ensuring precise and coherent video generation. We evaluated performance on Atari environments (Breakout, Pong, DemonAttack) using MSE, PSNR, and LPIPS. Results show that conditioning solely on future actions and embedding-based encoding improve generation accuracy and perceptual quality while capturing complex temporal dependencies like inertia. Our work paves the way for action-conditioned multi-step generative world models in dynamic environment.

MCML Authors

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[365]

T. Decker, V. Tresp and F. Buettner.
Why Uncertainty Calibration Matters for Reliable Perturbation-based Explanations.
XAI4Science @ICLR 2025 - Workshop XAI4Science: From Understanding Model Behavior to Discovering New Scientific Knowledge at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. URL

Abstract

Perturbation-based explanations are widely utilized to enhance the transparency of modern machine-learning models. However, their reliability is often compromised by the unknown model behavior under the specific perturbations used. This paper investigates the relationship between uncertainty calibration - the alignment of model confidence with actual accuracy - and perturbation-based explanations. We show that models frequently produce unreliable probability estimates when subjected to explainability-specific perturbations and theoretically prove that this directly undermines explanation quality. To address this, we introduce ReCalX, a novel approach to recalibrate models for improved perturbation-based explanations while preserving their original predictions. Experiments on popular computer vision models demonstrate that our calibration strategy produces explanations that are more aligned with human perception and actual object locations.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[364]

R. Visser, F. Fumagalli, M. Muschalik, E. Hüllermeier and B. Hammer.
Explaining Outliers using Isolation Forest and Shapley Interactions.
ESANN 2025 - European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges, Belgium, Apr 23-25, 2025. PDF

Abstract

In unsupervised machine learning, Isolation Forest (IsoForest) is a widely used algorithm for the eﬃcient detection of outliers. Identifying the features responsible for observed anomalies is crucial for practitioners, yet the ensemble nature of IsoForest complicates interpretation and comparison. As a remedy, SHAP is a prevalent method to interpret outlier scoring models by assigning contributions to individual features based on the Shapley Value (SV). However, complex anomalies typically involve interaction of features, and it is paramount for practitioners to distinguish such complex anomalies from simple cases. In this work, we propose Shapley Interactions (SIs) to enrich explanations of outliers with feature interactions. SIs, as an extension of the SV, decompose the outlier score into contributions of individual features and interactions of features up to a specified explanation order. We modify IsoForest to compute SI using TreeSHAP-IQ, an extension of TreeSHAP for tree-based models, using the shapiqpackage. Using a qualitative and quantitative analysis on synthetic and real-world datasets, we demonstrate the benefit of SI and feature interactions for outlier explanations over feature contributions alone.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[363]

Z. Ding.
Inductive representation learning and natural language question answering on temporal knowledge graphs.
Dissertation 2025. DOI

Abstract

Real-world applications such as recommendersystems, socialnetworks, andprotein-protein interactions often involve relational data. In recent years, there has been increasing interest in machine learning on such data, particularly in the context of knowledge graphs (KGs). KGs are structured relational data that store multi-relational information as directed graphs, where each node corresponds to an entity and each labeled edge represents a factual relationship between entities, e.g., (Oxford, located in, the United Kingdom). Traditional KGs assume time-invariant relationships. However, real-world relationships are dynamically evolving over time. For example, the chancellor of Germany in 2020 was Angela Merkel, but in 2022 it became Olaf Scholz. This necessitates the use of temporal knowledge graphs (TKGs), where temporal facts are introduced by coupling stationary facts with additional time identifiers, e.g., (Angela Merkel, is chancellor of, Germany, 2020). TKGs are more expressive than KGs as they model the temporal evolution of knowledge. Consequently, recent research has paid more attention to machine learning on TKGs. In this thesis, we focus on two machine learning problems: inductive knowledge representation learning and natural language question answering (QA) on TKGs. (Shortened)

MCML Authors

Zifeng Ding

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Database Systems and Data Mining AI Lab

[362]

C. Bülte, Y. Sale, T. Löhr, P. Hofman, G. Kutyniok and E. Hüllermeier.
An Axiomatic Assessment of Entropy- and Variance-based Uncertainty Quantification in Regression.
Preprint (Apr. 2025). arXiv

Abstract

MCML Authors

Christopher Bülte

Mathematical Foundations of Artificial Intelligence

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[361]

L. Fichtel, M. Spliethöver, E. Hüllermeier, P. Jimenez, N. Klowait, S. Kopp, A.-C. N. Ngomo, A. Robrecht, I. Scharlau, L. Terfloth, A.-L. Vollmer and H. Wachsmuth.
Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues.
Preprint (Apr. 2025). arXiv

Abstract

The ability to generate explanations that are understood by explainees is the quintessence of explainable artificial intelligence. Since understanding depends on the explainee’s background and needs, recent research has focused on co-constructive explanation dialogues, where the explainer continuously monitors the explainee’s understanding and adapts explanations dynamically. We investigate the ability of large language models (LLMs) to engage as explainers in co-constructive explanation dialogues. In particular, we present a user study in which explainees interact with LLMs, of which some have been instructed to explain a predefined topic co-constructively. We evaluate the explainees’ understanding before and after the dialogue, as well as their perception of the LLMs’ co-constructive behavior. Our results indicate that current LLMs show some co-constructive behaviors, such as asking verification questions, that foster the explainees’ engagement and can improve understanding of a topic. However, their ability to effectively monitor the current understanding and scaffold the explanations accordingly remains limited.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[360]

N. Röhrich, A. Hoffmann, R. Nordsieck, E. Zarbali and A. Javanmardi.
Masked Autoencoder Self Pre-Training for Defect Detection in Microelectronics.
Preprint (Apr. 2025). arXiv

Abstract

Whereas in general computer vision, transformer-based architectures have quickly become the gold standard, microelectronics defect detection still heavily relies on convolutional neural networks (CNNs). We hypothesize that this is due to the fact that a) transformers have an increased need for data and b) labelled image generation procedures for microelectronics are costly, and labelled data is therefore sparse. Whereas in other domains, pre-training on large natural image datasets can mitigate this problem, in microelectronics transfer learning is hindered due to the dissimilarity of domain data and natural images. Therefore, we evaluate self pre-training, where models are pre-trained on the target dataset, rather than another dataset. We propose a vision transformer (ViT) pre-training framework for defect detection in microelectronics based on masked autoencoders (MAE). In MAE, a large share of image patches is masked and reconstructed by the model during pre-training. We perform pre-training and defect detection using a dataset of less than 10.000 scanning acoustic microscopy (SAM) images labelled using transient thermal analysis (TTA). Our experimental results show that our approach leads to substantial performance gains compared to a) supervised ViT, b) ViT pre-trained on natural image datasets, and c) state-of-the-art CNN-based defect detection models used in the literature. Additionally, interpretability analysis reveals that our self pre-trained models, in comparison to ViT baselines, correctly focus on defect-relevant features such as cracks in the solder material. This demonstrates that our approach yields fault-specific feature representations, making our self pre-trained models viable for real-world defect detection in microelectronics.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

[359]

M. Scherbela, N. Gao, P. Grohs and S. Günnemann.
Accurate Ab-initio Neural-network Solutions to Large-Scale Electronic Structure Problems.
Preprint (Apr. 2025). arXiv

Abstract

We present finite-range embeddings (FiRE), a novel wave function ansatz for accurate large-scale ab-initio electronic structure calculations. Compared to contemporary neural-network wave functions, FiRE reduces the asymptotic complexity of neural-network variational Monte Carlo (NN-VMC) by ∼nel, the number of electrons. By restricting electron-electron interactions within the neural network, FiRE accelerates all key operations – sampling, pseudopotentials, and Laplacian computations – resulting in a real-world 10× acceleration in now-feasible 180-electron calculations. We validate our method’s accuracy on various challenging systems, including biochemical compounds, conjugated hydrocarbons, and organometallic compounds. On these systems, FiRE’s energies are consistently within chemical accuracy of the most reliable data, including experiments, even in cases where high-accuracy methods such as CCSD(T), AFQMC, or contemporary NN-VMC fall short. With these improvements in both runtime and accuracy, FiRE represents a new `gold-standard’ method for fast and accurate large-scale ab-initio calculations, potentially enabling new computational studies in fields like quantum chemistry, solid-state physics, and material design.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[358]

G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
3DV 2025 - 12th International Conference on 3D Vision. Singapore, Mar 25-28, 2025. To be published. Preprint available. arXiv

Abstract

We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Code and trained models are open-sourced.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Ruotong Liao

Database Systems and Data Mining AI Lab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[357]

C. Damke and E. Hüllermeier.
Adjusted Count Quantification Learning on Graphs.
Preprint (Mar. 2025). arXiv

Abstract

Quantification learning is the task of predicting the label distribution of a set of instances. We study this problem in the context of graph-structured data, where the instances are vertices. Previously, this problem has only been addressed via node clustering methods. In this paper, we extend the popular Adjusted Classify & Count (ACC) method to graphs. We show that the prior probability shift assumption upon which ACC relies is often not fulfilled and propose two novel graph quantification techniques: Structural importance sampling (SIS) makes ACC applicable in graph domains with covariate shift. Neighborhood-aware ACC improves quantification in the presence of non-homophilic edges. We show the effectiveness of our techniques on multiple graph quantification tasks.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[356]

R. Amoroso, G. Zhang, R. Koner, L. Baraldi, R. Cucchiara and V. Tresp.
Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Video Question Answering (Video QA) is a challenging video understanding task that requires models to comprehend entire videos, identify the most relevant information based on contextual cues from a given question, and reason accurately to provide answers. Recent advancements in Multimodal Large Language Models (MLLMs) have transformed video QA by leveraging their exceptional commonsense reasoning capabilities. This progress is largely driven by the effective alignment between visual data and the language space of MLLMs. However, for video QA, an additional space-time alignment poses a considerable challenge for extracting question-relevant information across frames. In this work, we investigate diverse temporal modeling techniques to integrate with MLLMs, aiming to achieve question-guided temporal modeling that leverages pre-trained visual and textual alignment in MLLMs. We propose T-Former, a novel temporal modeling method that creates a question-guided temporal bridge between frame-wise visual perception and the reasoning capabilities of LLMs. Our evaluation across multiple video QA benchmarks demonstrates that T-Former competes favorably with existing temporal modeling approaches and aligns with recent advancements in video QA.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Rajat Koner

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[355]

S. Chen, Z. Han, B. He, J. Liu, M. Buckley, Y. Qin, P. Torr, V. Tresp and J. Gu.
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI URL

Abstract

Large Language Models (LLMs) with in-context learning (ICL) ability can quickly adapt to a specific context given a few demonstrations (demos). Recently, Multimodal Large Language Models (MLLMs) built upon LLMs have also shown multimodal ICL ability, i.e., responding to queries given a few multimodal demos, including images, queries, and answers. While ICL has been extensively studied on LLMs, its research on MLLMs remains limited. One essential question is whether these MLLMs can truly conduct multimodal ICL, or if only the textual modality is necessary. We investigate this question by examining two primary factors that influence ICL: 1) Demo content, i.e., understanding the influences of demo content in different modalities. 2) Demo selection strategy, i.e., how to select better multimodal demos for improved performance. Experiments revealed that multimodal ICL is predominantly driven by the textual content whereas the visual information in the demos has little influence. Interestingly, visual content is still necessary and useful for selecting demos to increase performance. Motivated by our analysis, we propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES), which considers both visual and language modalities when selecting demos. Extensive experiments are conducted to support our findings and verify the improvement brought by our method.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[354]

Y. Zhang, H. Chen, A. Frikha, Y. Yang, D. Krompass, G. Zhang, J. Gu and V. Tresp.
CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Visual Question Answering (VQA) systems witnessed a significant advance in recent years due to the development of large-scale Vision-Language Pre-trained Models (VLPMs). As the application scenario and user demand change over time, an advanced VQA system is expected to be capable of continuously expanding its knowledge and capabilities over time, not only to handle new tasks (i.e., new question types or visual scenes) but also to answer questions in new specialized domains without forgetting previously acquired knowledge and skills. Existing works studying CL on VQA tasks primarily consider answer- and question-type incremental learning or scene- and function-incremental learning, whereas how VQA systems perform when they encounter new domains and increasing user demands has not been studied. Motivated by this, we introduce CL-CrossVQA, a rigorous Continual Learning benchmark for Cross-domain Visual Question Answering, through which we conduct extensive experiments on 4 VLPMs, 5 CL approaches, and 5 VQA datasets from different domains. In addition, by probing the forgetting phenomenon of the intermediate layers, we provide insights into how model architecture affects CL performance, why CL approaches can help mitigate forgetting in VLPMs, and how to design CL approaches suitable for VLPMs in this challenging continual learning environment. To facilitate future work on developing an advanced All-in-One VQA system, we will release our datasets and code.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Haokun Chen

Database Systems and Data Mining AI Lab

Ahmed Frikha

Dr.

* Former Member

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[353]

H. Chen, D. Krompass, J. Gu and V. Tresp.
FedPop: Federated Population-based Hyperparameter Tuning.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their ’training-after-tuning’ framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both the client and server sides. Compared with prior tuning methods, FedPop employs an online ’tuning-while-training’ framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets, including full-sized Non-IID ImageNet-1K, demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP-tuning methods in FL.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[352]

X. Feng, Z. Jiang, T. Kaufmann, P. Xu, E. Hüllermeier, P. Weng and Y. Zhu.
DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Defining a reward function is usually a challenging but critical task for the system designer in reinforcement learning, especially when specifying complex behaviors. Reinforcement learning from human feedback (RLHF) emerges as a promising approach to circumvent this. In RLHF, the agent typically learns a reward function by querying a human teacher using pairwise comparisons of trajectory segments. A key question in this domain is how to reduce the number of queries necessary to learn an informative reward function since asking a human teacher too many queries is impractical and costly. To tackle this question, we propose DUO, a novel method for diverse, uncertain, on-policy query generation and selection in RLHF. Our method produces queries that are (1) more relevant for policy training (via an on-policy criterion), (2) more informative (via a principled measure of epistemic uncertainty), and (3) diverse (via a clustering-based filter). Experimental results on a variety of locomotion and robotic manipulation tasks demonstrate that our method can outperform state-of-the-art RLHF methods given the same total budget of queries, while being robust to possibly irrational teachers.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[351]

Z. Li, S. S. Cranganore, N. Youngblut and N. Kilbertus.
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Leveraging the vast genetic diversity within microbiomes offers unparalleled insights into complex phenotypes, yet the task of accurately predicting and understanding such traits from genomic data remains challenging. We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences. Based on our model, we develop attribution techniques to elucidate gene interaction effects that drive microbial adaptation to diverse environments. We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats. We not only demonstrate solid predictive performance, but also how sequence-level information of entire genomes allows us to identify gene associations underlying complex phenotypes. Our attribution recovers known important interaction networks and proposes new candidates for experimental follow up.

MCML Authors

Zhufeng Li

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[350]

Y. Zhang, Z. Ma, Y. Ma, Z. Han, Y. Wu and V. Tresp.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks and continuously refining this plan, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[349]

J. Hanselle, S. Heid, J. Fürnkranz and E. Hüllermeier.
Probabilistic scoring lists for interpretable machine learning.
Machine Learning 114.55 (Feb. 2025). DOI

Abstract

A scoring system is a simple decision model that checks a set of features, adds a certain number of points to a total score for each feature that is satisfied, and finally makes a decision by comparing the total score to a threshold. Scoring systems have a long history of active use in safety-critical domains such as healthcare and justice, where they provide guidance for making objective and accurate decisions. Given their genuine interpretability, the idea of learning scoring systems from data is obviously appealing from the perspective of explainable AI. In this paper, we propose a practically motivated extension of scoring systems called probabilistic scoring lists (PSL), as well as a method for learning PSLs from data. Instead of making a deterministic decision, a PSL represents uncertainty in the form of probability distributions, or, more generally, probability intervals. Moreover, in the spirit of decision lists, a PSL evaluates features one by one and stops as soon as a decision can be made with enough confidence. To evaluate our approach, we conduct case studies in the medical domain and on standard benchmark data.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[348]

Abstract

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

[347]

E. Ailer, C. L. Müller and N. Kilbertus.
Instrumental variable estimation for compositional treatments.
Scientific Reports 15.5158 (Feb. 2025). DOI

Abstract

Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. First, we crisply articulate potential pitfalls for practitioners regarding the interpretation of compositional causes from the viewpoint of interventions and warn against attributing causal meaning to common summary statistics such as diversity indices in microbiome data analysis. We then advocate for and develop multivariate methods using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account while still yielding scientifically interpretable results. In a comparative analysis on synthetic and real microbiome data we show the advantages and limitations of our proposal. We posit that our analysis provides a useful framework and guidance for valid and informative cause-effect estimation in the context of compositional data.

MCML Authors

Elisabeth Ailer

* Former Member

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[346]

J. Bi, Y. Wang, D. Yan, X. Xiao, A. Hecker, V. Tresp and Y. Ma.
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection.
Preprint (Feb. 2025). arXiv

Abstract

Visual instruction tuning refines pre-trained Multimodal Large Language Models (MLLMs) to enhance their real-world task performance. However, the rapid expansion of visual instruction datasets introduces significant data redundancy, leading to excessive computational costs. Existing data selection methods predominantly rely on proxy models or loss-based metrics, both of which impose substantial computational overheads due to the necessity of model inference and backpropagation. To address this challenge, we propose PRISM, a novel training-free approach for efficient multimodal data selection. Unlike existing methods, PRISM eliminates the reliance on proxy models, warm-up pretraining, and gradient-based optimization. Instead, it leverages Pearson correlation analysis to quantify the intrinsic visual encoding properties of MLLMs, computing a task-specific correlation score to identify high-value instances. This not only enbles data-efficient selection,but maintains the original performance. Empirical evaluations across multiple MLLMs demonstrate that PRISM reduces the overall time required for visual instruction tuning and data selection to just 30% of conventional methods, while surpassing fully fine-tuned models across eight multimodal and three language understanding benchmarks, achieving a 101.7% relative improvement in final performance.

MCML Authors

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[345]

M. Jürgens, T. Mortier, E. Hüllermeier, V. Bengs and W. Waegeman.
A calibration test for evaluating set-based epistemic uncertainty representations.
Preprint (Feb. 2025). arXiv

Abstract

The accurate representation of epistemic uncertainty is a challenging yet essential task in machine learning. A widely used representation corresponds to convex sets of probabilistic predictors, also known as credal sets. One popular way of constructing these credal sets is via ensembling or specialized supervised learning methods, where the epistemic uncertainty can be quantified through measures such as the set size or the disagreement among members. In principle, these sets should contain the true data-generating distribution. As a necessary condition for this validity, we adopt the strongest notion of calibration as a proxy. Concretely, we propose a novel statistical test to determine whether there is a convex combination of the set’s predictions that is calibrated in distribution. In contrast to previous methods, our framework allows the convex combination to be instance dependent, recognizing that different ensemble members may be better calibrated in different regions of the input space. Moreover, we learn this combination via proper scoring rules, which inherently optimize for calibration. Building on differentiable, kernel-based estimators of calibration errors, we introduce a nonparametric testing procedure and demonstrate the benefits of capturing instance-level variability on of synthetic and real-world experiments.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

[344]

K. Padh, Z. Li, C. Casolo and N. Kilbertus.
Your Assumed DAG is Wrong and Here's How To Deal With It.
Preprint (Feb. 2025). arXiv

Abstract

Assuming a directed acyclic graph (DAG) that represents prior knowledge of causal relationships between variables is a common starting point for cause-effect estimation. Existing literature typically invokes hypothetical domain expert knowledge or causal discovery algorithms to justify this assumption. In practice, neither may propose a single DAG with high confidence. Domain experts are hesitant to rule out dependencies with certainty or have ongoing disputes about relationships; causal discovery often relies on untestable assumptions itself or only provides an equivalence class of DAGs and is commonly sensitive to hyperparameter and threshold choices. We propose an efficient, gradient-based optimization method that provides bounds for causal queries over a collection of causal graphs – compatible with imperfect prior knowledge – that may still be too large for exhaustive enumeration. Our bounds achieve good coverage and sharpness for causal queries such as average treatment effects in linear and non-linear synthetic settings as well as on real-world data. Our approach aims at providing an easy-to-use and widely applicable rebuttal to the valid critique of `What if your assumed DAG is wrong?'.

MCML Authors

Kirtan Padh

Ethics in Systems Design and Machine Learning

Zhufeng Li

Ethics in Systems Design and Machine Learning

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[343]

G. D. Pelegrina, P. Kolpaczki and E. Hüllermeier.
Shapley Value Approximation Based on k-Additive Games.
Preprint (Feb. 2025). arXiv

Abstract

The Shapley value is the prevalent solution for fair division problems in which a payout is to be divided among multiple agents. By adopting a game-theoretic view, the idea of fair division and the Shapley value can also be used in machine learning to quantify the individual contribution of features or data points to the performance of a predictive model. Despite its popularity and axiomatic justification, the Shapley value suffers from a computational complexity that scales exponentially with the number of entities involved, and hence requires approximation methods for its reliable estimation. We propose SVAkADD, a novel approximation method that fits a k-additive surrogate game. By taking advantage of k-additivity, we are able to elicit the exact Shapley values of the surrogate game and then use these values as estimates for the original fair division problem. The efficacy of our method is evaluated empirically and compared to competing methods.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[342]

G. Zhang, M. Ding, T. Liu, Y. Zhang and V. Tresp.
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs.
Preprint (Feb. 2025). arXiv

Abstract

Multimodal large language models (MLLMs) have demonstrated strong performance in understanding videos holistically, yet their ability to process streaming videos-videos are treated as a sequence of visual events-remains underexplored. Intuitively, leveraging past events as memory can enrich contextual and temporal understanding of the current event. In this paper, we show that leveraging memories as contexts helps MLLMs better understand video events. However, because such memories rely on predictions of preceding events, they may contain misinformation, leading to confabulation and degraded performance. To address this, we propose a confabulation-aware memory modification method that mitigates confabulated memory for memory-enhanced event understanding.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Tong Liu

Database Systems and Data Mining AI Lab

Yao Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[341]

T. Mortier, A. Javanmardi, Y. Sale, E. Hüllermeier and W. Waegeman.
Conformal Prediction in Hierarchical Classification.
Preprint (Jan. 2025). arXiv

Abstract

Conformal prediction has emerged as a widely used framework for constructing valid prediction sets in classification and regression tasks. In this work, we extend the split conformal prediction framework to hierarchical classification, where prediction sets are commonly restricted to internal nodes of a predefined hierarchy, and propose two computationally efficient inference algorithms. The first algorithm returns internal nodes as prediction sets, while the second relaxes this restriction, using the notion of representation complexity, yielding a more general and combinatorial inference problem, but smaller set sizes. Empirical evaluations on several benchmark datasets demonstrate the effectiveness of the proposed algorithms in achieving nominal coverage.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[340]

M. H. Shaker and E. Hüllermeier.
Random Forest Calibration.
Preprint (Jan. 2025). arXiv

Abstract

The Random Forest (RF) classifier is often claimed to be relatively well calibrated when compared with other machine learning methods. Moreover, the existing literature suggests that traditional calibration methods, such as isotonic regression, do not substantially enhance the calibration of RF probability estimates unless supplied with extensive calibration data sets, which can represent a significant obstacle in cases of limited data availability. Nevertheless, there seems to be no comprehensive study validating such claims and systematically comparing state-of-the-art calibration methods specifically for RF. To close this gap, we investigate a broad spectrum of calibration methods tailored to or at least applicable to RF, ranging from scaling techniques to more advanced algorithms. Our results based on synthetic as well as real-world data unravel the intricacies of RF probability estimates, scrutinize the impacts of hyper-parameters, compare calibration methods in a systematic way. We show that a well-optimized RF performs as well as or better than leading calibration approaches.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Artificial Intelligence and Machine Learning

[339]

N. Strauß.
Artificial intelligence for resource allocation tasks.
Dissertation 2024. DOI

Abstract

This thesis presents deep reinforcement learning approaches for complex resource allocation tasks, including discrete, continuous, and resource collection problems. It introduces novel neural architectures achieving state-of-the-art results in spatial resource allocation, multi-agent collection, and dynamic ambulance redeployment, including electric ambulances. For continuous tasks like portfolio optimization, it proposes efficient methods to handle allocation constraints, ensuring compliance during training and deployment. (Shortened).

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

[338]

B. Kühbacher, F. Iglesias-Suarez, N. Kilbertus and V. Eyring.
Towards Physically Consistent Deep Learning For Climate Model Parameterizations.
ICMLA 2024 - 23rd IEEE International Conference on Machine Learning and Applications. Miami, FL, USA, Dec 18-20, 2024. DOI

Abstract

Climate models play a critical role in understanding and projecting climate change. Due to their complexity, their horizontal resolution of about 40-100 km remains too coarse to resolve processes such as clouds and convection, which need to be approximated via parameterizations. These parameterizations are a major source of systematic errors and large uncertainties in climate projections. Deep learning (DL)-based parameterizations, trained on data from computationally expensive short, high-resolution simulations, have shown great promise for improving climate models in that regard. However, their lack of interpretability and tendency to learn spurious non-physical correlations result in reduced trust in the climate simulation. We propose an efficient supervised learning framework for DL-based parameterizations that leads to physically consistent models with improved interpretability and negligible computational overhead compared to standard supervised training. First, key features determining the target physical processes are uncovered. Subsequently, the neural network is fine-tuned using only those relevant features. We show empirically that our method robustly identifies a small subset of the inputs as actual physical drivers, therefore removing spurious non-physical relationships. This results in by design physically consistent and interpretable neural networks while maintaining the predictive performance of unconstrained black-box DL-based parameterizations.

MCML Authors

Birgit Kühbacher

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Ethics in Systems Design and Machine Learning

[337]

Abstract

Generalization of machine learning models can be severely compromised by data poisoning, where adversarial changes are applied to the training data. This vulnerability has led to interest in certifying (i.e., proving) that such changes up to a certain magnitude do not affect test predictions. We, for the first time, certify Graph Neural Networks (GNNs) against poisoning attacks, including backdoors, targeting the node features of a given graph. Our certificates are white-box and based upon (i) the neural tangent kernel, which characterizes the training dynamics of sufficiently wide networks; and (ii) a novel reformulation of the bilevel optimization describing poisoning as a mixed-integer linear program. We note that our framework is more general and constitutes the first approach to derive white-box poisoning certificates for NNs, which can be of independent interest beyond graph-related tasks.

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Debarghya Ghoshdastidar

Prof. Dr.

Theoretical Foundations of Artificial Intelligence

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[336]

A. Koebler, T. Decker, I. Thon, V. Tresp and F. Buettner.
Incremental Uncertainty-aware Performance Monitoring with Labeling Intervention.
BDU @NeurIPS 2024 - Workshop Bayesian Decision-making and Uncertainty: from probabilistic and spatiotemporal modeling to sequential experiment design at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

We study the problem of monitoring machine learning models under temporal distribution shifts, where circumstances change gradually over time, often leading to unnoticed yet significant declines in accuracy. We propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a novel label-free method that estimates model performance by modeling time-dependent shifts using optimal transport. IUPM also quantifies uncertainty in performance estimates and introduces an active labeling strategy to reduce this uncertainty. We further showcase the benefits of IUPM on different datasets and simulated temporal shifts over existing baselines.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[335]

A. White, A. Büttner, M. Gelbrecht, N. Kilbertus, F. Hellmann and N. Boers.
Projected Neural Differential Equations for Power Grid Modeling with Constraints.
D3S3 @NeurIPS 2024 - Workshop on Data-driven and Differentiable Simulations, Surrogates, and Solvers at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Neural differential equations offer a powerful approach for data-driven simulation. However, many applications in science and engineering possess known constraints that should be obeyed by the learned model. We introduce projected neural differential equations (PNDEs), a new method for constraining neural differential equations based on projection of the learned vector field to the tangent space of the constraint manifold. In tests on two challenging examples from power grid modeling, PNDEs outperform existing methods while requiring fewer hyperparameters. Our approach demonstrates significant potential for enhancing the modeling of constrained dynamical systems, particularly in complex domains like power grid dynamics where accuracy and reliability are essential.

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[334]

E. Ailer, N. Dern, J. Hartford and N. Kilbertus.
Targeted Sequential Indirect Experiment Design.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Scientific hypotheses typically concern specific aspects of complex, imperfectly understood or entirely unknown mechanisms, such as the effect of gene expression levels on phenotypes or how microbial communities influence environmental health. Such queries are inherently causal (rather than purely associational), but in many settings, experiments can not be conducted directly on the target variables of interest, but are indirect. Therefore, they perturb the target variable, but do not remove potential confounding factors. If, additionally, the resulting experimental measurements are multi-dimensional and the studied mechanisms nonlinear, the query of interest is generally not identified. We develop an adaptive strategy to design indirect experiments that optimally inform a targeted query about the ground truth mechanism in terms of sequentially narrowing the gap between an upper and lower bound on the query. While the general formulation consists of a bi-level optimization procedure, we derive an efficiently estimable analytical kernel-based estimator of the bounds for the causal effect, a query of key interest, and demonstrate the efficacy of our approach in confounded, multivariate, nonlinear synthetic settings.

MCML Authors

Elisabeth Ailer

* Former Member

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[333]

Abstract

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Vincent Fortuin

Dr.

Bayesian Deep Learning

[332]

A. Javanmardi, D. Stutz and E. Hüllermeier.
Conformalized Credal Set Predictors.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Artificial Intelligence and Machine Learning

[331]

M. Kollovieh, B. Charpentier, D. Zügner and S. Günnemann.
Expected Probabilistic Hierarchies.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Hierarchical clustering has usually been addressed by discrete optimization using heuristics or continuous optimization of relaxed scores for hierarchies. In this work, we propose to optimize expected scores under a probabilistic model over hierarchies. (1) We show theoretically that the global optimal values of the expected Dasgupta cost and Tree-Sampling divergence (TSD), two unsupervised metrics for hierarchical clustering, are equal to the optimal values of their discrete counterparts contrary to some relaxed scores. (2) We propose Expected Probabilistic Hierarchies (EPH), a probabilistic model to learn hierarchies in data by optimizing expected scores. EPH uses differentiable hierarchy sampling enabling end-to-end gradient descent based optimization, and an unbiased subgraph sampling approach to scale to large datasets. (3) We evaluate EPH on synthetic and real-world datasets including vector and graph datasets. EPH outperforms all other approaches quantitatively and provides meaningful hierarchies in qualitative evaluations.

MCML Authors

Marcel Kollovieh

Data Analytics & Machine Learning

Daniel Zügner

Dr.

A3 | Computational Models
→ Group Stephan Günnemann

* Former Member

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[330]

G. Ma, Y. Wang, D. Lim, S. Jegelka and Y. Wang.
A Canonicalization Perspective on Invariant and Equivariant Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

In many applications, we desire neural networks to exhibit invariance or equivariance to certain groups due to symmetries inherent in the data. Recently, frame-averaging methods emerged to be a unified framework for attaining symmetries efficiently by averaging over input-dependent subsets of the group, i.e., frames. What we currently lack is a principled understanding of the design of frames. In this work, we introduce a canonicalization perspective that provides an essential and complete view of the design of frames. Canonicalization is a classic approach for attaining invariance by mapping inputs to their canonical forms. We show that there exists an inherent connection between frames and canonical forms. Leveraging this connection, we can efficiently compare the complexity of frames as well as determine the optimality of certain frames. Guided by this principle, we design novel frames for eigenvectors that are strictly superior to existing methods – some are even optimal – both theoretically and empirically. The reduction to the canonicalization perspective further uncovers equivalences between previous methods. These observations suggest that canonicalization provides a fundamental understanding of existing frame-averaging methods and unifies existing equivariant and invariant learning methods.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[329]

M. Muschalik, H. Baniecki, F. Fumagalli, P. Kolpaczki, B. Hammer and E. Hüllermeier.
shapiq: Shapley Interactions for Machine Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[328]

Y. Wang, K. Hu, S. Gupta, Z. Ye, Y. Wang and S. Jegelka.
Understanding the Role of Equivariance in Self-supervised Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Contrastive learning has been a leading paradigm for self-supervised learning, but it is widely observed that it comes at the price of sacrificing useful features (eg colors) by being invariant to data augmentations. Given this limitation, there has been a surge of interest in equivariant self-supervised learning (E-SSL) that learns features to be augmentation-aware. However, even for the simplest rotation prediction method, there is a lack of rigorous understanding of why, when, and how E-SSL learns useful features for downstream tasks. To bridge this gap between practice and theory, we establish an information-theoretic perspective to understand the generalization ability of E-SSL. In particular, we identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks. This synergy effect encourages models to extract class-relevant features to improve its equivariant prediction, which, in turn, benefits downstream tasks requiring semantic features. Based on this perspective, we theoretically analyze the influence of data transformations and reveal several principles for practical designs of E-SSL. Our theory not only aligns well with existing E-SSL methods but also sheds light on new directions by exploring the benefits of model equivariance. We believe that a theoretically grounded understanding on the role of equivariance would inspire more principled and advanced designs in this field.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[327]

Y. Wang, Y. Wu, Z. Wei, S. Jegelka and Y. Wang.
A Theoretical Understanding of Self-Correction through In-context Alignment.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Going beyond mimicking limited human experiences, recent studies show initial evidence that, like humans, large language models (LLMs) are capable of improving their abilities purely by self-correction, i.e., correcting previous responses through self-examination, in certain circumstances. Nevertheless, little is known about how such capabilities arise. In this work, based on a simplified setup akin to an alignment task, we theoretically analyze self-correction from an in-context learning perspective, showing that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way. Notably, going beyond previous theories on over-simplified linear transformers, our theoretical construction underpins the roles of several key designs of realistic transformers for self-correction: softmax attention, multi-head attention, and the MLP block. We validate these findings extensively on synthetic datasets. Inspired by these findings, we also illustrate novel applications of self-correction, such as defending against LLM jailbreaks, where a simple self-correction step does make a large difference. We believe that these findings will inspire further research on understanding, exploiting, and enhancing self-correction for building better foundation models.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[326]

D. Winkel, N. Strauß, M. Bernhard, Z. Li, T. Seidl and M. Schubert.
Autoregressive Policy Optimization for Constrained Allocation Tasks.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Allocation tasks represent a class of problems where a limited amount of resources must be allocated to a set of entities at each time step. Prominent examples of this task include portfolio optimization or distributing computational workloads across servers. Allocation tasks are typically bound by linear constraints describing practical requirements that have to be strictly fulfilled at all times. In portfolio optimization, for example, investors may be obligated to allocate less than 30% of the funds into a certain industrial sector in any investment period. Such constraints restrict the action space of allowed allocations in intricate ways, which makes learning a policy that avoids constraint violations difficult. In this paper, we propose a new method for constrained allocation tasks based on an autoregressive process to sequentially sample allocations for each entity. In addition, we introduce a novel de-biasing mechanism to counter the initial bias caused by sequential sampling. We demonstrate the superior performance of our approach compared to a variety of Constrained Reinforcement Learning (CRL) methods on three distinct constrained allocation tasks: portfolio optimization, computational workload distribution, and a synthetic allocation benchmark.

MCML Authors

David Winkel

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Maximilian Bernhard

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Zongyue Li

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[325]

M. Yau, N. Karalias, E. Lu, J. Xu and S. Jegelka.
Are Graph Neural Networks Optimal Approximation Algorithms?
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message-passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max-Cut, Min-Vertex-Cover, and Max-3-SAT. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against solvers and neural baselines. Finally, we take advantage of OptGNN’s ability to capture convex relaxations to design an algorithm for producing bounds on the optimal solution from the learned embeddings of OptGNN.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[324]

C. Leiber, N. Strauß, M. Schubert and T. Seidl.
Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters.
DLC @ICDM 2024 - 6th Workshop on Deep Learning and Clustering at the 24th IEEE International Conference on Data Mining (ICDM 2024). Abu Dhabi, United Arab Emirates, Dec 09-12, 2024. DOI GitHub

Abstract

Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in these tasks. However, most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable. Thus, an area of research has emerged that addresses this problem. Most of these approaches estimate the number of clusters separated from the clustering process. This results in a strong dependency of the clustering result on the quality of the initial embedding. Other approaches are tailored to specific clustering processes, making them hard to adapt to other scenarios. In this paper, we propose UNSEEN, a general framework that, starting from a given upper bound, is able to estimate the number of clusters. To the best of our knowledge, it is the first method that can be easily combined with various deep clustering algorithms. We demonstrate the applicability of our approach by combining UNSEEN with the popular deep clustering algorithms DCN, DEC, and DKM and verify its effectiveness through an extensive experimental evaluation on several image and tabular datasets. Moreover, we perform numerous ablations to analyze our approach and show the importance of its components.

MCML Authors

Collin Leiber

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[323]

M. Bernhard.
Deep learning methods for image recognition in remote sensing.
Dissertation 2024. DOI

Abstract

In this dissertation, we present solutions to various image recognition problems in remote sensing. Thereby, we harness the characteristics of remote sensing images and address specific challenges coming with remote sensing images. Overall, the methods presented in this dissertation cover the tasks of image classification, object detection, semantic segmentation, and change detection, as well as learning settings with full, incomplete, and noisy supervision. (Shortened).

MCML Authors

Maximilian Bernhard

Dr.

* Former Member

[322]

A. Beer, P. Weber, L. Miklautz, C. Leiber, W. Durani, C. Böhm and C. Plant.
SHADE: Deep Density-based Clustering.
ICDM 2024 - 24th IEEE International Conference on Data Mining. Abu Dhabi, United Arab Emirates, Dec 09-12, 2024. DOI

Abstract

Detecting arbitrarily shaped clusters in high-dimensional noisy data is challenging for current clustering methods. We introduce SHADE (Structure-preserving High-dimensional Analysis with Density-based Exploration), the first deep clustering algorithm that incorporates density-connectivity into its loss function. Similar to existing deep clustering algorithms, SHADE supports high-dimensional and large data sets with the expressive power of a deep autoencoder. In contrast to most existing deep clustering methods that rely on a centroid-based clustering objective, SHADE incorporates a novel loss function that captures density-connectivity. SHADE thereby learns a representation that enhances the separation of density-connected clusters. SHADE detects a stable clustering and noise points fully automatically without any user input. It outperforms existing methods in clustering quality, especially on data that contain non-Gaussian clusters, such as video data. Moreover, the embedded space of SHADE is suitable for visualization and interpretation of the clustering results as the individual shapes of the clusters are preserved.

MCML Authors

Anna Beer

Dr.

* Former Member

Collin Leiber

Dr.

* Former Member

Walid Durani

Database Systems and Data Mining AI Lab

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[321]

T. Hannan, R. Koner, M. Bernhard, S. Shit, B. Menze, V. Tresp, M. Schubert and T. Seidl.
GRAtt-VIS: Gated Residual Attention for Video Instance Segmentation.
ICPR 2024 - 27th International Conference on Pattern Recognition. Kolkata, India, Dec 01-05, 2024. DOI GitHub

Abstract

Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce textbf{GRAtt-VIS}, textbf{G}ated textbf{R}esidual textbf{Att}ention for textbf{V}ideo textbf{I}nstance textbf{S}egmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next, based on the gate activation, we rectify degraded features from its past representation. Such a residual configuration alleviates the need for dedicated memory and provides a continuous stream of relevant instance features. Secondly, we propose a novel inter-instance interaction using gate activation as a mask for self-attention. This masking strategy dynamically restricts the unrepresentative instance queries in the self-attention and preserves vital information for long-term tracking. We refer to this novel combination of Gated Residual Connection and Masked Self-Attention as textbf{GRAtt} block, which can easily be integrated into the existing propagation-based framework. Further, GRAtt blocks significantly reduce the attention overhead and simplify dynamic temporal modeling. GRAtt-VIS achieves state-of-the-art performance on YouTube-VIS and the highly challenging OVIS dataset, significantly improving over previous methods.

MCML Authors

Tanveer Hannan

Database Systems and Data Mining AI Lab

Rajat Koner

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Maximilian Bernhard

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[320]

N. Saberi, M. H. Shaker, C. R. Duguay, K. A. Scott and E. Hüllermeier.
Uncertainty Estimation of Lake Ice Cover Maps From a Random Forest Classifier Using MODIS TOA Reflectance Data.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (Dec. 2024). DOI

Abstract

This article presents a method to improve the usability of lake ice cover (LIC) maps generated from moderate resolution imaging spectroradiometer (MODIS) top-of-atmosphere reflectance data by providing estimates of aleatoric and epistemic uncertainty. We used a random forest (RF) classifier, which has been shown to have superior performance in classifying lake ice, open water, and clouds, to generate daily LIC maps with inherent (aleatoric) and model (epistemic) uncertainties. RF allows for the learning of different hypotheses (trees), producing diverse predictions that can be utilized to quantify aleatoric and epistemic uncertainty. We use a decomposition of Shannon entropy to quantify these uncertainties and apply pixel-based uncertainty estimation. Our results show that using uncertainty values to reject the classification of uncertain pixels significantly improves recall and precision. The method presented herein is under consideration for integration into the processing chain implemented for the production of daily LIC maps as part of the European Space Agency’s Climate Change Initiative (CCI+) Lakes project.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Artificial Intelligence and Machine Learning

[319]

M. Kollovieh, L. Gosch, M. Lienen, Y. Scholten, L. Schwinn and S. Günnemann.
Assessing Robustness via Score-Based Adversarial Image Generation.
Transactions on Machine Learning Research (Dec. 2024). URL

Abstract

Most adversarial attacks and defenses focus on perturbations within small -norm constraints. However, threat models cannot capture all relevant semantics-preserving perturbations, and hence, the scope of robustness evaluations is limited. In this work, we introduce Score-Based Adversarial Generation (ScoreAG), a novel framework that leverages the advancements in score-based generative models to generate unrestricted adversarial examples that overcome the limitations of -norm constraints. Unlike traditional methods, ScoreAG maintains the core semantics of images while generating adversarial examples, either by transforming existing images or synthesizing new ones entirely from scratch. We further exploit the generative capability of ScoreAG to purify images, empirically enhancing the robustness of classifiers. Our extensive empirical evaluation demonstrates that ScoreAG improves upon the majority of state-of-the-art attacks and defenses across multiple benchmarks. This work highlights the importance of investigating adversarial examples bounded by semantics rather than -norm constraints. ScoreAG represents an important step towards more encompassing robustness assessments.

MCML Authors

Marcel Kollovieh

Data Analytics & Machine Learning

Lukas Gosch

A3 | Computational Models
→ Group Stephan Günnemann

Data Analytics & Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[318]

F. Fumagalli, M. Muschalik, E. Hüllermeier, B. Hammer and J. Herbinger.
Unifying Feature-Based Explanations with Functional ANOVA and Cooperative Game Theory.
Preprint (Dec. 2024). arXiv

Abstract

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Julia Herbinger

Dr.

* Former Member

[317]

Y. Liu, Y. Zhang, Q. Li, T. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy.
EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Full-parameter fine-tuning has become the go-to choice for adapting language models (LMs) to downstream tasks due to its excellent performance. As LMs grow in size, fine-tuning the full parameters of LMs requires a prohibitively large amount of GPU memory. Existing approaches utilize zeroth-order optimizer to conserve GPU memory, which can potentially compromise the performance of LMs as non-zero order optimizers tend to converge more readily on most downstream tasks. In this paper, we propose a novel optimizer-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step. HiFT can significantly reduce the amount of gradients and optimizer state parameters residing in GPU memory at the same time, thereby reducing GPU memory usage. Our results demonstrate that: (1) HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning. (2) HiFT supports various optimizers including AdamW, AdaGrad, SGD, etc. (3) HiFT can save more than 60% GPU memory compared with standard full-parameter fine-tuning for 7B model. (4) HiFT enables full-parameter fine-tuning of a 7B model on single 48G A6000 with a precision of 32 using the AdamW optimizer, without using any memory saving techniques.

MCML Authors

Yongkang Liu

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

* Former Member

Tong Liu

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[316]

Z. Ding, J. Wu, J. Wu, Y. Xia and V. Tresp.
Temporal Fact Reasoning over Hyper-Relational Knowledge Graphs.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Stemming from traditional knowledge graphs (KGs), hyper-relational KGs (HKGs) provide additional key-value pairs (i.e., qualifiers) for each KG fact that help to better restrict the fact validity. In recent years, there has been an increasing interest in studying graph reasoning over HKGs. Meanwhile, as discussed in recent works that focus on temporal KGs (TKGs), world knowledge is ever-evolving, making it important to reason over temporal facts in KGs. Previous mainstream benchmark HKGs do not explicitly specify temporal information for each HKG fact. Therefore, almost all existing HKG reasoning approaches do not devise any module specifically for temporal reasoning. To better study temporal fact reasoning over HKGs, we propose a new type of data structure named hyper-relational TKG (HTKG). Every fact in an HTKG is coupled with a timestamp explicitly indicating its time validity. We develop two new benchmark HTKG datasets, i.e., Wiki-hy and YAGO-hy, and propose an HTKG reasoning model that efficiently models hyper-relational temporal facts. To support future research on this topic, we open-source our datasets and model.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Yan Xia

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[315]

R. Liao, M. Erler, H. Wang, G. Zhai, G. Zhang, Y. Ma and V. Tresp.
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

Abstract

In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[314]

H. Zhang, J. Liu, Z. Han, S. Chen, B. He, V. Tresp, Z. Xu and J. Gu.
Visual Question Decomposition on Multimodal Large Language Models.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Question decomposition has emerged as an effective strategy for prompting Large Language Models (LLMs) to answer complex questions. However, while existing methods primarily focus on unimodal language models, the question decomposition capability of Multimodal Large Language Models (MLLMs) has yet to be explored. To this end, this paper explores visual question decomposition on MLLMs. Specifically, we introduce a systematic evaluation framework including a dataset and several evaluation criteria to assess the quality of the decomposed sub-questions, revealing that existing MLLMs struggle to produce high-quality sub-questions. To address this limitation, we propose a specific finetuning dataset, DecoVQA+, for enhancing the model’s question decomposition capability. Aiming at enabling models to perform appropriate selective decomposition, we propose an efficient finetuning pipeline. The finetuning pipeline consists of our proposed dataset and a training objective for selective decomposition. Finetuned MLLMs demonstrate significant improvements in the quality of sub-questions and the policy of selective question decomposition. Additionally, the models also achieve higher accuracy with selective decomposition on VQA benchmark datasets.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[313]

I. M. Grigore, G. M. Tavares and S. Barbon Junior.
Beyond Flattening: Detecting Concurrency Anomalies Using K-NN Graph-Based Modeling in Object-Centric Event Logs.
DATAMOD @SEFM 2024 - 12th International Symposium From Data to Models and Back at the 22nd International Conference of Software Engineering and Formal Methods (SEFM 2024). Aveiro, Portugal, Nov 04-05, 2024. DOI

Abstract

Detecting anomalous executions is essential in today’s dynamic and diverse business environments. It plays a pivotal role in identifying inefficiencies, ensuring compliance, and mitigating risks associated with deviations from standard procedures. Traditional process mining techniques generally assume a linear sequence of events. However, real-world processes often present concurrency, characterized by the parallel execution of multiple activities or cases and complex interactions among events. These behaviors are not mapped by conventional linear models, this way, not accurately capturing the dynamic nature of process flows. To tackle this challenge, this study proposes a new approach for detecting concurrency anomalies using a K-NN graph-based model, overcoming the traditional flattening method. In our experiments, we explored object-centric event logs with different types of concurrency anomalies and compared them to the traditional flattening procedure. Our proposal was able to provide comprehensive and precise communities (clusters) of anomalous variants compared to the baseline.

MCML Authors

Gabriel Marques Tavares

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[312]

M. Bernhard, T. Hannan, N. Strauß and M. Schubert.
Context Matters: Leveraging Spatiotemporal Metadata for Semi-Supervised Learning on Remote Sensing Images.
ECAI 2024 - 27th European Conference on Artificial Intelligence. Santiago de Compostela, Spain, Oct 19-24, 2024. DOI GitHub

Abstract

Remote sensing projects typically generate large amounts of imagery that can be used to train powerful deep neural networks. However, the amount of labeled images is often small, as remote sensing applications generally require expert labelers. Thus, semi-supervised learning (SSL), i.e., learning with a small pool of labeled and a larger pool of unlabeled data, is particularly useful in this domain. Current SSL approaches generate pseudo-labels from model predictions for unlabeled samples. As the quality of these pseudo-labels is crucial for performance, utilizing additional information to improve pseudo-label quality yields a promising direction. For remote sensing images, geolocation and recording time are generally available and provide a valuable source of information as semantic concepts, such as land cover, are highly dependent on spatiotemporal context, e.g., due to seasonal effects and vegetation zones. In this paper, we propose to exploit spatiotemporal metainformation in SSL to improve the quality of pseudo-labels and, therefore, the final model performance. We show that directly adding the available metadata to the input of the predictor at test time degenerates the prediction quality for metadata outside the spatiotemporal distribution of the training set. Thus, we propose a teacher-student SSL framework where only the teacher network uses metainformation to improve the quality of pseudo-labels on the training set. Correspondingly, our student network benefits from the improved pseudo-labels but does not receive metadata as input, making it invariant to spatiotemporal shifts at test time. Furthermore, we propose methods for encoding and injecting spatiotemporal information into the model and introduce a novel distillation mechanism to enhance the knowledge transfer between teacher and student. Our framework dubbed Spatiotemporal SSL can be easily combined with several state-of-the-art SSL methods, resulting in significant and consistent improvements on the BigEarthNet and EuroSAT benchmarks.

MCML Authors

Maximilian Bernhard

Dr.

* Former Member

Tanveer Hannan

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[311]

S. M. A. R. Thies, J. C. Alfaro and V. Bengs.
MORE–PLR: Multi-Output Regression Employed for Partial Label Ranking.
DS 2024 - 27th International Conference on Discovery Science. Pisa, Italy, Oct 14-16, 2024. DOI GitHub

Abstract

The partial label ranking (PLR) problem is a supervised learning scenario where the learner predicts a ranking with ties of the labels for a given input instance. It generalizes the well-known label ranking (LR) problem, which only allows for strict rankings. So far, pre-vious learning approaches for PLR have primarily adapted LR methods to accommodate ties in predictions. This paper proposes using multi-output regression (MOR) to address the PLR problem by treating ranking positions as multivariate targets, an approach that has received little attention in both LR and PLR. To effectively employ this approach, we introduce several post-hoc layers that convert MOR results into a ranking, potentially including ties. This framework produces a range of learning approaches, which we demonstrate in experimental evaluations to be competitive with the current state-of-the-art PLR methods.

MCML Authors

Viktor Bengs

Dr.

* Former Member

[310]

S. Rauch, C. M. M. Frey, L. Zellner and T. Seidl.
Process-Aware Bayesian Networks for Sequential Event Log Queries.
ICPM 2024 - 6th International Conference on Process Mining. Lyngby, Denmark, Oct 14-18, 2024. DOI

Abstract

Business processes from many domains like manufacturing, healthcare, or business administration suffer from different amounts of uncertainty concerning the execution of individual activities and their order of occurrence. As long as a process is not entirely serial, i.e., there are no forks or decisions to be made along the process execution, we are - in the absence of exhaustive domain knowledge - confronted with the question whether and in what order activities should be executed or left out for a given case and a desired outcome. As the occurrence or non-occurrence of events has substantial implications regarding process key performance indicators like throughput times or scrap rate, there is ample need for assessing and modeling that process-inherent uncertainty. We propose a novel way of handling the uncertainty by leveraging the probabilistic mechanisms of Bayesian Networks to model processes from the structural and temporal information given in event log data and offer a comprehensive evaluation of uncertainty by modelling cases in their entirety. In a thorough analysis of well-established benchmark datasets, we show that our Process-aware Bayesian Network is capable of answering process queries concerned with any unknown process sequence regarding activities and/or attributes enhancing the explainability of processes. Our method can infer execution probabilities of activities at different stages and can query probabilities of certain process outcomes. The key benefit of the Process-aware Query System over existing approaches is the ability to deliver probabilistic, case-diagnostic information about the execution of activities via Bayesian inference.

MCML Authors

Simon Rauch

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Christian Frey

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[309]

A. Maldonado, S. A. Aryasomayajula, C. M. M. Frey and T. Seidl.
iGEDI: interactive Generating Event Data with Intentional Features.
ICPM 2024 - Demo Tracks at the 6th International Conference on Process Mining. Lyngby, Denmark, Oct 14-18, 2024. URL

Abstract

Process mining solutions aim to improve performance, save resources, and address bottlenecks in organizations. However, success depends on data quality and availability, and existing analyses often lack diverse data for rigorous testing. To overcome this, we propose an interactive web application tool, extending the GEDI Python framework, which creates event datasets that meet specific (meta-)features. It provides diverse benchmark event data by exploring new regions within the feature space, enhancing the range and quality of process mining analyses. This tool improves evaluation quality and helps uncover correlations between meta-features and metrics, ultimately enhancing solution effectiveness.

MCML Authors

Andrea Maldonado

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Christian Frey

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[308]

A. Maldonado.
Data-Driven Approaches Towards Transparent Benchmarking of Process Mining Tasks.
ICPM 2024 - Doctoral Consortium at the 6th International Conference on Process Mining. Lyngby, Denmark, Oct 14-18, 2024. URL

Abstract

The abundance of new approaches in process mining and the diversity of processes in the real-world, raises the question of this thesis: How can we create benchmarks, which reliably measure the impact of event data features on process mining evaluation? Developing benchmarks, that employ comprehensive intentional ED and also consider connections between ED characteristic features, methods, and metrics, will support process miners to evaluate methods more efficiently and reliably.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining AI Lab

[307]

Z. Xian, L. Zellner, G. M. Tavares and T. Seidl.
CC-HIT: Creating Counterfactuals from High-Impact Transitions.
ML4PM @ICPM 2024 - 4th International Workshop on Leveraging Machine Learning in Process Mining at the 6th International Conference on Process Mining (ICPM 2024). Lyngby, Denmark, Oct 14-18, 2024. DOI

Abstract

Reliable process information, especially regarding trace durations, is crucial for smooth execution. Without it, maintaining a process becomes costly. While many predictive systems aim to identify inefficiencies, they often focus on individual process instances, missing the global perspective. It is essential not only to detect where delays occur but also to pinpoint specific activity transitions causing them. To address this, we propose CC-HIT (Creating Counterfactuals from High-Impact Transitions), which identifies temporal dependencies across the entire process. By focusing on activity transitions, we provide deeper insights into relational impacts, enabling faster resolution of inefficiencies. CC-HIT highlights the most influential transitions on process performance, offering actionable insights for optimization. We validate this method using the BPIC 2020 dataset, demonstrating its effectiveness compared to existing approaches.

MCML Authors

Zhicong Xian

Database Systems and Data Mining AI Lab

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[306]

Abstract

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Nico Stucki

Applied Topology and Geometry

Vincent Bürgin

Foundations of Deep Neural Networks

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry

[305]

S. Haas, K. Hegestweiler, M. Rapp, M. Muschalik and E. Hüllermeier.
Stakeholder-centric explanations for black-box decisions: an XAI process model and its application to automotive goodwill assessments.
Frontiers in Artificial Intelligence 7 (Oct. 2024). DOI

Abstract

Machine learning has made tremendous progress in predictive performance in recent years. Despite these advances, employing machine learning models in high-stake domains remains challenging due to the opaqueness of many high-performance models. If their behavior cannot be analyzed, this likely decreases the trust in such models and hinders the acceptance of human decision-makers. Motivated by these challenges, we propose a process model for developing and evaluating explainable decision support systems that are tailored to the needs of different stakeholders. To demonstrate its usefulness, we apply the process model to a real-world application in an enterprise context. The goal is to increase the acceptance of an existing black-box model developed at a car manufacturer for supporting manual goodwill assessments. Following the proposed process, we conduct two quantitative surveys targeted at the application’s stakeholders. Our study reveals that textual explanations based on local feature importance best fit the needs of the stakeholders in the considered use case. Specifically, our results show that all stakeholders, including business specialists, goodwill assessors, and technical IT experts, agree that such explanations significantly increase their trust in the decision support system. Furthermore, our technical evaluation confirms the faithfulness and stability of the selected explanation method. These practical findings demonstrate the potential of our process model to facilitate the successful deployment of machine learning models in enterprise settings. The results emphasize the importance of developing explanations that are tailored to the specific needs and expectations of diverse stakeholders.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[304]

K. Gatmiry, Z. Li, S. J. Reddi and S. Jegelka.
Simplicity Bias via Global Convergence of Sharpness Minimization.
Preprint (Oct. 2024). arXiv

Abstract

The remarkable generalization ability of neural networks is usually attributed to the implicit bias of SGD, which often yields models with lower complexity using simpler (e.g. linear) and low-rank features. Recent works have provided empirical and theoretical evidence for the bias of particular variants of SGD (such as label noise SGD) toward flatter regions of the loss landscape. Despite the folklore intuition that flat solutions are ‘simple’, the connection with the simplicity of the final trained model (e.g. low-rank) is not well understood. In this work, we take a step toward bridging this gap by studying the simplicity structure that arises from minimizers of the sharpness for a class of two-layer neural networks. We show that, for any high dimensional training data and certain activations, with small enough step size, label noise SGD always converges to a network that replicates a single linear feature across all neurons; thereby, implying a simple rank one feature matrix. To obtain this result, our main technical contribution is to show that label noise SGD always minimizes the sharpness on the manifold of models with zero loss for two-layer networks. Along the way, we discover a novel property – a local geodesic convexity – of the trace of Hessian of the loss at approximate stationary points on the manifold of zero loss, which links sharpness to the geometry of the manifold. This tool may be of independent interest.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[303]

K. Gatmiry, N. Saunshi, S. J. Reddi, S. Jegelka and S. Kumar.
On the Role of Depth and Looping for In-Context Learning with Task Diversity.
Preprint (Oct. 2024). arXiv

Abstract

The intriguing in-context learning (ICL) abilities of deep Transformer models have lately garnered significant attention. By studying in-context linear regression on unimodal Gaussian data, recent empirical and theoretical works have argued that ICL emerges from Transformers’ abilities to simulate learning algorithms like gradient descent. However, these works fail to capture the remarkable ability of Transformers to learn multiple tasks in context. To this end, we study in-context learning for linear regression with diverse tasks, characterized by data covariance matrices with condition numbers ranging from [1,κ], and highlight the importance of depth in this setting. More specifically, (a) we show theoretical lower bounds of log(κ) (or κ√) linear attention layers in the unrestricted (or restricted) attention setting and, (b) we show that multilayer Transformers can indeed solve such tasks with a number of layers that matches the lower bounds. However, we show that this expressivity of multilayer Transformer comes at the price of robustness. In particular, multilayer Transformers are not robust to even distributional shifts as small as O(e−L) in Wasserstein distance, where L is the depth of the network. We then demonstrate that Looped Transformers – a special class of multilayer Transformers with weight-sharing – not only exhibit similar expressive power but are also provably robust under mild assumptions. Besides out-of-distribution generalization, we also show that Looped Transformers are the only models that exhibit a monotonic behavior of loss with respect to depth.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[302]

T. Putterman, D. Lim, Y. Gelberg, S. Jegelka and H. Maron.
Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models.
Preprint (Oct. 2024). arXiv

Abstract

Low-rank adaptations (LoRAs) have revolutionized the finetuning of large foundation models, enabling efficient adaptation even with limited computational resources. The resulting proliferation of LoRAs presents exciting opportunities for applying machine learning techniques that take these low-rank weights themselves as inputs. In this paper, we investigate the potential of Learning on LoRAs (LoL), a paradigm where LoRA weights serve as input to machine learning models. For instance, an LoL model that takes in LoRA weights as inputs could predict the performance of the finetuned model on downstream tasks, detect potentially harmful finetunes, or even generate novel model edits without traditional training methods. We first identify the inherent parameter symmetries of low rank decompositions of weights, which differ significantly from the parameter symmetries of standard neural networks. To efficiently process LoRA weights, we develop several symmetry-aware invariant or equivariant LoL models, using tools such as canonicalization, invariant featurization, and equivariant layers. We finetune thousands of text-to-image diffusion models and language models to collect datasets of LoRAs. In numerical experiments on these datasets, we show that our LoL architectures are capable of processing low rank weight decompositions to predict CLIP score, finetuning data attributes, finetuning data membership, and accuracy on downstream tasks.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[301]

T. Schwarz, C. Casolo and N. Kilbertus.
Uncertainty-Aware Optimal Treatment Selection for Clinical Time Series.
Preprint (Oct. 2024). arXiv

Abstract

In personalized medicine, the ability to predict and optimize treatment outcomes across various time frames is essential. Additionally, the ability to select cost-effective treatments within specific budget constraints is critical. Despite recent advancements in estimating counterfactual trajectories, a direct link to optimal treatment selection based on these estimates is missing. This paper introduces a novel method integrating counterfactual estimation techniques and uncertainty quantification to recommend personalized treatment plans adhering to predefined cost constraints. Our approach is distinctive in its handling of continuous treatment variables and its incorporation of uncertainty quantification to improve prediction reliability. We validate our method using two simulated datasets, one focused on the cardiovascular system and the other on COVID-19. Our findings indicate that our method has robust performance across different counterfactual estimation baselines, showing that introducing uncertainty quantification in these settings helps the current baselines in finding more reliable and accurate treatment selection. The robustness of our method across various settings highlights its potential for broad applicability in personalized healthcare solutions.

MCML Authors

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[300]

Y. Sun, Z. Wu, Y. Ma and V. Tresp.
Quantum Architecture Search with Unsupervised Representation Learning.
Preprint (Oct. 2024). arXiv

Abstract

Unsupervised representation learning presents new opportunities for advancing Quantum Architecture Search (QAS) on Noisy Intermediate-Scale Quantum (NISQ) devices. QAS is designed to optimize quantum circuits for Variational Quantum Algorithms (VQAs). Most QAS algorithms tightly couple the search space and search algorithm, typically requiring the evaluation of numerous quantum circuits, resulting in high computational costs and limiting scalability to larger quantum circuits. Predictor-based QAS algorithms mitigate this issue by estimating circuit performance based on structure or embedding. However, these methods often demand time-intensive labeling to optimize gate parameters across many circuits, which is crucial for training accurate predictors. Inspired by the classical neural architecture search algorithm Arch2vec, we investigate the potential of unsupervised representation learning for QAS without relying on predictors. Our framework decouples unsupervised architecture representation learning from the search process, enabling the learned representations to be applied across various downstream tasks. Additionally, it integrates an improved quantum circuit graph encoding scheme, addressing the limitations of existing representations and enhancing search efficiency. This predictor-free approach removes the need for large labeled datasets. During the search, we employ REINFORCE and Bayesian Optimization to explore the latent representation space and compare their performance against baseline methods. Our results demonstrate that the framework efficiently identifies high-performing quantum circuits with fewer search iterations.

MCML Authors

Yize Sun

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[299]

A. White, A. Büttner, M. Gelbrecht, V. Duruisseaux, N. Kilbertus, F. Hellmann and N. Boers.
Projected Neural Differential Equations for Learning Constrained Dynamics.
Preprint (Oct. 2024). arXiv

Abstract

Neural differential equations offer a powerful approach for learning dynamics from data. However, they do not impose known constraints that should be obeyed by the learned model. It is well-known that enforcing constraints in surrogate models can enhance their generalizability and numerical stability. In this paper, we introduce projected neural differential equations (PNDEs), a new method for constraining neural differential equations based on projection of the learned vector field to the tangent space of the constraint manifold. In tests on several challenging examples, including chaotic dynamical systems and state-of-the-art power grid models, PNDEs outperform existing methods while requiring fewer hyperparameters. The proposed approach demonstrates significant potential for enhancing the modeling of constrained dynamical systems, particularly in complex domains where accuracy and reliability are essential.

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[298]

M. Yau, E. Akyürek, J. Mao, J. B. Tenenbaum, S. Jegelka and J. Andreas.
Learning Linear Attention in Polynomial Time.
Preprint (Oct. 2024). arXiv

Abstract

Previous research has explored the computational expressivity of Transformer models in simulating Boolean circuits or Turing machines. However, the learnability of these simulators from observational data has remained an open question. Our study addresses this gap by providing the first polynomial-time learnability results (specifically strong, agnostic PAC learning) for single-layer Transformers with linear attention. We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS. As a consequence, the problem of learning any linear transformer may be converted into the problem of learning an ordinary linear predictor in an expanded feature space, and any such predictor may be converted back into a multiheaded linear transformer. Moving to generalization, we show how to efficiently identify training datasets for which every empirical risk minimizer is equivalent (up to trivial symmetries) to the linear Transformer that generated the data, thereby guaranteeing the learned model will correctly generalize across all inputs. Finally, we provide examples of computations expressible via linear attention and therefore polynomial-time learnable, including associative memories, finite automata, and a class of Universal Turing Machine (UTMs) with polynomially bounded computation histories. We empirically validate our theoretical findings on three tasks: learning random linear attention networks, key–value associations, and learning to execute finite automata. Our findings bridge a critical gap between theoretical expressivity and learnability of Transformers, and show that flexible and general models of computation are efficiently learnable.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks

[297]

T. Hannan, M. M. Islam, T. Seidl and G. Bertasius.
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Locating specific moments within long videos (20–120 min) presents a significant challenge, akin to finding a needle in a haystack. Adapting existing short video (5–30 s) grounding methods to this problem yields poor performance. Since most real-life videos, such as those on YouTube and AR/VR, are lengthy, addressing this issue is crucial. Existing methods typically operate in two stages: clip retrieval and grounding. However, this disjoint process limits the retrieval module’s fine-grained event understanding, crucial for specific moment detection. We propose RGNet which deeply integrates clip retrieval and grounding into a single network capable of processing long videos into multiple granular levels, e.g., clips and frames. Its core component is a novel transformer encoder, RG-Encoder, that unifies the two stages through shared features and mutual optimization. The encoder incorporates a sparse attention mechanism and an attention loss to model both granularity jointly. Moreover, we introduce a contrastive clip sampling technique to mimic the long video paradigm closely during training. RGNet surpasses prior methods, showcasing state-of-the-art performance on long video temporal grounding (LVTG) datasets MAD and Ego4D.

MCML Authors

Tanveer Hannan

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[296]

G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Ruotong Liao

Database Systems and Data Mining AI Lab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[295]

M. C. da Silva, B. Licari, G. M. Tavares and S. Barbon Junior.
Benchmarking AutoML Clustering Frameworks.
AutoML 2024 - ABCD Track - International Conference on Automated Machine Learning. Paris, France, Sep 09-12, 2024. URL

Abstract

The surge of frameworks for automated unsupervised clustering problems exposed the notable gap in performance assessment, unified datasets and methodologies for this field. The lack of standardization and proper clustering goal setting obscures the applicability and suitability of such solutions. Therefore, we propose a benchmark to bridge this gap by offering a comparative analysis of AutoML frameworks for clustering, using several criteria and a comprehensive set of benchmarking problems. Four prominent AutoML unsupervised frameworks (AutoML4Clust, Autocluster, cSmartML, and ML2DAC) were compared following our methodology. By extending the evaluation beyond quantitative metrics, this research contributes to a more nuanced understanding of the applicability and performance of AutoML for a diverse set of clustering problems. Our analysis shows the evident demand for effort in the direction of pipeline synthesis (i.e., search and optimization of complete pipelines), clustering goal definition and suitable analysis dimensions.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

[294]

C. Damke and E. Hüllermeier.
CUQ-GNN: Committee-Based Graph Uncertainty Quantification Using Posterior Networks.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

In this work, we study the influence of domain-specific characteristics when defining a meaningful notion of predictive uncertainty on graph data. Previously, the so-called Graph Posterior Network (GPN) model has been proposed to quantify uncertainty in node classification tasks. Given a graph, it uses Normalizing Flows (NFs) to estimate class densities for each node independently and converts those densities into Dirichlet pseudo-counts, which are then dispersed through the graph using the personalized Page-Rank (PPR) algorithm. The architecture of GPNs is motivated by a set of three axioms on the properties of its uncertainty estimates. We show that those axioms are not always satisfied in practice and therefore propose the family of Committe-based Uncertainty Quantification Graph Neural Networks (CUQ-GNNs), which combine standard Graph Neural Networks (GNNs) with the NF-based uncertainty estimation of Posterior Networks (PostNets). This approach adapts more flexibly to domain-specific demands on the properties of uncertainty estimates. We compare CUQ-GNN against GPN and other uncertainty quantification approaches on common node classification benchmarks and show that it is effective at producing useful uncertainty estimates.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[293]

R. Fischer, M. Wever, S. Buschjäger and T. Liebig.
MetaQuRe: Meta-learning from Model Quality and Resource Consumption.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

Automated machine learning (AutoML) allows for selecting, parametrizing, and composing learning algorithms for a given data set. While resources play a pivotal role in neural architecture search, it is less pronounced by classical AutoML approaches. In fact, they generally focus on only maximizing predictive quality and disregard the importance of finding resource-efficient solutions. To push resource awareness further, our work explicitly explores how measures such as running time or energy consumption can be better considered in AutoML. Firstly, we propose a novel method for algorithm selection that balances multiple performance aspects (including resource demand) as prioritized by the user with the help of compositional meta-learning. Secondly, to foster research on green meta-learning and AutoML, we release the MetaQuRe data set, which contains information on predictive (Qu)ality and (Re)source consumption of models evaluated across hundreds of data sets and four execution environments. We use this data to put our methodology into practice and conduct an in-depth analysis of how our approach and data set can help in making AutoML more resource-aware, which represents our third contribution. Lastly, we publish MetaQuRe alongside an extensive code base, allowing for reproducing all results, expanding our data with results from custom environments, and exploring MetaQuRe interactively. In short, our work demonstrates both the importance as well as benefits of rethinking AutoML and meta-learning in a resource-aware way, thus paving the path for making future ML solutions more sustainable.

MCML Authors

Marcel Wever

Dr.

* Former Member

[292]

S. Gilhuber, A. Beer, Y. Ma and T. Seidl.
FALCUN: A Simple and Efficient Deep Active Learning Strategy.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

We propose FALCUN, a novel deep batch active learning method that is label- and time-efficient. Our proposed acquisition uses a natural, self-adjusting balance of uncertainty and diversity: It slowly transitions from emphasizing uncertain instances at the decision boundary to emphasizing batch diversity. In contrast, established deep active learning methods often have a fixed weighting of uncertainty and diversity, limiting their effectiveness over diverse data sets exhibiting different characteristics. Moreover, to increase diversity, most methods demand intensive search through a deep neural network’s high-dimensional latent embedding space. This leads to high acquisition times when experts are idle while waiting for the next batch for annotation. We overcome this structural problem by exclusively operating on the low-dimensional probability space, yielding much faster acquisition times without sacrificing label efficiency. In extensive experiments, we show FALCUN’s suitability for diverse use cases, including medical images and tabular data. Compared to state-of-the-art methods like BADGE, CLUE, and AlfaMix, FALCUN consistently excels in quality and speed: while FALCUN is among the fastest methods, it has the highest average label efficiency.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[291]

P. Jahn, C. M. M. Frey, A. Beer, C. Leiber and T. Seidl.
Data with Density-Based Clusters: A Generator for Systematic Evaluation of Clustering Algorithms.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI GitHub

Abstract

Mining data containing density-based clusters is well-established and widespread but faces problems when it comes to systematic and reproducible comparison and evaluation. Although the success of clustering methods hinges on data quality and availability, reproducibly generating suitable data for this setting is not easy, leading to mostly low-dimensional toy datasets being used. To resolve this issue, we propose DENSIRED (DENSIty-based Reproducible Experimental Data), a novel data generator for data containing density-based clusters. It is highly flexible w.r.t. a large variety of properties of the data and produces reproducible datasets in a two-step approach. First, skeletons of the clusters are constructed following a random walk. In the second step, these skeletons are enriched with data samples. DENSIRED enables the systematic generation of data for a robust and reliable analysis of methods aimed toward examining data containing density-connected clusters. In extensive experiments, we analyze the impact of user-defined properties on the generated datasets and the intrinsic dimensionalities of synthesized clusters.

MCML Authors

Philipp Jahn

Database Systems and Data Mining AI Lab

Collin Leiber

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

[290]

Y. Liu, E. Nie, S. Feng, Z. Hua, Z. Ding, D. Wang, Y. Zhang and H. Schütze.
A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI GitHub

Abstract

Current state-of-the-art dialogue systems heavily rely on extensive training datasets. However, challenges arise in domains where domain-specific training datasets are insufficient or entirely absent. To tackle this challenge, we propose a novel data Augmentation framework for Multi-Domain Dialogue Generation, referred to as AMDG. The AMDG framework consists of a data augmentation process and a two-stage training approach: domain-agnostic training and domain adaptation training. We posit that domain corpora are a blend of domain-agnostic and domain-specific features, with certain representation patterns shared among diverse domains. Domain-agnostic training aims to enable models to learn these common expressive patterns. To construct domain-agnostic dialogue corpora, we employ a de-domaining data processing technique used to remove domain-specific features. By mitigating the effects of domain-specific features, the model trained on the de-domained corpora can effectively learn common expression patterns in different domains. Subsequently, we adapt the learned domain-agnostic features to the target domain through domain adaptation training. We conduct experiments on Chinese dialogue datasets from five different domains and show that AMDG achieves superior performance compared to both direct training on the target domain corpus and collective training on all five domain corpora. Our work underscores AMDG as a viable alternative solution for low-resource multi-domain dialogue generation.

MCML Authors

Yongkang Liu

Dr.

* Former Member

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Zifeng Ding

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Linguistics

[289]

Abstract

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Hüseyin Anil Gündüz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Mina Rezaei

Dr.

Statistical Learning and Data Science

[288]

B. Chocholaty, C. Leiber and S. Marburg.
Effects of similarity measures and assignment methods on mode pairing for the application of timber plates.
ISMA 2024 - 31st International Conference on Noise and Vibration Engineering. KU Leuven, Belgium, Sep 09-11, 2024. URL

Abstract

Correctly pairing experimentally and numerically determined mode shapes is crucial for successful model updating. It ensures that the updated model accurately reflects the physical behavior of the structure. This study investigates the two main steps applied for successful mode pairing. First, the correlation between the model and experiments is analyzed using different measures of similarity. Second, based on the computed correlation, a variety of strategies for a correct assignment of the mode pairs is studied. Here, an approach to iteratively combine the mode pairs showing the maximum similarity value in the similarity matrix, an extension additionally using the auto-similarity matrix, the Hungarian method, and a clustering-based approach are investigated. To study the efficacy of the various approaches, the study incorporates an application involving a timber plate. Thus, the effects of employing different similarity measures and pair assignment methods are demonstrated, providing insights for future studies related to mode pairing and model updating.

MCML Authors

Collin Leiber

Dr.

* Former Member

[287]

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Explaining Change in Models and Data with Global Feature Importance and Effects.
TempXAI @ECML-PKDD 2024 - Tutorial-Workshop Explainable AI for Time Series and Data Streams at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. PDF

Abstract

In dynamic machine learning environments, where data streams continuously evolve, traditional explanation methods struggle to remain faithful to the underlying model or data distribution. Therefore, this work presents a unified framework for efficiently computing incremental model-agnostic global explanations tailored for time-dependent models. By extending static model-agnostic methods such as Permutation Feature Importance, SAGE, and Partial Dependence Plots into the online learning context, the proposed framework enables the continuous updating of explanations as new data becomes available. These incremental variants ensure that global explanations remain relevant while minimizing computational overhead. The framework also addresses key challenges related to data distribution maintenance and perturbation generation in online learning, offering time and memory efficient solutions like geometric reservoir-based sampling for data replacement.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[286]

A. Maldonado, C. M. M. Frey, G. M. Tavares, N. Rehwald and T. Seidl.
GEDI: Generating Event Data with Intentional Features for Benchmarking Process Mining.
BPM 2024 - 22nd International Conference on Business Process Management. Krakow, Poland, Sep 01-06, 2024. DOI

Abstract

Process mining solutions include enhancing performance, conserving resources, and alleviating bottlenecks in organizational contexts. However, as in other data mining fields, success hinges on data quality and availability. Existing analyses for process mining solutions lack diverse and ample data for rigorous testing, hindering insights’ generalization. To address this, we propose Generating Event Data with Intentional features, a framework producing event data sets satisfying specific meta-features. Considering the meta-feature space that defines feasible event logs, we observe that existing real-world datasets describe only local areas within the overall space. Hence, our framework aims at providing the capability to generate an event data benchmark, which covers unexplored regions. Therefore, our approach leverages a discretization of the meta-feature space to steer generated data towards regions, where a combination of meta-features is not met yet by existing benchmark datasets. Providing a comprehensive data pool enriches process mining analyses, enables methods to capture a wider range of real-world scenarios, and improves evaluation quality. Moreover, it empowers analysts to uncover correlations between meta-features and evaluation metrics, enhancing explainability and solution effectiveness. Experiments demonstrate GEDI’s ability to produce a benchmark of intentional event data sets and robust analyses for process mining tasks.

MCML Authors

Andrea Maldonado

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Christian Frey

Dr.

* Former Member

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[285]

R. S. Oyamada, G. M. Tavares, S. B. Junior and P. Ceravolo.
CoSMo: A Framework to Instantiate Conditioned Process Simulation Models.
BPM 2024 - 22nd International Conference on Business Process Management. Krakow, Poland, Sep 01-06, 2024. DOI

Abstract

Process simulation is gaining attention for its ability to assess potential performance improvements and risks associated with business process changes. The existing literature presents various techniques, generally grounded in process models discovered from event log data or built upon deep learning algorithms. These techniques have specific strengths and limitations. Traditional data-driven approaches offer increased interpretability, while deep learning-based excel at generalizing changes across large event logs. However, the practical application of deep learning faces challenges related to managing stochasticity and integrating information for what-if analysis. This paper introduces a novel recurrent neural architecture tailored to discover COnditioned process Simulation MOdels (CoSMo) based on user-based constraints or any other nature of a-priori knowledge. This architecture facilitates the simulation of event logs that adhere to specific constraints by incorporating declarative-based rules into the learning phase as an attempt to fill the gap of incorporating information into deep learning models to perform what-if analysis. Experimental validation illustrates CoSMo’s efficacy in simulating event logs while adhering to predefined declarative conditions, emphasizing both control-flow and data-flow perspectives.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

[284]

P. Kolpaczki, E. Hüllermeier and V. Bengs.
Piecewise-Stationary Dueling Bandits.
Transactions on Machine Learning Research (Sep. 2024). URL

Abstract

We study the piecewise-stationary dueling bandits problem with arms, where the time horizon consists of stationary segments, each of which is associated with its own preference matrix. The learner repeatedly selects a pair of arms and observes a binary preference between them as feedback. To minimize the accumulated regret, the learner needs to pick the Condorcet winner of each stationary segment as often as possible, despite preference matrices and segment lengths being unknown. We propose the Beat the Winner Reset algorithm and prove a bound on its expected binary weak regret in the stationary case, which tightens the bound of current state-of-art algorithms. We also show a regret bound for the non-stationary case, without requiring knowledge of or . We further propose and analyze two meta-algorithms, DETECT for weak regret and Monitored Dueling Bandits for strong regret, both based on a detection-window approach that can incorporate any dueling bandit algorithm as a black-box algorithm. Finally, we prove a worst-case lower bound for expected weak regret in the non-stationary case.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

[283]

M. C. da Silva, G. M. Tavares, E. Medvet and S. Barbon Junior.
Problem-oriented AutoML in Clustering.
Preprint (Sep. 2024). arXiv

Abstract

The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks by addressing the shortcomings of traditional AutoML solutions. Conventional methods often rely on predefined internal Clustering Validity Indexes (CVIs) and static meta-features, limiting their adaptability and effectiveness across diverse clustering tasks. In contrast, PoAC establishes a dynamic connection between the clustering problem, CVIs, and meta-features, allowing users to customize these components based on the specific context and goals of their task. At its core, PoAC employs a surrogate model trained on a large meta-knowledge base of previous clustering datasets and solutions, enabling it to infer the quality of new clustering pipelines and synthesize optimal solutions for unseen datasets. Unlike many AutoML frameworks that are constrained by fixed evaluation metrics and algorithm sets, PoAC is algorithm-agnostic, adapting seamlessly to different clustering problems without requiring additional data or retraining. Experimental results demonstrate that PoAC not only outperforms state-of-the-art frameworks on a variety of datasets but also excels in specific tasks such as data visualization, and highlight its ability to dynamically adjust pipeline configurations based on dataset complexity.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

[282]

T. Decker, A. Koebler, M. Lebacher, I. Thon, V. Tresp and F. Buettner.
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance.
KDD 2024 - 30th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Barcelona, Spain, Aug 25-29, 2024. DOI

Abstract

Monitoring and maintaining machine learning models are among the most critical challenges in translating recent advances in the field into real-world applications. However, current monitoring methods lack the capability of provide actionable insights answering the question of why the performance of a particular model really degraded. In this work, we propose a novel approach to explain the behavior of a black-box model under feature shifts by attributing an estimated performance change to interpretable input characteristics. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation (XPE). We analyze the underlying assumptions and demonstrate the superiority of our approach over several baselines on different data sets across various data modalities such as images, audio, and tabular data. We also indicate how the generated results can lead to valuable insights, enabling explanatory model monitoring by revealing potential root causes for model deterioration and guiding toward actionable countermeasures.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[281]

T. Liu, I. Škrjanec and V. Demberg.
Temperature-scaling surprisal estimates improve fit to human reading times – but does it do so for the 'right reasons'?
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

A wide body of evidence shows that human language processing difficulty is predicted by the information-theoretic measure surprisal, a word’s negative log probability in context. However, it is still unclear how to best estimate these probabilities needed for predicting human processing difficulty – while a long-standing belief held that models with lower perplexity would provide more accurate estimates of word predictability, and therefore lead to better reading time predictions, recent work has shown that for very large models, psycholinguistic predictive power decreases. One reason could be that language models might be more confident of their predictions than humans, because they have had exposure to several magnitudes more data. In this paper, we test what effect temperature-scaling of large language model (LLM) predictions has on surprisal estimates and their predictive power of reading times of English texts. Firstly, we show that calibration of large language models typically improves with model size, i.e. poorer calibration cannot account for poorer fit to reading times. Secondly, we find that temperature-scaling probabilities lead to a systematically better fit to reading times (up to 89% improvement in delta log likelihood), across several reading time corpora. Finally, we show that this improvement in fit is chiefly driven by words that are composed of multiple subword tokens.

MCML Authors

Tong Liu

Database Systems and Data Mining AI Lab

[280]

J. Brandt, M. Wever, V. Bengs and E. Hüllermeier.
Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO.
IJCAI 2024 - 33rd International Joint Conference on Artificial Intelligence. Jeju, Korea, Aug 03-09, 2024. DOI

Abstract

Hyperparameter optimization (HPO) is indispensable for achieving optimal performance in machine learning tasks. A popular class of methods in this regard is based on Successive Halving (SHA), which casts HPO into a pure-exploration multi-armed bandit problem under finite sampling budget constraints. This is accomplished by considering hyperparameter configurations as arms and rewards as the negative validation losses. While enjoying theoretical guarantees as well as working well in practice, SHA comes, however, with several hyperparameters itself, one of which is the maximum budget that can be allocated to evaluate a single arm (hyperparameter configuration). Although there are already solutions to this meta hyperparameter optimization problem, such as the doubling trick or asynchronous extensions of SHA, these are either practically inefficient or lack theoretical guarantees. In this paper, we propose incremental SHA (iSHA), a synchronous extension of SHA, allowing to increase the maximum budget a posteriori while still enjoying theoretical guarantees. Our empirical analysis of HPO problems corroborates our theoretical findings and shows that iSHA is more resource-efficient than existing SHA-based approaches.

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[279]

Abstract

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[278]

S. Heid, J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Learning decision catalogues for situated decision making: The case of scoring systems.
International Journal of Approximate Reasoning 171 (Aug. 2024). DOI

Abstract

In this paper, we formalize the problem of learning coherent collections of decision models, which we call decision catalogues, and illustrate it for the case where models are scoring systems. This problem is motivated by the recent rise of algorithmic decision-making and the idea to improve human decision-making through machine learning, in conjunction with the observation that decision models should be situated in terms of their complexity and resource requirements: Instead of constructing a single decision model and using this model in all cases, different models might be appropriate depending on the decision context. Decision catalogues are supposed to support a seamless transition from very simple, resource-efficient to more sophisticated but also more demanding models. We present a general algorithmic framework for inducing such catalogues from training data, which tackles the learning task as a problem of searching the space of candidate catalogues systematically and, to this end, makes use of heuristic search methods. We also present a concrete instantiation of this framework as well as empirical studies for performance evaluation, which, in a nutshell, show that greedy search is an efficient and hard-to-beat strategy for the construction of catalogues of scoring systems.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[277]

A. Szałata, K. Hrovatin, S. Becker, A. Tejada-Lapuerta, H. Cui, B. Wang and F. J. Theis.
Transformers in single-cell omics: a review and new perspectives.
Nature Methods 21 (Aug. 2024). DOI

Abstract

Recent efforts to construct reference maps of cellular phenotypes have expanded the volume and diversity of single-cell omics data, providing an unprecedented resource for studying cell properties. Despite the availability of rich datasets and their continued growth, current single-cell models are unable to fully capitalize on the information they contain. Transformers have become the architecture of choice for foundation models in other domains owing to their ability to generalize to heterogeneous, large-scale datasets. Thus, the question arises of whether transformers could set off a similar shift in the field of single-cell modeling. Here we first describe the transformer architecture and its single-cell adaptations and then present a comprehensive review of the existing applications of transformers in single-cell analysis and critically discuss their future potential for single-cell biology. By studying limitations and technical challenges, we aim to provide a structured outlook for future research directions at the intersection of machine learning and single-cell biology.

MCML Authors

Sören Becker

Ethics in Systems Design and Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[276]

S. Dutta, T. Kaufmann, G. Glavaš, I. Habernal, K. Kersting, F. Kreuter, M. Mezini, I. Gurevych, E. Hüllermeier and H. Schütze.
Problem Solving Through Human-AI Preference-Based Cooperation.
Preprint (Aug. 2024). arXiv

Abstract

While there is a widespread belief that artificial general intelligence (AGI) – or even superhuman AI – is imminent, complex problems in expert domains are far from being solved. We argue that such problems require human-AI cooperation and that the current state of the art in generative AI is unable to play the role of a reliable partner due to a multitude of shortcomings, including difficulty to keep track of a complex solution artifact (e.g., a software program), limited support for versatile human preference expression and lack of adapting to human preference in an interactive setting. To address these challenges, we propose HAICo2, a novel human-AI co-construction framework. We take first steps towards a formalization of HAICo2 and discuss the difficult open research problems that it faces.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[275]

T. Decker, A. R. Bhattarai, J. Gu, V. Tresp and F. Buettner.
Provably Better Explanations with Optimized Aggregation of Feature Attributions.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models. Despite the numerous techniques available, individual methods often produce inconsistent and unstable results, putting their overall reliability into question. In this work, we aim to systematically improve the quality of feature attributions by combining multiple explanations across distinct methods or their variations. For this purpose, we propose a novel approach to derive optimal convex combinations of feature attributions that yield provable improvements of desired quality criteria such as robustness or faithfulness to the model behavior. Through extensive experiments involving various model architectures and popular feature attribution techniques, we demonstrate that our combination strategy consistently outperforms individual methods and existing baselines.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[274]

F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and k-Shapley values (k-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Artificial Intelligence and Machine Learning

[273]

Abstract

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Marcel Wever

Dr.

* Former Member

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[272]

Y. Sale, V. Bengs, M. Caprio and E. Hüllermeier.
Second-Order Uncertainty Quantification: A Distance-Based Approach.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

In the past couple of years, various approaches to representing and quantifying different types of predictive uncertainty in machine learning, notably in the setting of classification, have been proposed on the basis of second-order probability distributions, i.e., predictions in the form of distributions on probability distributions. A completely conclusive solution has not yet been found, however, as shown by recent criticisms of commonly used uncertainty measures associated with second-order distributions, identifying undesirable theoretical properties of these measures. In light of these criticisms, we propose a set of formal criteria that meaningful uncertainty measures for predictive uncertainty based on second-order distributions should obey. Moreover, we provide a general framework for developing uncertainty measures to account for these criteria, and offer an instantiation based on the Wasserstein distance, for which we prove that all criteria are satisfied.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Artificial Intelligence and Machine Learning

[271]

P. Foth, L. Gosch, S. Geisler, L. Schwinn and S. Günnemann.
Relaxing Graph Transformers for Adversarial Attacks.
ICML 2024 - Workshop Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators at the 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. PDF

Abstract

Existing studies have shown that Graph Neural Networks (GNNs) are vulnerable to adversarial attacks. Even though Graph Transformers (GTs) surpassed Message-Passing GNNs on several benchmarks, their adversarial robustness properties are unexplored. However, attacking GTs is challenging due to their Positional Encodings (PEs) and special attention mechanisms which can be difficult to differentiate. We overcome these challenges by targeting three representative architectures based on (1) random-walk PEs, (2) pair-wise-shortest-path PEs, and (3) spectral PEs - and propose the first adaptive attacks for GTs. We leverage our attacks to evaluate robustness to (a) structure perturbations on node classification; and (b) node injection attacks for (fake-news) graph classification. Our evaluation reveals that they can be catastrophically fragile and underlines our work’s importance and the necessity for adaptive attacks.

MCML Authors

Lukas Gosch

Data Analytics & Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[270]

Y. Sun, J. Liu, Z. Wu, Z. Ding, Y. Ma, T. Seidl and V. Tresp.
SA-DQAS: Self-attention Enhanced Differentiable Quantum Architecture Search.
ICML 2024 - Workshop Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators at the 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. PDF

Abstract

We introduce SA-DQAS in this paper, a novel framework that enhances the gradient-based Differentiable Quantum Architecture Search (DQAS) with a self-attention mechanism, aimed at optimizing circuit design for Quantum Machine Learning (QML) challenges. Analogous to a sequence of words in a sentence, a quantum circuit can be viewed as a sequence of placeholders containing quantum gates. Unlike DQAS, each placeholder is independent, while the self-attention mechanism in SA-DQAS helps to capture relation and dependency information among each operation candidate placed on placeholders in a circuit. To evaluate and verify, we conduct experiments on job-shop scheduling problems (JSSP), Max-cut problems, and quantum fidelity. Incorporating self-attention improves the stability and performance of the resulting quantum circuits and refines their structural design with higher noise resilience and fidelity. Our research demonstrates the first successful integration of self-attention with DQAS.

MCML Authors

Yize Sun

Database Systems and Data Mining AI Lab

Zifeng Ding

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[269]

X. Feng, Z. Jiang, T. Kaufmann, E. Hüllermeier, P. Weng and Y. Zhu.
Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries.
MHFAIA @ICML 2024 - Workshop on Models of Human Feedback for AI Alignment at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. Traditional methods with pairwise trajectory comparisons face challenges: trajectories with subtle differences are hard to compare, and comparisons are ordinal, limiting direct inference of preference strength. In this paper, we introduce the distinguishability query, where humans compare two pairs of trajectories and indicate which pair is easier to compare and then give preference feedback on the easier pair. This type of query directly infers preference strength and is expected to reduce cognitive load on the labeler. We also connect this query to cardinal utility and difference relations, and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results empirically demonstrates the potential of our method for faster, data-efficient learning and improved user-friendliness on RLHF benchmarks.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[268]

P. Hofman, Y. Sale and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty: A Credal Approach.
SPIGM @ICML 2024 - Workshop on Structured Probabilistic Inference & Generative Modeling at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Uncertainty representation and quantification are paramount in machine learning, especially in safety-critical applications. In this paper, we propose a novel framework for the quantification of aleatoric and epistemic uncertainty based on the notion of credal sets, i.e., sets of probability distributions. Thus, we assume a learner that produces (second-order) predictions in the form of sets of probability distributions on outcomes. Practically, such an approach can be realized by means of ensemble learning: Given an ensemble of learners, credal sets are generated by including sufficiently plausible predictors, where plausibility is measured in terms of (relative) likelihood. We provide a formal justification for the framework and introduce new measures of epistemic and aleatoric uncertainty as concrete instantiations. We evaluate these measures both theoretically, by analysing desirable axiomatic properties, and empirically, by comparing them in terms of performance and effectiveness to existing measures of uncertainty in an experimental study.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[267]

L. Arrighi, L. Pennella, G. M. Tavares and S. Barbon Junior.
Decision Predicate Graphs: Enhancing Interpretability in Tree Ensembles.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Understanding the decisions of tree-based ensembles and their relationships is pivotal for machine learning model interpretation. Recent attempts to mitigate the human-in-the-loop interpretation challenge have explored the extraction of the decision structure underlying the model taking advantage of graph simplification and path emphasis. However, while these efforts enhance the visualisation experience, they may either result in a visually complex representation or compromise the interpretability of the original ensemble model. In addressing this challenge, especially in complex scenarios, we introduce the Decision Predicate Graph (DPG) as a model-specific tool to provide a global interpretation of the model. DPG is a graph structure that captures the tree-based ensemble model and learned dataset details, preserving the relations among features, logical decisions, and predictions towards emphasising insightful points. Leveraging well-known graph theory concepts, such as the notions of centrality and community, DPG offers additional quantitative insights into the model, complementing visualisation techniques, expanding the problem space descriptions, and offering diverse possibilities for extensions. Empirical experiments demonstrate the potential of DPG in addressing traditional benchmarks and complex classification scenarios.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

[266]

P. Kolpaczki, G. Haselbeck and E. Hüllermeier.
How Much Can Stratification Improve the Approximation of Shapley Values?
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Over the last decade, the Shapley value has become one of the most widely applied tools to provide post-hoc explanations for black box models. However, its theoretically justified solution to the problem of dividing a collective benefit to the members of a group, such as features or data points, comes at a price. Without strong assumptions, the exponential number of member subsets excludes an exact calculation of the Shapley value. In search for a remedy, recent works have demonstrated the efficacy of approximations based on sampling with stratification, in which the sample space is partitioned into smaller subpopulations. The effectiveness of this technique mainly depends on the degree to which the allocation of available samples over the formed strata mirrors their unknown variances. To uncover the hypothetical potential of stratification, we investigate the gap in approximation quality caused by the lack of knowledge of the optimal allocation. Moreover, we combine recent advances to propose two state-of-the-art algorithms Adaptive SVARM and Continuous Adaptive SVARM that adjust the sample allocation on-the-fly. The potential of our approach is assessed in an empirical evaluation.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[265]

C. Damke and E. Hüllermeier.
Linear Opinion Pooling for Uncertainty Quantification on Graphs.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL GitHub

Abstract

We address the problem of uncertainty quantification for graph-structured data, or, more specifically, the problem to quantify the predictive uncertainty in (semi-supervised) node classification. Key questions in this regard concern the distinction between two different types of uncertainty, aleatoric and epistemic, and how to support uncertainty quantification by leveraging the structural information provided by the graph topology. Challenging assumptions and postulates of state-of-the-art methods, we propose a novel approach that represents (epistemic) uncertainty in terms of mixtures of Dirichlet distributions and refers to the established principle of linear opinion pooling for propagating information between neighbored nodes in the graph. The effectiveness of this approach is demonstrated in a series of experiments on a variety of graph-structured datasets.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[264]

Abstract

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Lisa Wimmer

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[263]

T. Löhr, M. Ingrisch and E. Hüllermeier.
Towards Aleatoric and Epistemic Uncertainty in Medical Image Classification.
AIME 2024 - 22nd International Conference on Artificial Intelligence in Medicine. Salt Lake City, UT, USA, Jul 09-12, 2024. DOI

Abstract

Medical domain applications require a detailed understanding of the decision making process, in particular when data-driven modeling via machine learning is involved, and quantifying uncertainty in the process adds trust and interpretability to predictive models. However, current uncertainty measures in medical imaging are mostly monolithic and do not distinguish between different sources and types of uncertainty. In this paper, we advocate the distinction between so-called aleatoric and epistemic uncertainty in the medical domain and illustrate its potential in clinical decision making for the case of PET/CT image classification.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[262]

A. Javanmardi, O. K. Aimiyekagbon, A. Bender, J. K. Kimotho, W. Sextro and E. Hüllermeier.
Remaining Useful Lifetime Estimation of Bearings Operating under Time-Varying Conditions.
PHME 2024 - 8th European Conference of the Prognostics and Health Management Society 2024. Prague, Czech Republic, Jul 03-05, 2024. DOI

Abstract

This paper investigates the remaining useful lifetime (RUL) estimation of bearings under dynamic, i.e., time-varying, operating conditions (OC). Unlike conventional studies that assume constant OC in bearing accelerated life tests, we introduce a dataset with time-varying OC during run-to-failure experiments, simulating real-world scenarios. We explore data-driven approaches to identify the transition point from a healthy to an unhealthy state and estimate the RUL. Additionally, we examine strategies for integrating OC information to enhance RUL estimations. These methodologies are evaluated through numerical experiments using various machine learning algorithms.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[261]

F. Quinzan, C. Casolo, K. Muandet, Y. Luo and N. Kilbertus.
Learning Counterfactually Invariant Predictors.
Transactions on Machine Learning Research (Jul. 2024). URL

Abstract

Notions of counterfactual invariance (CI) have proven essential for predictors that are fair, robust, and generalizable in the real world. We propose graphical criteria that yield a sufficient condition for a predictor to be counterfactually invariant in terms of a conditional independence in the observational distribution. In order to learn such predictors, we propose a model-agnostic framework, called Counterfactually Invariant Prediction (CIP), building on the Hilbert-Schmidt Conditional Independence Criterion (HSCIC), a kernel-based conditional dependence measure. Our experimental results demonstrate the effectiveness of CIP in enforcing counterfactual invariance across various simulated and real-world datasets including scalar and multi-variate settings.

MCML Authors

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[260]

J. Brandt, B. Haddenhorst, V. Bengs and E. Hüllermeier.
Dueling Bandits with Delayed Feedback.
DataNinja sAIOnARA 2024 - DataNinja sAIOnARA Conference: Shaping Trustworthy AI: Opportunities, Innovation and Achievements for Reliable Approaches. Bielefeld, Germany, Jun 25-27, 2024. DOI

Abstract

Dueling Bandits is a well-studied extension of the Multi-Armed Bandits problem, in which the learner must select two arms in each time step and receives a binary feedback as an outcome of the chosen duel. However, all of the existing best arm identification algorithms for the Dueling Bandits setting assume that the feedback can be observed immediately after selecting the two arms. If this is not the case, the algorithms simply do nothing and wait until the feedback of the recent duel can be observed, which is a waste of runtime. We propose an algorithm that can already start a new duel even if the previous one is not finished and thus is much more time efficient. Our arm selection strategy balances the expected information gain of the chosen duel and the expected delay until we observe the feedback. By theoretically grounded confidence bounds we can ensure that the arms we discard are not the best arms with high probability.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[259]

H. Li, C. Shen, P. Torr, V. Tresp and J. Gu.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

Diffusion-based models have gained significant popularity for text-to-image generation due to their exceptional image-generation capabilities. A risk with these models is the potential generation of inappropriate content, such as biased or harmful images. However, the underlying reasons for generating such undesired content from the perspective of the diffusion model’s internal representation remain unclear. Previous work interprets vectors in an interpretable latent space of diffusion models as semantic concepts. However, existing approaches cannot discover directions for arbitrary concepts, such as those related to inappropriate concepts. In this work, we propose a novel self-supervised approach to find interpretable latent directions for a given concept. With the discovered vectors, we further propose a simple approach to mitigate inappropriate generation. Extensive experiments have been conducted to verify the effectiveness of our mitigation approach, namely, for fair generation, safe generation, and responsible text-enhancing generation.

MCML Authors

Hang Li

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[258]

Y. Xia, L. Shi, Z. Ding, J. F. Henriques and D. Cremers.
Text2Loc: 3D Point Cloud Localization from Natural Language.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition, followed by fine localization. In global place recognition, relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM), whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover, we propose a novel matching-free fine localization method to further refine the location predictions, which completely removes the need for complicated text-instance matching and is lighter, faster, and more accurate than previous methods. Extensive experiments show that Text2Loc improves the localization accuracy by up to 2 × over the state-of-the-art on the KITTI360Pose dataset.

MCML Authors

Yan Xia

Dr.

* Former Member

Zifeng Ding

Database Systems and Data Mining AI Lab

Daniel Cremers

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Computer Vision & Artificial Intelligence

[257]

I. Obadic, A. Levering, L. Pennig, D. Oliveira, D. Marcos and X. Zhu.
Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes.
CVPR 2024 - Workshop at the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Predicting socioeconomic indicators from satellite imagery with deep learning has become an increasingly popular research direction. Post-hoc concept-based explanations can be an important step towards broader adoption of these models in policy-making as they enable the interpretation of socioeconomic outcomes based on visual concepts that are intuitive to humans. In this paper, we study the interplay between representation learning using an additional task-specific contrastive loss and post-hoc concept explainability for socioeconomic studies. Our results on two different geographical locations and tasks indicate that the task-specific pretraining imposes a continuous ordering of the latent space embeddings according to the socioeconomic outcomes. This improves the model’s interpretability as it enables the latent space of the model to associate urban concepts with continuous intervals of socioeconomic outcomes. Further, we illustrate how analyzing the model’s conceptual sensitivity for the intervals of socioeconomic outcomes can shed light on new insights for urban studies.

MCML Authors

Ivica Obadic

Data Science in Earth Observation

Lars Pennig

C3 | Physics and Geo Sciences

Ethics in Systems Design and Machine Learning

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[256]

Z. Ding, H. Cai, J. Wu, Y. Ma, R. Liao, B. Xiong and V. Tresp.
zrLLM: Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

Modeling evolving knowledge over temporal knowledge graphs (TKGs) has become a heated topic. Various methods have been proposed to forecast links on TKGs. Most of them are embedding-based, where hidden representations are learned to represent knowledge graph (KG) entities and relations based on the observed graph contexts. Although these methods show strong performance on traditional TKG forecasting (TKGF) benchmarks, they face a strong challenge in modeling the unseen zero-shot relations that have no prior graph context. In this paper, we try to mitigate this problem as follows. We first input the text descriptions of KG relations into large language models (LLMs) for generating relation representations, and then introduce them into embedding-based TKGF methods. LLM-empowered representations can capture the semantic information in the relation descriptions. This makes the relations, whether seen or unseen, with similar semantic meanings stay close in the embedding space, enabling TKGF models to recognize zero-shot relations even without any observed graph context. Experimental results show that our approach helps TKGF models to achieve much better performance in forecasting the facts with previously unseen relations, while still maintaining their ability in link forecasting regarding seen relations.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Ruotong Liao

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[255]

R. Liao, X. Jia, Y. Li, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL GitHub

Abstract

The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional embedding-based and rule-based methods dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval-augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and few-shot parameter-efficient instruction tuning to solve the above challenges, respectively. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting with low computation resources using extremely limited training data as few as 16 samples. GenTKG also highlights remarkable cross-domain generalizability with outperforming performance on unseen datasets without re-training, and in-domain generalizability regardless of time split in the same dataset. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[254]

E. Hüllermeier.
On the Challenge of Quantifying Epistemic Uncertainty in Machine Learning.
SIPTA - The Society for Imprecise Probabilities: Theories and Applications. Virtual, Jun 14, 2024. Invited Talk. PDF

Abstract

n/a

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[253]

R. S. Oyamada, G. M. Tavares, S. B. Junior and P. Ceravolo.
Enhancing Predictive Process Monitoring with Time-Related Feature Engineering.
CAiSE 2024 - 36th International Conference on Advanced Information Systems Engineering. Limassol, Cyprus, Jun 03-07, 2024. DOI

Abstract

Predictive process monitoring plays a critical role in process mining by predicting the dynamics of ongoing processes. Recent trends employ deep learning techniques that use event sequences to make highly accurate predictions. However, this focus often overshadows the significant advantages of lightweight, transparent algorithms. This study explores the potential of traditional regression algorithms, namely kNN, SVM, and RF, enhanced by event time feature engineering. We integrate existing and novel time-related features to augment these algorithms and compare their performance against the well-known LSTM network. Our results show that these enhanced lightweight models not only compete with LSTM in terms of predictive accuracy but also excel in scenarios requiring online, real-time decision-making and explanation. Furthermore, despite incorporating additional feature extraction processes, these algorithms maintain superior computational efficiency compared to their deep learning counterparts, making them more viable for time-critical and resource-constrained environments.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

[252]

S. Haas and E. Hüllermeier.
Conformalized prescriptive machine learning for uncertainty-aware automated decision making: the case of goodwill requests.
International Journal of Data Science and Analytics (Jun. 2024). DOI

Abstract

Due to the inherent presence of uncertainty in machine learning (ML) systems, the usage of ML is until now out of scope for many critical (financial) business processes. One such process is goodwill assessment at car manufacturers, where a large part of goodwill cases is still assessed manually by human experts. To increase the degree of automation while still providing an overall reliable assessment service, we propose a selective uncertainty-aware automated decision making approach based on uncertainty quantification through conformal prediction. In our approach, goodwill requests are still shifted to human experts in case the risk of a wrong assessment is too high. Nevertheless, ML can be introduced into the process with reduced and controllable risk. We hereby determine the risk of wrong ML assessments through two hierarchical conformal predictors that make use of the prediction set and interval size as the main criteria for quantifying uncertainty. We also utilize conformal prediction’s property to output empty prediction sets if no prediction is significant enough and abstain from an automatic decision in that case. Instead of providing mathematical guarantees for limited risk, we focus on the risk vs. degree of automation trade-off and how a business decision maker can select in an a posteriori fashion a trade-off that best suits the business problem at hand from a set of pareto optimal solutions. We also show empirically on a goodwill data set of a BMW National Sales Company that by only selecting certain requests for automated decision making we can significantly increase the accuracy of automatically processed requests. For instance, from 92 to 98% for labor and from 90 to 98% for parts contributions respectively, while still maintaining a degree of automation of approximately 70%.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[251]

T. Kaufmann, J. Blüml, A. Wüst, Q. Delfosse, K. Kersting and E. Hüllermeier.
OCALM: Object-Centric Assessment with Language Models.
Preprint (Jun. 2024). arXiv

Abstract

Properly defining a reward signal to efficiently train a reinforcement learning (RL) agent is a challenging task. Designing balanced objective functions from which a desired behavior can emerge requires expert knowledge, especially for complex environments. Learning rewards from human feedback or using large language models (LLMs) to directly provide rewards are promising alternatives, allowing non-experts to specify goals for the agent. However, black-box reward models make it difficult to debug the reward. In this work, we propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for RL agents from natural language task descriptions. OCALM uses the extensive world-knowledge of LLMs while leveraging the object-centric nature common to many environments to derive reward functions focused on relational concepts, providing RL agents with the ability to derive policies from task descriptions.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[250]

V. Margraf, M. Wever, S. Gilhuber, G. M. Tavares, T. Seidl and E. Hüllermeier.
ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data.
Preprint (Jun. 2024). arXiv GitHub

Abstract

In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled, aiming to enhance learning algorithms’ efficiency and performance. Numerous such query strategies have been proposed and compared in the active learning literature. However, the community still lacks standardized benchmarks for comparing the performance of different query strategies. This particularly holds for the combination of query strategies with different learning algorithms into active learning pipelines and examining the impact of the learning algorithm choice. To close this gap, we propose ALPBench, which facilitates the specification, execution, and performance monitoring of active learning pipelines. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, ALPBench consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. To demonstrate its usefulness and broad compatibility with various learning algorithms and query strategies, we conduct an exemplary study evaluating 9 query strategies paired with 8 learning algorithms in 2 different settings.

MCML Authors

Valentin Margraf

Artificial Intelligence and Machine Learning

Marcel Wever

Dr.

* Former Member

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[249]

A. Stephan, L. Miklautz, C. Leiber, P. H. Araujo, D. Répás, C. Plant and B. Roth.
Text-Guided Alternative Image Clustering.
Preprint (Jun. 2024). arXiv

Abstract

Traditional image clustering techniques only find a single grouping within visual data. In particular, they do not provide a possibility to explicitly define multiple types of clustering. This work explores the potential of large vision-language models to facilitate alternative image clustering. We propose Text-Guided Alternative Image Consensus Clustering (TGAICC), a novel approach that leverages user-specified interests via prompts to guide the discovery of diverse clusterings. To achieve this, it generates a clustering for each prompt, groups them using hierarchical clustering, and then aggregates them using consensus clustering. TGAICC outperforms image- and text-based baselines on four alternative image clustering benchmark datasets. Furthermore, using count-based word statistics, we are able to obtain text-based explanations of the alternative clusterings. In conclusion, our research illustrates how contemporary large vision-language models can transform explanatory data analysis, enabling the generation of insightful, customizable, and diverse image clusterings.

MCML Authors

Collin Leiber

Dr.

* Former Member

[248]

T. Wollschläger, N. Kemper, L. Hetzel, J. Sommer and S. Günnemann.
Expressivity and Generalization: Fragment-Biases for Molecular GNNs.
Preprint (Jun. 2024). arXiv

Abstract

Although recent advances in higher-order Graph Neural Networks (GNNs) improve the theoretical expressiveness and molecular property predictive performance, they often fall short of the empirical performance of models that explicitly use fragment information as inductive bias. However, for these approaches, there exists no theoretic expressivity study. In this work, we propose the Fragment-WL test, an extension to the well-known Weisfeiler & Leman (WL) test, which enables the theoretic analysis of these fragment-biased GNNs. Building on the insights gained from the Fragment-WL test, we develop a new GNN architecture and a fragmentation with infinite vocabulary that significantly boosts expressiveness. We show the effectiveness of our model on synthetic and real-world data where we outperform all GNNs on Peptides and have 12% lower error than all GNNs on ZINC and 34% lower error than other fragment-biased models. Furthermore, we show that our model exhibits superior generalization capabilities compared to the latest transformer-based architectures, positioning it as a robust solution for a range of molecular modeling tasks.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[247]

Y. Han, Z. Ding, Y. Liu, B. He and V. Tresp.
Critical Path Identification in Supply Chain Knowledge Graphs with Large Language Models.
ESWC 2024 - Extended Semantic Web Conference. Hersonissos, Crete, Greece, May 26-30, 2024. DOI

Abstract

In the ever-evolving landscape of global commerce, supply chain management (SCM) has gained increasing significance. An important task in SCM is to find critical supply chain paths for a target company because these paths often represent potential bottlenecks in supply networks and thus could be crucial to risk management. The mainstream solution to this task requires supply chain managers to manually review supply chain data to uncover critical paths, resulting in considerable human labor costs. To better study SCM, recent efforts have been made to construct supply chain knowledge graphs (KGs) that connect supply chain-related data from different sources, facilitating the identification of critical paths through KG reasoning. In this paper, we develop an automated approach for critical path identification (CPI) based on supply chain KGs. We encode supply chain KGs into text and use large language models (LLMs) for CPI. LLMs can not only analyze the topological KG information but also leverage their world knowledge for better path identification. We experiment with two popular LLMs, i.e., GPT-3.5 and GPT-4, and find that they are able to do CPI and meanwhile generate reasonable explanations.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[246]

A. Beer, O. Palotás, A. Maldonado, A. Draganov and I. Assent.
DROPP: Structure-aware PCA for Ordered Data.
ICDE 2024 - 40th IEEE International Conference on Data Engineering. Utrecht, Netherlands, May 13-17, 2024. DOI

Abstract

Ordered data arises in many areas, e.g., in molecular dynamics and other spatial-temporal trajectories. While data points that are close in this order are related, common dimensionality reduction techniques cannot capture this relation or order. Thus, the information is lost in the low-dimensional representations. We introduce DROPP, which incorporates order into dimensionality reduction by adapting a Gaussian kernel function across the ordered covariances between data points. We find underlying principal components that are characteristic of the process that generated the data. In extensive experiments, we show DROPP’s advantages over other dimensionality reduction techniques on synthetic as well as real-world data sets from molecular dynamics and climate research: The principal components of different data sets that were generated by the same underlying mechanism are very similar to each other. They can, thus, be used for dimensionality reduction with low reconstruction errors along a set of data sets, allowing an explainable visual comparison of different data sets as well as good compression even for unseen data.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining AI Lab

[245]

S. d'Ascoli, S. Becker, P. Schwaller, A. Mathis and N. Kilbertus.
ODEFormer: Symbolic Regression of Dynamical Systems with Transformers.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL GitHub

Abstract

We introduce ODEFormer, the first transformer able to infer multidimensional ordinary differential equation (ODE) systems in symbolic form from the observation of a single solution trajectory. We perform extensive evaluations on two datasets: (i) the existing ‘Strogatz’ dataset featuring two-dimensional systems; (ii) ODEBench, a collection of one- to four-dimensional systems that we carefully curated from the literature to provide a more holistic benchmark. ODEFormer consistently outperforms existing methods while displaying substantially improved robustness to noisy and irregularly sampled observations, as well as faster inference.

MCML Authors

Sören Becker

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[244]

L. Eyring, D. Klein, T. Uscidda, G. Palla, N. Kilbertus, Z. Akata and F. J. Theis.
Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

In optimal transport (OT), a Monge map is known as a mapping that transports a source distribution to a target distribution in the most cost-efficient way. Recently, multiple neural estimators for Monge maps have been developed and applied in diverse unpaired domain translation tasks, e.g. in single-cell biology and computer vision. However, the classic OT framework enforces mass conservation, which makes it prone to outliers and limits its applicability in real-world scenarios. The latter can be particularly harmful in OT domain translation tasks, where the relative position of a sample within a distribution is explicitly taken into account. While unbalanced OT tackles this challenge in the discrete setting, its integration into neural Monge map estimators has received limited attention. We propose a theoretically grounded method to incorporate unbalancedness into any Monge map estimator. We improve existing estimators to model cell trajectories over time and to predict cellular responses to perturbations. Moreover, our approach seamlessly integrates with the OT flow matching (OT-FM) framework. While we show that OT-FM performs competitively in image translation, we further improve performance by incorporating unbalancedness (UOT-FM), which better preserves relevant features. We hence establish UOT-FM as a principled method for unpaired image translation.

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mathematical Modelling of Biological Systems

[243]

Abstract

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Yawei Li

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Mina Rezaei

Dr.

Statistical Learning and Data Science

[242]

S. Chen, Z. Han, B. He, M. Buckley, P. Torr, V. Tresp and J. Gu.
Understanding and Improving In-Context Learning on Vision-language Models.
ME-FoMo @ICLR 2024 - Workshop on Mathematical and Empirical Understanding of Foundation Models at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Recently, in-context learning (ICL) on large language models (LLMs) has received great attention, and this technique can also be applied to vision-language models (VLMs) built upon LLMs. These VLMs can respond to queries by conditioning responses on a series of multimodal demonstrations, which comprise images, queries, and answers. Though ICL has been extensively studied on LLMs, its research on VLMs remains limited. The inclusion of additional visual information in the demonstrations motivates the following research questions: which of the two modalities in the demonstration is more significant? How can we select effective multimodal demonstrations to enhance ICL performance? This study investigates the significance of both visual and language information. Our findings indicate that ICL in VLMs is predominantly driven by the textual information in the demonstrations whereas the visual information in the demonstrations barely affects the ICL performance. Subsequently, we provide an understanding of the findings by analyzing the model information flow and comparing model inner states given different ICL settings. Motivated by our analysis, we propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES), which considers both visual and language modalities when selecting demonstrations and shows better ICL performance. Extensive experiments are conducted to support our findings, understanding, and improvement of the ICL performance of VLMs.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[241]

Z. Li, S. S. Cranganore, N. Youngblut and N. Kilbertus.
Whole Genome Transformers for Gene Interaction Effects in Microbiome Habitat Prediction.
MLGenX @ICLR 2024 - Workshop Machine Learning for Genomics Explorations at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Leveraging the vast genetic diversity within microbiomes offers unparalleled insights into complex phenotypes, yet the task of accurately predicting and understanding such traits from genomic data remains challenging. We propose a framework that leverages existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences. Based on our model, we develop attribution techniques to elucidate gene interaction effects that drive microbial adaptation to diverse environments. We train and validate our approach on a large dataset of high-quality microbiome genomes from different habitats. We not only demonstrate solid predictive performance but also pioneer leveraging sequence-level information of entire genomes to reveal the genetic foundations of complex phenotypes. Our attribution recovers known important interaction networks and proposes new candidates for experimental follow-up.

MCML Authors

Zhufeng Li

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[240]

L. Zellner, S. Rauch, J. Sontheim and T. Seidl.
On Diverse and Precise Recommendations for Small and Medium-Sized Enterprises.
PAKDD 2024 - 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Taipeh, Taiwan, May 07-10, 2024. DOI GitHub

Abstract

Recommender Systems are a popular and common means to extract relevant information for users. Small and medium-sized enterprises make up a large share of the overall amount of business but need to be more frequently considered regarding the demand for recommender systems. Different conditions, such as the small amount of data, lower computational capabilities, and users frequently not possessing an account, require a different and potentially a more small-scale recommender system. The requirements regarding quality are similar: High accuracy and high diversity are certainly an advantage. We provide multiple solutions with different variants solely based on information contained in event-based sequences and temporal information. Our code is available at GitHub. We conduct experiments on four different datasets with an increasing set of items to show a possible range for scalability. The promising results show the applicability of these grammar-based recommender system variants and leave the final decision on which recommender to choose to the user and its ultimate goals.

MCML Authors

Simon Rauch

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[239]

S. Chen, Z. Han, B. He, Z. Ding, W. Yu, P. Torr, V. Tresp and J. Gu.
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
SeT LLM @ICLR 2024 - Workshop on Secure and Trustworthy Large Language Models at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Zifeng Ding

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[238]

V. Bengs, B. Haddenhorst and E. Hüllermeier.
Identifying Copeland Winners in Dueling Bandits with Indifferences.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[237]

P. Kolpaczki, M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
SVARM-IQ: Efficient Approximation of Any-order Shapley Interactions through Stratification.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

Addressing the limitations of individual attribution scores via the Shapley value (SV), the field of explainable AI (XAI) has recently explored intricate interactions of features or data points. In particular, extensions of the SV, such as the Shapley Interaction Index (SII), have been proposed as a measure to still benefit from the axiomatic basis of the SV. However, similar to the SV, their exact computation remains computationally prohibitive. Hence, we propose with SVARM-IQ a sampling-based approach to efficiently approximate Shapley-based interaction indices of any order. SVARM-IQ can be applied to a broad class of interaction indices, including the SII, by leveraging a novel stratified representation. We provide non-asymptotic theoretical guarantees on its approximation quality and empirically demonstrate that SVARM-IQ achieves state-of-the-art estimation results in practical XAI scenarios on different model classes and application domains.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[236]

A. Lohrer, D. Kazempour, M. Hünemörder and P. Kröger.
CoMadOut—a robust outlier detection algorithm based on CoMAD.
Machine Learning 113 (May. 2024). DOI

Abstract

Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.

MCML Authors

Andreas Lohrer

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Principal Investigator

[235]

N. Strauß and M. Schubert.
Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem.
SDM 2024 - SIAM International Conference on Data Mining. Houston, TX, USA, Apr 18-20, 2024. DOI

Abstract

The traveling officer problem (TOP) is a challenging stochastic optimization task. In this problem, a parking officer is guided through a city equipped with parking sensors to fine as many parking offenders as possible. A major challenge in TOP is the dynamic nature of parking offenses, which randomly appear and disappear after some time, regardless of whether they have been fined. Thus, solutions need to dynamically adjust to currently fineable parking offenses while also planning ahead to increase the likelihood that the officer arrives during the offense taking place. Though various solutions exist, these methods often struggle to take the implications of actions on the ability to fine future parking violations into account. This paper proposes SATOP, a novel spatial-aware deep reinforcement learning approach for TOP. Our novel state encoder creates a representation of each action, leveraging the spatial relationships between parking spots, the agent, and the action. Furthermore, we propose a novel message-passing module for learning future inter-action correlations in the given environment. Thus, the agent can estimate the potential to fine further parking violations after executing an action. We evaluate our method using an environment based on real-world data from Melbourne. Our results show that SATOP consistently outperforms state-of-the-art TOP agents and is able to fine up to 22% more parking offenses.

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[234]

C. Leiber.
Clustering in transformed feature spaces by analyzing distinct modes.
Dissertation 2024. DOI

Abstract

The growing availability of data demands clustering methods that can extract valuable information without requiring costly annotations, especially for large, high-dimensional datasets. This dissertation develops subspace and deep clustering approaches, leveraging methods like the Dip-test of unimodality and Minimum Description Length principle to identify and encode relevant features and clusters automatically, even in complex datasets. By incorporating these techniques into neural networks and refining them through a novel parameter-free approach, the research offers robust clustering tools that perform well without prior knowledge of the number of clusters, all implemented in the open-source package ClustPy. (Shortened).

MCML Authors

Collin Leiber

Dr.

* Former Member

[233]

I. M. Grigore, G. M. Tavares, M. C. Silva, P. Ceravolo and S. Junior.
Automated Trace Clustering Pipeline Synthesis in Process Mining.
Information 15.4 (Apr. 2024). DOI

Abstract

Business processes have undergone a significant transformation with the advent of the process-oriented view in organizations. The increasing complexity of business processes and the abundance of event data have driven the development and widespread adoption of process mining techniques. However, the size and noise of event logs pose challenges that require careful analysis. The inclusion of different sets of behaviors within the same business process further complicates data representation, highlighting the continued need for innovative solutions in the evolving field of process mining. Trace clustering is emerging as a solution to improve the interpretation of underlying business processes. Trace clustering offers benefits such as mitigating the impact of outliers, providing valuable insights, reducing data dimensionality, and serving as a preprocessing step in robust pipelines. However, designing an appropriate clustering pipeline can be challenging for non-experts due to the complexity of the process and the number of steps involved. For experts, it can be time-consuming and costly, requiring careful consideration of trade-offs. To address the challenge of pipeline creation, the paper proposes a genetic programming solution for trace clustering pipeline synthesis that optimizes a multi-objective function matching clustering and process quality metrics. The solution is applied to real event logs, and the results demonstrate improved performance in downstream tasks through the identification of sub-logs.

MCML Authors

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

[232]

S. Feuerriegel, D. Frauen, V. Melnychuk, J. Schweisthal, K. Heß, A. Curth, S. Bauer, N. Kilbertus, I. S. Kohane and M. van der Schaar.
Causal machine learning for predicting treatment outcomes.
Nature Medicine 30 (Apr. 2024). DOI

Abstract

Causal machine learning (ML) offers flexible, data-driven methods for predicting treatment outcomes including efficacy and toxicity, thereby supporting the assessment and safety of drugs. A key benefit of causal ML is that it allows for estimating individualized treatment effects, so that clinical decision-making can be personalized to individual patient profiles. Causal ML can be used in combination with both clinical trial data and real-world data, such as clinical registries and electronic health records, but caution is needed to avoid biased or incorrect predictions. In this Perspective, we discuss the benefits of causal ML (relative to traditional statistical or ML approaches) and outline the key components and steps. Finally, we provide recommendations for the reliable use of causal ML and effective translation into the clinic.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[231]

P. Hofman, Y. Sale and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty with Proper Scoring Rules.
Preprint (Apr. 2024). arXiv

Abstract

Uncertainty representation and quantification are paramount in machine learning and constitute an important prerequisite for safety-critical applications. In this paper, we propose novel measures for the quantification of aleatoric and epistemic uncertainty based on proper scoring rules, which are loss functions with the meaningful property that they incentivize the learner to predict ground-truth (conditional) probabilities. We assume two common representations of (epistemic) uncertainty, namely, in terms of a credal set, i.e. a set of probability distributions, or a second-order distribution, i.e., a distribution over probability distributions. Our framework establishes a natural bridge between these representations. We provide a formal justification of our approach and introduce new measures of epistemic and aleatoric uncertainty as concrete instantiations.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Artificial Intelligence and Machine Learning

[230]

L. Rottkamp and M. Schubert.
A Time-Inhomogeneous Markov Model for Resource Availability under Sparse Observations.
Preprint (Apr. 2024). arXiv

Abstract

Accurate spatio-temporal information about the current situation is crucial for smart city applications such as modern routing algorithms. Often, this information describes the state of stationary resources, e.g. the availability of parking bays, charging stations or the amount of people waiting for a vehicle to pick them up near a given location. To exploit this kind of information, predicting future states of the monitored resources is often mandatory because a resource might change its state within the time until it is needed. To train an accurate predictive model, it is often not possible to obtain a continuous time series on the state of the resource. For example, the information might be collected from traveling agents visiting the resource with an irregular frequency. Thus, it is necessary to develop methods which work on sparse observations for training and prediction. In this paper, we propose time-inhomogeneous discrete Markov models to allow accurate prediction even when the frequency of observation is very rare. Our new model is able to blend recent observations with historic data and also provide useful probabilistic estimates for future states. Since resources availability in a city is typically time-dependent, our Markov model is time-inhomogeneous and cyclic within a predefined time interval. To train our model, we propose a modified Baum-Welch algorithm. Evaluations on real-world datasets of parking bay availability show that our new method indeed yields good results compared to methods being trained on complete data and non-cyclic variants.

MCML Authors

Lukas Rottkamp

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[229]

Abstract

MCML Authors

Yusuf Sale

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Julia Herbinger

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[228]

H. Chen, Y. Zhang, D. Krompass, J. Gu and V. Tresp.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount of data for finetuning. However, collecting and centralizing training data from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as a promising solution, enabling multiple clients to collaboratively train neural networks without centralizing their local data. To alleviate client computation burdens and communication overheads, previous works have adapted Parameter-efficient Finetuning (PEFT) methods for FL. Hereby, only a small fraction of the model parameters are optimized and communicated during federated communications. Nevertheless, most previous works have focused on a single modality and neglected one common phenomenon, i.e., the presence of data heterogeneity across the clients. Therefore, in this work, we propose a finetuning framework tailored to heterogeneous multi-modal FL, called Federated Dual-Aadapter Teacher (FedDAT). Specifically, our approach leverages a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, where FedDAT substantially outperforms the existing centralized PEFT methods adapted for FL.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Yao Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[227]

P. Kolpaczki, V. Bengs, M. Muschalik and E. Hüllermeier.
Approximating the Shapley Value without Marginal Contributions.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

The Shapley value, which is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, has recently been used intensively in explainable artificial intelligence. Its meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley value, most of them revolve around the notion of an agent’s marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contribution. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[226]

J. Lienen and E. Hüllermeier.
Mitigating Label Noise through Data Ambiguation.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

Label noise poses an important challenge in machine learning, especially in deep learning, in which large models with high expressive power dominate the field. Models of that kind are prone to memorizing incorrect labels, thereby harming generalization performance. Many methods have been proposed to address this problem, including robust loss functions and more complex label correction approaches. Robust loss functions are appealing due to their simplicity, but typically lack flexibility, while label correction usually adds substantial complexity to the training setup. In this paper, we suggest to address the shortcomings of both methodologies by ‘ambiguating’ the target information, adding additional, complementary candidate labels in case the learner is not sufficiently convinced of the observed training label. More precisely, we leverage the framework of so-called superset learning to construct set-valued targets based on a confidence threshold, which deliver imprecise yet more reliable beliefs about the ground-truth, effectively helping the learner to suppress the memorization effect. In an extensive empirical evaluation, our method demonstrates favorable learning behavior on synthetic and real-world noise, confirming the effectiveness in detecting and correcting erroneous training labels.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[225]

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

While shallow decision trees may be interpretable, larger ensemble models like gradient-boosted trees, which often set the state of the art in machine learning problems involving tabular data, still remain black box models. As a remedy, the Shapley value (SV) is a well-known concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions. The model-specific TreeSHAP methodology solves the exponential complexity for retrieving exact SVs from tree-based models. Expanding beyond individual feature attribution, Shapley interactions reveal the impact of intricate feature interactions of any order. In this work, we present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions of tree-based models. TreeSHAP-IQ is supported by a mathematical framework that exploits polynomial arithmetic to compute the interaction scores in a single recursive traversal of the tree, akin to Linear TreeSHAP. We apply TreeSHAP-IQ on state-of-the-art tree ensembles and explore interactions on well-established benchmark datasets.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Artificial Intelligence and Machine Learning

[224]

M. Bernhard, R. Amoroso, Y. Kindermann, M. Schubert, L. Baraldi, R. Cucchiara and V. Tresp.
What's Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI GitHub

Abstract

Semantic segmentation represents a fundamental task in computer vision with various application areas such as autonomous driving, medical imaging, or remote sensing. For evaluating and comparing semantic segmentation models, the mean intersection over union (mIoU) is currently the gold standard. However, while mIoU serves as a valuable benchmark, it does not offer insights into the types of errors incurred by a model. Moreover, different types of errors may have different impacts on downstream applications. To address this issue, we propose an intuitive method for the systematic categorization of errors, thereby enabling a fine-grained analysis of semantic segmentation models. Since we assign each erroneous pixel to precisely one error type, our method seamlessly extends the popular IoU-based evaluation by shedding more light on the false positive and false negative predictions. Our approach is model- and dataset-agnostic, as it does not rely on additional information besides the predicted and ground-truth segmentation masks. In our experiments, we demonstrate that our method accurately assesses model strengths and weaknesses on a quantitative basis, thus reducing the dependence on time-consuming qualitative model inspection. We analyze a variety of state-of-the-art semantic segmentation models, revealing systematic differences across various architectural paradigms. Exploiting the gained insights, we showcase that combining two models with complementary strengths in a straightforward way is sufficient to consistently improve mIoU, even for models setting the current state of the art on ADE20K.

MCML Authors

Maximilian Bernhard

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[223]

U. Sahin, H. Li, Q. Khan, D. Cremers and V. Tresp.
Enhancing Multimodal Compositional Reasoning of Visual Language Models With Generative Negative Mining.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI GitHub

Abstract

Contemporary large-scale visual language models (VLMs) exhibit strong representation capacities, making them ubiquitous for enhancing image and text understanding tasks. They are often trained in a contrastive manner on a large and diverse corpus of images and corresponding text captions scraped from the internet. Despite this, VLMs often struggle with compositional reasoning tasks which require a fine-grained understanding of the complex interactions of objects and their attributes. This failure can be attributed to two main factors: 1) Contrastive approaches have traditionally focused on mining negative examples from existing datasets. However, the mined negative examples might not be difficult for the model to discriminate from the positive. An alternative to mining would be negative sample generation 2) But existing generative approaches primarily focus on generating hard negative texts associated with a given image. Mining in the other direction, i.e., generating negative image samples associated with a given text has been ignored. To overcome both these limitations, we propose a framework that not only mines in both directions but also generates challenging negative samples in both modalities, i.e., images and texts. Leveraging these generative hard negative samples, we significantly enhance VLMs’ performance in tasks involving multimodal compositional reasoning.

MCML Authors

Hang Li

* Former Member

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[222]

G. Zhang, Y. Zhang, K. Zhang and V. Tresp.
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI GitHub

Abstract

Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even surpass human capability in reasoning times and location. To address this question, we propose a two-stage Recognition & Reasoning probing task applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the studies, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In extensive evaluation experiments, we find that although VLMs can effectively retain times and location-relevant features in visual encoders, they still fail to make perfect reasoning with context-conditioned visual features.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[221]

E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part I.
4OR (Jan. 2024). DOI

Abstract

Multiple criteria decision aiding (MCDA) and preference learning (PL) are established research fields, which have different roots, developed in different communities – the former in the decision sciences and operations research, the latter in AI and machine learning – and have their own agendas in terms of problem setting, assumptions, and criteria of success. In spite of this, they share the major goal of constructing practically useful decision models that either support humans in the task of choosing the best, classifying, or ranking alternatives from a given set, or even automate decision-making by acting autonomously on behalf of the human. Therefore, MCDA and PL can complement and mutually benefit from each other, a potential that has been exhausted only to some extent so far. By elaborating on the connection between MCDA and PL in more depth, our goal is to stimulate further research at the junction of these two fields. To this end, we first review both methodologies, MCDA in this part of the paper and PL in the second part, with the intention of highlighting their most common elements. In the second part, we then compare both methodologies in a systematic way and give an overview of existing work on combining PL and MCDA.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[220]

E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part II.
4OR (Jan. 2024). DOI

Abstract

This article elaborates on the connection between multiple criteria decision aiding (MCDA) and preference learning (PL), two research fields with different roots and developed in different communities. It complements the first part of the paper, in which we started with a review of MCDA. In this part, a similar review will be given for PL, followed by a systematic comparison of both methodologies, as well as an overview of existing work on combining PL and MCDA. Our main goal is to stimulate further research at the junction of these two methodologies.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[219]

S. Chen, J. Gu, Z. Han, Y. Ma, P. Torr and V. Tresp.
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL GitHub

Abstract

Various adaptation methods, such as LoRA, prompts, and adapters, have been proposed to enhance the performance of pre-trained vision-language models in specific domains. As test samples in real-world applications usually differ from adaptation data, the robustness of these adaptation methods against distribution shifts are essential. In this study, we assess the robustness of 11 widely-used adaptation methods across 4 vision-language datasets under multimodal corruptions. Concretely, we introduce 7 benchmark datasets, including 96 visual and 87 textual corruptions, to investigate the robustness of different adaptation methods, the impact of available adaptation examples, and the influence of trainable parameter size during adaptation. Our analysis reveals that: 1) Adaptation methods are more sensitive to text corruptions than visual corruptions. 2) Full fine-tuning does not consistently provide the highest robustness; instead, adapters can achieve better robustness with comparable clean performance. 3) Contrary to expectations, our findings indicate that increasing the number of adaptation data and parameters does not guarantee enhanced robustness; instead, it results in even lower robustness. We hope this study could benefit future research in the development of robust multimodal adaptation methods.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[218]

F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
SHAP-IQ: Unified Approximation of any-order Shapley Interactions.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[217]

Y. Scholten, J. Schuchardt, A. Bojchevski and S. Günnemann.
Hierarchical randomized smoothing.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Real-world data is complex and often consists of objects that can be decomposed into multiple entities (e.g. images into pixels, graphs into interconnected nodes). Randomized smoothing is a powerful framework for making models provably robust against small changes to their inputs - by guaranteeing robustness of the majority vote when randomly adding noise before classification. Yet, certifying robustness on such complex data via randomized smoothing is challenging when adversaries do not arbitrarily perturb entire objects (e.g. images) but only a subset of their entities (e.g. pixels). As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. By adding noise in a more targeted manner than existing methods we obtain stronger robustness guarantees while maintaining high accuracy. We initialize hierarchical smoothing using different noising distributions, yielding novel robustness certificates for discrete and continuous domains. We experimentally demonstrate the importance of hierarchical smoothing in image and node classification, where it yields superior robustness-accuracy trade-offs. Overall, hierarchical smoothing is an important contribution towards models that are both - certifiably robust to perturbations and accurate.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[216]

J. Schuchardt, Y. Scholten and S. Günnemann.
Provable Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

A machine learning model is traditionally considered robust if its prediction remains (almost) constant under input perturbations with small norm. However, real-world tasks like molecular property prediction or point cloud segmentation have inherent equivariances, such as rotation or permutation equivariance. In such tasks, even perturbations with large norm do not necessarily change an input’s semantic content. Furthermore, there are perturbations for which a model’s prediction explicitly needs to change. For the first time, we propose a sound notion of adversarial robustness that accounts for task equivariance. We then demonstrate that provable robustness can be achieved by (1) choosing a model that matches the task’s equivariances (2) certifying traditional adversarial robustness. Certification methods are, however, unavailable for many models, such as those with continuous equivariances. We close this gap by developing the framework of equivariance-preserving randomized smoothing, which enables architecture-agnostic certification. We additionally derive the first architecture-specific graph edit distance certificates, i.e. sound robustness guarantees for isomorphism equivariant tasks like node classification. Overall, a sound notion of robustness is an important prerequisite for future work at the intersection of robust and geometric machine learning.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[215]

R. Liao, X. Jia, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
TGL @NeurIPS 2023 - Workshop Temporal Graph Learning at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

The rapid advancements in large language models (LLMs) have ignited interest in the realm of the temporal knowledge graph (TKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex graph data structure and sequential natural expressions LLMs can handle, and between the enormous data volume of TKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and lightweight few-shot parameter-efficient instruction tuning to solve the above challenges. Extensive experiments have shown that GenTKG is a simple but effective, efficient, and generalizable approach that outperforms conventional methods on temporal relational forecasting with extremely limited computation. Our work opens a new frontier for the temporal knowledge graph domain.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[214]

A. Koebler, T. Decker, M. Lebacher, I. Thon, V. Tresp and F. Buettner.
Towards Explanatory Model Monitoring.
XAIA @NeurIPS 2023 - Workshop XAI in Action: Past, Present, and Future Applications at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Monitoring machine learning systems and efficiently recovering their reliability after performance degradation are two of the most critical issues in real-world applications. However, current monitoring strategies lack the capability to provide actionable insights answering the question of why the performance of a particular model really degraded. To address this, we propose Explanatory Performance Estimation (XPE) as a novel method that facilitates more informed model monitoring and maintenance by attributing an estimated performance change to interpretable input features. We demonstrate the superiority of our approach compared to natural baselines on different data sets. We also discuss how the generated results lead to valuable insights that can reveal potential root causes for model deterioration and guide toward actionable countermeasures.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[213]

C. Leiber, L. Miklautz, C. Plant and C. Böhm.
Benchmarking Deep Clustering Algorithms With ClustPy.
ICDMW 2023 - IEEE International Conference on Data Mining Workshops. Shanghai, China, Dec 01-04, 2023. DOI GitHub

Abstract

Deep clustering algorithms have gained popularity as they are able to cluster complex large-scale data, like images. Yet these powerful algorithms require many decisions w.r.t. architecture, learning rate and other hyperparameters, making it difficult to compare different methods. A comprehensive empirical evaluation of novel clustering methods, however, plays an important role in both scientific and practical applications, as it reveals their individual strengths and weaknesses. Therefore, we introduce ClustPy, a unified framework for benchmarking deep clustering algorithms, and perform a comparison of several fundamental deep clustering methods and some recently introduced ones. We compare these methods on multiple well known image data sets using different evaluation metrics, perform a sensitivity analysis w.r.t. important hyperparameters and perform ablation studies, e.g., for different autoencoder architectures and image augmentation. To our knowledge this is the first in depth benchmarking of deep clustering algorithms in a unified setting.

MCML Authors

Collin Leiber

Dr.

* Former Member

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[212]

T. Kaufmann, P. Weng, V. Bengs and E. Hüllermeier.
A Survey of Reinforcement Learning from Human Feedback.
Preprint (Dec. 2023). arXiv

Abstract

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in directing the model’s capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[211]

Y. Sale, P. Hofman, L. Wimmer, E. Hüllermeier and T. Nagler.
Second-Order Uncertainty Quantification: Variance-Based Measures.
Preprint (Dec. 2023). arXiv

Abstract

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Lisa Wimmer

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[210]

E. Thelisson, G. Mika, Q. Schneiter, K. Padh and H. Verma.
Toward Responsible AI Use: Considerations for Sustainability Impact Assessment.
Preprint (Dec. 2023). arXiv

Abstract

As AI/ML models, including Large Language Models, continue to scale with massive datasets, so does their consumption of undeniably limited natural resources, and impact on society. In this collaboration between AI, Sustainability, HCI and legal researchers, we aim to enable a transition to sustainable AI development by enabling stakeholders across the AI value chain to assess and quantitfy the environmental and societal impact of AI. We present the ESG Digital and Green Index (DGI), which offers a dashboard for assessing a company’s performance in achieving sustainability targets. This includes monitoring the efficiency and sustainable use of limited natural resources related to AI technologies (water, electricity, etc). It also addresses the societal and governance challenges related to AI. The DGI creates incentives for companies to align their pathway with the Sustainable Development Goals (SDGs). The value, challenges and limitations of our methodology and findings are discussed in the paper.

MCML Authors

Kirtan Padh

Ethics in Systems Design and Machine Learning

[209]

G. Zhang, J. Bi, J. Gu, Y. Chen and V. Tresp.
SPOT! Revisiting Video-Language Models for Event Understanding.
Preprint (Dec. 2023). arXiv

Abstract

Understanding videos is an important research topic for multimodal learning. Leveraging large-scale datasets of web-crawled video-text pairs as weak supervision has become a pre-training paradigm for learning joint representations and showcased remarkable potential in video understanding tasks. However, videos can be multi-event and multi-grained, while these video-text pairs usually contain only broad-level video captions. This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events? To address this, we introduce SPOT Prober, to benchmark existing video-language models’s capacities of distinguishing event-level discrepancies as an indicator of models’ event understanding ability. Our approach involves extracting events as tuples (<Subject, Predicate, Object, Attribute, Timestamps>) from videos and generating false event tuples by manipulating tuple components systematically. We reevaluate the existing video-language models with these positive and negative captions and find they fail to distinguish most of the manipulated events. Based on our findings, we propose to plug in these manipulated event captions as hard negative samples and find them effective in enhancing models for event understanding.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[208]

Z. Ding, Z. Li, R. Qi, J. Wu, B. He, Y. Ma, Z. Meng, S. Chen, R. Liao, Z. Han and V. Tresp.
FORECASTTKGQUESTIONS: A Benchmark for Temporal Question Answering and Forecasting over Temporal Knowledge Graphs.
ISWC 2023 - 22nd International Semantic Web Conference. Athens, Greeke, Nov 06-11, 2023. DOI

Abstract

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. Previous related works aim to develop QA systems that answer temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning this period can be fully used for inference. In real-world scenarios, however, it is common that given knowledge until the current instance, we wish the TKGQA systems to answer the questions asking about future. As humans constantly plan the future, building forecasting TKGQA systems is important. In this paper, we propose a novel task: forecasting TKGQA, and propose a coupled large-scale TKGQA benchmark dataset, i.e., FORECASTTKGQUESTIONS. It includes three types of forecasting questions, i.e., entity prediction, yes-unknown, and fact reasoning questions. For every question, a timestamp is annotated and QA models only have access to TKG information prior to it for answer inference. We find that previous TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-unknown and fact reasoning questions. To this end, we propose FORECASTTKGQA, a TKGQA model that employs a TKG forecasting module for future inference. Experiments show that it performs well in forecasting TKGQA.

MCML Authors

Zifeng Ding

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Shuo Chen

Database Systems and Data Mining AI Lab

Ruotong Liao

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[207]

A. Maldonado, L. Zellner, S. Strickroth and T. Seidl.
Process Mining Techniques for Collusion Detection in Online Exams.
EduPM @ICPM 2023 - 2nd International Workshop ‘Education meets Process Mining’ organized with the 5th International Conference on Process Mining (ICPM 2023). Rome, Italy, Oct 23-27, 2023. DOI

Abstract

Honesty and fairness are essential. As many skills, practicing those values starts in the classroom. Whether students are examined online or on-site, only testing their knowledge righteously, educators can assess their skills and room for improvement. As online exams increase, we are provided with more suitable data for analysis. Process mining methods as anomaly detection and trace clustering techniques have been used to identify dishonest behavior in other fields, as e.g. fraud detection. In this paper, we investigate collusion detection in online exams as a process mining task. We explore trace ordering for anomaly detection (TOAD) as well as hierarchical agglomerative trace clustering (HATC). Promising preliminary results exemplify, how process mining techniques empower teachers in their decision making, while via flexible configuration of parameters, leaves the last word to them.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[206]

A. Maldonado, G. M. Tavares, R. Oyamada, P. Ceravolo and T. Seidl.
FEEED: Feature Extraction from Event Data.
ICPM 2023 - Doctoral Consortium at the 5th International Conference on Process Mining. Rome, Italy, Oct 23-27, 2023. PDF

Abstract

The analysis of event data is largely influenced by the effective characterization of descriptors. These descriptors serve as the building blocks of our understanding, encapsulating the behavior described within the event data. In light of these considerations, we introduce FEEED (Feature Extraction from Event Data), an extendable tool for event data feature extraction. FEEED represents a significant advancement in event data behavior analysis, offering a range of features to empower analysts and data scientists in their pursuit of insightful, actionable, and understandable event data analysis. What sets FEEED apart is its unique capacity to act as a bridge between the worlds of data mining and process mining. In doing so, it promises to enhance the accuracy, comprehensiveness, and utility of characterizing event data for a diverse range of applications.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining AI Lab

Gabriel Marques Tavares

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[205]

C. Leiber, L. Miklautz, C. Plant and C. Böhm.
Application of Deep Clustering Algorithms.
CIKM 2023 - 32nd ACM International Conference on Information and Knowledge Management. Birmingham, UK, Oct 21-25, 2023. DOI

Abstract

Deep clustering algorithms have gained popularity for clustering complex, large-scale data sets, but getting started is difficult because of numerous decisions regarding architecture, optimizer, and other hyperparameters. Theoretical foundations must be known to obtain meaningful results. At the same time, ease of use is necessary to get used by a broader audience. Therefore, we require a unified framework that allows for easy execution in diverse settings. While this applies to established clustering methods like k-Means and DBSCAN, deep clustering algorithms lack a standard structure, resulting in significant programming overhead. This complicates empirical evaluations, which are essential in both scientific and practical applications. We present a solution to this problem by providing a theoretical background on deep clustering as well as practical implementation techniques and a unified structure with predefined neural networks. For the latter, we use the Python package ClustPy. The aim is to share best practices and facilitate community participation in deep clustering research.

MCML Authors

Collin Leiber

Dr.

* Former Member

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[204]

J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Probabilistic Scoring Lists for Interpretable Machine Learning.
DS 2023 - 26th International Conference on Discovery Science. Porto, Portugal, Oct 09-11, 2023. DOI

Abstract

A scoring system is a simple decision model that checks a set of features, adds a certain number of points to a total score for each feature that is satisfied, and finally makes a decision by comparing the total score to a threshold. Scoring systems have a long history of active use in safety-critical domains such as healthcare and justice, where they provide guidance for making objective and accurate decisions. Given their genuine interpretability, the idea of learning scoring systems from data is obviously appealing from the perspective of explainable AI. In this paper, we propose a practically motivated extension of scoring systems called probabilistic scoring lists (PSL), as well as a method for learning PSLs from data. Instead of making a deterministic decision, a PSL represents uncertainty in the form of probability distributions. Moreover, in the spirit of decision lists, a PSL evaluates features one by one and stops as soon as a decision can be made with enough confidence. To evaluate our approach, we conduct a case study in the medical domain.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[203]

L. Miklautz, A. Shkabrii, C. Leiber, B. Tobias, B. Seidl, E. Weissensteiner, A. Rausch, C. Böhm and C. Plant.
Non-Redundant Image Clustering of Early Medieval Glass Beads.
DSAA 2023 - 10th IEEE International Conference on Data Science and Advanced Analytics. Thessaloniki, Greece, Oct 09-13, 2023. DOI

Abstract

Glass beads were among the most common grave goods in the Early Middle Ages, with an estimated number in the millions. The color, size, shape and decoration of the beads are diverse leading to many different archaeological classification systems that depend on the subjective decisions of individual experts. The lack of an agreed upon expert categorization leads to a pressing problem in archaeology, as the categorization of archaeological artifacts, like glass beads, is important to learn about cultural trends, manufacturing processes or economic relationships (e.g., trade routes) of historical times. An automated, objective and reproducible classification system is therefore highly desirable. We present a high-quality data set of images of Early Medieval beads and propose a clustering pipeline to learn a classification system in a data-driven way. The pipeline consists of a novel extension of deep embedded non-redundant clustering to identify multiple, meaningful clusterings of glass bead images. During the cluster analysis we address several challenges associated with the data and as a result identify high-quality clusterings that overlap with archaeological domain expertise. To the best of our knowledge this is the first application of non-redundant image clustering for archaeological data.

MCML Authors

Collin Leiber

Dr.

* Former Member

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[202]

J. Brandt, E. Schede, S. Sharma, V. Bengs, E. Hüllermeier and K. Tierney.
Contextual Preselection Methods in Pool-based Realtime Algorithm Configuration.
LWDA 2023 - Conference on Lernen. Wissen. Daten. Analysen. Marburg, Germany, Oct 09-11, 2023. PDF

Abstract

Realtime algorithm configuration is concerned with the task of designing a dynamic algorithm configurator that observes sequentially arriving problem instances of an algorithmic problem class for which it selects suitable algorithm configurations (e.g., minimal runtime) of a specific target algorithm. The Contextual Preselection under the Plackett-Luce (CPPL) algorithm maintains a pool of configurations from which a set of algorithm configurations is selected that are run in parallel on the current problem instance. It uses the well-known UCB selection strategy from the bandit literature, while the pool of configurations is updated over time via a racing mechanism. In this paper, we investigate whether the performance of CPPL can be further improved by using different bandit-based selection strategies as well as a ranking-based strategy to update the candidate pool. Our experimental results show that replacing these components can indeed improve performance again significantly.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[201]

J. Hanselle, J. Kornowicz, S. Heid, K. Thommes and E. Hüllermeier.
Comparing Humans and Algorithms in Feature Ranking: A Case-Study in the Medical Domain.
LWDA 2023 - Conference on Lernen. Wissen. Daten. Analysen. Marburg, Germany, Oct 09-11, 2023. PDF

Abstract

The selection of useful, informative, and meaningful features is a key prerequisite for the successful application of machine learning in practice, especially in knowledge-intense domains like decision support. Here, the task of feature selection, or ranking features by importance, can, in principle, be solved automatically in a data-driven way but also supported by expert knowledge. Besides, one may of course, conceive a combined approach, in which a learning algorithm closely interacts with a human expert. In any case, finding an optimal approach requires a basic understanding of human capabilities in judging the importance of features compared to those of a learning algorithm. Hereto, we conducted a case study in the medical domain, comparing feature rankings based on human judgment to rankings automatically derived from data. The quality of a ranking is determined by the performance of a decision list processing features in the order specified by the ranking, more specifically by so-called probabilistic scoring systems.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Artificial Intelligence and Machine Learning

[200]

M. Bernhard, N. Strauß and M. Schubert.
MapFormer: Boosting Change Detection by Using Pre-change Information.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI GitHub

Abstract

Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth’s surface. In this paper, we leverage this information for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Motivated by this observation, we propose the new task of Conditional Change Detection, where pre-change semantic information is used as input next to bi-temporal images. To fully exploit the extra information, we propose MapFormer, a novel architecture based on a multi-modal feature fusion module that allows for feature processing conditioned on the available semantic information. We further employ a supervised, cross-modal contrastive loss to guide the learning of visual representations. Our approach outperforms existing change detection methods by an absolute 11.7% and 18.4% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we demonstrate the robustness of our approach to the quality of the pre-change semantic information and the absence pre-change imagery.

MCML Authors

Maximilian Bernhard

Dr.

* Former Member

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[199]

H. Chen, A. Frikha, D. Krompass, J. Gu and V. Tresp.
FRAug: Tackling Federated Learning with Non-IID Features via Representation Augmentation.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

Federated Learning (FL) is a decentralized machine learning paradigm, in which multiple clients collaboratively train neural networks without centralizing their local data, and hence preserve data privacy. However, real-world FL applications usually encounter challenges arising from distribution shifts across the local datasets of individual clients. These shifts may drift the global model aggregation or result in convergence to deflected local optimum. While existing efforts have addressed distribution shifts in the label space, an equally important challenge remains relatively unexplored. This challenge involves situations where the local data of different clients indicate identical label distributions but exhibit divergent feature distributions. This issue can significantly impact the global model performance in the FL framework. In this work, we propose Federated Representation Augmentation (FRAug) to resolve this practical and challenging problem. FRAug optimizes a shared embedding generator to capture client consensus. Its output synthetic embeddings are transformed into client-specific by a locally optimized RTNet to augment the training space of each client. Our empirical evaluation on three public benchmarks and a real-world medical dataset demonstrates the effectiveness of the proposed method, which substantially outperforms the current state-of-the-art FL methods for feature distribution shifts, including PartialFed and FedBN.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Ahmed Frikha

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[198]

H. Li, J. Gu, R. Koner, S. Sharifzadeh and V. Tresp.
Do DALL-E and Flamingo Understand Each Other?
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI GitHub

Abstract

The field of multimodal research focusing on the comprehension and creation of both images and text has witnessed significant strides. This progress is exemplified by the emergence of sophisticated models dedicated to image captioning at scale, such as the notable Flamingo model and text-to-image generative models, with DALL-E serving as a prominent example. An interesting question worth exploring in this domain is whether Flamingo and DALL-E understand each other. To study this question, we propose a reconstruction task where Flamingo generates a description for a given image and DALL-E uses this description as input to synthesize a new image. We argue that these models understand each other if the generated image is similar to the given image. Specifically, we study the relationship between the quality of the image reconstruction and that of the text generation. We find that an optimal description of an image is one that gives rise to a generated image similar to the original one. The finding motivates us to propose a unified framework to finetune the text-to-image and image-to-text models. Concretely, the reconstruction part forms a regularization loss to guide the tuning of the models. Extensive experiments on multiple datasets with different image captioning and image generation models validate our findings and demonstrate the effectiveness of our proposed unified framework. As DALL-E and Flamingo are not publicly available, we use Stable Diffusion and BLIP in the remaining work.

MCML Authors

Hang Li

* Former Member

Rajat Koner

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[197]

G. Zhang, J. Ren, J. Gu and V. Tresp.
Multi-event Video-Text Retrieval.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI GitHub

Abstract

Video-Text Retrieval (VTR) is a crucial multi-modal task in an era of massive video-text data on the Internet. A plethora of work characterized by using a two-stream Vision-Language model architecture that learns a joint representation of video-text pairs has become a prominent approach for the VTR task. However, these models operate under the assumption of bijective video-text correspondences and neglect a more practical scenario where video content usually encompasses multiple events, while texts like user queries or webpage metadata tend to be specific and correspond to single events. This establishes a gap between the previous training objective and real-world applications, leading to the potential performance degradation of earlier models during inference. In this study, we introduce the Multi-event Video-Text Retrieval (MeVTR) task, addressing scenarios in which each video contains multiple different events, as a niche scenario of the conventional Video-Text Retrieval Task. We present a simple model, Me-Retriever, which incorporates key event video representation and a new MeVTR loss for the MeVTR task. Comprehensive experiments show that this straightforward framework outperforms other models in the Video-to-Text and Text-to-Video tasks, effectively establishing a robust baseline for the MeVTR task. We believe this work serves as a strong foundation for future studies.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[196]

Y. Shen, R. Liao, Z. Han, Y. Ma and V. Tresp.
GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models.
Preprint (Oct. 2023). arXiv

Abstract

While multi-modal models have successfully integrated information from image, video, and audio modalities, integrating graph modality into large language models (LLMs) remains unexplored. This discrepancy largely stems from the inherent divergence between structured graph data and unstructured text data. Incorporating graph knowledge provides a reliable source of information, enabling potential solutions to address issues in text generation, e.g., hallucination, and lack of domain knowledge. To evaluate the integration of graph knowledge into language models, a dedicated dataset is needed. However, there is currently no benchmark dataset specifically designed for multimodal graph-language models. To address this gap, we propose GraphextQA, a question answering dataset with paired subgraphs, retrieved from Wikidata, to facilitate the evaluation and future development of graph-language models. Additionally, we introduce a baseline model called CrossGNN, which conditions answer generation on the paired graphs by cross-attending question-aware graph features at decoding. The proposed dataset is designed to evaluate graph-language models’ ability to understand graphs and make use of it for answer generation. We perform experiments with language-only models and the proposed graph-language model to validate the usefulness of the paired graphs and to demonstrate the difficulty of the task.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[195]

D. Winkel, N. Strauß, M. Schubert and T. Seidl.
Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning.
ECAI 2023 - 26th European Conference on Artificial Intelligence. Kraków, Poland, Sep 30-Oct 04, 2023. DOI

Abstract

Portfolio optimization tasks describe sequential decision problems in which the investor’s wealth is distributed across a set of assets. Allocation constraints are used to enforce minimal or maximal investments into particular subsets of assets to control for objectives such as limiting the portfolio’s exposure to a certain sector due to environmental concerns. Although methods for (CRL) can optimize policies while considering allocation constraints, it can be observed that these general methods yield suboptimal results. In this paper, we propose a novel approach to handle allocation constraints based on a decomposition of the constraint action space into a set of unconstrained allocation problems. In particular, we examine this approach for the case of two constraints. For example, an investor may wish to invest at least a certain percentage of the portfolio into green technologies while limiting the investment in the fossil energy sector. We show that the action space of the task is equivalent to the decomposed action space, and introduce a new (RL) approach CAOSD, which is built on top of the decomposition. The experimental evaluation on real-world Nasdaq data demonstrates that our approach consistently outperforms state-of-the-art CRL benchmarks for portfolio optimization.

MCML Authors

David Winkel

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[194]

P. Becker and V. Bengs.
Shapley-Based Feature Selection for Online Algorithm Selection.
DynXAI @ECML-PKDD 2023 - Workshop on Explainable Artificial Intelligence: From Static to Dynamic at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Online algorithm selection concerns the task of designing a dynamic algorithm selector that observes sequentially arriving problem instances of an algorithmic problem class for which it must select a suitable algorithm from a given pool of candidate algorithms. Typically the suitability of a candidate algorithm is determined by its average runtime for solving a problem instance. Recent work has shown that multi-armed bandit algorithms can be leveraged for specifying a suitable algorithm selector by taking available feature information of problem instances into account. In this paper, we investigate whether the performance of these bandit-based selection strategies can be further improved by incorporating feature selection. To this end, we use the concept of Shapley values from cooperative game theory to specify the contribution of the features with respect to the suitability of the candidate algorithms and adapt the bandit-based selection strategies to consider only features with the highest contribution. We present two different Shapley value-based approaches and show empirically that UCB-based bandit selection strategies can be improved, while Thompson sampling-based strategies actually deteriorate in terms of average runtime.

MCML Authors

Viktor Bengs

Dr.

* Former Member

[193]

Z. Ding, J. Wu, Z. Li, Y. Ma and V. Tresp.
Improving Few-Shot Inductive Learning on Temporal Knowledge Graphs Using Confidence-Augmented Reinforcement Learning.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI GitHub

Abstract

Temporal knowledge graph completion (TKGC) aims to predict the missing links among the entities in a temporal knowledge graph (TKG). Most previous TKGC methods only consider predicting the missing links among the entities seen in the training set, while they are unable to achieve great performance in link prediction concerning newly-emerged unseen entities. Recently, a new task, i.e., TKG few-shot out-of-graph (OOG) link prediction, is proposed, where TKGC models are required to achieve great link prediction performance concerning newly-emerged entities that only have few-shot observed examples. In this work, we propose a TKGC method FITCARL that combines few-shot learning with reinforcement learning to solve this task. In FITCARL, an agent traverses through the whole TKG to search for the prediction answer. A policy network is designed to guide the search process based on the traversed path. To better address the data scarcity problem in the few-shot setting, we introduce a module that computes the confidence of each candidate action and integrate it into the policy for action selection. We also exploit the entity concept information with a novel concept regularizer to boost model performance. Experimental results show that FITCARL achieves stat-of-the-art performance on TKG few-shot OOG link prediction.

MCML Authors

Zifeng Ding

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[192]

S. Gilhuber, J. Busch, D. Rotthues, C. M. M. Frey and T. Seidl.
DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Node classification is one of the core tasks on attributed graphs, but successful graph learning solutions require sufficiently labeled data. To keep annotation costs low, active graph learning focuses on selecting the most qualitative subset of nodes that maximizes label efficiency. However, deciding which heuristic is best suited for an unlabeled graph to increase label efficiency is a persistent challenge. Existing solutions either neglect aligning the learned model and the sampling method or focus only on limited selection aspects. They are thus sometimes worse or only equally good as random sampling. In this work, we introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings. Toward better transferability between different graph structures, we combine three independent scoring functions to identify the most informative node samples for labeling in a parameter-free way: i) Model Uncertainty, ii) Diversity Component, and iii) Node Importance computed via graph diffusion heuristics. Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria and similarly fast as simpler heuristics. Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.

MCML Authors

Sandra Gilhuber (née Obermeier)

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Christian Frey

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[191]

S. Gilhuber, R. Hvingelby, M. L. A. Fok and T. Seidl.
How to Overcome Confirmation Bias in Semi-Supervised Image Classification by Active Learning.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Do we need active learning? The rise of strong deep semi-supervised methods raises doubt about the usability of active learning in limited labeled data settings. This is caused by results showing that combining semi-supervised learning (SSL) methods with a random selection for labeling can outperform existing active learning (AL) techniques. However, these results are obtained from experiments on well-established benchmark datasets that can overestimate the external validity. However, the literature lacks sufficient research on the performance of active semi-supervised learning methods in realistic data scenarios, leaving a notable gap in our understanding. Therefore we present three data challenges common in real-world applications: between-class imbalance, within-class imbalance, and between-class similarity. These challenges can hurt SSL performance due to confirmation bias. We conduct experiments with SSL and AL on simulated data challenges and find that random sampling does not mitigate confirmation bias and, in some cases, leads to worse performance than supervised learning. In contrast, we demonstrate that AL can overcome confirmation bias in SSL in these realistic settings. Our results provide insights into the potential of combining active and semi-supervised learning in the presence of common real-world challenges, which is a promising direction for robust methods when learning with limited labeled data in real-world applications.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[190]

S. Haas and E. Hüllermeier.
Rectifying Bias in Ordinal Observational Data Using Unimodal Label Smoothing.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

This paper proposes a novel approach for modeling observational data in the form of expert ratings, which are commonly given on an ordered (numerical or ordinal) scale. In practice, such ratings are often biased, due to the expert’s preferences, psychological effects, etc. Our approach aims to rectify these biases, thereby preventing machine learning methods from transferring them to models trained on the data. To this end, we make use of so-called label smoothing, which allows for redistributing probability mass from the originally observed rating to other ratings, which are considered as possible corrections. This enables the incorporation of domain knowledge into the standard cross-entropy loss and leads to flexibly configurable models. Concretely, our method is realized for ordinal ratings and allows for arbitrary unimodal smoothings using a binary smoothing relation. Additionally, the paper suggests two practically motivated smoothing heuristics to address common biases in observational data, a time-based smoothing to handle concept drift and a class-wise smoothing based on class priors to mitigate data imbalance. The effectiveness of the proposed methods is demonstrated on four real-world goodwill assessment data sets of a car manufacturer with the aim of automating goodwill decisions. Overall, this paper presents a promising approach for modeling ordinal observational data that can improve decision-making processes and reduce reliance on human expertise.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[189]

M. Klein, C. Leiber and C. Böhm.
k-SubMix: Common Subspace Clustering on Mixed-Type Data.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Clustering heterogeneous data is an ongoing challenge in the data mining community. The most prevalent clustering methods are designed to process datasets with numerical features only, but often datasets consist of mixed numerical and categorical features. This requires new approaches capable of handling both kinds of data types. Further, the most relevant cluster structures are often hidden in only a few features. Thus, another key challenge is to detect those specific features automatically and abandon features not relevant for clustering. This paper proposes the subspace mixed-type clustering algorithm k-SubMix, which tackles both challenges. Its cost function can handle both numerical and categorical features while simultaneously identifying those with the biggest impact for a high-quality clustering result. Unlike other subspace mixed-type clustering methods, k-SubMix preserves inter-cluster comparability, as it is the first mixed-type approach that defines a common subspace for all clusters. Extensive experiments show that k-SubMix outperforms competitive methods and reduces the data’s complexity by a simultaneous dimensionality reduction.

MCML Authors

Mauritius Klein

* Former Member

Collin Leiber

Dr.

* Former Member

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[188]

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[187]

Abstract

Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. Such coarse approximations can be detrimental in practical applications, notably safety-critical ones. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. These symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[186]

E. Terzieva, M. Muschalik, P. Hofman and E. Hüllermeier.
Identifying Trends in Feature Attributions During Training of Neural Networks.
ECML-PKDD 2023 - Workshop Uncertainty meets Explainability in Machine Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

This study investigates the evolving dynamics of commonly used feature attribution (FA) values during training of neural networks. As models transition from a state of high uncertainty to low uncertainty, we show that the features’ significance also changes, which is inline with the general learning theory of deep neural networks. During model training, we compute FA scores through Layer-wise Relevance Propagation (LRP) and Gradient-weighted Class Activation Mapping (Grad-CAM), which are selected for their efficiency and speed of computation. We summarize the attribution scores in terms of the sum of the absolute values of FA scores and their entropy. We further analyze these summary scores in relation to the models’ generalization capabilities. The analysis identifies trends where FA values increase in magnitude while entropy decreases during the training process, regardless of model generalization, suggesting independence of overfitting. This research offers a unique view on the application of FA methods in explainable artificial intelligence (XAI) and raises intriguing questions about their behavior across varying model architectures and datasets, which may have implications for future work combining XAI and uncertainty estimation in machine learning.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[185]

T. Kaufmann, S. Ball, J. Beck, E. Hüllermeier and F. Kreuter.
On the challenges and practices of reinforcement learning from real human feedback.
HLDM @ECML-PKDD 2023 - 1st Workshop on Hybrid Human-Machine Learning and Decision Making at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that does not require an engineered reward function but instead learns from human feedback. Due to its increasing popularity, various authors have studied how to learn an accurate reward model from only few samples, making optimal use of this feedback. Because of the cost and complexity of user studies, however, this research is often conducted with synthetic human feedback. Such feedback can be generated by evaluating behavior based on ground-truth rewards which are available for some benchmark tasks. While this setting can help evaluate some aspects of RLHF, it differs from practical settings in which synthetic feedback is not available. Working with real human feedback brings additional challenges that cannot be observed with synthetic feedback, including fatigue, inter-rater inconsistencies, delay, misunderstandings, and modality-dependent difficulties. We describe and discuss some of these challenges together with current practices and opportunities for further research in this paper.

MCML Authors

Timo Kaufmann

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence and Machine Learning

Sarah Ball

Social Data Science and AI

Jacob Beck

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[184]

A. Javanmardi, Y. Sale, P. Hofman and E. Hüllermeier.
Conformal Prediction with Partially Labeled Data.
COPA 2023 - 12th Symposium on Conformal and Probabilistic Prediction with Applications. Limassol, Cyprus, Sep 13-15, 2023. URL

Abstract

While the predictions produced by conformal prediction are set-valued, the data used for training and calibration is supposed to be precise. In the setting of superset learning or learning from partial labels, a variant of weakly supervised learning, it is exactly the other way around: training data is possibly imprecise (set-valued), but the model induced from this data yields precise predictions. In this paper, we combine the two settings by making conformal prediction amenable to set-valued training data. We propose a generalization of the conformal prediction procedure that can be applied to set-valued training and calibration data. We prove the validity of the proposed method and present experimental studies in which it compares favorably to natural baselines.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Artificial Intelligence and Machine Learning

[183]

L. Rottkamp, N. Strauß and M. Schubert.
DEAR: Dynamic Electric Ambulance Redeployment.
SSTD 2023 - 18th International Symposium on Spatial and Temporal Databases. Calgary, Canada, Aug 23-25, 2023. DOI

Abstract

Dynamic Ambulance Redeployment (DAR) is the task of dynamically assigning ambulances after incidents to base stations to minimize future response times. Though DAR has attracted considerable attention from the research community, existing solutions do not consider using electric ambulances despite the global shift towards electric mobility. In this paper, we are the first to examine the impact of electric ambulances and their required downtime for recharging to DAR and demonstrate that using policies for conventional vehicles can lead to a significant increase in either the number of required ambulances or in the response time to emergencies. Therefore, we propose a new redeployment policy that considers the remaining energy levels, the recharging stations’ locations, and the required recharging time. Our new method is based on minimizing energy deficits (MED) and can provide well-performing redeployment decisions in the novel Dynamic Electric Ambulance Redeployment problem (DEAR). We evaluate MED on a simulation using real-world emergency data from the city of San Francisco and show that MED can provide the required service level without additional ambulances in most cases. For DEAR, MED outperforms various established state-of-the-art solutions for conventional DAR and straightforward solutions to this setting.

MCML Authors

Lukas Rottkamp

Spatial Artificial Intelligence

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[182]

Z. Liu, Y. Ma, H. Li, M. Hildebrandt, Y. Ouyang and Z. Xiong.
Debiased Contrastive Loss for Collaborative Filtering.
KSEM 2023 - 16th International Conference Knowledge Science, Engineering and Management. Guangzhou, China, Aug 16-18, 2023. DOI

Abstract

Collaborative filtering (CF) is the most fundamental technique in recommender systems, which reveals user preference by implicit feedback. Generally, binary cross-entropy or bayesian personalized ranking are usually employed as the loss function to optimize model parameters. Recently, the sampled softmax loss has been proposed to enhance the sampling efficiency, which adopts an in-batch sample strategy. However, it suffers from the sample bias issue, which unavoidably introduces false negative instances, resulting inaccurate representations of users’ genuine interests. To address this problem, we propose a debiased contrastive loss, incorporating a bias correction probability to alleviate the sample bias. We integrate the proposed method into several matrix factorizations (MF) and graph neural network-based (GNN) recommendation models. Besides, we theoretically analyze the effectiveness of our methods in automatically mining the hard negative instances. Experimental results on three public benchmarks demonstrate that the proposed debiased contrastive loss can augment several existing MF and GNN-based CF models and outperform popular learning objectives in the recommendation. Additionally, we demonstrate that our method substantially enhances training efficiency.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[181]

A. Beer, A. Draganov, E. Hohma, P. Jahn, C. M. M. Frey and I. Assent.
Connecting the Dots — Density-Connectivity Distance unifies DBSCAN, k-Center and Spectral Clustering.
KDD 2023 - 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Long Beach, CA, USA, Aug 06-10, 2023. DOI GitHub

Abstract

Despite the popularity of density-based clustering, its procedural definition makes it difficult to analyze compared to clustering methods that minimize a loss function. In this paper, we reformulate DBSCAN through a clean objective function by introducing the density-connectivity distance (dc-dist), which captures the essence of density-based clusters by endowing the minimax distance with the concept of density. This novel ultrametric allows us to show that DBSCAN, k-center, and spectral clustering are equivalent in the space given by the dc-dist, despite these algorithms being perceived as fundamentally different in their respective literatures. We also verify that finding the pairwise dc-dists gives DBSCAN clusterings across all epsilon-values, simplifying the problem of parameterizing density-based clustering. We conclude by thoroughly analyzing density-connectivity and its properties – a task that has been elusive thus far in the literature due to the lack of formal tools.

MCML Authors

Anna Beer

Dr.

* Former Member

Philipp Jahn

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Christian Frey

Dr.

* Former Member

[180]

M. Caprio, Y. Sale, E. Hüllermeier and I. Lee.
A Novel Bayes' Theorem for Upper Probabilities.
Epi UAI 2023 - International Workshop on Epistemic Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Aug 04, 2023. DOI

Abstract

In their seminal 1990 paper, Wasserman and Kadane establish an upper bound for the Bayes’ posterior probability of a measurable set A, when the prior lies in a class of probability measures and the likelihood is precise. They also give a sufficient condition for such upper bound to hold with equality. In this paper, we introduce a generalization of their result by additionally addressing uncertainty related to the likelihood. We give an upper bound for the posterior probability when both the prior and the likelihood belong to a set of probabilities. Furthermore, we give a sufficient condition for this upper bound to become an equality. This result is interesting on its own, and has the potential of being applied to various fields of engineering (e.g. model predictive control), machine learning, and artificial intelligence.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[179]

S. Henzgen and E. Hüllermeier.
Weighting by Tying: A New Approach to Weighted Rank Correlation.
Preprint (Aug. 2023). arXiv

Abstract

Measures of rank correlation are commonly used in statistics to capture the degree of concordance between two orderings of the same set of items. Standard measures like Kendall’s tau and Spearman’s rho coefficient put equal emphasis on each position of a ranking. Yet, motivated by applications in which some of the positions (typically those on the top) are more important than others, a few weighted variants of these measures have been proposed. Most of these generalizations fail to meet desirable formal properties, however. Besides, they are often quite inflexible in the sense of committing to a fixed weighing scheme. In this paper, we propose a weighted rank correlation measure on the basis of fuzzy order relations. Our measure, called scaled gamma, is related to Goodman and Kruskal’s gamma rank correlation. It is parametrized by a fuzzy equivalence relation on the rank positions, which in turn is specified conveniently by a so-called scaling function. This approach combines soundness with flexibility: it has a sound formal foundation and allows for weighing rank positions in a flexible way.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[178]

Y. Sale, M. Caprio and E. Hüllermeier.
Is the Volume of a Credal Set a Good Measure for Epistemic Uncertainty?
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Abstract

Adequate uncertainty representation and quantification have become imperative in various scientific disciplines, especially in machine learning and artificial intelligence. As an alternative to representing uncertainty via one single probability measure, we consider credal sets (convex sets of probability measures). The geometric representation of credal sets as d-dimensional polytopes implies a geometric intuition about (epistemic) uncertainty. In this paper, we show that the volume of the geometric representation of a credal set is a meaningful measure of epistemic uncertainty in the case of binary classification, but less so for multi-class classification. Our theoretical findings highlight the crucial role of specifying and employing uncertainty measures in machine learning in an appropriate way, and for being aware of possible pitfalls.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[177]

Abstract

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[176]

M. K. Belaid, R. Bornemann, M. Rabus, R. Krestel and E. Hüllermeier.
Compare-xAI: Toward Unifying Functional Testing Methods for Post-hoc XAI Algorithms into a Multi-dimensional Benchmark.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. DOI GitHub

Abstract

In recent years, Explainable AI (xAI) attracted a lot of attention as various countries turned explanations into a legal right. xAI algorithms enable humans to understand the underlying models and explain their behavior, leading to insights through which the models can be analyzed and improved beyond the accuracy metric by, e.g., debugging the learned pattern and reducing unwanted biases. However, the widespread use of xAI and the rapidly growing body of published research in xAI have brought new challenges. A large number of xAI algorithms can be overwhelming and make it difficult for practitioners to choose the correct xAI algorithm for their specific use case. This problem is further exacerbated by the different approaches used to assess novel xAI algorithms, making it difficult to compare them to existing methods. To address this problem, we introduce Compare-xAI, a benchmark that allows for a direct comparison of popular xAI algorithms with a variety of different use cases. We propose a scoring protocol employing a range of functional tests from the literature, each targeting a specific end-user requirement in explaining a model. To make the benchmark results easily accessible, we group the tests into four categories (fidelity, fragility, stability, and stress tests). We present results for 13 xAI algorithms based on 11 functional tests. After analyzing the findings, we derive potential solutions for data science practitioners as workarounds to the found practical limitations. Finally, Compare-xAI is a tentative to unify systematic evaluation and comparison methods for xAI algorithms with a focus on the end-user’s requirements.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[175]

M. Muschalik, F. Fumagalli, R. Jagtani, B. Hammer and E. Hüllermeier.
iPDP: On Partial Dependence Plots in Dynamic Modeling Scenarios.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. Best Paper Award. DOI

Abstract

Post-hoc explanation techniques such as the well-established partial dependence plot (PDP), which investigates feature dependencies, are used in explainable artificial intelligence (XAI) to understand black-box machine learning models. While many real-world applications require dynamic models that constantly adapt over time and react to changes in the underlying distribution, XAI, so far, has primarily considered static learning environments, where models are trained in a batch mode and remain unchanged. We thus propose a novel model-agnostic XAI framework called incremental PDP (iPDP) that extends on the PDP to extract time-dependent feature effects in non-stationary learning environments. We formally analyze iPDP and show that it approximates a time-dependent variant of the PDP that properly reacts to real and virtual concept drift. The time-sensitivity of iPDP is controlled by a single smoothing parameter, which directly corresponds to the variance and the approximation error of iPDP in a static learning environment. We illustrate the efficacy of iPDP by showcasing an example application for drift detection and conducting multiple experiments on real-world and synthetic data sets and streams.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[174]

V. Bengs, E. Hüllermeier and W. Waegeman.
On Second-Order Scoring Rules for Epistemic Uncertainty Quantification.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i.e., the uncertainty caused by a lack of knowledge and data. An emerging branch of the literature proposes the use of a second-order learner that provides predictions in terms of distributions on probability distributions. However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. As a main mathematical tool to prove this result, we introduce the generalised notion of second-order scoring rules.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[173]

M. Biloš, K. Rasul, A. Schneider, Y. Nevmyvaka and S. Günnemann.
Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Temporal data such as time series can be viewed as discretized measurements of the underlying function. To build a generative model for such data we have to model the stochastic process that governs it. We propose a solution by defining the denoising diffusion model in the function space which also allows us to naturally handle irregularly-sampled observations. The forward process gradually adds noise to functions, preserving their continuity, while the learned reverse process removes the noise and returns functions as new samples. To this end, we define suitable noise sources and introduce novel denoising and score-matching models. We show how our method can be used for multivariate probabilistic forecasting and imputation, and how our model can be interpreted as a neural process.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[172]

T. Wollschläger, N. Gao, B. Charpentier, M. A. Ketata and S. Günnemann.
Uncertainty Estimation for Molecules: Desiderata and Methods.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Graph Neural Networks (GNNs) are promising surrogates for quantum mechanical calculations as they establish unprecedented low errors on collections of molecular dynamics (MD) trajectories. Thanks to their fast inference times they promise to accelerate computational chemistry applications. Unfortunately, despite low in-distribution (ID) errors, such GNNs might be horribly wrong for out-of-distribution (OOD) samples. Uncertainty estimation (UE) may aid in such situations by communicating the model’s certainty about its prediction. Here, we take a closer look at the problem and identify six key desiderata for UE in molecular force fields, three ’physics-informed’ and three ’application-focused’ ones. To overview the field, we survey existing methods from the field of UE and analyze how they fit to the set desiderata. By our analysis, we conclude that none of the previous works satisfies all criteria. To fill this gap, we propose Localized Neural Kernel (LNK) a Gaussian Process (GP)-based extension to existing GNNs satisfying the desiderata. In our extensive experimental evaluation, we test four different UE with three different backbones across two datasets. In out-of-equilibrium detection, we find LNK yielding up to 2.5 and 2.1 times lower errors in terms of AUC-ROC score than dropout or evidential regression-based methods while maintaining high predictive performance.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[171]

A. Giovagnoli, Y. Ma, M. Schubert and V. Tresp.
QNEAT: Natural Evolution of Variational Quantum Circuit Architecture.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

Abstract

Quantum Machine Learning (QML) is a recent and rapidly evolving field where the theoretical framework and logic of quantum mechanics is employed to solve machine learning tasks. A variety of techniques that have a different level of quantum-classical hybridization has been presented. Here we focus on variational quantum circuits (VQC), which emerged as the most promising candidates for the quantum counterpart of neural networks in the noisy intermediate-scale quantum (NISQ) era. Although showing promising results, VQCs can be hard to train because of different issues e.g. barren plateau, periodicity of the weights or choice of the architecture. In this paper we focus on this last problem and in order to address it we propose a gradient free algorithm inspired by natural evolution to optimise both the weights and the architecture of the VQC. In particular, we present a version of the well known neuroevolution of augmenting topologies (NEAT) algorithm adapted to the case of quantum variational circuits. We test the algorithm with different benchmark problems of classical fields of machine learning i.e. reinforcement learning and optimization.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[170]

M. Wever, M. Özdogan and E. Hüllermeier.
Cooperative Co-Evolution for Ensembles of Nested Dichotomies for Multi-Class Classification.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

Abstract

In multi-class classification, it can be beneficial to decompose a learning problem into several simpler problems. One such reduction technique is the use of so-called nested dichotomies, which recursively bisect the set of possible classes such that the resulting subsets can be arranged in the form of a binary tree, where each split defines a binary classification problem. Recently, a genetic algorithm for optimizing the structure of such nested dichotomies has achieved state-of-the-art results. Motivated by its success, we propose to extend this approach using a co-evolutionary scheme to optimize both the structure of nested dichotomies and their composition into ensembles through which they are evaluated. Furthermore, we present an experimental study showing this approach to yield ensembles of nested dichotomies at substantially lower cost and, in some cases, even with an improved generalization performance.

MCML Authors

Marcel Wever

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[169]

M. Fromm, M. Berrendorf, E. Faerman and T. Seidl.
Cross-Domain Argument Quality Estimation.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI GitHub

Abstract

Argumentation is one of society’s foundational pillars, and, sparked by advances in NLP, and the vast availability of text data, automated mining of arguments receives increasing attention. A decisive property of arguments is their strength or quality. While there are works on the automated estimation of argument strength, their scope is narrow:They focus on isolated datasets and neglect the interactions with related argument-mining tasks, such as argument identification and evidence detection. In this work, we close this gap by approaching argument quality estimation from multiple different angles:Grounded on rich results from thorough empirical evaluations, we assess the generalization capabilities of argument quality estimation across diverse domains and the interplay with related argument mining tasks. We find that generalization depends on a sufficient representation of different domains in the training part. In zero-shot transfer and multi-task experiments, we reveal that argument quality is among the more challenging tasks but can improve others.

MCML Authors

Michael Fromm

Dr.

* Former Member

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[168]

Z. Han, R. Liao, J. Gu, Y. Zhang, Z. Ding, Y. Gu, H. Köppl, H. Schütze and V. Tresp.
ECOLA: Enhancing Temporal Knowledge Embeddings with Contextualized Language Representations.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts. However, existing enhancement approaches cannot apply to temporal knowledge graphs (tKGs), which contain time-dependent event knowledge with complex temporal dynamics. Specifically, existing enhancement approaches often assume knowledge embedding is time-independent. In contrast, the entity embedding in tKG models usually evolves, which poses the challenge of aligning temporally relevant texts with entities. To this end, we propose to study enhancing temporal knowledge embedding with textual data in this paper. As an approach to this task, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which takes the temporal aspect into account and injects textual information into temporal knowledge embedding. To evaluate ECOLA, we introduce three new datasets for training and evaluating ECOLA. Extensive experiments show that ECOLA significantly enhances temporal KG embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Yao Zhang

Database Systems and Data Mining AI Lab

Zifeng Ding

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[167]

J. Gu, Z. Han, S. Chen, A. Beirami, B. He, G. Zhang, R. Liao, Y. Qin, V. Tresp and P. Torr.
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
Preprint (Jul. 2023). arXiv

Abstract

Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. In past years, Prompt engineering has been well-studied in natural language processing. Recently, it has also been intensively studied in vision-language modeling. However, there is currently a lack of a systematic overview of prompt engineering on pre-trained vision-language models. This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e.g. Flamingo), image-text matching models (e.g. CLIP), and text-to-image generation models (e.g. Stable Diffusion). For each type of model, a brief model summary, prompting methods, prompting-based applications, and the corresponding responsibility and integrity issues are summarized and discussed. Furthermore, the commonalities and differences between prompting on vision-language models, language models, and vision models are also discussed. The challenges, future directions, and research opportunities are summarized to foster future research on this topic.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Ruotong Liao

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[166]

C. M. M. Frey.
Learning from complex networks.
Dissertation 2023. DOI

Abstract

This thesis addresses key challenges in modern graph-based applications by proposing advanced techniques in spectral clustering, graph neural networks, and probabilistic graph structures. It introduces a robust, accelerated spectral clustering model for homogeneous graphs and a transformer-inspired Graph Shell Attention model to counter over-smoothing in graph neural networks. Furthermore, it tackles optimization in uncertain networks, presents a new approach to a vehicle routing problem with flexible delivery locations, and provides a novel method for classifying social media trends, illustrating the vital role of AI in understanding complex graph structures. (Shortened).

MCML Authors

Christian Frey

Dr.

* Former Member

[165]

T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever and E. Hüllermeier.
Towards Green Automated Machine Learning: Status Quo and Future Directions.
Journal of Artificial Intelligence Research 77 (Jun. 2023). DOI

Abstract

Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution — a machine learning pipeline — tailored to the learning task (dataset) at hand. Over the last decade, AutoML has developed into an independent research field with hundreds of contributions. At the same time, AutoML is being criticized for its high resource consumption as many approaches rely on the (costly) evaluation of many machine learning pipelines, as well as the expensive large-scale experiments across many datasets and approaches. In the spirit of recent work on Green AI, this paper proposes Green AutoML, a paradigm to make the whole AutoML process more environmentally friendly. Therefore, we first elaborate on how to quantify the environmental footprint of an AutoML tool. Afterward, different strategies on how to design and benchmark an AutoML tool w.r.t. their “greenness”, i.e., sustainability, are summarized. Finally, we elaborate on how to be transparent about the environmental footprint and what kind of research incentives could direct the community in a more sustainable AutoML research direction. As part of this, we propose a sustainability checklist to be attached to every AutoML paper featuring all core aspects of Green AutoML.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Marcel Wever

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[164]

M. Lotfollahi, A. K. Susmelj, C. De Donno, L. Hetzel, Y. Ji, I. L. Ibarra, S. R. Srivatsan, M. Naghipourfar, R. M. Daza, B. Martin, J. Shendure, J. L. McFaline‐Figueroa, P. Boyeau, F. A. Wolf, N. Yakubova, S. Günnemann, C. Trapnell, D. Lopez‐Paz and F. J. Theis.
Predicting cellular responses to complex perturbations in high‐throughput screens.
Molecular Systems Biology 19.e11517 (Jun. 2023). DOI

Abstract

Recent advances in multiplexed single‐cell transcriptomics experiments facilitate the high‐throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep‐learning approaches for single‐cell response modeling. CPA learns to in silico predict transcriptional perturbation response at the single‐cell level for unseen dosages, cell types, time points, and species. Using newly generated single‐cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture’s modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing in silico 5,329 missing combinations (97.6% of all possibilities) in a single‐cell Perturb‐seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single‐cell level and thus accelerate therapeutic applications using single‐cell technologies.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[163]

J. Sommer, L. Hetzel, D. Lüdke, F. J. Theis and S. Günnemann.
The power of motifs as inductive bias for learning molecular distributions.
Preprint (Jun. 2023). arXiv

Abstract

Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study aims to investigate the impact of subgraph structures and vocabulary design on distribution learning, using small drug molecules as a case study. To this end, we introduce Subcover, a new subgraph-based fragmentation scheme, and evaluate it through a two-step variational auto-encoder. Our results show that Subcover’s improved identification of chemically meaningful subgraphs leads to a relative improvement of the FCD score by 30%, outperforming previous methods. Our findings highlight the potential of Subcover to enhance the performance and scalability of existing methods, contributing to the advancement of drug discovery.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[162]

D. Winkel, N. Strauß, M. Schubert, Y. Ma and T. Seidl.
Constrained Portfolio Management using Action Space Decomposition for Reinforcement Learning.
PAKDD 2023 - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka, Japan, May 25-28, 2023. DOI

Abstract

Financial portfolio managers typically face multi-period optimization tasks such as short-selling or investing at least a particular portion of the portfolio in a specific industry sector. A common approach to tackle these problems is to use constrained Markov decision process (CMDP) methods, which may suffer from sample inefficiency, hyperparameter tuning, and lack of guarantees for constraint violations. In this paper, we propose Action Space Decomposition Based Optimization (ADBO) for optimizing a more straightforward surrogate task that allows actions to be mapped back to the original task. We examine our method on two real-world data portfolio construction tasks. The results show that our new approach consistently outperforms state-of-the-art benchmark approaches for general CMDPs.

MCML Authors

David Winkel

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[161]

A.-K. Wickert, C. Damke, L. Baumgärtner, E. Hüllermeier and M. Mezini.
UnGoML: Automated Classification of unsafe Usages in Go.
MSR 2023 - IEEE/ACM 20th International Conference on Mining Software Repositories. Melbourne, Australia, May 15-16, 2023. FOSS (Free, Open Source Software) Impact Paper Award. DOI GitHub

Abstract

The Go programming language offers strong protection from memory corruption. As an escape hatch of these protections, it provides the unsafe package. Previous studies identified that this unsafe package is frequently used in real-world code for several purposes, e.g., serialization or casting types. Due to the variety of these reasons, it may be possible to refactor specific usages to avoid potential vulnerabilities. However, the classification of unsafe usages is challenging and requires the context of the call and the program’s structure. In this paper, we present the first automated classifier for unsafe usages in Go, UnGoML, to identify what is done with the unsafe package and why it is used. For UnGoML, we built four custom deep learning classifiers trained on a manually labeled data set. We represent Go code as enriched control-flow graphs (CFGs) and solve the label prediction task with one single-vertex and three context-aware classifiers. All three context-aware classifiers achieve a top-1 accuracy of more than 86% for both dimensions, WHAT and WHY. Furthermore, in a set-valued conformal prediction setting, we achieve accuracies of more than 93% with mean label set sizes of 2 for both dimensions. Thus, UnGoML can be used to efficiently filter unsafe usages for use cases such as refactoring or a security audit.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Artificial Intelligence and Machine Learning

[160]

Abstract

MCML Authors

Raffaele Paolino

Mathematical Foundations of Artificial Intelligence

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

[159]

Z. Liu, Y. Ma, M. Schubert, Y. Ouyang, W. Rong and Z. Xiong.
Multimodal Contrastive Transformer for Explainable Recommendation.
IEEE Transactions on Computational Social Systems (May. 2023). DOI

Abstract

Explanations play an essential role in helping users evaluate results from recommender systems. Various natural language generation methods have been proposed to generate explanations for the recommendation. However, they usually suffer from two problems. First, since user-provided review text contains noisy data, the generated explanations may be irrelevant to the recommended items. Second, as lacking some supervision signals, most of the generated sentences are similar, which cannot meet the diversity and personalized needs of users. To tackle these problems, we propose a multimodal contrastive transformer (MMCT) model for an explainable recommendation, which incorporates multimodal information into the learning process, including sentiment features, item features, item images, and refined user reviews. Meanwhile, we propose a dynamic fusion mechanism during the decoding stage, which generates supervision signals to guide the explanation generation. Additionally, we develop a contrastive objective to generate diverse explainable texts. Comprehensive experiments on two real-world datasets show that the proposed model outperforms comparable explainable recommendation baselines in terms of explanation performance and recommendation performance. Efficiency analysis and robustness analysis verify the advantages of the proposed model. While ablation analysis establishes the relative contributions of the respective components and various modalities, the case study shows the working of our model from an intuitive sense.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[158]

L. G. M. Bauer, C. Leiber, C. Böhm and C. Plant.
Extension of the Dip-test Repertoire - Efficient and Differentiable p-value Calculation for Clustering.
SDM 2023 - SIAM International Conference on Data Mining. Minneapolis, MN, USA, Apr 27-29, 2023. DOI

Abstract

Over the last decade, the Dip-test of unimodality has gained increasing interest in the data mining community as it is a parameter-free statistical test that reliably rates the modality in one-dimensional samples. It returns a so called Dip-value and a corresponding probability for the sample’s unimodality (Dip-p-value). These two values share a sigmoidal relationship. However, the specific transformation is dependent on the sample size. Many Dip-based clustering algorithms use bootstrapped look-up tables translating Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a specifically designed sigmoid function as a substitute for these state-of-the-art look-up tables. This accelerates computation and provides an approximation of the Dip- to Dip-p-value transformation for every single sample size. Further, it is differentiable and can therefore easily be integrated in learning schemes using gradient descent. We showcase this by exploiting our function in a novel subspace clustering algorithm called Dip’n’Sub. We highlight in extensive experiments the various benefits of our proposal.

MCML Authors

Collin Leiber

Dr.

* Former Member

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[157]

D. Schubert, P. Gupta and M. Wever.
Meta-learning for Automated Selection of Anomaly Detectors for Semi-supervised Datasets.
IDA 2023 - 21st International Symposium on Intelligent Data Analysis. Louvain-la-Neuve, Belgium, Apr 12-14, 2023. DOI

Abstract

In anomaly detection, a prominent task is to induce a model to identify anomalies learned solely based on normal data. Generally, one is interested in finding an anomaly detector that correctly identifies anomalies, i.e., data points that do not belong to the normal class, without raising too many false alarms. Which anomaly detector is best suited depends on the dataset at hand and thus needs to be tailored. The quality of an anomaly detector may be assessed via confusion-based metrics such as the Matthews correlation coefficient (MCC). However, since during training only normal data is available in a semi-supervised setting, such metrics are not accessible. To facilitate automated machine learning for anomaly detectors, we propose to employ meta-learning to predict MCC scores using the metrics that can be computed with normal data only and order anomaly detectors using the predicted scores for selection. First promising results can be obtained considering the hypervolume and the false positive rate as meta-features.

MCML Authors

Marcel Wever

Dr.

* Former Member

[156]

T. Tornede, A. Tornede, L. Fehring, L. Gehring, H. Graf, J. Hanselle, F. Mohr and M. Wever.
PyExperimenter: Easily distribute experiments and track results.
The Journal of Open Source Software 8.86 (Apr. 2023). DOI

Abstract

PyExperimenter is a tool to facilitate the setup, documentation, execution, and subsequent evaluation of results from an empirical study of algorithms and in particular is designed to reduce the involved manual effort significantly. It is intended to be used by researchers in the field of artificial intelligence, but is not limited to those.
The empirical analysis of algorithms is often accompanied by the execution of algorithms for different inputs and variants of the algorithms, specified via parameters, and the measurement of non-functional properties. Since the individual evaluations are usually independent, the evaluation can be performed in a distributed manner on an HPC system. However, setting up, documenting, and evaluating the results of such a study is often file-based. Usually, this requires extensive manual work to create configuration files for the inputs or to read and aggregate measured results from a report file. In addition, monitoring and restarting individual executions is tedious and time-consuming.
PyExperimenter adresses theses challenges by means of a single well defined configuration file and a central database for managing massively parallel evaluations, as well as collecting and aggregating their results. Thereby, PyExperimenter alleviates the aforementioned overhead and allows experiment executions to be defined and monitored with ease.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Marcel Wever

Dr.

* Former Member

[155]

M. K. Belaid, D. E. Mekki, M. Rabus and E. Hüllermeier.
Optimizing Data Shapley Interaction Calculation from $O(2^n)$ to $O(t n^2)$ for KNN models.
Preprint (Apr. 2023). arXiv

Abstract

With the rapid growth of data availability and usage, quantifying the added value of each training data point has become a crucial process in the field of artificial intelligence. The Shapley values have been recognized as an effective method for data valuation, enabling efficient training set summarization, acquisition, and outlier removal. In this paper, we introduce ‘STI-KNN’, an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in $O(t n^2)$ time, which is a significant improvement over the $O(2^n)$ time complexity of baseline methods. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[154]

E. Hüllermeier.
Representation of Quantification of Uncertainty in Machine Learning.
TRR 165/181 2023 - Scale interactions, data-driven modeling, and uncertainty in weather and climate. Ingolstadt, Germany, Mar 27-30, 2023. Invited Talk. PDF

Abstract

n/a

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Artificial Intelligence and Machine Learning

[153]

Abstract

MCML Authors

Theresa Ullmann

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[152]

J. Brandt, E. Schede, B. Haddenhorst, V. Bengs, E. Hüllermeier and K. Tierney.
AC-Band: A Combinatorial Bandit-Based Approach to Algorithm Configuration.
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI

Abstract

We study the algorithm configuration (AC) problem, in which one seeks to find an optimal parameter configuration of a given target algorithm in an automated way. Although this field of research has experienced much progress recently regarding approaches satisfying strong theoretical guarantees, there is still a gap between the practical performance of these approaches and the heuristic state-of-the-art approaches. Recently, there has been significant progress in designing AC approaches that satisfy strong theoretical guarantees. However, a significant gap still remains between the practical performance of these approaches and state-of-the-art heuristic methods. To this end, we introduce AC-Band, a general approach for the AC problem based on multi-armed bandits that provides theoretical guarantees while exhibiting strong practical performance. We show that AC-Band requires significantly less computation time than other AC approaches providing theoretical guarantees while still yielding high-quality configurations.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[151]

R. Koner, T. Hannan, S. Shit, S. Sharifzadeh, M. Schubert, T. Seidl and V. Tresp.
InstanceFormer: An Online Video Instance Segmentation Framework.
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI GitHub

Abstract

Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transformer-based efficient online VIS framework named InstanceFormer, which is especially suitable for long and challenging videos. We propose three novel components to model short-term and long-term dependency and temporal coherence. First, we propagate the representation, location, and semantic information of prior instances to model short-term changes. Second, we propose a novel memory cross-attention in the decoder, which allows the network to look into earlier instances within a certain temporal window. Finally, we employ a temporal contrastive loss to impose coherence in the representation of an instance across all frames. Memory attention and temporal coherence are particularly beneficial to long-range dependency modeling, including challenging scenarios like occlusion. The proposed InstanceFormer outperforms previous online benchmark methods by a large margin across multiple datasets. Most importantly, InstanceFormer surpasses offline approaches for challenging and long datasets such as YouTube-VIS-2021 and OVIS.

MCML Authors

Rajat Koner

Database Systems and Data Mining AI Lab

Tanveer Hannan

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[150]

J. Brandt, M. Wever, D. Iliadis, V. Bengs and E. Hüllermeier.
Iterative Deepening Hyperband.
Preprint (Feb. 2023). arXiv

Abstract

Hyperparameter optimization (HPO) is concerned with the automated search for the most appropriate hyperparameter configuration (HPC) of a parameterized machine learning algorithm. A state-of-the-art HPO method is Hyperband, which, however, has its own parameters that influence its performance. One of these parameters, the maximal budget, is especially problematic: If chosen too small, the budget needs to be increased in hindsight and, as Hyperband is not incremental by design, the entire algorithm must be re-run. This is not only costly but also comes with a loss of valuable knowledge already accumulated. In this paper, we propose incremental variants of Hyperband that eliminate these drawbacks, and show that these variants satisfy theoretical guarantees qualitatively similar to those for the original Hyperband with the ‘right’ budget. Moreover, we demonstrate their practical utility in experiments with benchmark data sets.

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[149]

V. Bengs and E. Hüllermeier.
Multi-armed bandits with censored consumption of resources.
Machine Learning 112.1 (Jan. 2023). DOI

Abstract

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit. Otherwise, the observation is censored, i.e., no reward is obtained. For this problem setting, we introduce a measure of regret, which incorporates both the actual amount of consumed resources of each learning round and the optimality of realizable rewards as well as the risk of exceeding the allocated resource limit. Thus, to minimize regret, the learner needs to set a resource limit and choose an arm in such a way that the chance to realize a high reward within the predefined resource limit is high, while the resource limit itself should be kept as low as possible. We propose a UCB-inspired online learning algorithm, which we analyze theoretically in terms of its regret upper bound. In a simulation study, we show that our learning algorithm outperforms straightforward extensions of standard multi-armed bandit algorithms.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[148]

P. Gupta, J. P. Drees and E. Hüllermeier.
Automated Side-Channel Attacks using Black-Box Neural Architecture Search.
Preprint at Cryptology ePrint Archive (Jan. 2023). URL

Abstract

The usage of convolutional neural networks (CNNs) to break cryptographic systems through hardware side-channels has enabled fast and adaptable attacks on devices like smart cards and TPMs. Current literature proposes fixed CNN architectures designed by domain experts to break such systems, which is time-consuming and unsuitable for attacking a new system. Recently, an approach using neural architecture search (NAS), which is able to acquire a suitable architecture automatically, has been explored. These works use the secret key information in the attack dataset for optimization and only explore two different search strategies using one-dimensional CNNs. We propose a NAS approach that relies only on using the profiling dataset for optimization, making it fully black-box. Using a large-scale experimental parameter study, we explore which choices for NAS, such as 1-D or 2-D CNNs and search strategy, produce the best results on 10 state-of-the-art datasets for Hamming weight and identity leakage models. We show that applying the random search strategy on 1-D inputs results in a high success rate and retrieves the correct secret key using a single attack trace on two of the datasets. This combination matches the attack efficiency of fixed CNN architectures, outperforming them in 4 out of 10 datasets. Our experiments also point toward the need for repeated attack evaluations of machine learning-based solutions in order to avoid biased performance estimates.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Artificial Intelligence and Machine Learning

[147]

O. Shchur.
Modeling Continuous-time Event Data with Neural Temporal Point Processes.
Dissertation 2022. URL

Abstract

Temporal point processes (TPPs) provide a natural framework for modeling continuous-time event data such as earthquake catalogs in seismology or spike trains in neuroscience. Unlike conventional TPP models, neural TPPs are able to capture complex patterns present in real-world event data. The two main themes of this thesis are design of flexible, tractable and efficient neural TPP models, and their applications to real-world problems.

MCML Authors

Oleksandr Shchur

Dr.

* Former Member

[146]

S. Legler, T. Janjic, M. H. Shaker and E. Hüllermeier.
Machine learning for estimating parameters of a convective-scale model: A comparison of neural networks and random forests.
GMA - 32nd Workshop of Computational Intelligence of the VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik. Berlin, Germany, Dec 01-02, 2022. PDF

Abstract

Errors and inaccuracies in the representation of clouds in convection-permitting numerical weather prediction models can be caused by various sources, including the forcing and boundary conditions, the representation of orography, and the accuracy of the numerical schemes determining the evolution of humidity and temperature. Moreover, the parametrization of microphysics and the parametrization of processes in the surface and boundary layers do have a significant influence. These schemes typically contain several tunable parameters that are either non-physical or only crudely known, leading to model errors and imprecision. Furthermore, not accounting for uncertainties in these parameters might lead to overconfidence in the model during forecasting and data assimilation (DA).
Traditionally, the numerical values of model parameters are chosen by manual model tuning. More objectively, they can be estimated from observations by the so-called augmented state approach during the data assimilation [7]. Alternatively, the problem of estimating model parameters has recently been tackled by means of a hybrid approach combining DA with machine learning, more specifically a Bayesian neural network (BNN) [6]. As a proof of concept, this approach has been applied to a one-dimensional modified shallow-water (MSW) model [8].
Even though the BNN is able to accurately estimate the model parameters and their uncertainties, its high computational cost poses an obstacle to its use in operational settings where the grid sizes of the atmospheric fields are much larger than in the simple MSW model. Because random forests (RF) [2] are typically computationally cheaper while still being able to adequately represent uncertainties, we are interested in comparing RFs and BNNs. To this end, we follow [6] and again consider the problem of estimating the three model parameters of the MSW model as a function of the atmospheric state.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[145]

M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, M. Galkin, S. Sharifzadeh, A. Fischer, V. Tresp and J. Lehmann.
Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework.
IEEE Transactions on Pattern Analysis and Machine Intelligence 44.12 (Dec. 2022). DOI GitHub

Abstract

The heterogeneity in recently published knowledge graph embedding models’ implementations, training, and evaluation has made fair and thorough comparisons difficult. To assess the reproducibility of previously published results, we re-implemented and evaluated 21 models in the PyKEEN software package. In this paper, we outline which results could be reproduced with their reported hyper-parameters, which could only be reproduced with alternate hyper-parameters, and which could not be reproduced at all, as well as provide insight as to why this might be the case. We then performed a large-scale benchmarking on four datasets with several thousands of experiments and 24,804 GPU hours of computation time. We present insights gained as to best practices, best configurations for each model, and where improvements could be made over previously published best configurations. Our results highlight that the combination of model architecture, training approach, loss function, and the explicit modeling of inverse relations is crucial for a model’s performance and is not only determined by its architecture. We provide evidence that several architectures can obtain results competitive to the state of the art when configured carefully.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[144]

W. Durani, D. Mautz, C. Plant and C. Böhm.
DBHD: Density-based clustering for highly varying density.
ICDM 2022 - 22nd IEEE International Conference on Data Mining. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI

Abstract

A major challenge in cluster analysis is the discovery of clusters with widely varying sizes, densities, and shapes. Most clustering algorithms lack the ability to detect heterogeneous clusters that differ greatly in all three properties simultaneously. In this work, we propose the Density Clustering for Highly varying Density algorithm (DBHD). DBHD uses a novel approach that considers local density information and introduces two new conditions to distinguish between different types of data points. Based on this and the adaptively computed density information, DBHD can detect the clusters described above and is robust to noise. Moreover, DBHD has intuitive and robust parameters. In extensive experiments, we show that our technique is considerably more effective in detecting clusters of different shapes, sizes, and densities than well-known (DBSCAN or OPTICS) and recently proposed algorithms such as DPC, SNN-DPC, or LSDBC.

MCML Authors

Walid Durani

Database Systems and Data Mining AI Lab

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[143]

S. Gilhuber, P. Jahn, Y. Ma and T. Seidl.
VERIPS: Verified Pseudo-label Selection for Deep Active Learning.
ICDM 2022 - 22nd IEEE International Conference on Data Mining. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI GitHub

Abstract

Active learning has the power to significantly reduce the amount of labeled data needed to build strong classifiers. Existing active pseudo-labeling methods show high potential in integrating pseudo-labels within the active learning loop but heavily depend on the prediction accuracy of the model. In this work, we propose VERIPS, an algorithm that significantly outperforms existing pseudo-labeling techniques for active learning. At its core, VERIPS uses a pseudo-label verification mechanism that consists of a second network only trained on data approved by the oracle and helps to discard questionable pseudo-labels. In particular, the verifier model eliminates all pseudo-labels for which it disagrees with the actual task model. VERIPS overcomes the problems of poorly performing initial models, e.g., due to imbalanced or too small initial pools, where previous methods select too many incorrect pseudo-labels and recovering takes long or is not possible. Moreover, VERIPS is particularly insensitive to parameter choices that existing approaches suffer from.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Philipp Jahn

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[142]

N. Strauß, M. Berrendorf, T. Haider and M. Schubert.
A Comparison of Ambulance Redeployment Systems on Real-World Data.
ICDMW 2022 - IEEE International Conference on Data Mining Workshops. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI GitHub

Abstract

Modern Emergency Medical Services (EMS) benefit from real-time sensor information in various ways as they provide up-to-date location information and help assess current local emergency risks. A critical part of EMS is dynamic ambulance redeployment, i.e., the task of assigning idle ambulances to base stations throughout a community. Although there has been a considerable effort on methods to optimize emergency response systems, a comparison of proposed methods is generally difficult as reported results are mostly based on artificial and proprietary test beds. In this paper, we present a benchmark simulation environment for dynamic ambulance redeployment based on real emergency data from the city of San Francisco. Our proposed simulation environment is highly scalable and is compatible with modern reinforcement learning frameworks. We provide a comparative study of several state-of-the-art methods for various metrics. Results indicate that even simple baseline algorithms can perform considerably well in close-to-realistic settings.

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Max Berrendorf

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[141]

H. Aliee, T. Richter, M. Solonin, I. Ibarra, F. J. Theis and N. Kilbertus.
Sparsity in Continuous-Depth Neural Networks.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Neural Ordinary Differential Equations (NODEs) have proven successful in learning dynamical systems in terms of accurately recovering the observed trajectories. While different types of sparsity have been proposed to improve robustness, the generalization properties of NODEs for dynamical systems beyond the observed data are underexplored. We systematically study the influence of weight and feature sparsity on forecasting as well as on identifying the underlying dynamical laws. Besides assessing existing methods, we propose a regularization technique to sparsify input-output connections’’ and extract relevant features during training. Moreover, we curate real-world datasets including human motion capture and human hematopoiesis single-cell RNA-seq data to realistically analyze different levels of out-of-distribution (OOD) generalization in forecasting and dynamics identification respectively. Our extensive empirical evaluation on these challenging benchmarks suggests that weight sparsity improves generalization in the presence of noise or irregular sampling. However, it does not prevent learning spurious feature dependencies in the inferred dynamics, rendering them impractical for predictions under interventions, or for inferring the true underlying dynamics. Instead, feature sparsity can indeed help with recovering sparse ground-truth dynamics compared to unregularized NODEs.

MCML Authors

Till Richter

Mathematical Modelling of Biological Systems

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[140]

V. Bengs, E. Hüllermeier and W. Waegeman.
Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner’s (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[139]

J. Brandt, V. Bengs, B. Haddenhorst and E. Hüllermeier.
Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. Unlike existing works, we study this problem in a non-stochastic setting with subset-dependent feedback, i.e., the semi-bandit feedback received could be generated by an oblivious adversary and also might depend on the chosen set of arms. In addition, we consider a general feedback scenario covering both the numerical-based as well as preference-based case and introduce a sound theoretical framework for this setting guaranteeing sensible notions of optimal arms, which a learner seeks to find. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies from aggressive to conservative. Theoretical questions about the sufficient and necessary budget of the algorithm to find the best arm are answered and complemented by deriving lower bounds for any learning algorithm for this problem scenario.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[138]

L. Hetzel, S. Boehm, N. Kilbertus, S. Günnemann, M. Lotfollahi and F. J. Theis.
Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA HTS is required to enrich single-cell data meaningfully.We introduce chemCPA, a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with an architecture surgery for transfer learning and demonstrate how training on existing bulk RNA HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating drug discovery.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[137]

Y. Scholten, J. Schuchardt, S. Geisler, A. Bojchevski and S. Günnemann.
Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Randomized smoothing is one of the most promising frameworks for certifying the adversarial robustness of machine learning models, including Graph Neural Networks (GNNs). Yet, existing randomized smoothing certificates for GNNs are overly pessimistic since they treat the model as a black box, ignoring the underlying architecture. To remedy this, we propose novel gray-box certificates that exploit the message-passing principle of GNNs: We randomly intercept messages and carefully analyze the probability that messages from adversarially controlled nodes reach their target nodes. Compared to existing certificates, we certify robustness to much stronger adversaries that control entire nodes in the graph and can arbitrarily manipulate node features. Our certificates provide stronger guarantees for attacks at larger distances, as messages from farther-away nodes are more likely to get intercepted. We demonstrate the effectiveness of our method on various models and datasets. Since our gray-box certificates consider the underlying graph structure, we can significantly improve certifiable robustness by applying graph sparsification.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[136]

A. Campagner, J. Lienen, E. Hüllermeier and D. Ciucci.
Scikit-Weak: A Python Library for Weakly Supervised Machine Learning.
IJCRS 2022 - International Joint Conference on Rough Sets. Suzhou, China, Nov 11-14, 2022. DOI

Abstract

In this article we introduce and describe SCIKIT-WEAK, a Python library inspired by SCIKIT-LEARN and developed to provide an easy-to-use framework for dealing with weakly supervised and imprecise data learning problems, which, despite their importance in real-world settings, cannot be easily managed by existing libraries. We provide a rationale for the development of such a library, then we discuss its design and the currently implemented methods and classes, which encompass several state-of-the-art algorithms.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[135]

A. Frikha.
Deep knowledge transfer for generalization across tasks and domains under data scarcity: on intersections of anomaly detection, few-shot learning, continual learning, domain generalization and data-free learning.
Dissertation 2022. DOI

Abstract

This thesis addresses challenges in deep learning when key assumptions, such as abundant data or i.i.d. conditions, are violated. It introduces methods for anomaly detection with scarce data, enabling models to learn sequential tasks with minimal forgetting. For domain generalization, it proposes a feature-discovery algorithm that enhances generalization to unseen domains and a data-free approach to create robust models by synthesizing cross-domain knowledge from pre-trained models. These contributions advance deep learning for complex real-world scenarios.(Shortened).

MCML Authors

Ahmed Frikha

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

[134]

M. Bernhard and M. Schubert.
Robust Object Detection in Remote Sensing Imagery with Noisy and Sparse Geo-Annotations.
ACM SIGSPATIAL 2022 - 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. Seattle, WA, USA, Nov 01-04, 2022. DOI GitHub

Abstract

Recently, the availability of remote sensing imagery from aerial vehicles and satellites constantly improved. For an automated interpretation of such data, deep-learning-based object detectors achieve state-of-the-art performance. However, established object detectors require complete, precise, and correct bounding box annotations for training. In order to create the necessary training annotations for object detectors, imagery can be georeferenced and combined with data from other sources, such as points of interest localized by GPS sensors. Unfortunately, this combination often leads to poor object localization and missing annotations. Therefore, training object detectors with such data often results in insufficient detection performance. In this paper, we present a novel approach for training object detectors with extremely noisy and incomplete annotations. Our method is based on a teacher-student learning framework and a correction module accounting for imprecise and missing annotations. Thus, our method is easy to use and can be combined with arbitrary object detectors. We demonstrate that our approach improves standard detectors by 37.1% $AP_{50}$ on a noisy real-world remote-sensing dataset. Furthermore, our method achieves great performance gains on two datasets with synthetic noise.

MCML Authors

Maximilian Bernhard

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Spatial Artificial Intelligence

[133]

Abstract

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Shuo Chen

Database Systems and Data Mining AI Lab

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[132]

A. Lohrer, J. J. Binder and P. Kröger.
Group Anomaly Detection for Spatio-Temporal Collective Behaviour Scenarios in Smart Cities.
IWCTS @ACM SIGSPATIAL 2022 - 15th International Workshop on Computational Transportation Science at the 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2022). Seattle, WA, USA, Nov 01-04, 2022. DOI

Abstract

Group anomaly detection in terms of detecting and predicting abnormal behaviour from entities as a group rather than as an individual, addresses a variety of challenges in spatio-temporal environments like e.g. traffic and transportation systems, smart cities, geoinformation systems, etc. They provide information about a commonly large number of individual entities. Examples for such entities would be airplanes and drones, vehicles, ships but also people, remote sensors and any other information source in interaction with the environment. However, as point anomaly detection is quite common for revealing the abnormal behaviour of individual entities, the collective behaviour of the individuals as a group remains completely uncovered. For example potential for traffic flow optimizations or increased local traffic guideline violations cannot be detected by one single drive but by considering the behavior of a group of vehicle drives in this area. With this work-in-progress we elaborate the potential of group anomaly detection algorithms for spatio-temporal collective behaviour scenarios in smart cities. We describe the group anomaly detection problem in the context of urban planning and demonstrate its effectiveness on a public real-world data set for urban rental bike rides and stations in and around Munich revealing abnormal groups of rides, which allows to optimize the rental bike accessibility to the population and with that to contribute to a sustainable environment.

MCML Authors

Andreas Lohrer

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[131]

S. Shit, R. Koner, B. Wittmann, J. C. Paetzold, I. Ezhov, H. Li, J. Pan, S. Sharifzadeh, G. Kaissis, V. Tresp and B. Menze.
Relationformer: A Unified Framework for Image-to-Graph Generation.
ECCV 2022 - 17th European Conference on Computer Vision. Tel Aviv, Israel, Oct 23-27, 2022. DOI GitHub

Abstract

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach’s effectiveness and generalizability.

MCML Authors

Rajat Koner

Database Systems and Data Mining AI Lab

Georgios Kaissis

Dr.

* Former Principal Investigator

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[130]

C. Zelenka, A. Lohrer, M. Bayer and P. Kröger.
AI4EO Hyperview: A SpectralNet3D and RNNPlus Approach for Sustainable Soil Parameter Estimation on Hyperspectral Image Data.
ICIP 2022 - IEEE International Conference on Image Processing. Bordeaux, France, Oct 16-19, 2022. DOI

Abstract

The goal of the Hyperview challenge is to use Hyperspectral Imaging (HSI) to predict the soil parameters potassium (K), phosphorus pentoxide (P 2 O 5 ), magnesium (Mg) and the pH value. These are relevant parameters to determine the need of fertilization in agriculture. With this knowledge, fertilizers can be applied in a targeted way rather than in a prophylactic way which is the current procedure of choice.In this context we introduce two different approaches to solve this regression task based on 3D CNNs with Huber loss regression (SpectralNet3D) and on 1D RNNs. Both methods show distinct advantages with a peak challenge metric score of 0.808 on provided validation data.

MCML Authors

Andreas Lohrer

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[129]

E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
Journal of Artificial Intelligence Research 75 (Oct. 2022). DOI

Abstract

Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There is currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. We review existing AC literature within the lens of our taxonomies, outline relevant design choices of configuration approaches, contrast methods and problem variants against each other, and describe the state of AC in industry. Finally, our review provides researchers and practitioners with a look at future research directions in the field of AC.

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Artificial Intelligence and Machine Learning

[128]

C. M. M. Frey, Y. Ma and M. Schubert.
SEA: Graph Shell Attention in Graph Neural Networks.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

A common problem in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes’ representations of the input graph align and become indiscernible. The latest models employing attention mechanisms with Graph Transformer Layers (GTLs) are still restricted to the layer-wise computational workflow of a GNN that are not beyond preventing such effects. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes’ representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph textbf{S}htextbf{e}ll textbf{A}ttention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node’s representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results while drastically reducing the number of parameters compared to state-of-the-art models.

MCML Authors

Christian Frey

Dr.

* Former Member

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

[127]

N. Strauß, D. Winkel, M. Berrendorf and M. Schubert.
Reinforcement Learning for Multi-Agent Stochastic Resource Collection.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

Stochastic Resource Collection (SRC) describes tasks where an agent tries to collect a maximal amount of dynamic resources while navigating through a road network. An instance of SRC is the traveling officer problem (TOP), where a parking officer tries to maximize the number of fined parking violations. In contrast to vehicular routing problems, in SRC tasks, resources might appear and disappear by an unknown stochastic process, and thus, the task is inherently more dynamic. In most applications of SRC, such as TOP, covering realistic scenarios requires more than one agent. However, directly applying multi-agent approaches to SRC yields challenges considering temporal abstractions and inter-agent coordination. In this paper, we propose a novel multi-agent reinforcement learning method for the task of Multi-Agent Stochastic Resource Collection (MASRC). To this end, we formalize MASRC as a Semi-Markov Game which allows the use of temporal abstraction and asynchronous actions by various agents. In addition, we propose a novel architecture trained with independent learning, which integrates the information about collaborating agents and allows us to take advantage of temporal abstractions. Our agents are evaluated on the multiple traveling officer problem, an instance of MASRC where multiple officers try to maximize the number of fined parking violations. Our simulation environment is based on real-world sensor data. Results demonstrate that our proposed agent can beat various state-of-the-art approaches.

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

David Winkel

Database Systems and Data Mining AI Lab

Max Berrendorf

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[126]

D. Winkel, N. Strauß, M. Schubert and T. Seidl.
Risk-Aware Reinforcement Learning for Multi-Period Portfolio Selection.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

The task of portfolio management is the selection of portfolio allocations for every single time step during an investment period while adjusting the risk-return profile of the portfolio to the investor’s individual level of risk preference. In practice, it can be hard for an investor to quantify his individual risk preference. As an alternative, approximating the risk-return Pareto front allows for the comparison of different optimized portfolio allocations and hence for the selection of the most suitable risk level. Furthermore, an approximation of the Pareto front allows the analysis of the overall risk sensitivity of various investment policies. In this paper, we propose a deep reinforcement learning (RL) based approach, in which a single meta agent generates optimized portfolio allocation policies for any level of risk preference in a given interval. Our method is more efficient than previous approaches, as it only requires training of a single agent for the full approximate risk-return Pareto front. Additionally, it is more stable in training and only requires per time step market risk estimations independent of the policy. Such risk control per time step is a common regulatory requirement for e.g., insurance companies. We benchmark our meta agent against other state-of-the-art risk-aware RL methods using a realistic environment based on real-world Nasdaq-100 data. Our evaluation shows that the proposed meta agent outperforms various benchmark approaches by generating strategies with better risk-return profiles.

MCML Authors

David Winkel

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[125]

S. Gilhuber, M. Berrendorf, Y. Ma and T. Seidl.
Accelerating Diversity Sampling for Deep Active Learning By Low-Dimensional Representations.
IAL @ECML-PKDD 2022 - 6th International Workshop on Interactive Adaptive Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-23, 2022. PDF GitHub

Abstract

Selecting diverse instances for annotation is one of the key factors of successful active learning strategies. To this end, existing methods often operate on high-dimensional latent representations. In this work, we propose to use the low-dimensional vector of predicted probabilities instead, which can be seamlessly integrated into existing methods. We empirically demonstrate that this considerably decreases the query time, i.e., time to select an instance for annotation, while at the same time improving results. Low query times are relevant for active learning researchers, which use a (fast) oracle for simulated annotation and thus are often constrained by query time. It is also practically relevant when dealing with complex annotation tasks for which only a small pool of skilled domain experts is available for annotation with a limited time budget.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Max Berrendorf

Dr.

* Former Member

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[124]

E. Hohma, C. M. M. Frey, A. Beer and T. Seidl.
SCAR - Spectral Clustering Accelerated and Robustified.
VLDB 2022 - 48th International Conference on Very Large Databases. Sydney, Australia (and hybrid), Sep 05-09, 2022. DOI GitHub

Abstract

Spectral clustering is one of the most advantageous clustering approaches. However, standard Spectral Clustering is sensitive to noisy input data and has a high runtime complexity. Tackling one of these problems often exacerbates the other. As real-world datasets are often large and compromised by noise, we need to improve both robustness and runtime at once. Thus, we propose Spectral Clustering - Accelerated and Robust (SCAR), an accelerated, robustified spectral clustering method. In an iterative approach, we achieve robustness by separating the data into two latent components: cleansed and noisy data. We accelerate the eigendecomposition - the most time-consuming step - based on the Nyström method. We compare SCAR to related recent state-of-the-art algorithms in extensive experiments. SCAR surpasses its competitors in terms of speed and clustering quality on highly noisy data.

MCML Authors

Christian Frey

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[123]

C. Leiber, L. G. M. Bauer, M. Neumayr, C. Plant and C. Böhm.
The DipEncoder: Enforcing Multimodality in Autoencoders.
KDD 2022 - 28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA, Aug 14-18, 2022. DOI

Abstract

Hartigan’s Dip-test of unimodality gained increasing interest in unsupervised learning over the past few years. It is free from complex parameterization and does not require a distribution assumed a priori. A useful property is that the resulting Dip-values can be derived to find a projection axis that identifies multimodal structures in the data set. In this paper, we show how to apply the gradient not only with respect to the projection axis but also with respect to the data to improve the cluster structure. By tightly coupling the Dip-test with an autoencoder, we obtain an embedding that clearly separates all clusters in the data set. This method, called DipEncoder, is the basis of a novel deep clustering algorithm. Extensive experiments show that the DipEncoder is highly competitive to state-of-the-art methods.

MCML Authors

Collin Leiber

Dr.

* Former Member

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[122]

Z. Ding, Z. Li, R. Qi, J. Wu, B. He, Y. Ma, Z. Meng, S. Chen, R. Liao, Z. Han and V. Tresp.
Forecasting Question Answering over Temporal Knowledge Graphs.
Preprint (Aug. 2022). arXiv

Abstract

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. TKGQA requires temporal reasoning techniques to extract the relevant information from temporal knowledge bases. The only existing TKGQA dataset, i.e., CronQuestions, consists of temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning the same period can be fully used for answer inference, allowing the TKGQA models to use even the future knowledge to answer the questions based on the past facts. In real-world scenarios, however, it is also common that given the knowledge until now, we wish the TKGQA systems to answer the questions asking about the future. As humans constantly seek plans for the future, building TKGQA systems for answering such forecasting questions is important. Nevertheless, this has still been unexplored in previous research. In this paper, we propose a novel task: forecasting question answering over temporal knowledge graphs. We also propose a large-scale TKGQA benchmark dataset, i.e., ForecastTKGQuestions, for this task. It includes three types of questions, i.e., entity prediction, yes-no, and fact reasoning questions. For every forecasting question in our dataset, QA models can only have access to the TKG information before the timestamp annotated in the given question for answer inference. We find that the state-of-the-art TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-no questions and fact reasoning questions. To this end, we propose ForecastTKGQA, a TKGQA model that employs a TKG forecasting module for future inference, to answer all three types of questions. Experimental results show that ForecastTKGQA outperforms recent TKGQA methods on the entity prediction questions, and it also shows great effectiveness in answering the other two types of questions.

MCML Authors

Zifeng Ding

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Shuo Chen

Database Systems and Data Mining AI Lab

Ruotong Liao

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[121]

M. Fromm.
Machine learning driven argument mining.
Dissertation 2022. DOI

Abstract

This thesis addresses the challenges of argumentation in the digital age by applying machine learning methods to automatically identify, retrieve, and evaluate arguments from diverse and often contradictory online sources. The first focus is on argument identification, specifically in heterogeneous text sources and peer reviews, where the relationship between the topic and arguments is crucial, and knowledge transfer across domains is limited. The second focus is on argument retrieval, where machine learning is used to select relevant documents, ensuring comprehensive and non-redundant argument coverage. Finally, the thesis explores the strength or quality of arguments, integrating this concept with other argument mining tasks and evaluating its impact across different text domains and contexts. (Shortened.)

MCML Authors

Michael Fromm

Dr.

* Former Member

[120]

E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
IJCAI-ECAI 2022 - 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence. Vienna, Austria, Jul 23-29, 2022. Extended Abstract. DOI

Abstract

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[119]

M. Ali, M. Berrendorf, M. Galkin, V. Thost, T. Ma, V. Tresp and J. Lehmann.
Improving Inductive Link Prediction Using Hyper-Relational Facts (Extended Abstract).
IJCAI-ECAI 2022 - Best paper track at the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence. Vienna, Austria, Jul 23-29, 2022. DOI

Abstract

For many years, link prediction on knowledge graphs (KGs) has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based KGs, whereas their richer counterparts, hyper-relational KGs (e.g., Wikidata), have not yet been properly studied. In this work, we classify different inductive settings and study the benefits of employing hyper-relational KGs on a wide range of semi- and fully inductive link prediction tasks powered by recent advancements in graph neural networks. Our experiments on a novel set of benchmarks show that qualifiers over typed edges can lead to performance improvements of 6% of absolute gains (for the Hits@10 metric) compared to triple-only baselines.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[118]

L. Hetzel, S. Boehm, N. Kilbertus, S. Günnemann, M. Lotfollahi and F. J. Theis.
Predicting single-cell perturbation responses for unseen drugs.
MLDD @ICML 2022 - Workshop on Machine Learning for Drug Discovery at the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 17-23, 2022. URL

Abstract

Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA-seq HTS is required to enrich single-cell data meaningfully. We introduce a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with a transfer learning scheme and demonstrate how training on existing bulk RNA-seq HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating targeted drug discovery.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[117]

H. Li, Q. Khan, V. Tresp and D. Cremers.
Biologically Inspired Neural Path Finding.
BI 2022 - 15th International Conference on Brain Informatics. Padova, Italy, Jul 15-15, 2022. DOI GitHub

Abstract

The human brain can be considered to be a graphical structure comprising of tens of billions of biological neurons connected by synapses. It has the remarkable ability to automatically re-route information flow through alternate paths, in case some neurons are damaged. Moreover, the brain is capable of retaining information and applying it to similar but completely unseen scenarios. In this paper, we take inspiration from these attributes of the brain to develop a computational framework to find the optimal low cost path between a source node and a destination node in a generalized graph. We show that our framework is capable of handling unseen graphs at test time. Moreover, it can find alternate optimal paths, when nodes are arbitrarily added or removed during inference, while maintaining a fixed prediction time.

MCML Authors

Hang Li

* Former Member

Qadeer Khan

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[116]

Z. Liu, Y. Ma, M. Hildebrandt, Y. Ouyang and Z. Xiong.
CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations.
Knowledge and Information Systems 64 (Jul. 2022). DOI

Abstract

Sequential recommendations play a crucial role in many real-world applications. Due to the sequential nature, reinforcement learning has been employed to iteratively produce recommendations based on an observed stream of user behavior. In this setting, a recommendation agent interacts with the environments (users) by sequentially recommending items (actions) to maximize users’ overall long-term cumulative rewards. However, most reinforcement learning-based recommendation models only focus on extrinsic rewards based on user feedback, leading to sub-optimal policies if user-item interactions are sparse and fail to obtain the dynamic rewards based on the users’ preferences. As a remedy, we propose a dynamic intrinsic reward signal integrated with a contrastive discriminator-augmented reinforcement learning framework. Concretely, our framework contains two modules: (1) a contrastive learning module is employed to learn the representation of item sequences; (2) an intrinsic reward learning function to imitate the user’s internal dynamics. Furthermore, we combine static extrinsic reward and dynamic intrinsic reward to train a sequential recommender system based on double Q-learning. We integrate our framework with five representative sequential recommendation models. Specifically, our framework augments these recommendation models with two output layers: the supervised layer that applies cross-entropy loss to perform ranking and the other for reinforcement learning. Experimental results on two real-world datasets demonstrate that the proposed framework outperforms several sequential recommendation baselines and exploration with intrinsic reward baselines.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[115]

Z. Liu, Y. Ma, M. Schubert, Y. Ouyang and Z. Xiong.
Multi-Modal Contrastive Pre-training for Recommendation.
ICMR 2022 - ACM International Conference on Multimedia Retrieval. Newark, NJ, USA, Jun 27-30, 2022. DOI

Abstract

Personalized recommendation plays a central role in various online applications. To provide quality recommendation service, it is of crucial importance to consider multi-modal information associated with users and items, e.g., review text, description text, and images. However, many existing approaches do not fully explore and fuse multiple modalities. To address this problem, we propose a multi-modal contrastive pre-training model for recommendation. We first construct a homogeneous item graph and a user graph based on the relationship of co-interaction. For users, we propose intra-modal aggregation and inter-modal aggregation to fuse review texts and the structural information of the user graph. For items, we consider three modalities: description text, images, and item graph. Moreover, the description text and image complement each other for the same item. One of them can be used as promising supervision for the other. Therefore, to capture this signal and better exploit the potential correlation of intra-modalities, we propose a self-supervised contrastive inter-modal alignment task to make the textual and visual modalities as similar as possible. Then, we apply inter-modal aggregation to obtain the multi-modal representation of items. Next, we employ a binary cross-entropy loss function to capture the potential correlation between users and items. Finally, we fine-tune the pre-trained multi-modal representations using an existing recommendation model. We have performed extensive experiments on three real-world datasets. Experimental results verify the rationality and effectiveness of the proposed method.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[114]

G. Fu, Z. Meng, Z. Han, Z. Ding, Y. Ma, M. Schubert, V. Tresp and R. Wattenhofer.
TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion.
SPNLP @ACL 2022 - 6th ACL Workshop on Structured Prediction for NLP at the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). Dublin, Ireland, May 22-27, 2022. DOI

Abstract

Temporal knowledge graphs store the dynamics of entities and relations during a time period. However, typical temporal knowledge graphs often suffer from incomplete dynamics with missing facts in real-world scenarios. Hence, modeling temporal knowledge graphs to complete the missing facts is important. In this paper, we tackle the temporal knowledge graph completion task by proposing TempCaps, which is a Capsule network-based embedding model for Temporal knowledge graph completion. TempCaps models temporal knowledge graphs by introducing a novel dynamic routing aggregator inspired by Capsule Networks. Specifically, TempCaps builds entity embeddings by dynamically routing retrieved temporal relation and neighbor information. Experimental results demonstrate that TempCaps reaches state-of-the-art performance for temporal knowledge graph completion. Additional analysis also shows that TempCaps is efficient.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Volker Tresp

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Database Systems and Data Mining AI Lab

[113]

D. Zügner.
Adversarial Robustness of Graph Neural Networks.
Dissertation 2022. URL

Abstract

In this thesis we look at graph neural networks (GNNs) from a perspective of adversarial robustness. We generalize the notion of adversarial attacks – small perturbations to the input data deliberately crafted to mislead a machine learning model – from traditional vector data such as images to graphs. We further propose robustness certification procedures for perturbations of the node attributes as well as the graph structure.

MCML Authors

Daniel Zügner

Dr.

* Former Member

[112]

C. Leiber, D. Mautz, C. Plant and C. Böhm.
Automatic Parameter Selection for Non-Redundant Clustering.
SDM 2022 - SIAM International Conference on Data Mining. Virtual, Apr 28-30, 2022. DOI

Abstract

High-dimensional datasets often contain multiple meaningful clusterings in different subspaces. For example, objects can be clustered either by color, weight, or size, revealing different interpretations of the given dataset. A variety of approaches are able to identify such non-redundant clusterings. However, most of these methods require the user to specify the expected number of subspaces and clusters for each subspace. Stating these values is a non-trivial problem and usually requires detailed knowledge of the input dataset. In this paper, we propose a framework that utilizes the Minimum Description Length Principle (MDL) to detect the number of subspaces and clusters per subspace automatically. We describe an efficient procedure that greedily searches the parameter space by splitting and merging subspaces and clusters within subspaces. Additionally, an encoding strategy is introduced that allows us to detect outliers in each subspace. Extensive experiments show that our approach is highly competitive to state-of-the-art methods.

MCML Authors

Collin Leiber

Dr.

* Former Member

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[111]

D. Alivanistos, M. Berrendorf, M. Cochez and M. Galkin.
Query Embedding on Hyper-Relational Knowledge Graphs.
ICLR 2022 - 10th International Conference on Learning Representations. Virtual, Apr 25-29, 2022. URL GitHub

Abstract

Multi-hop logical reasoning is an established problem in the field of representation learning on knowledge graphs (KGs). It subsumes both one-hop link prediction as well as other more complex types of logical queries. Existing algorithms operate only on classical, triple-based graphs, whereas modern KGs often employ a hyper-relational modeling paradigm. In this paradigm, typed edges may have several key-value pairs known as qualifiers that provide fine-grained context for facts. In queries, this context modifies the meaning of relations, and usually reduces the answer set. Hyper-relational queries are often observed in real-world KG applications, and existing approaches for approximate query answering cannot make use of qualifier pairs. In this work, we bridge this gap and extend the multi-hop reasoning problem to hyper-relational KGs allowing to tackle this new type of complex queries. Building upon recent advancements in Graph Neural Networks and query embedding techniques, we study how to embed and answer hyper-relational conjunctive queries. Besides that, we propose a method to answer such queries and demonstrate in our experiments that qualifiers improve query answering on a diverse set of query patterns.

MCML Authors

Max Berrendorf

Dr.

* Former Member

[110]

M. Galkin, M. Berrendorf and C. T. Hoyt.
An Open Challenge for Inductive Link Prediction on Knowledge Graphs.
GLB @WWW 2022 - Workshop on Graph Learning Benchmarks at the International World Wide Web Conference (WWW 2022). Virtual, Apr 22-29, 2022. arXiv GitHub

Abstract

An emerging trend in representation learning over knowledge graphs (KGs) moves beyond transductive link prediction tasks over a fixed set of known entities in favor of inductive tasks that imply training on one graph and performing inference over a new graph with unseen entities. In inductive setups, node features are often not available and training shallow entity embedding matrices is meaningless as they cannot be used at inference time with unseen entities. Despite the growing interest, there are not enough benchmarks for evaluating inductive representation learning methods. In this work, we introduce ILPC 2022, a novel open challenge on KG inductive link prediction. To this end, we constructed two new datasets based on Wikidata with various sizes of training and inference graphs that are much larger than existing inductive benchmarks. We also provide two strong baselines leveraging recently proposed inductive methods. We hope this challenge helps to streamline community efforts in the inductive graph representation learning area. ILPC 2022 follows best practices on evaluation fairness and reproducibility.

MCML Authors

Max Berrendorf

Dr.

* Former Member

[109]

C. T. Hoyt, M. Berrendorf, M. Gaklin, V. Tresp and B. M. Gyori.
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs.
GLB @WWW 2022 - Workshop on Graph Learning Benchmarks at the International World Wide Web Conference (WWW 2022). Virtual, Apr 22-29, 2022. arXiv

Abstract

The link prediction task on knowledge graphs without explicit negative triples in the training data motivates the usage of rank-based metrics. Here, we review existing rank-based metrics and propose desiderata for improved metrics to address lack of interpretability and comparability of existing metrics to datasets of different sizes and properties. We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory. We finally propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[108]

D. Kazempour.
Advances in correlation clustering.
Dissertation 2022. DOI

Abstract

This thesis addresses key challenges in correlation clustering, particularly in high-dimensional datasets, by developing novel methods to evaluate and improve clustering algorithms. The first contribution focuses on defining and deriving internal evaluation criteria for correlation clustering, proposing a new cost function to assess cluster quality based on commonalities among existing algorithms. The second part introduces two innovative strategies for detecting regions of interest (ROIs) in Hough space, improving the robustness of the Hough transform algorithm, and extending it to handle quadratic and periodic correlated clusters. Finally, the thesis explores unifying local and global correlation clustering views and enhancing the resilience of these methods to outliers. (Shortened.)

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

[107]

Y. Liu, Y. Ma, M. Hildebrandt, M. Joblin and V. Tresp.
TLogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs.
AAAI 2022 - 36th Conference on Artificial Intelligence. Virtual, Feb 22-Mar 01, 2022. DOI

Abstract

Conventional static knowledge graphs model entities in relational data as nodes, connected by edges of specific relation types. However, information and knowledge evolve continuously, and temporal dynamics emerge, which are expected to influence future situations. In temporal knowledge graphs, time information is integrated into the graph by equipping each edge with a timestamp or a time range. Embedding-based methods have been introduced for link prediction on temporal knowledge graphs, but they mostly lack explainability and comprehensible reasoning chains. Particularly, they are usually not designed to deal with link forecasting – event prediction involving future timestamps. We address the task of link forecasting on temporal knowledge graphs and introduce TLogic, an explainable framework that is based on temporal logical rules extracted via temporal random walks. We compare TLogic with state-of-the-art baselines on three benchmark datasets and show better overall performance while our method also provides explanations that preserve time consistency. Furthermore, in contrast to most state-of-the-art embedding-based methods, TLogic works well in the inductive setting where already learned rules are transferred to related datasets with a common vocabulary.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[106]

S. Sharifzadeh, S. M. Baharlou, M. Schmitt, H. Schütze and V. Tresp.
Improving Scene Graph Classification by Exploiting Knowledge from Texts.
AAAI 2022 - 36th Conference on Artificial Intelligence. Virtual, Feb 22-Mar 01, 2022. DOI

Abstract

Training scene graph classification models requires a large amount of annotated image data. Meanwhile, scene graphs represent relational knowledge that can be modeled with symbolic data from texts or knowledge graphs. While image annotation demands extensive labor, collecting textual descriptions of natural scenes requires less effort. In this work, we investigate whether textual scene descriptions can substitute for annotated image data. To this end, we employ a scene graph classification framework that is trained not only from annotated images but also from symbolic data. In our architecture, the symbolic entities are first mapped to their correspondent image-grounded representations and then fed into the relational reasoning pipeline. Even though a structured form of knowledge, such as the form in knowledge graphs, is not always available, we can generate it from unstructured texts using a transformer-based language model. We show that by fine-tuning the classification pipeline with the extracted knowledge from texts, we can achieve ~8x more accurate results in scene graph classification, ~3x in object classification, and ~1.5x in predicate classification, compared to the supervised baselines with only 1% of the annotated images.

MCML Authors

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[105]

M. Berrendorf.
Machine learning for managing structured and semi-structured data.
Dissertation 2022. DOI

Abstract

As data availability grows across sectors, machine learning, especially graph neural networks, plays a crucial role in extracting insights by automating complex analysis, including relational learning. Knowledge graphs help store entity facts, though they often require automated methods like Link Prediction and Entity Alignment to fill in missing information due to the sheer volume. This thesis advances knowledge graph completion by improving Entity Alignment through active learning, refining Link Prediction with metadata, and introducing a new evaluation metric, as well as a software library to aid researchers. (Shortened).

MCML Authors

Max Berrendorf

Dr.

* Former Member

[104]

V.-L. Nguyen, M. H. Shaker and E. Hüllermeier.
How to measure uncertainty in uncertainty sampling for active learning.
Machine Learning 111.1 (2022). DOI

Abstract

Various strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[103]

L. Qian, C. Plant and C. Böhm.
Density-based Clustering for Adaptive Density Variation.
ICDM 2021 - 21st IEEE International Conference on Data Mining. Auckland, New Zealand, Dec 07-10, 2021. DOI

Abstract

Cluster analysis plays a crucial role in data mining and knowledge discovery. Although many researchers have investigated clustering algorithms over the past few decades, most of the well-known algorithms have shortcomings when dealing with clusters of arbitrary shapes and varying sizes and in the presence of noise and outliers. Density-based methods partially solve these issues but fail to discover clusters with varying densities. In this paper, we propose a novel Density-Based clustering algorithm for Adaptive Density Variation (DBADV), which is based on the classic clustering algorithm DBSCAN. To address the problem of density variation, we define the local density information, which not only reflects the individual property of each object but also describes the density distribution of clusters, and finds the adaptive search range of each object by collecting information from its neighbors. Moreover, we design a new metric to obtain the mutual nearest neighbors of each object to better detect the objects around the boundaries between clusters. We show the effectiveness of our method in extensive experiments on synthetic and realworld data sets, which demonstrate that the performance of the proposed algorithm DBADV is superior to other competitive clustering algorithms.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[102]

A. Beer, L. Stephan and T. Seidl.
LUCKe- Connecting Clustering and Correlation Clustering.
ICDMW 2021 - IEEE International Conference on Data Mining Workshops. Auckland, New Zealand, Dec 07-10, 2021. DOI

Abstract

LUCKe allows any purely distance-based ‘classic’ clustering algorithm to reliably find linear correlation clusters. An elaborated distance matrix based on the points’ local PCA extracts all necessary information from high dimensional data to declare points of the same arbitrary dimensional linear correlation cluster as ‘similar’. For that, the points’ eigensystems as well as only the relevant information about their position in space, are put together. LUCKe allows transferring known benefits from the large field of basic clustering to correlation clustering. Its applicability is shown in extensive experiments with simple representatives of diverse basic clustering approaches.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[101]

J. Busch, M. Hünemörder, J. Held, P. Kröger and T. Seidl.
Implicit Hough Transform Neural Networks for Subspace Clustering.
ICDMW 2021 - IEEE International Conference on Data Mining Workshops. Auckland, New Zealand, Dec 07-10, 2021. DOI

Abstract

Subspace clustering constitutes a fundamental task in data mining and unsupervised machine learning with myriad applications. We present a novel approach to subspace clustering that detects affine hyperplanes in a given arbitrary-dimensional dataset by explicitly parametrizing them and optimizing their parameters using gradient updates w.r.t. a differentiable loss function. The explicit parametrization allows our model to avoid the exponential search space incurred by models relying on an explicit Hough transform to detect subspaces by searching for high-density points in parameter space. Compared to other existing approaches, our method is highly scalable, can be trained very efficiently on a GPU, is applicable to out-of-sample data, and is amenable to anytime scenarios since training can be stopped at any time and convergence is usually fast. The model can further be viewed as a linear neural network layer and trained end-to-end with an autoencoder to detect arbitrary non-linear correlations. We provide empirical results on a wide array of synthetic datasets with different characteristics following a rigorous evaluation protocol. Our results demonstrate the advantageous properties of our model and additionally reveal that it is particularly robust to jitter and noise present in the data.

MCML Authors

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[100]

A. Lohrer, J. Deller, M. Hünemörder and P. Kröger.
OAB - An Open Anomaly Benchmark Framework for Unsupervised and Semisupervised Anomaly Detection on Image and Tabular Data Sets.
ICDMW 2021 - IEEE International Conference on Data Mining Workshops. Auckland, New Zealand, Dec 07-10, 2021. DOI

Abstract

We introduce OAB, an Open Anomaly Benchmark Framework for unsupervised and semisupervised anomaly detection on image and tabular data sets, ensuring simple reproducibility for existing benchmark results as well as a reliable comparability and low-effort extensibility when new anomaly detection algorithms or new data sets are added. While making established methods of the most popular benchmarks easily accessible, OAB generalizes the task of un- and semisupervised anomaly benchmarking and offers besides commonly used benchmark data sets also semantically meaningful real-world anomaly data sets as well as a broad range of traditional and state-of-the-art anomaly detection algorithms. The benefit of OAB for the research community has been demonstrated by reproducing and extending existing benchmarks to new algorithms with very low effort allowing researchers to focus on the actual algorithm research.

MCML Authors

Andreas Lohrer

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[99]

L. Hetzel, D. S. Fischer, S. Günnemann and F. J. Theis.
Graph representation learning for single-cell biology.
Current Opinion in Systems Biology 28.100347 (Dec. 2021). DOI

Abstract

Single-cell RNA sequencing measures gene expression at an unprecedented resolution and scale and allows the analysis of cellular phenotypes which was not possible before. In this context, graphs occur as a natural representation of the system —both as gene-centric and cell-centric. However, many advances in machine learning on graphs are not yet harnessed in models on single-cell data. Taking the inference of cell types or gene interactions as examples, graph representation learning has a wide applicability to both cell and gene graphs. Recent advances in spatial molecular profiling additionally put graph learning in the focus of attention because of the innate resemblance of spatial information to spatial graphs. We argue that graph embedding techniques have great potential for various applications across single-cell biology. Here, we discuss how graph representation learning maps to current models and concepts used in single-cell biology and formalise overlaps to developments in graph-based deep learning.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Mathematical Modelling of Biological Systems

[98]

M. Bernhard and M. Schubert.
Correcting Imprecise Object Locations for Training Object Detectors in Remote Sensing Applications.
Remote Sensing 13 (Dec. 2021). URL

Abstract

Object detection on aerial and satellite imagery is an important tool for image analysis in remote sensing and has many areas of application. As modern object detectors require accurate annotations for training, manual and labor-intensive labeling is necessary. In situations where GPS coordinates for the objects of interest are already available, there is potential to avoid the cumbersome annotation process. Unfortunately, GPS coordinates are often not well-aligned with georectified imagery. These spatial errors can be seen as noise regarding the object locations, which may critically harm the training of object detectors and, ultimately, limit their practical applicability. To overcome this issue, we propose a co-correction technique that allows us to robustly train a neural network with noisy object locations and to transform them toward the true locations. When applied as a preprocessing step on noisy annotations, our method greatly improves the performance of existing object detectors. Our method is applicable in scenarios where the images are only annotated with points roughly indicating object locations, instead of entire bounding boxes providing precise information on the object locations and extents. We test our method on three datasets and achieve a substantial improvement (e.g., 29.6% mAP on the COWC dataset) over existing methods for noise-robust object detection.

MCML Authors

Maximilian Bernhard

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[97]

A. Beer.
On the edges of clustering: creating synergies with related problems.
Dissertation 2021. DOI

Abstract

This thesis explores the connections between clustering and related tasks like subspace clustering, correlation clustering, outlier detection, and data ordering. It introduces novel methods such as the KISS score for subspace clustering, LUCK for correlation clustering, and the ABC algorithm for outlier detection. Additionally, it develops the Circle Index for optimizing data ordering to improve clustering performance. (Shortened.)

MCML Authors

Anna Beer

Dr.

* Former Member

[96]

N. Kees, M. Fromm, E. Faerman and T. Seidl.
Active Learning for Argument Strength Estimation.
Insights @EMNLP 2021 - 2nd Workshop on Insights from Negative Results at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). Punta Cana, Dominican Republic, Nov 07-11, 2021. DOI

Abstract

High-quality arguments are an essential part of decision-making. Automatically predicting the quality of an argument is a complex task that recently got much attention in argument mining. However, the annotation effort for this task is exceptionally high. Therefore, we test uncertainty-based active learning (AL) methods on two popular argument-strength data sets to estimate whether sample-efficient learning can be enabled. Our extensive empirical evaluation shows that uncertainty-based acquisition functions can not surpass the accuracy reached with the random acquisition on these data sets.

MCML Authors

Michael Fromm

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[95]

M. Ali, M. Berrendorf, M. Galkin, V. Thost, T. Ma, V. Tresp and J. Lehmann.
Improving Inductive Link Prediction Using Hyper-Relational Facts.
ISWC 2021 - 20th International Semantic Web Conference. Virtual, Oct 24-28, 2021. Best Paper Award. DOI GitHub

Abstract

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[94]

D. Kazempour, A. Beer, M. Oelker, P. Kröger and T. Seidl.
Compound Segmentation via Clustering on Mol2Vec-based Embeddings.
eScience 2021 - 17th IEEE eScience Conference. Virtual, Sep 20-23, 2021. DOI

Abstract

During different steps in the process of discovering drug candidates for diseases, it can be supportive to identify groups of molecules that share similar properties, i.e. common overall structural similarity. The existing methods for computing (dis)similarities between chemical structures rely on a priori domain knowledge. Here we investigate the clustering of compounds that are applied on embeddings generated from a recently published Mol2Vec technique which enables an entirely unsupervised vector representation of compounds. A research question we address in this work is: do existent well-known clustering algorithms such as k-means or hierarchical clustering methods yield meaningful clusters on the Mol2Vec embeddings? Further, we investigate how far subspace clustering can be utilized to compress the data by reducing the dimensionality of the compounds vector representation. Our first conducted experiments on a set of COVID-19 drug candidates reveal that well-established methods yield meaningful clusters. Preliminary results from subspace clusterings indicate that a compression of the vector representations seems viable.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[93]

S. Obermeier, A. Beer, F. Wahl and T. Seidl.
Cluster Flow — an Advanced Concept for Ensemble-Enabling, Interactive Clustering.
BTW 2021 - 19th Symposium of Database Systems for Business, Technology and Web. Dresden, Germany, Sep 13-17, 2021. DOI

Abstract

Even though most clustering algorithms serve knowledge discovery in fields other than computer science, most of them still require users to be familiar with programming or data mining to some extent. As that often prevents efficient research, we developed an easy to use, highly explainable clustering method accompanied by an interactive tool for clustering. It is based on intuitively understandable kNN graphs and the subsequent application of adaptable filters, which can be combined ensemble-like and iteratively and prune unnecessary or misleading edges. For a first overview of the data, fully automatic predefined filter cascades deliver robust results. A selection of simple filters and combination methods that can be chosen interactively yield very good results on benchmark datasets compared to various algorithms.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[92]

D. Kazempour, J. Winter, P. Kröger and T. Seidl.
On Methods and Measures for the Inspection of Arbitrarily Oriented Subspace Clusters.
Datenbank-Spektrum 21 (Sep. 2021). DOI

Abstract

When using arbitrarily oriented subspace clustering algorithms one obtains a partitioning of a given data set and for each partition its individual subspace. Since clustering is an unsupervised machine learning task, we may not have “ground truth” labels at our disposal or do not wish to rely on them. What is needed in such cases are internal measure which permits a label-less analysis of the obtained subspace clustering. In this work, we propose methods for revising clusters obtained from arbitrarily oriented correlation clustering algorithms. Initial experiments conducted reveal improvements in the clustering results compared to the original clustering outcome. Our proposed approach is simple and can be applied as a post-processing step on arbitrarily oriented correlation clusterings.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[91]

T. Seidl, M. Fromm and S. Obermeier.
Proceedings of the LWDA 2021 Workshops: FGWM, KDML, FGWI-BIA, and FGIR.
LWDA 2021 - Lernen, Wissen, Daten, Analysen 2021 (Sep. 2021). URL

Abstract

LWDA 2021 is a joint conference of six special interest groups of the German Computer Science Society (GI), addressing research in the areas of knowledge discovery and machine learning, information retrieval, database systems, and knowledge management. The German acronym LWDA stands for ‘Lernen, Wissen, Daten, Analysen’ (Learning, Knowledge, Data, Analytics). Following the tradition of the last years, LWDA 2021 provides a joint forum for experienced and young researchers, to bring insights into recent trends, technologies, and applications and to promote interaction among the special interest groups.

MCML Authors

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Michael Fromm

Dr.

* Former Member

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

[90]

A. Lohrer, A. Beer, M. Hünemörder, J. Lauterbach, T. Seidl and P. Kröger.
AnyCORE - An Anytime Algorithm for Cluster Outlier REmoval.
LWDA 2021 - Conference on Lernen. Wissen. Daten. Analysen. München, Germany, Sep 01-03, 2021. PDF

Abstract

We introduce AnyCORE (Anytime Cluster Outlier REmoval), an algorithm that enables users to detect and remove outliers at anytime. The algorithm is based on the idea of MORe++, an approach for outlier detection and removal that iteratively scores and removes 1d-cluster-outliers in n-dimensional data sets. In contrast to MORe++, AnyCORE provides continuous responses for its users and converges independent of cluster centers. This allows AnyCORE to perform outlier detection in combination with an arbitrary clustering method that is most suitable for a given data set. We conducted our AnyCORE experiments on synthetic and real-world data sets by benchmarking its variant with k-Means as the underlying clustering method versus the traditional batch algorithm version of MORe++. In extensive experiments we show that AnyCORE is able to compete with the related batch algorithm version.

MCML Authors

Andreas Lohrer

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[89]

L. Miklautz, L. G. M. Bauer, D. Mautz, S. Tschiatschek, C. Böhm and C. Plant.
Details (Don't) Matter: Isolating Cluster Information in Deep Embedded Spaces.
IJCAI 2021 - 30th International Joint Conference on Artificial Intelligence). Montreal, Canada, Aug 19-26, 2021. DOI

Abstract

Deep clustering techniques combine representation learning with clustering objectives to improve their performance. Among existing deep clustering techniques, autoencoder-based methods are the most prevalent ones. While they achieve promising clustering results, they suffer from an inherent conflict between preserving details, as expressed by the reconstruction loss, and finding similar groups by ignoring details, as expressed by the clustering loss. This conflict leads to brittle training procedures, dependence on trade-off hyperparameters and less interpretable results. We propose our framework, ACe/DeC, that is compatible with Autoencoder Centroid based Deep Clustering methods and automatically learns a latent representation consisting of two separate spaces. The clustering space captures all cluster-specific information and the shared space explains general variation in the data. This separation resolves the above mentioned conflict and allows our method to learn both detailed reconstructions and cluster specific abstractions. We evaluate our framework with extensive experiments to show several benefits: (1) cluster performance – on various data sets we outperform relevant baselines; (2) no hyperparameter tuning – this improved performance is achieved without introducing new clustering specific hyperparameters; (3) interpretability – isolating the cluster specific information in a separate space is advantageous for data exploration and interpreting the clustering results; and (4) dimensionality of the embedded space – we automatically learn a low dimensional space for clustering. Our ACe/DeC framework isolates cluster information, increases stability and interpretability, while improving cluster performance.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[88]

C. Leiber, L. G. M. Bauer, B. Schelling, C. Böhm and C. Plant.
Dip-based Deep Embedded Clustering with k-Estimation.
KDD 2021 - 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Singapore, Aug 14-18, 2021. DOI

Abstract

The combination of clustering with Deep Learning has gained much attention in recent years. Unsupervised neural networks like autoencoders can autonomously learn the essential structures in a data set. This idea can be combined with clustering objectives to learn relevant features automatically. Unfortunately, they are often based on a k-means framework, from which they inherit various assumptions, like spherical-shaped clusters. Another assumption, also found in approaches outside the k-means-family, is knowing the number of clusters a-priori. In this paper, we present the novel clustering algorithm DipDECK, which can estimate the number of clusters simultaneously to improving a Deep Learning-based clustering objective. Additionally, we can cluster complex data sets without assuming only spherically shaped clusters. Our algorithm works by heavily overestimating the number of clusters in the embedded space of an autoencoder and, based on Hartigan’s Dip-test - a statistical test for unimodality - analyses the resulting micro-clusters to determine which to merge. We show in extensive experiments the various benefits of our method: (1) we achieve competitive results while learning the clustering-friendly representation and number of clusters simultaneously; (2) our method is robust regarding parameters, stable in performance, and allows for more flexibility in the cluster shape; (3) we outperform relevant competitors in the estimation of the number of clusters.

MCML Authors

Collin Leiber

Dr.

* Former Member

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[87]

M. Biloš and S. Günnemann.
Scalable Normalizing Flows for Permutation Invariant Densities.
ICML 2021 - 38th International Conference on Machine Learning. Virtual, Jul 18-24, 2021. URL

Abstract

Modeling sets is an important problem in machine learning since this type of data can be found in many domains. A promising approach defines a family of permutation invariant densities with continuous normalizing flows. This allows us to maximize the likelihood directly and sample new realizations with ease. In this work, we demonstrate how calculating the trace, a crucial step in this method, raises issues that occur both during training and inference, limiting its practicality. We propose an alternative way of defining permutation equivariant transformations that give closed form trace. This leads not only to improvements while training, but also to better final performance. We demonstrate the benefits of our approach on point processes and general set modeling.

MCML Authors

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Data Analytics & Machine Learning

[86]

N. Strauß, L. Rottkamp, S. Schmoll and M. Schubert.
Efficient Parking Search using Shared Fleet Data.
MDM 2021 - 22nd IEEE International Conference on Mobile Data Management. Virtual, Jun 15-18, 2021. DOI

Abstract

Finding an available on-street parking spot is a relevant problem of day-to-day life. In recent years, several cities began providing real-time parking occupancy data. Finding a free parking spot in such a smart environment can be modeled and solved as a Markov decision process (MDP). The solver has to consider uncertainty as available parking spots might not remain available until arrival due to other vehicles claiming spots in the meantime. Knowing the parking intention of every vehicle in the environment would eliminate this uncertainty but is currently not realistic. In contrast, acquiring data from a subset of vehicles appears feasible and could at least reduce uncertainty.In this paper, we examine how sharing data within a vehicle fleet might lower parking search times. We use this data to better estimate the availability of parking spots at arrival. Since optimal solutions for large scenarios are computationally infeasible, we base our methods on approximations shown to perform well in single-agent settings. Our evaluation features a simulation of a part of Melbourne and indicates that fleet data can significantly reduce the time spent searching for a free parking bay.

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Lukas Rottkamp

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[85]

J. Schuchardt, A. Bojchevski, J. Gasteiger and S. Günnemann.
Collective Robustness Certificates - Exploiting Interdependence in Graph Neural Networks.
ICLR 2021 - 9th International Conference on Learning Representations. Virtual, May 03-07, 2021. URL

Abstract

MCML Authors

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Data Analytics & Machine Learning

[84]

E. Faerman.
Representation learning on relational data.
Dissertation 2021. DOI

Abstract

This thesis introduces methods that leverage relational information to address various problems in machine learning, such as node classification, graph matching, and argument mining. It explores unsupervised and semi-supervised approaches for node classification, graph alignment for geographical maps and knowledge graphs, and proposes a novel method for identifying and searching arguments in peer reviews. Additionally, it presents a subspace clustering method that uses relationships to improve clustering performance on large datasets. (Shortened.)

MCML Authors

Evgeny Faerman

Dr.

* Former Member

[83]

Y. Ma and V. Tresp.
Causal Inference under Networked Interference and Intervention Policy Enhancement.
AISTATS 2021 - 24th International Conference on Artificial Intelligence and Statistics. Virtual, Apr 13-15, 2021. URL

Abstract

Estimating individual treatment effects from data of randomized experiments is a critical task in causal inference. The Stable Unit Treatment Value Assumption (SUTVA) is usually made in causal inference. However, interference can introduce bias when the assigned treatment on one unit affects the potential outcomes of the neighboring units. This interference phenomenon is known as spillover effect in economics or peer effect in social science. Usually, in randomized experiments or observational studies with interconnected units, one can only observe treatment responses under interference. Hence, the issue of how to estimate the superimposed causal effect and recover the individual treatment effect in the presence of interference becomes a challenging task in causal inference. In this work, we study causal effect estimation under general network interference using Graph Neural Networks, which are powerful tools for capturing node and link dependencies in graphs. After deriving causal effect estimators, we further study intervention policy improvement on the graph under capacity constraint. We give policy regret bounds under network interference and treatment capacity constraint. Furthermore, a heuristic graph structure-dependent error bound for Graph Neural Network-based causal estimators is provided.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[82]

M. Berrendorf, E. Faerman and V. Tresp.
Active Learning for Entity Alignment.
ECIR 2021 - 43rd European Conference on Information Retrieval. Virtual, Mar 28-Apr 01, 2021. DOI GitHub

Abstract

In this work, we propose a novel framework for labeling entity alignments in knowledge graph datasets. Different strategies to select informative instances for the human labeler build the core of our framework. We illustrate how the labeling of entity alignments is different from assigning class labels to single instances and how these differences affect the labeling efficiency. Based on these considerations, we propose and evaluate different active and passive learning strategies. One of our main findings is that passive learning approaches, which can be efficiently precomputed, and deployed more easily, achieve performance comparable to the active learning strategies.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[81]

M. Berrendorf, L. Wacker and E. Faerman.
A Critical Assessment of State-of-the-Art in Entity Alignment.
ECIR 2021 - 43rd European Conference on Information Retrieval. Virtual, Mar 28-Apr 01, 2021. DOI GitHub

Abstract

In this work, we perform an extensive investigation of two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs. Therefore, we first carefully examine the benchmarking process and identify several shortcomings, making the results reported in the original works not always comparable. Furthermore, we suspect that it is a common practice in the community to make the hyperparameter optimization directly on a test set, reducing the informative value of reported performance. Thus, we select a representative sample of benchmarking datasets and describe their properties. We also examine different initializations for entity representations since they are a decisive factor for model performance. Furthermore, we use a shared train/validation/test split for an appropriate evaluation setting to evaluate all methods on all datasets. In our evaluation, we make several interesting findings. While we observe that most of the time SotA approaches perform better than baselines, they have difficulties when the dataset contains noise, which is the case in most real-life applications. Moreover, in our ablation study, we find out that often different features of SotA method are crucial for good performance than previously assumed.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

[80]

M. Fromm, M. Berrendorf, S. Obermeier, T. Seidl and E. Faerman.
Diversity Aware Relevance Learning for Argument Search.
ECIR 2021 - 43rd European Conference on Information Retrieval. Virtual, Mar 28-Apr 01, 2021. DOI GitHub

Abstract

In this work, we focus on the problem of retrieving relevant arguments for a query claim covering diverse aspects. State-of-the-art methods rely on explicit mappings between claims and premises, and thus are unable to utilize large available collections of premises without laborious and costly manual annotation. Their diversity approach relies on removing duplicates via clustering which does not directly ensure that the selected premises cover all aspects. This work introduces a new multi-step approach for the argument retrieval problem. Rather than relying on ground-truth assignments, our approach employs a machine learning model to capture semantic relationships between arguments. Beyond that, it aims to cover diverse facets of the query, instead of trying to identify duplicates explicitly. Our empirical evaluation demonstrates that our approach leads to a significant improvement in the argument retrieval task even though it requires less data.

MCML Authors

Michael Fromm

Dr.

* Former Member

Max Berrendorf

Dr.

* Former Member

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Evgeny Faerman

Dr.

* Former Member

[79]

A. Beer, E. Allerborn, V. Hartmann and T. Seidl.
KISS - A fast kNN-based Importance Score for Subspaces.
EDBT 2021 - 24th International Conference on Extending Database Technology. Nicosia, Cyprus, Mar 23-26, 2021. PDF

Abstract

In high-dimensional datasets some dimensions or attributes can be more important than others. Whereas most algorithms neglect one or more dimensions for all points of a dataset or at least for all points of a certain cluster together, our method KISS (textbf{k}NN-based textbf{I}mportance textbf{S}core of textbf{S}ubspaces) detects the most important dimensions for each point individually. It is fully unsupervised and does not depend on distorted multidimensional distance measures. Instead, the $k$ nearest neighbors ($k$NN) in one-dimensional projections of the data points are used to calculate the score for every dimension’s importance. Experiments across a variety of settings show that those scores reflect well the structure of the data. KISS can be used for subspace clustering. What sets it apart from other methods for this task is its runtime, which is linear in the number of dimensions and $O(n log(n))$ in the number of points, as opposed to quadratic or even exponential runtimes for previous algorithms.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[78]

M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, S. Sharifzadeh, V. Tresp and J. Lehmann.
PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings.
Journal of Machine Learning Research 22.82 (Mar. 2021). PDF

Abstract

Recently, knowledge graph embeddings (KGEs) have received significant attention, and several software libraries have been developed for training and evaluation. While each of them addresses specific needs, we report on a community effort to a re-design and re-implementation of PyKEEN, one of the early KGE libraries. PyKEEN 1.0 enables users to compose knowledge graph embedding models based on a wide range of interaction models, training approaches, loss functions, and permits the explicit modeling of inverse relations. It allows users to measure each component’s influence individually on the model’s performance. Besides, an automatic memory optimization has been realized in order to optimally exploit the provided hardware. Through the integration of Optuna, extensive hyper-parameter optimization (HPO) functionalities are provided.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[77]

M. Fromm, E. Faerman, M. Berrendorf, S. Bhargava, R. Qi, Y. Zhang, L. Dennert, S. Selle, Y. Mao and T. Seidl.
Argument Mining Driven Analysis of Peer-Reviews.
AAAI 2021 - 35th Conference on Artificial Intelligence. Virtual, Feb 02-09, 2021. DOI GitHub

Abstract

Peer reviewing is a central process in modern research and essential for ensuring high quality and reliability of published work. At the same time, it is a time-consuming process and increasing interest in emerging fields often results in a high review workload, especially for senior researchers in this area. How to cope with this problem is an open question and it is vividly discussed across all major conferences. In this work, we propose an Argument Mining based approach for the assistance of editors, meta-reviewers, and reviewers. We demonstrate that the decision process in the field of scientific publications is driven by arguments and automatic argument identification is helpful in various use-cases. One of our findings is that arguments used in the peer-review process differ from arguments in other domains making the transfer of pre-trained models difficult. Therefore, we provide the community with a new peer-review dataset from different computer science conferences with annotated arguments. In our extensive empirical evaluation, we show that Argument Mining can be used to efficiently extract the most relevant parts from reviews, which are paramount for the publication decision. The process remains interpretable since the extracted arguments can be highlighted in a review without detaching them from their context.

MCML Authors

Michael Fromm

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Max Berrendorf

Dr.

* Former Member

Yao Zhang

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[76]

S. Sharifzadeh, S. M. Baharlou and V. Tresp.
Classification by Attention: Scene Graph Classification with Prior Knowledge.
AAAI 2021 - 35th Conference on Artificial Intelligence. Virtual, Feb 02-09, 2021. DOI

Abstract

A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for perception and prior knowledge. Instead, we take a multi-task learning approach by introducing schema representations and implementing the classification as an attention layer between image-based representations and the schemata. This allows for the prior knowledge to emerge and propagate within the perception model. By enforcing the model also to represent the prior, we achieve a strong inductive bias. We show that our model can accurately generate commonsense knowledge and that the iterative injection of this knowledge to scene representations, as a top-down mechanism, leads to significantly higher classification performance. Additionally, our model can be fine-tuned on external knowledge given as triples. When combined with self-supervised learning and with 1% of annotated images only, this gives more than 3% improvement in object classification, 26% in scene graph classification, and 36% in predicate prediction accuracy.

MCML Authors

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[75]

S. Schmoll and M. Schubert.
Semi-Markov Reinforcement Learning for Stochastic Resource Collection.
IJCAI 2020 - 29th International Joint Conference on Artificial Intelligence. Yokohama, Japan (postponed due to the Corona pandemic), Jan 07-15, 2021. DOI

Abstract

We show that the task of collecting stochastic, spatially distributed resources (Stochastic Resource Collection, SRC) may be considered as a Semi-Markov-Decision-Process. Our Deep-Q-Network (DQN) based approach uses a novel scalable and transferable artificial neural network architecture. The concrete use-case of the SRC is an officer (single agent) trying to maximize the amount of fined parking violations in his area. We evaluate our approach on a environment based on the real-world parking data of the city of Melbourne. In small, hence simple, settings with short distances between resources and few simultaneous violations, our approach is comparable to previous work. When the size of the network grows (and hence the amount of resources) our solution significantly outperforms preceding methods. Moreover, applying a trained agent to a non-overlapping new area outperforms existing approaches.

MCML Authors

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[74]

M. Berrendorf, E. Faerman, L. Vermue and V. Tresp.
Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank.
WI-IAT 2020 - IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Virtual, Dec 14-17, 2020. DOI

Abstract

In this work, we take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment. In the current experimental setting, multiple different scores are employed to assess different aspects of model performance. We analyze the informativeness of these evaluation measures and identify several shortcomings. In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets. Moreover, we demonstrate that varying size of the test size automatically has impact on the performance of the same model based on commonly used metrics for the Entity Alignment task. We show that this leads to various problems in the interpretation of results, which may support misleading conclusions. Therefore, we propose adjustments to the evaluation and demonstrate empirically how this supports a fair, comparable, and interpretable assessment of model performance.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[73]

E. Faerman, F. Borutta, J. Busch and M. Schubert.
Ada-LLD: Adaptive Node Similarity Using Multi-Scale Local Label Distributions.
WI-IAT 2020 - IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Virtual, Dec 14-17, 2020. DOI GitHub

Abstract

In many applications, data is represented as a network connecting nodes of various types. While types might be known for some nodes in the network, the type of a newly added node is typically unknown. In this paper, we focus on predicting the types of these new nodes based on their connectivity to the already labeled nodes. To tackle this problem, we propose Adaptive Node Similarity Using Multi-Scale Local Label Distributions (Ada-LLD) which learns the dependency of a node’s class label from the distribution of class labels in this node’s local neighborhood. In contrast to previous approaches, our approach is able to learn how class labels correlate with labels in variously sized neighborhoods. We propose a neural network architecture that combines information from differently sized neighborhoods allowing for the detection of correlations on multiple scales. Our evaluations demonstrate that our method significantly improves prediction quality on real world data sets. In the spirit of reproducible research we make our code available.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[72]

S. Obermeier, M. Berrendorf and P. Kröger.
Memory-Efficient RkNN Retrieval by Nonlinear k-Distance Approximation.
WI-IAT 2020 - IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Virtual, Dec 14-17, 2020. DOI

Abstract

The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear lines in log-log space. In this work, we investigate this assumption and uncover that it is violated in regions of changing density, which we show are typical for real-life datasets. Towards a generic solution, we pose the estimation of k-distances as a regression problem. Thereby, we enable harnessing the power of the abundance of available Machine Learning models and profiting from their advancement. We propose a flexible approach which allows steering the performance-memory consumption trade-off, and in particular to find good solutions with a fixed memory budget crucial in the context of edge computing. Moreover, we show how to obtain and improve guaranteed bounds essential to exact query processing. In experiments on real-world datasets, we demonstrate how this framework can significantly reduce the index memory consumption, and strongly reduce the candidate set size. We publish our code at https://github.com/sobermeier/nonlinear-kdist, and a detailed technical report at https://arxiv.org/abs/2011.01773.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Max Berrendorf

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[71]

C. Böhm and C. Plant.
Massively Parallel Graph Drawing and Representation Learning.
IEEE BigData 2020 - IEEE International Conference on Big Data. Virtual, Dec 10-13, 2020. DOI

Abstract

To fully exploit the performance potential of modern multi-core processors, machine learning and data mining algorithms for big data must be parallelized in multiple ways. Today’s CPUs consist of multiple cores, each following an independent thread of control, and each equipped with multiple arithmetic units which can perform the same operation on a vector of multiple data objects. Graph embedding, i.e. converting the vertices of a graph into numerical vectors is a data mining task of high importance and is useful for graph drawing (low-dimensional vectors) and graph representation learning (high-dimensional vectors). In this paper, we propose MulticoreGEMPE (Graph Embedding by Minimizing the Predictive Entropy), an information-theoretic method which can generate low and high-dimensional vectors. MulticoreGEMPE applies MIMD (Multiple Instructions Multiple Data, using OpenMP) and SIMD (Single Instructions Multiple Data, using AVX-512) parallelism. We propose general ideas applicable in other graph-based algorithms like emph{vectorized hashing} and emph{vectorized reduction}. Our experimental evaluation demonstrates the superiority of our approach.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[70]

C. Böhm and C. Plant.
Massively Parallel Random Number Generation.
IEEE BigData 2020 - IEEE International Conference on Big Data. Virtual, Dec 10-13, 2020. DOI

Abstract

Random numbers are of high importance for many applications, e.g. simulation, optimization, and data mining. Unlike in information security, in these applications the demands on the quality of the random numbers are only moderate while the most important issue is the runtime efficiency. We propose in this paper new SIMD (Single Instruction, Multiple Data) and MIMD (Multiple Instructions, Multiple Data) parallel methods for Linear Congruential Generators (LCG), the most widespread class of fast pseudo-random number generators. In particular, we propose algorithms for the well-known 48-bit LCG used in the Java-class Random and in the method drand48() of C++ for processors using AVX (Advanced Vector eXtensions) and OpenMP. Our focus is on consistency with the original methods which facilitates debugging and enables the user to exactly reproduce previous non-parallel experiments in a SIMD and MIMD environment. Our experimental evaluation demonstrates the superiority of our algorithms.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[69]

M. Perdacher, C. Plant and C. Böhm.
Improved Data Locality Using Morton-order Curve on the Example of LU Decomposition.
IEEE BigData 2020 - IEEE International Conference on Big Data. Virtual, Dec 10-13, 2020. DOI

Abstract

The LU decomposition is an essential element used in many linear algebra applications. Furthermore, it is used in LINPACK to benchmark the performance of modern multi-core processor environments. These processors offer a large memory hierarchy including multiple registers and various levels of cache. Registers or L1 data cache are small in size but also very fast. The L2 or L3 cache memory is usually shared among other cores and larger but slower. For the LU decomposition, the latency of fetching data from the main memory to the registers to perform a calculation also depends on the input matrix’s memory access pattern. Here, we look at the block factorization algorithm, where the LU decomposition performance depends on the performance of the matrix multiplication. In both cases, the LU decomposition and the matrix multiplication, such a matrix is traversed by three nested loops. In this paper, we propose to traverse such loops in an order defined by a space-filling curve. This traversal dramatically improves data locality and offers effective exploitation of the memory hierarchy. Besides the canonical (or line-by-line) access pattern, we demonstrate the traversal in Hilbert-, Peano and Morton order. Our extensive experiments show that the Morton order (or Z -order) and the inverse Morton order (or Z-order) have a better runtime performance compared to the others.

MCML Authors

Christian Böhm

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

* Former Principal Investigator

[68]

S. Geisler, D. Zügner and S. Günnemann.
Reliable Graph Neural Networks via Robust Aggregation.
NeurIPS 2020 - 34th Conference on Neural Information Processing Systems. Virtual, Dec 06-12, 2020. URL

Abstract

Perturbations targeting the graph structure have proven to be extremely effective in reducing the performance of Graph Neural Networks (GNNs), and traditional defenses such as adversarial training do not seem to be able to improve robustness. This work is motivated by the observation that adversarially injected edges effectively can be viewed as additional samples to a node’s neighborhood aggregation function, which results in distorted aggregations accumulating over the layers. Conventional GNN aggregation functions, such as a sum or mean, can be distorted arbitrarily by a single outlier. We propose a robust aggregation function motivated by the field of robust statistics. Our approach exhibits the largest possible breakdown point of 0.5, which means that the bias of the aggregation is bounded as long as the fraction of adversarial edges of a node is less than 50%. Our novel aggregation function, Soft Medoid, is a fully differentiable generalization of the Medoid and therefore lends itself well for end-to-end deep learning. Equipping a GNN with our aggregation improves the robustness with respect to structure perturbations on Cora ML by a factor of 3 (and 5.5 on Citeseer) and by a factor of 8 for low-degree nodes.

MCML Authors

Daniel Zügner

Dr.

* Former Member

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Data Analytics & Machine Learning

[67]

O. Shchur, N. Gao, M. Biloš and S. Günnemann.
Fast and Flexible Temporal Point Processes with Triangular Maps.
NeurIPS 2020 - 34th Conference on Neural Information Processing Systems. Virtual, Dec 06-12, 2020. URL

Abstract

Temporal point process (TPP) models combined with recurrent neural networks provide a powerful framework for modeling continuous-time event data. While such models are flexible, they are inherently sequential and therefore cannot benefit from the parallelism of modern hardware. By exploiting the recent developments in the field of normalizing flows, we design TriTPP - a new class of non-recurrent TPP models, where both sampling and likelihood computation can be done in parallel. TriTPP matches the flexibility of RNN-based methods but permits several orders of magnitude faster sampling. This enables us to use the new model for variational inference in continuous-time discrete-state systems. We demonstrate the advantages of the proposed framework on synthetic and real-world datasets.

MCML Authors

Oleksandr Shchur

Dr.

* Former Member

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[66]

Y. Ma and V. Tresp.
A Variational Quantum Circuit Model for Knowledge Graph Embeddings.
QTNML @NeurIPS 2020 - 1st Workshop on Quantum Tensor Networks in Machine Learning at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. PDF

Abstract

Can quantum computing resources facilitate representation learning? In this work, we propose the first quantum Ansatz for statistical relational learning on knowledge graphs using parametric quantum circuits. We propose a variational quantum circuit for modeling knowledge graphs by introducing quantum representations of entities. In particular, latent representations of entities are encoded as coefficients of quantum states, while predicates are characterized by parametric gates acting on the quantum states. We show that quantum representations can be trained efficiently meanwhile preserving the quantum advantages. Simulations on classical machines with different datasets show that our proposed quantum circuit Ansatz and quantum representations can achieve comparable results to the state-of-the-art classical models, e.g., RESCAL, DISTMULT. Furthermore, after optimizing the models, the complexity of inductive inference on the knowledge graphs can be reduced with respect to the number of entities.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[65]

J. Busch, E. Faerman, M. Schubert and T. Seidl.
Learning Self-Expression Metrics for Scalable and Inductive Subspace Clustering.
SSL @NeurIPS 2020 - Workshop on Self-Supervised Learning - Theory and Practice at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. arXiv GitHub

Abstract

Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. In particular, methods relying on the self-expressiveness property have recently proved especially successful. However, they suffer from two major shortcomings: First, a quadratic-size coefficient matrix is learned directly, preventing these methods from scaling beyond small datasets. Secondly, the trained models are transductive and thus cannot be used to cluster out-of-sample data unseen during training. Instead of learning self-expression coefficients directly, we propose a novel metric learning approach to learn instead a subspace affinity function using a siamese neural network architecture. Consequently, our model benefits from a constant number of parameters and a constant-size memory footprint, allowing it to scale to considerably larger datasets. In addition, we can formally show that out model is still able to exactly recover subspace clusters given an independence assumption. The siamese architecture in combination with a novel geometric classifier further makes our model inductive, allowing it to cluster out-of-sample data. Additionally, non-linear clusters can be detected by simply adding an auto-encoder module to the architecture. The whole model can then be trained end-to-end in a self-supervised manner. This work in progress reports promising preliminary results on the MNIST dataset. In the spirit of reproducible research, me make all code publicly available. In future work we plan to investigate several extensions of our model and to expand experimental evaluation.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[64]

M. Berrendorf and E. Faerman.
mberr/ea-active-learning: Zenodo. Version 1.0.1.
2020. DOI

Abstract

Code for paper ‘Active Learning for Entity Alignment’ (https://arxiv.org/abs/2001.08943)

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

[63]

M. Berrendorf, L. Wacker and E. Faerman.
mberr/ea-sota-comparison: Zenodo. Version v1.1.1.
2020. DOI

Abstract

Code for paper ‘A Critical Assessment of State-of-the-Art in Entity Alignment.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

[62]

Y. Zhang, Y. Lu and T. Seidl.
KNNAC: An Efficient k Nearest Neighbor Based Clustering with Active Core Detection.
iiWAS 2020 - 22nd International Conference on Information Integration and Web-based Applications and Services. Chiang Mai, Thailand, Nov 30-Dec 02, 2020. DOI

Abstract

Density-based clustering algorithms are commonly adopted when arbitrarily shaped clusters exist. Usually, they do not need to know the number of clusters in prior, which is a big advantage. Conventional density-based approaches such as DBSCAN, utilize two parameters to define density. Recently, novel density-based clustering algorithms are proposed to reduce the problem complexity to the use of a single parameter k by utilizing the concepts of k Nearest Neighbor (kNN) and Reverse k Nearest Neighbor (RkNN) to define density. However, those kNN-based approaches are either ineffective or inefficient. In this paper, we present a new clustering algorithm KNNAC, which only requires computing the densities for a chosen subset of points due to the use of active core detection. We empirically show that, compared to other nearest neighbor based clustering approaches (e.g., RECORD, IS-DBSCAN, etc.), KNNAC can provide competitive performance while taking a fraction of the runtime.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[61]

D. Kazempour, A. Beer, P. Kröger and T. Seidl.
I fold you so! An internal evaluation measure for arbitrary oriented subspace clustering through piecewise-linear approximations of manifolds.
ICDMW 2020 - IEEE International Conference on Data Mining Workshops. Sorrento, Italy, Nov 17-20, 2020. DOI

Abstract

In this work we propose SRE, the first internal evaluation measure for arbitrary oriented subspace clustering results. For this purpose we present a new perspective on the subspace clustering task: the goal we formalize is to compute a clustering which represents the original dataset by minimizing the reconstruction loss from the obtained subspaces, while at the same time minimizing the dimensionality as well as the number of clusters. A fundamental feature of our approach is that it is model-agnostic, i.e., it is independent of the characteristics of any specific subspace clustering method. It is scale invariant and mathematically founded. The experiments show that the SRE scoring better assesses the quality of an arbitrarily oriented sub-space clustering compared to commonly used external evaluation measures.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[60]

D. Kazempour, P. Kröger and T. Seidl.
Towards an Internal Evaluation Measure for Arbitrarily Oriented Subspace Clustering.
ICDMW 2020 - IEEE International Conference on Data Mining Workshops. Sorrento, Italy, Nov 17-20, 2020. DOI

Abstract

In the setting of unsupervised machine learning, especially in clustering tasks, the evaluation of either novel algorithms or the assessment of a clustering of novel data is challenging. While mostly in the literature the evaluation of new methods is performed on labelled data, there are cases where no labels are at our disposal. In other cases we may not want to trust the “ground truth” labels. In general there exists a spectrum of so called internal evaluation measures in the literature. Each of the measures is mostly specialized towards a specific clustering model. The model of arbitrarily oriented subspace clusters is a more recent one. To the best of our knowledge there exist at the current time no internal evaluation measures tailored at assessing this particular type of clusterings. In this work we present the first internal quality measures for arbitrarily oriented subspace clusterings namely the normalized projected energy (NPE) and subspace compactness score (SCS). The results from the experiments show that especially NPE is capable of assessing clusterings by considering archetypical properties of arbitrarily oriented subspace clustering.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[59]

D. Kazempour, L. M. Yan, P. Kröger and T. Seidl.
You see a set of wagons - I see one train: Towards a unified view of local and global arbitrarily oriented subspace clusters.
ICDMW 2020 - IEEE International Conference on Data Mining Workshops. Sorrento, Italy, Nov 17-20, 2020. DOI

Abstract

Having data with a high number of features raises the need to detect clusters which exhibit within subspaces of features a high similarity. These subspaces can be arbitrarily oriented which gave rise to arbitrarily-oriented subspace clustering (AOSC) algorithms. In the diversity of such algorithms some are specialized at detecting clusters which are global, across the entire dataset regardless of any distances, while others are tailored at detecting local clusters. Both of these views (local and global) are obtained separately by each of the algorithms. While from an algebraic point of view, none of both representations can claim to be the true one, it is vital that domain scientists are presented both views, enabling them to inspect and decide which of the representations is closest to the domain specific reality. We propose in this work a framework which is capable to detect locally dense arbitrarily oriented subspace clusters which are embedded within a global one. We also first introduce definitions of locally and globally arbitrarily oriented subspace clusters. Our experiments illustrate that this approach has no significant impact on the cluster quality nor on the runtime performance, and enables scientists to be no longer limited exclusively to either of the local or global views.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Database Systems and Data Mining AI Lab

[58]

V. Melnychuk, E. Faerman, I. Manakov and T. Seidl.
Matching the Clinical Reality: Accurate OCT-Based Diagnosis From Few Labels.
CIKMW @CIKM 2020 - Workshop at the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020). Galway, Ireland, Oct 19-23, 2020. PDF GitHub

Abstract

Unlabeled data is often abundant in the clinic, making machine learning methods based on semi-supervised learning a good match for this setting. Despite this, they are currently receiving relatively little attention in medical image analysis literature. Instead, most practitioners and researchers focus on supervised or transfer learning approaches. The recently proposed Mix-Match and FixMatch algorithms have demonstrated promising results in extracting useful representations while requiring very few labels. Motivated by these recent successes, we apply MixMatch and FixMatch in an ophthalmological diagnostic setting and investigate how they fare against standard transfer learning. We find that both algorithms outperform the transfer learning baseline on all fractions of labelled data. Furthermore, our experiments show that Mean Teacher, which is a component of both algorithms, is not needed for our classification problem, as disabling it leaves the outcome unchanged.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Evgeny Faerman

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[57]

Y. Ma, Z. Han and V. Tresp.
Learning with Temporal Knowledge Graphs.
CIKMW @CIKM 2020 - Workshop at the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020). Galway, Ireland, Oct 19-23, 2020. Invited talk. PDF

Abstract

Temporal knowledge graphs, also known as episodic or time-dependent knowledge graphs, are large-scale event databases that describe temporally evolving multi-relational data. An episodic knowledge graph can be regarded as a sequence of semantic knowledge graphs incorporated with timestamps. In this talk, we review recently developed learning-based algorithms for temporal knowledge graphs completion and forecasting.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[56]

T. Seidl.
Keynote: Data Mining on Process Data.
ICPM 2020 - 2nd International Conference on Process Mining. Virtual, Oct 04-09, 2020. DOI

Abstract

Data Mining and Process Mining – is one just a variant of the other, or do worlds separate the two areas from each other? The notions sound so similar but the contents sometimes look differently, so respective researchers may get confused in their mutual perception, be it authors or reviewers. The talk recalls commonalities like model-based supervised and unsupervised learning approaches, and it also sheds light to peculiarities in process data and process mining tasks as seen from a data mining perspective. When considering trace data from event log files as time series, as sequences, or as activity sets, quite different data mining techniques apply and may be extended and improved. A particular example is rare pattern mining, which fills a gap between frequent patterns and outlier detection. The task aims at identifying patterns that occur with low frequency but above single outliers. Structural deficiences may cause malfunctions or other undesired behavior which get discarded as outliers in event logs, since they are observed infrequently only. Rare pattern mining may identify these situations, and recent approaches include clustering or ordering non-conformant traces. The talk concludes with some remarks on how to sell process mining papers to the data mining community, and vice versa, in order to improve mutual acceptance, and to increase synergies in the fields.

MCML Authors

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[55]

A. Maldonado, J. Sontheim, F. Richter and T. Seidl.
Performance Skyline: Inferring Process Performance Models from Interval Events.
SA4PM @ICPM 2020 - 1st International Workshop on Streaming Analytics for Process Mining in conjunction with the 2nd International Conference on Process Mining (ICPM 2020). Virtual, Oct 04-09, 2020. DOI

Abstract

Performance mining from event logs is a central task in managing and optimizing business processes. Established analysis techniques work with a single timestamp per event only. However, when available, time interval information enables proper analysis of the duration of individual activities as well as the overall execution runtime. Our novel approach, performance skyline, considers extended events, including start and end timestamps in log files, aiming at the discovery of events that are crucial to the overall duration of real process executions. As first contribution, our method gains a geometrical process representation for traces with interval events by using interval-based methods from sequence pattern mining and performance analysis. Secondly, we introduce the performance skyline, which discovers dominating events considering a given heuristic in this case, event duration. As a third contribution, we propose three techniques for statistical analysis of performance skylines and process trace sets, enabling more accurate process discovery, conformance checking, and process enhancement. Experiments on real event logs demonstrate that our contributions are highly suitable for detecting and analyzing the dominant events of a process.

MCML Authors

Andrea Maldonado

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[54]

A. Beer, D. Seeholzer, N. S. Schüler and T. Seidl.
Angle-Based Clustering.
SISAP 2020 - 13th International Conference on Similarity Search and Applications. Virtual, Sep 30-Oct 02, 2020. DOI

Abstract

The amount of data increases steadily, and yet most clustering algorithms perform complex computations for every single data point. Furthermore, Euclidean distance which is used for most of the clustering algorithms is often not the best choice for datasets with arbitrarily shaped clusters or such with high dimensionality. Based on ABOD, we introduce ABC, the first angle-based clustering method. The algorithm first identifies a small part of the data as border points of clusters based on the angle between their neighbors. Those few border points can, with some adjustments, be clustered with well-known clustering algorithms like hierarchical clustering with single linkage or DBSCAN. Residual points can quickly and easily be assigned to the cluster of their nearest border point, so the overall runtime is heavily reduced while the results improve or remain similar.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[53]

Y. Ma.
Learning with relational knowledge in the context of cognition, quantum computing, and causality.
Dissertation 2020. DOI

Abstract

This dissertation explores the use of knowledge graphs, including semantic and episodic graphs, for representing static and evolving human knowledge, and proposes methods for improving knowledge inference. It introduces two quantum machine learning algorithms aimed at speeding up knowledge graph inference, demonstrating significant speedups over classical methods. Additionally, the work addresses causal inference in relational data, specifically in social networks, and proposes causal estimators using graph neural networks to estimate superimposed effects and optimize treatment assignments for network welfare. (Shortened.)

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[52]

A. Beer, D. Kazempour, J. Busch, A. Tekles and T. Seidl.
Grace - Limiting the Number of Grid Cells for Clustering High-Dimensional Data.
LWDA 2020 - Conference on Lernen. Wissen. Daten. Analysen. Bonn, Germany, Sep 09-11, 2020. PDF

Abstract

Using grid-based clustering algorithms on high-dimensionaldata has the advantage of being able to summarize datapoints into cells, but usually produces an exponential number of grid cells. In this paper we introduce Grace (using textit{Gr}id which is textit{a}daptive for textit{c}lusttextit{e}ring), a clustering algorithm which limits the number of cells produced depending on the number of points in the dataset. A non-equidistant grid is constructed based on the distribution of points in one-dimensional projections of the data. A density threshold is automatically deduced from the data and used to detect dense cells, which are later combined to clusters. The adaptive grid structure makes an efficient but still accurate clustering of multidimensional data possible. Experiments with synthetic as well as real-world data sets of various size and dimensionality confirm these properties.

MCML Authors

Anna Beer

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[51]

C. Plant, S. Biedermann and C. Böhm.
Data Compression as a Comprehensive Framework for Graph Drawing and Representation Learning.
KDD 2020 - 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, California, USA, Aug 23-27, 2020. DOI

Abstract

Embedding a graph into feature space is a promising approach to understand its structure. Embedding into 2D or 3D space enables visualization; representation in higher-dimensional vector space (typically >100D) enables the application of data mining techniques. For the success of knowledge discovery it is essential that the distances between the embedded vertices truly reflect the structure of the graph. Our fundamental idea is to compress the adjacency matrix by predicting the existence of an edge from the Euclidean distance between the corresponding vertices in the embedding, and to use the achieved compression as a quality measure for the embedding. We call this quality measure Predictive Entropy (PE). PE uses a sigmoid function to define the probability which is monotonically decreasing with the Euclidean distance. We use this sigmoid probability to compress the adjacency matrix of the graph by an entropy coding. While PE could be used to assess the result of any graph drawing or representation learning method we particularly use it as objective function in our new method GEMPE (Graph Embedding by Minimizing the Predictive Entropy). We demonstrate in our experiments that GEMPE clearly outperforms comparison methods with respect to quality of the visual result, clustering and node-labeling accuracy on the discovered coordinates.

MCML Authors

Christian Böhm

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

* Former Principal Investigator

[50]

D. Zügner and S. Günnemann.
Certifiable Robustness of Graph Convolutional Networks under Structure Perturbation.
KDD 2020 - 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, California, USA, Aug 23-27, 2020. DOI

Abstract

Recent works show that message-passing neural networks (MPNNs) can be fooled by adversarial attacks on both the node attributes and the graph structure. Since MPNNs are currently being rapidly adopted in real-world applications, it is thus crucial to improve their reliablility and robustness. While there has been progress on robustness certification of MPNNs under perturbation of the node attributes, no existing method can handle structural perturbations. These perturbations are especially challenging because they alter the message passing scheme itself. In this work we close this gap and propose the first method to certify robustness of Graph Convolutional Networks (GCNs) under perturbations of the graph structure. We show how this problem can be expressed as a jointly constrained bilinear program - a challenging, yet well-studied class of problems - and propose a novel branch-and-bound algorithm to obtain lower bounds on the global optimum. These lower bounds are significantly tighter and can certify up to twice as many nodes compared to a standard linear relaxation.

MCML Authors

Daniel Zügner

Dr.

* Former Member

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[49]

C. Böhm.
Space-filling Curves for High-performance Data Mining.
Preprint (Aug. 2020). arXiv

Abstract

Space-filling curves like the Hilbert-curve, Peano-curve and Z-order map natural or real numbers from a two or higher dimensional space to a one dimensional space preserving locality. They have numerous applications like search structures, computer graphics, numerical simulation, cryptographics and can be used to make various algorithms cache-oblivious. In this paper, we describe some details of the Hilbert-curve. We define the Hilbert-curve in terms of a finite automaton of Mealy-type which determines from the two-dimensional coordinate space the Hilbert order value and vice versa in a logarithmic number of steps. And we define a context-free grammar to generate the whole curve in a time which is linear in the number of generated coordinate/order value pairs, i.e. a constant time per coordinate pair or order value. We also review two different strategies which enable the generation of curves without the usual restriction to square-like grids where the side-length is a power of two. Finally, we elaborate on a few applications, namely matrix multiplication, Cholesky decomposition, the Floyd-Warshall algorithm, k-Means clustering, and the similarity join.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[48]

A. Beer, V. Hartmann and T. Seidl.
Orderings of Data - more than a Tripping Hazard.
SSDBM 2020 - 32nd International Conference on Scientific and Statistical Database Management. Vienna, Austria, Jul 07-09, 2020. DOI

Abstract

As data processing techniques get more and more sophisticated every day, many of us researchers often get lost in the details and subtleties of the algorithms we are developing and far too easily seem to forget to look also at the very first steps of every algorithm: the input of the data. Since there are plenty of library functions for this task, we indeed do not have to think about this part of the pipeline anymore. But maybe we should. All data is stored and loaded into a program in some order. In this vision paper we study how ignoring this order can (1) lead to performance issues and (2) make research results unreproducible. We furthermore examine desirable properties of a data ordering and why current approaches are often not suited to tackle the two mentioned problems.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[47]

D. Mautz, C. Plant and C. Böhm.
DeepECT: The Deep Embedded Cluster Tree.
Data Science and Engineering 5 (Jul. 2020). DOI

Abstract

The idea of combining the high representational power of deep learning techniques with clustering methods has gained much attention in recent years. Optimizing a clustering objective and the dataset representation simultaneously has been shown to be advantageous over separately optimizing them. So far, however, all proposed methods have been using a flat clustering strategy, with the actual number of clusters known a priori. In this paper, we propose the Deep Embedded Cluster Tree (DeepECT), the first divisive hierarchical embedded clustering method. The cluster tree does not need to know the actual number of clusters during optimization. Instead, the level of detail to be analyzed can be chosen afterward and for each sub-tree separately. An optional data-augmentation-based extension allows DeepECT to ignore prior-known invariances of the dataset, such as affine transformations in image data. We evaluate and show the advantages of DeepECT in extensive experiments.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[46]

S. Friedl, S. Schmoll, F. Borutta and M. Schubert.
SMART-Env.
MDM 2020 - 21st IEEE International Conference on Mobile Data Management. Versailles, France, Jun 30-Jul 03, 2020. DOI

Abstract

In this work, we present SMART-Env (Spatial Multi-Agent Resource search Training Environment), a spatio-temporal multi-agent environment for evaluating and training different kinds of agents on resource search tasks. We explain how to simulate arbitrary spawning distributions on real-world street graphs, compare agents’ behavior and evaluate their performance over time. Finally, we demonstrate SMART-Env in a taxi dispatching scenario with three different kinds of agents.

MCML Authors

Sabrina Friedl

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[45]

M. Ali, C. T. Hoyt, L. Vermue, M. Galkin and M. Berrendorf.
pykeen/benchmarking. Version v1.0.
2020. DOI

Abstract

pykeen/benchmarking: Accompanying arXiv announcement (v1.0). Zenodo. Mehdi Ali, Charles Tapley Hoyt, Laurent Vermue, Michael Galkin, & Max Berrendorf. (2020).

MCML Authors

Max Berrendorf

Dr.

* Former Member

[44]

D. Mautz, W. Ye, C. Plant and C. Böhm.
Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans.
ACM Transactions on Knowledge Discovery from Data 14.5 (Jun. 2020). DOI

Abstract

A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the dataset. The new research field of non-redundant clustering addresses this class of problems. In this article, we follow the approach that different, non-redundant k-means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space. We assume that these subspaces (and optionally a further noise space without any cluster structure) are orthogonal to each other. This assumption enables a particularly rigorous mathematical treatment of the non-redundant clustering problem and thus a particularly efficient algorithm, which we call Nr-Kmeans (for non-redundant k-means). The superiority of our algorithm is demonstrated both theoretically, as well as in extensive experiments. Further, we propose an extension of Nr-Kmeans that harnesses Hartigan’s dip test to identify the number of clusters for each subspace automatically.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[43]

F. Borutta, D. Kazempour, F. Marty, P. Kröger and T. Seidl.
Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform.
PAKDD 2020 - 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Singapore, May 11-14, 2020. DOI

Abstract

When facing high-dimensional data streams, clustering algorithms quickly reach the boundaries of their usefulness as most of these methods are not designed to deal with the curse of dimensionality. Due to inherent sparsity in high-dimensional data, distances between objects tend to become meaningless since the distances between any two objects measured in the full dimensional space tend to become the same for all pairs of objects. In this work, we present a novel oriented subspace clustering algorithm that is able to deal with such issues and detects arbitrarily oriented subspace clusters in high-dimensional data streams. Data streams generally implicate the challenge that the data cannot be stored entirely and hence there is a general demand for suitable data handling strategies for clustering algorithms such that the data can be processed within a single scan. We therefore propose the CASHSTREAM algorithm that unites state-of-the-art stream processing techniques and additionally relies on the Hough transform to detect arbitrarily oriented subspace clusters. Our experiments compare CASHSTREAM to its static counterpart and show that the amount of consumed memory is significantly decreased while there is no loss in terms of runtime.

MCML Authors

Felix Borutta

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[42]

J. Klicpera, J. Groß and S. Günnemann.
Directional Message Passing for Molecular Graphs.
ICLR 2020 - 8th International Conference on Learning Representations. Virtual, Apr 26-May 01, 2020. URL

Abstract

Graph neural networks have recently achieved great successes in predicting quantum mechanical properties of molecules. These models represent a molecule as a graph using only the distance between atoms (nodes). They do not, however, consider the spatial direction from one atom to another, despite directional information playing a central role in empirical potentials for molecules, e.g. in angular potentials. To alleviate this limitation we propose directional message passing, in which we embed the messages passed between atoms instead of the atoms themselves. Each message is associated with a direction in coordinate space. These directional message embeddings are rotationally equivariant since the associated directions rotate with the molecule. We propose a message passing scheme analogous to belief propagation, which uses the directional information by transforming messages based on the angle between them. Additionally, we use spherical Bessel functions and spherical harmonics to construct theoretically well-founded, orthogonal representations that achieve better performance than the currently prevalent Gaussian radial basis representations while using fewer than 1/4 of the parameters. We leverage these innovations to construct the directional message passing neural network (DimeNet). DimeNet outperforms previous GNNs on average by 76% on MD17 and by 31% on QM9. Our implementation is available online.

MCML Authors

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Stephan Günnemann

Data Analytics & Machine Learning

[41]

O. Shchur, M. Biloš and S. Günnemann.
Intensity-Free Learning of Temporal Point Processes.
ICLR 2020 - 8th International Conference on Learning Representations. Virtual, Apr 26-May 01, 2020. Spotlight Presentation. URL

Abstract

Temporal point processes are the dominant paradigm for modeling sequences of events happening at irregular intervals. The standard way of learning in such models is by estimating the conditional intensity function. However, parameterizing the intensity function usually incurs several trade-offs. We show how to overcome the limitations of intensity-based approaches by directly modeling the conditional distribution of inter-event times. We draw on the literature on normalizing flows to design models that are flexible and efficient. We additionally propose a simple mixture model that matches the flexibility of flow-based models, but also permits sampling and computing moments in closed form. The proposed models achieve state-of-the-art performance in standard prediction tasks and are suitable for novel applications, such as learning sequence embeddings and imputing missing data.

MCML Authors

Oleksandr Shchur

Dr.

* Former Member

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[40]

M. Berrendorf, E. Faerman and V. Tresp.
Active Learning for Entity Alignment.
DL4G @WWW 2020 - 5th International Workshop on Deep Learning for Graphs at the International World Wide Web Conference (WWW 2020). Taipeh, Taiwan, Apr 21, 2020. arXiv

Abstract

In this work, we propose a novel framework for the labeling of entity alignments in knowledge graph datasets. Different strategies to select informative instances for the human labeler build the core of our framework. We illustrate how the labeling of entity alignments is different from assigning class labels to single instances and how these differences affect the labeling efficiency. Based on these considerations we propose and evaluate different active and passive learning strategies. One of our main findings is that passive learning approaches, which can be efficiently precomputed and deployed more easily, achieve performance comparable to the active learning strategies.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[39]

M. Berrendorf, E. Faerman, L. Vermue and V. Tresp.
Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank (Extended Abstract).
DL4G @WWW 2020 - 5th International Workshop on Deep Learning for Graphs at the International World Wide Web Conference (WWW 2020). Taipeh, Taiwan, Apr 21, 2020. Full paper at WI-AT 2020. DOI

Abstract

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[38]

M. C. Altinigneli, L. Miklautz, C. Böhm and C. Plant.
Hierarchical Quick Shift Guided Recurrent Clustering.
ICDE 2020 - 36th IEEE International Conference on Data Engineering. Dallas, TX, USA, Apr 20-24, 2020. DOI

Abstract

We propose a novel density-based mode-seeking Hierarchical Quick Shift clustering algorithm with an optional Recurrent Neural Network (RNN) to jointly learn the cluster assignments for every sample and the underlying dynamics of the mode-seeking clustering process. As a mode-seeking clustering algorithm, Hierarchical Quick Shift constrains data samples to stay on similar trajectories. All data samples converging to the same local mode are assigned to a common cluster. The RNN enables us to learn quasi-temporal structures during the mode-seeking clustering process. It supports variable density clusters with arbitrary shapes without requiring the expected number of clusters a priori. We evaluate our method in extensive experiments to show the advantages over other density-based clustering algorithms.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[37]

M. Berrendorf, E. Faerman, V. Melnychuk, V. Tresp and T. Seidl.
Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned.
ECIR 2020 - 42nd European Conference on Information Retrieval. Virtual, Apr 14-17, 2020. DOI GitHub

Abstract

In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive.We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets.We believe that people interested in KG matching might profit from our work, as well as novices entering the field.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[36]

F. Borutta.
Unsupervised learning on social data.
Dissertation 2020. DOI

Abstract

This thesis addresses several challenges in social data analytics, focusing on methods for clustering, learning from network data, and analyzing dynamic social data. It introduces novel algorithms for correlation clustering on streaming data, hierarchical clustering for social maps, and user identification based on spatio-temporal mobility patterns. Additionally, the thesis presents various node embedding techniques for learning representations from network topology and proposes a graph neural network model for matching nodes across overlapping graphs. (Shortened.)

MCML Authors

Felix Borutta

Dr.

* Former Member

[35]

L. Miklautz, D. Mautz, M. C. Altinigneli, C. Böhm and C. Plant.
Deep embedded non-redundant clustering.
AAAI 2020 - 34th Conference on Artificial Intelligence. New York City, New York, USA, Feb 07-12, 2020. DOI

Abstract

Complex data types like images can be clustered in multiple valid ways. Non-redundant clustering aims at extracting those meaningful groupings by discouraging redundancy between clusterings. Unfortunately, clustering images in pixel space directly has been shown to work unsatisfactory. This has increased interest in combining the high representational power of deep learning with clustering, termed deep clustering. Algorithms of this type combine the non-linear embedding of an autoencoder with a clustering objective and optimize both simultaneously. None of these algorithms try to find multiple non-redundant clusterings. In this paper, we propose the novel Embedded Non-Redundant Clustering algorithm (ENRC). It is the first algorithm that combines neural-network-based representation learning with non-redundant clustering. ENRC can find multiple highly non-redundant clusterings of different dimensionalities within a data set. This is achieved by (softly) assigning each dimension of the embedded space to the different clusterings. For instance, in image data sets it can group the objects by color, material and shape, without the need for explicit feature engineering. We show the viability of ENRC in extensive experiments and empirically demonstrate the advantage of combining non-linear representation learning with non-redundant clustering.

MCML Authors

Christian Böhm

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

* Former Principal Investigator

[34]

D. Davletshina, V. Melnychuk, V. Tran, H. Singla, M. Berrendorf, E. Faerman, M. Fromm and M. Schubert.
Unsupervised Anomaly Detection for X-Ray Images.
Preprint (Jan. 2020). arXiv GitHub

Abstract

Obtaining labels for medical (image) data requires scarce and expensive experts. Moreover, due to ambiguous symptoms, single images rarely suffice to correctly diagnose a medical condition. Instead, it often requires to take additional background information such as the patient’s medical history or test results into account. Hence, instead of focusing on uninterpretable black-box systems delivering an uncertain final diagnosis in an end-to-end-fashion, we investigate how unsupervised methods trained on images without anomalies can be used to assist doctors in evaluating X-ray images of hands. Our method increases the efficiency of making a diagnosis and reduces the risk of missing important regions. Therefore, we adopt state-of-the-art approaches for unsupervised learning to detect anomalies and show how the outputs of these methods can be explained. To reduce the effect of noise, which often can be mistaken for an anomaly, we introduce a powerful preprocessing pipeline. We provide an extensive evaluation of different approaches and demonstrate empirically that even without labels it is possible to achieve satisfying results on a real-world dataset of X-ray images of hands. We also evaluate the importance of preprocessing and one of our main findings is that without it, most of our approaches perform not better than random.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Viet Tran

C2 | Biology
→ Group Christian Müller

Biomedical Statistics and Data Science

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Michael Fromm

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[33]

M. Biloš, B. Charpentier and S. Günnemann.
Uncertainty on Asynchronous Time Event Prediction.
NeurIPS 2019 - 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 08-14, 2019. URL

Abstract

Asynchronous event sequences are the basis of many applications throughout different industries. In this work, we tackle the task of predicting the next event (given a history), and how this prediction changes with the passage of time. Since at some time points (e.g. predictions far into the future) we might not be able to predict anything with confidence, capturing uncertainty in the predictions is crucial. We present two new architectures, WGP-LN and FD-Dir, modelling the evolution of the distribution on the probability simplex with time-dependent logistic normal and Dirichlet distributions. In both cases, the combination of RNNs with either Gaussian process or function decomposition allows to express rich temporal evolution of the distribution parameters, and naturally captures uncertainty. Experiments on class prediction, time prediction and anomaly detection demonstrate the high performances of our models on various datasets compared to other approaches.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[32]

A. Bojchevski and S. Günnemann.
Certifiable Robustness to Graph Perturbations.
NeurIPS 2019 - 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 08-14, 2019. URL

Abstract

Despite the exploding interest in graph neural networks there has been little effort to verify and improve their robustness. This is even more alarming given recent findings showing that they are extremely vulnerable to adversarial attacks on both the graph structure and the node attributes. We propose the first method for verifying certifiable (non-)robustness to graph perturbations for a general class of models that includes graph neural networks and label/feature propagation. By exploiting connections to PageRank and Markov decision processes our certificates can be efficiently (and under many threat models exactly) computed. Furthermore, we investigate robust training procedures that increase the number of certifiably robust nodes while maintaining or improving the clean predictive accuracy.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[31]

J. Gasteiger, S. Weißenberger and S. Günnemann.
Diffusion Improves Graph Learning.
NeurIPS 2019 - 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 08-14, 2019. URL

Abstract

Graph convolution is the core of most Graph Neural Networks (GNNs) and usually approximated by message passing between direct (one-hop) neighbors. In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC). GDC leverages generalized graph diffusion, examples of which are the heat kernel and personalized PageRank. It alleviates the problem of noisy and often arbitrarily defined edges in real graphs. We show that GDC is closely related to spectral-based models and thus combines the strengths of both spatial (message passing) and spectral methods. We demonstrate that replacing message passing with graph diffusion convolution consistently leads to significant performance improvements across a wide range of models on both supervised and unsupervised tasks and a variety of datasets. Furthermore, GDC is not limited to GNNs but can trivially be combined with any graph-based model or algorithm (e.g. spectral clustering) without requiring any changes to the latter or affecting its computational complexity. Our implementation is available online.

MCML Authors

Stephan Günnemann

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Data Analytics & Machine Learning

[30]

E. Faerman, O. Voggenreiter, F. Borutta, T. Emrich, M. Berrendorf and M. Schubert.
Graph Alignment Networks with Node Matching Scores.
NeurIPS 2019 - Workshop on Graph Representation Learning at the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 08-14, 2019. PDF

Abstract

In this work we address the problem of graph node alignment at the example of Map Fusion (MF). Given two partly overlapping road networks, the goal is to match nodes that represent the same locations in both networks. For this task we propose a new model based on Graph Neural Networks (GNN). Existing GNN approaches, which have recently been successfully applied on various tasks for graph based data, show poor performance for the MF task. We hypothesize that this is mainly caused by graph regions from the non-overlapping areas, as information from those areas negatively affect the learned node representations. Therefore, our model has an additional inductive bias and learns to ignore effects of nodes that do not have a matching in the other graph. Our new model can easily be extended to other graph alignment problems, e.g., for calculating graph similarities, or for the alignment of entities in knowledge graphs, as well.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Max Berrendorf

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[29]

D. Mautz, C. Plant and C. Böhm.
Deep Embedded Cluster Tree.
ICDM 2019 - 19th IEEE International Conference on Data Mining. Beijing, China, Nov 08-11, 2019. DOI

Abstract

The idea of combining the high representational power of deep learning techniques with clustering methods has gained much interest in recent years. Optimizing representation and clustering simultaneously has been shown to have an advantage over optimizing them separately. However, so far all proposed methods have been using a flat clustering strategy, with the true number of clusters known a priori. In this paper, we propose the Deep Embedded Cluster Tree (DeepECT), the first divisive hierarchical embedded clustering method. The cluster tree does not need to know the true number of clusters during optimization. Instead, the level of detail to be analyzed can be chosen afterward and for each sub-tree separately. An optional data-augmentation-based extension allows DeepECT to ignore prior-known invariances of the dataset, such as affine transformations in image data. We evaluate and show the advantages of DeepECT in extensive experiments.

MCML Authors

Christian Böhm

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Principal Investigator

[28]

E. Faerman, M. Rogalla, N. Strauß, A. Krüger, B. Blümel, M. Berrendorf, M. Fromm and M. Schubert.
Spatial Interpolation with Message Passing Framework.
ICDMW 2019 - IEEE International Conference on Data Mining Workshops. Beijing, China, Nov 08-11, 2019. DOI

Abstract

Spatial interpolation is the task to predict a measurement for any location in a given geographical region. To train a prediction model, we assume to have point-wise measurements for various locations in the region. In addition, it is often beneficial to consider historic measurements for these locations when training an interpolation model. Typical use cases are the interpolation of weather, pollution or traffic information. In this paper, we introduce a new type of model with strong relational inductive bias based on Message Passing Networks. In addition, we extend our new model to take geomorphological characteristics into account to improve the prediciton quality. We provide an extensive evaluation based on a large real-world weather dataset and compare our new approach with classical statistical interpolation techniques and Neural Networks without inductive bias.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Max Berrendorf

Dr.

* Former Member

Michael Fromm

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[27]

M. Fromm, M. Berrendorf, E. Faerman, Y. Chen, B. Schüss and M. Schubert.
XD-STOD: Cross-Domain Superresolution for Tiny Object Detection.
ICDMW 2019 - IEEE International Conference on Data Mining Workshops. Beijing, China, Nov 08-11, 2019. DOI

Abstract

Monitoring the restoration of natural habitats after human intervention is an important task in the field of remote sensing. Currently, this requires extensive field studies entailing considerable costs. Unmanned Aerial vehicles (UAVs, a.k.a. drones) have the potential to reduce these costs, but generate immense amounts of data which have to be evaluated automatically with special techniques. Especially the automated detection of tree seedlings poses a big challenge, as their size and shape vary greatly across images. In addition, there is a tradeoff between different flying altitudes. Given the same camera equipment, a lower flying altitude achieves higher resolution images and thus, achieving high detection rates is easier. However, the imagery will only cover a limited area. On the other hand, flying at larger altitudes, allows for covering larger areas, but makes seedling detection more challenging due to the coarser images. In this paper we investigate the usability of super resolution (SR) networks for the case that we can collect a large amount of coarse imagery on higher flying altitudes, but only a small amount of high resolution images from lower flying altitudes. We use a collection of high-resolution images taken by a drone at 5m altitude. After training the SR models on these data, we evaluate their applicability to low quality images taken at 30m altitude (in-domain). In addition, we investigate and compare whether approaches trained on a highly diverse large data sets can be transferred to these data (cross-domain). We also evaluate the usability of the SR results based on their influence on the detection rate of different object detectors. We found that the features acquired from training on standard SR data sets are transferable to the drone footage. Furthermore, we demonstrate that the detection rate of common object detectors can be improved by SR techniques using both settings, in-domain and cross-domain.

MCML Authors

Michael Fromm

Dr.

* Former Member

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[26]

F. Lüer, D. Mautz and C. Böhm.
Anomaly Detection in Time Series using Generative Adversarial Networks.
ICDMW 2019 - IEEE International Conference on Data Mining Workshops. Beijing, China, Nov 08-11, 2019. DOI

Abstract

Generative Adversarial Networks (GANs) have been applied to an increasing amount of tasks, especially related to image data. A comparably recent advance was their application to the domain of anomaly detection in images and, even more recently, on spatiotemporal data. In this work, a recurrent GAN (RGAN) is applied on cardiovascular data from the MIT-BIH dataset to learn the natural variety of normal sinus rhythms in a healthy individual. The generator is used to reconstruct samples using differently parameterized levels of similarity and thresholds. We find that solely using the generator already allows a surprisingly good anomaly detection performance. Furthermore, we discuss adding the discriminator, which might significantly improve the performance. Future work also includes only using the discriminator, minimizing the time required for inference, which is important for streaming data.

MCML Authors

Christian Böhm

Prof. Dr.

* Former Principal Investigator

[25]

F. Borutta, S. Schmoll and S. Friedl.
Optimizing the Spatio-Temporal Resource Search Problem with Reinforcement Learning.
ACM SIGSPATIAL 2019 - 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. Chicago, ILL, USA, Nov 05-08, 2019. DOI

Abstract

Collecting spatio-temporal resources is an important goal in many real-world use cases such as finding customers for taxicabs. In this paper, we tackle the resource search problem posed by the GIS Cup 2019 where the objective is to minimize the average search time of taxicabs looking for customers. The main challenge is that the taxicabs may not communicate with each other and the only observation they have is the current time and position. Inspired by radial transit route structures in urban environments, our approach relies on round trips that are used as action space for a downstream reinforcement learning procedure. Our source code is publicly available at https://github.com/Fe18/TripBanditAgent.

MCML Authors

Felix Borutta

Dr.

* Former Member

Sabrina Friedl

Dr.

* Former Member

[24]

F. Borutta, J. Busch, E. Faerman, A. Klink and M. Schubert.
Structural Graph Representations based on Multiscale Local Network Topologies.
WI 2019 - IEEE/WIC/ACM International Conference on Web Intelligence. Thessaloniki, Greece, Oct 14-17, 2019. DOI

Abstract

In many applications, it is required to analyze a graph merely based on its topology. In these cases, nodes can only be distinguished based on their structural neighborhoods and it is common that nodes having the same functionality or role yield similar neighborhood structures. In this work, we investigate two problems: (1) how to create structural node embeddings which describe a node’s role and (2) how important the nodes’ roles are for characterizing entire graphs. To describe the role of a node, we explore the structure within the local neighborhood (or multiple local neighborhoods of various extents) of the node in the vertex domain, compute the visiting probability distribution of nodes in the local neighborhoods and summarize each distribution to a single number by computing its entropy. Furthermore, we argue that the roles of nodes are important to characterize the entire graph. Therefore, we propose to aggregate the role representations to describe whole graphs for graph classification tasks. Our experiments show that our new role descriptors outperform state-of-the-art structural node representations that are usually more expensive to compute. Additionally, we achieve promising results compared to advanced state-of-the-art approaches for graph classification on various benchmark datasets, often outperforming these approaches.

MCML Authors

Felix Borutta

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[23]

M. Fromm, E. Faerman and T. Seidl.
TACAM: Topic And Context Aware Argument Mining.
WI 2019 - IEEE/WIC/ACM International Conference on Web Intelligence. Thessaloniki, Greece, Oct 14-17, 2019. DOI

Abstract

In this work we address the problem of argument search. The purpose of argument search is the distillation of pro and contra arguments for requested topics from large text corpora. In previous works, the usual approach is to use a standard search engine to extract text parts which are relevant to the given topic and subsequently use an argument recognition algorithm to select arguments from them. The main challenge in the argument recognition task, which is also known as argument mining, is that often sentences containing arguments are structurally similar to purely informative sentences without any stance about the topic. In fact, they only differ semantically. Most approaches use topic or search term information only for the first search step and therefore assume that arguments can be classified independently of a topic. We argue that topic information is crucial for argument mining, since the topic defines the semantic context of an argument. Precisely, we propose different models for the classification of arguments, which take information about a topic of an argument into account. Moreover, to enrich the context of a topic and to let models understand the context of the potential argument better, we integrate information from different external sources such as Knowledge Graphs or pre-trained NLP models. Our evaluation shows that considering topic information, especially in connection with external information, provides a significant performance boost for the argument mining task.

MCML Authors

Michael Fromm

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[22]

A. Beer, J. Lauterbach and T. Seidl.
MORe++: k-Means Based Outlier Removal on High-Dimensional Data.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

MORe++ is a k-Means based Outlier Removal method working on high dimensional data. It is simple, efficient and scalable. The core idea is to find local outliers by examining the points of different k-Means clusters separately. Like that, one-dimensional projections of the data become meaningful and allow to find one-dimensional outliers easily, which else would be hidden by points of other clusters. MORe++ does not need any additional input parameters than the number of clusters k used for k-Means, and delivers an intuitively accessible degree of outlierness. In extensive experiments it performed well compared to k-Means– and ORC.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[21]

M. Berrendorf, F. Borutta and P. Kröger.
k-Distance Approximation for Memory-Efficient RkNN Retrieval.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

For a given query object, Reverse k-Nearest Neighbor queries retrieve those objects that have the query object among their k-nearest neighbors. However, computing the k-nearest neighbor sets for all points in a database is expensive in terms of computational costs. Therefore, specific index structures have been invented to apply pruning heuristics which aim at reducing the search space. At time, the state-of-the-art index structure for enabling fast RkNN query processing in general metric spaces is the MRkNNCoP-Tree which uses linear functions to approximate lower and upper bounds on the k-distances to prune the search space. Storing those linear functions results in additional storage costs in O(n) which might be infeasible in situation where storage space is limited, e.g., on mobile devices. In this work, we present a novel index based on the MRkNNCoP-Tree as well as recent developments in the field of neural indexing. By learning a single neural network model that approximates the k-nearest neighbor distance bounds for all points in a database, the storage complexity of the proposed index structure is reduced to O(1) while the index is still able to guarantee exact query results. As shown in our experimental evaluations on synthetic and real-world data sets, our approach can significantly reduce the required storage space in trade-off to some growth in terms of refinement sets when relying on exact query processing.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[20]

F. Borutta, P. Kröger and T. Hubauer.
A Generic Summary Structure for Arbitrarily Oriented Subspace Clustering in Data Streams.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

Nowadays, as lots of data is gathered in large volumes and with high velocity, the development of algorithms capable of handling complex data streams in (near) real-time is a major challenge. In this work, we present the algorithm CORRSTREAM which tackles the problem of detecting arbitrarily oriented subspace clusters in high-dimensional data streams. The proposed method follows a two phase approach, where the continuous online phase aggregates data points within a proper microcluster structure that stores all necessary information to define a microcluster’s subspace and is generic enough to cope with a variety of offline procedures. Given several such microclusters, the offline phase is able to build a final clustering model which reveals arbitrarily oriented subspaces in which the data tend to cluster. In our experimental evaluation, we show that CORRSTREAM not only has an acceptable throughput but also outperforms static counterpart algorithms by orders of magnitude when considering the runtime. At the same time, the loss of accuracy is quite small.

MCML Authors

Felix Borutta

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[19]

M. A. X. Hünemörder, D. Kazempour, P. Kröger and T. Seidl.
SIDEKICK: Linear Correlation Clustering with Supervised Background Knowledge.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

While explainable AI (XAI) is gaining in popularity, other more traditional machine learning algorithms can also benefit from increased explainability. A semi-supervised approach to correlation clustering opens up a promising design space that might provide such explainability to correlation clustering algorithms. In this work, semi-supervised linear correlation clustering is defined as the task of finding arbitrary oriented subspace clusters using only a small sample of supervised background knowledge provided by a domain experts. This work describes a first foray into this novel approach and provides an implementation of a basic algorithm to perform this task. We have found that even a small amount of supervised background knowledge can significantly improve the quality of correlation clustering in general. With confidence it can be stated, the results of this work have the potential to inspire several more semi-supervised approaches to correlation clustering in the future.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[18]

D. Kazempour, M. Hünemörder and T. Seidl.
On coMADs and Principal Component Analysis.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

Principal Component Analysis (PCA) is a popular method for linear dimensionality reduction. It is often used to discover hidden correlations or to facilitate the interpretation and visualization of data. However, it is liable to suffer from outliers. Strong outliers can skew the principal components and as a consequence lead to a higher reconstruction loss. While there exist several sophisticated approaches to make the PCA more robust, we present an approach which is intriguingly simple: we replace the covariance matrix by a so-called coMAD matrix. The first experiments show that PCA based on the coMAD matrix is more robust towards outliers.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[17]

A. Beer, N. S. Schüler and T. Seidl.
A Generator for Subspace Clusters.
LWDA 2019 - Conference on Lernen. Wissen. Daten. Analysen. Berlin, Germany, Sep 30-Oct 02, 2019. PDF

Abstract

We introduce a generator for data containing subspace clusters which is accurately tunable and adjustable to the needs of developers. It is online available and allows to give a plethora of characteristics the data should contain, while it is simultaneously able to generate meaningful data containing subspace clusters with a minimum of input data.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[16]

D. Kazempour, A. Beer, O. Schrüfer and T. Seidl.
Clustering Trend Data Time-Series through Segmentation of FFT-decomposed Signal Constituents.
LWDA 2019 - Conference on Lernen. Wissen. Daten. Analysen. Berlin, Germany, Sep 30-Oct 02, 2019. PDF

Abstract

When we are given trend data for different keywords, scientists may want to cluster them in order to detect specific terms which exhibit a similar trending. For this purpose the periodic regression on each of the time-series can be performed. We ask in this work: What if we not simply cluster the regression models of each time-series, but the periodic signal constituents? The impact of such an approach is twofold: first we would see at a regression level how similar or dissimilar two time-series are regarding their periodic models, and secondly we would be able to see similarities based on single signal constituents between different time-series, containing the semantic that although time-series may be different on a regression level, they may be similar on an constituent level, reflecting other periodic influences. The results of this approach reveal commonalities between time series on a constituent level that are not visible in first place, by looking at their plain regression models.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[15]

D. Kazempour, L. M. Yan and T. Seidl.
From Covariance to Comode in context of Principal Component Analysis.
LWDA 2019 - Conference on Lernen. Wissen. Daten. Analysen. Berlin, Germany, Sep 30-Oct 02, 2019. PDF

Abstract

When it comes to the task of dimensionality reduction, the Principal Component Analysis (PCA) is among the most well known methods. Despite its popularity, PCA is prone to outliers which can be traced back to the fact that this method relies on a covariance matrix. Even with the variety of sophisticated methods to enhance the robustness of the PCA, we provide here in this work-in-progress an approach which is intriguingly simple: the covariance matrix is replaced by a so-called comode matrix. Through this minor modification the experiments show that the reconstruction loss is significantly reduced. In this work we introduce the comode and its relation to the MeanShift algorithm, including its bandwidth parameter, compare it in an experiment against the classic covariance matrix and evaluate the impact of the bandwidth hyperparameter on the reconstruction error.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[14]

J. Held, A. Beer and T. Seidl.
Chain-detection Between Clusters.
Datenbank-Spektrum 19 (Sep. 2019). DOI

Abstract

Chains connecting two or more different clusters are a well known problem of clustering algorithms like DBSCAN or Single Linkage Clustering. Since already a small number of points resulting from, e.g., noise can form such a chain and build a bridge between different clusters, it can happen that the results of the clustering algorithm are distorted: several disparate clusters get merged into one. This single-link effect is rather known but to the best of our knowledge there are no satisfying solutions which extract those chains, yet. We present a new algorithm detecting not only straight chains between clusters, but also bent and noisy ones. Users are able to choose between eliminating one dimensional and higher dimensional chains connecting clusters to receive the underlying cluster structure. Also, the desired straightness can be set by the user. As this paper is an extension of ‘Chain-detection for DBSCAN’, we apply our technique not only in combination with DBSCAN but also with single link hierarchical clustering. On a real world dataset containing traffic accidents in Great Britain we were able to detect chains emerging from streets between cities and villages, which led to clusters composed of diverse villages. Additionally, we analyzed the robustness regarding the variance of chains in synthetic experiments.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[13]

S. Schmoll, S. Friedl and M. Schubert.
Scaling the Dynamic Resource Routing Problem.
SSTD 2019 - 16th International Symposium on Spatial and Temporal Databases. Vienna, Austria, Aug 19-21, 2019. DOI

Abstract

Routing to a resource (e.g. a parking spot or charging station) is a probabilistic search problem due to the uncertainty as to whether the resource is available at the time of arrival or not. In recent years, more and more real-time information about the current state of resources has become available in order to facilate this task. Therefore, we consider the case of a driver receiving online updates about the current situation. In this setting, the problem can be described as a fully observable Markov Decision Process (MDP) which can be used to compute an optimal policy minimizing the expected search time. However, current approaches do not scale beyond a dozen resources in a query. In this paper, we suggest to adapt common approximate solutions for solving MDPs. We propose a new re-planning and hindsight planning algorithm that redefine the state space and rely on novel cost estimations to find close to optimal results. Unlike exact solutions for computing MDPs, our approximate planers can scale up to hundreds of resources without prohibitive computational costs. We demonstrate the result quality and the scalability of our approaches on two settings describing the search for parking spots and charging stations in an urban environment.

MCML Authors

Sabrina Friedl

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[12]

A. Beer, D. Kazempour, M. Baur and T. Seidl.
Human Learning in Data Science (Poster Extended Abstract).
HCII 2019 - 21st International Conference of Human-Computer Interaction. Orlando, Florida, USA, Jul 26-31, 2019. DOI

Abstract

As machine learning becomes a more and more important area in Data Science, bringing with it a rise of abstractness and complexity, the desire for explainability rises, too. With our work we aim to gain explainability focussing on correlation clustering and try to pursue the original goals of different Data Science tasks,: Extracting knowledge from data. As well-known tools like Fold-It or GeoTime show, gamification is a very mighty approach, but not only to solve tasks which prove more difficult for machines than for humans. We could also gain knowledge from how players proceed trying to solve those difficult tasks. That is why we developed Straighten it up!, a game in which users try to find the best linear correlations in high dimensional datasets. Finding arbitrarily oriented subspaces in high dimensional data is an exponentially complex task due to the number of potential subspaces in regards to the number of dimensions. Nevertheless, linearly correlated points are as a simple pattern easy to track by the human eye. Straighten it up! gives users an overview over two-dimensional projections of a self-chosen dataset. Users decide which subspace they want to examine first, and can draw in arbitrarily many lines fitting the data. An offset inside of which points are assigned to the corresponding line can easily be chosen for every line independently, and users can switch between different projections at any time. We developed a scoring system not only as incentive, but first of all for further examination, based on the density of each cluster, its minimum spanning tree, size of offset, and coverage. By tracking every step of a user we are able to detect common mechanisms and examine differences to state-of-the-art correlation and subspace clustering algorithms, resulting in more comprehensibility.

MCML Authors

Anna Beer

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[11]

D. Kazempour, A. Beer and T. Seidl.
Data on RAILs: On interactive generation of artificial linear correlated data (Poster Extended Abstract).
HCII 2019 - 21st International Conference of Human-Computer Interaction. Orlando, Florida, USA, Jul 26-31, 2019. DOI

Abstract

Artificially generated data sets are present in many data mining and machine learning publications in the experimental section. One of the reasons to use synthetic data is, that scientists can express their understanding of a “ground truth”, having labels and thus an expectation of what an algorithm should be able to detect. This permits also a degree of control to create data sets which either emphasize the strengths of a method or reveal its weaknesses and thus potential targets for improvement. In order to develop methods which detect linear correlated clusters, the necessity of generating such artificial clusters is indispensable. This is mostly done by command-line based scripts which may be tedious since they demand from users to ‘visualize’ in their minds how the correlated clusters have to look like and be positioned within the data space. We present in this work RAIL, a generator for Reproducible Artificial Interactive Linear correlated data. With RAIL, users can add multiple planes into a data space and arbitrarily change orientation and position of those planes in an interactive fashion. This is achieved by manipulating the parameters describing each of the planes, giving users immediate feedback in real-time. With this approach scientists no longer need to imagine their data but can interactively explore and design their own artificial data sets containing linear correlated clusters. Another convenient feature in this context is that the data is only generated when the users decide that their design phase is completed. If researchers want to share data, a small file is exchanged containing the parameters which describe the clusters through information such as e.g. their Hessian-Normal-Form or number of points per cluster, instead of sharing several large csv files.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[10]

A. Beer, D. Kazempour, L. Stephan and T. Seidl.
LUCK - Linear Correlation Clustering Using Cluster Algorithms and a kNN based Distance Function (short paper).
SSDBM 2019 - 31st International Conference on Scientific and Statistical Database Management. Santa Cruz, CA, USA, Jul 23-25, 2019. DOI

Abstract

LUCK allows to use any distance-based clustering algorithm to find linear correlated data. For that a novel distance function is introduced, which takes the distribution of the kNN of points into account and corresponds to the probability of two points being part of the same linear correlation. In this work in progress we tested the distance measure with DBSCAN and k-Means comparing it to the well-known linear correlation clustering algorithms ORCLUS, 4C, COPAC, LMCLUS, and CASH, receiving good results for difficult synthetic data sets containing crossing or non-continuous correlations.

MCML Authors

Anna Beer

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[9]

A. Beer and T. Seidl.
Graph Ordering and Clustering - A Circular Approach.
SSDBM 2019 - 31st International Conference on Scientific and Statistical Database Management. Santa Cruz, CA, USA, Jul 23-25, 2019. DOI

Abstract

As the ordering of data, particularly of graphs, can influence the result of diverse Data Mining tasks performed on it heavily, we introduce the Circle-Index, the first internal quality measurement for orderings of graphs. It is based on a circular arrangement of nodes, but takes in contrast to similar arrangements from the field of, e.g., visual analytics, the edge lengths in this arrangement into account. The minimization of the Circle-Index leads to an arrangement which not only offers a simple way to cluster the data using a constrained texttt{MinCut} in only linear time, but is also visually convincing. We developed the clustering algorithm CirClu which implements this minimization and texttt{MinCut}, and compared it with several established clustering algorithms achieving very good results. Simultaneously we compared the Circle-Index with several internal quality measures for clusterings. We observed a strong coherence between the Circle-Index and the matching of achieved clusterings to the respective ground truths in diverse real world datasets.

MCML Authors

Anna Beer

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[8]

D. Kazempour, K. Emmerig, P. Kröger and T. Seidl.
Detecting Global Periodic Correlated Clusters in Event Series based on Parameter Space Transform.
SSDBM 2019 - 31st International Conference on Scientific and Statistical Database Management. Santa Cruz, CA, USA, Jul 23-25, 2019. DOI

Abstract

Periodicities are omnipresent: In nature in the cycles of predator and prey populations, reoccurring patterns regarding our power consumption over the days, or the presence of flu diseases over the year. With regards to the importance of periodicities we ask: Is there a way to detect periodic correlated clusters which are hidden in event series? We propose as a work in progress a method for detecting sinusoidal periodic correlated clusters on event series which relies on parameter space transformation. Our contributions are: Providing the first non-linear correlation clustering algorithm for detecting periodic correlated clusters. Further our method provides an explicit model giving domain experts information on parameters such as amplitude, frequency, phase-shift and vertical-shift of the detected clusters. Beyond that we approach the issue of determining an adequate frequency and phase-shift of the detected correlations given a frequency and phase-shift boundary.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[7]

D. Kazempour and T. Seidl.
On systematic hyperparameter analysis through the example of subspace clustering.
SSDBM 2019 - 31st International Conference on Scientific and Statistical Database Management. Santa Cruz, CA, USA, Jul 23-25, 2019. DOI

Abstract

In publications where a clustering method is described, the chosen hyperparameters are in many cases to our current observation empirically determined. In this work in progress we discuss and propose one approach on how hyperparameters can be systematically explored and their effects regarding the data set analyzed. We further introduce in the context of hyperparameter analysis a modified definition of the resilience term, which refers here to a subset of data points which persists to be in the same cluster over different hyperparameter settings. In order to analyze relations among different hyperparameters we further introduce the concept of dynamic intersection computing.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[6]

A. Bojchevski and S. Günnemann.
Adversarial Attacks on Node Embeddings via Graph Poisoning.
ICML 2019 - 36th International Conference on Machine Learning. Long Beach, CA, USA, Jun 09-15, 2019. URL

Abstract

The goal of network representation learning is to learn low-dimensional node embeddings that capture the graph structure and are useful for solving downstream tasks. However, despite the proliferation of such methods, there is currently no study of their robustness to adversarial attacks. We provide the first adversarial vulnerability analysis on the widely used family of methods based on random walks. We derive efficient adversarial perturbations that poison the network structure and have a negative effect on both the quality of the embeddings and the downstream tasks. We further show that our attacks are transferable since they generalize to many models and are successful even when the attacker is restricted.

MCML Authors

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[5]

A. Beer, D. Kazempour and T. Seidl.
Rock - Let the points roam to their clusters themselves.
EDBT 2019 - 22nd International Conference on Extending Database Technology. Lisbon, Portugal, Mar 26-29, 2019. PDF

Abstract

In this work we present Rock, a method where the points roam to their clusters using k-NN. Rock is a draft for an algorithm which is capable of detecting non-convex clusters of arbitrary dimension while delivering representatives for each cluster similar to, e.g., Mean Shift or k-Means. Applying Rock, points roam to the mean of their k-NN while k increments in every step. Like that, rather outlying points and noise move to their nearest cluster while the clusters themselves contract first to their skeletons and further to a representative point each. Our empirical results on synthetic and real data demonstrate that Rock is able to detect clusters on datasets where either mode seeking or density-based approaches do not succeed.

MCML Authors

Anna Beer

Dr.

* Former Member

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[4]

D. Kazempour, L. Krombholz, P. Kröger and T. Seidl.
A Galaxy of Correlations - Detecting Linear Correlated Clusters through k-Tuples Sampling using Parameter Space Transform.
EDBT 2019 - 22nd International Conference on Extending Database Technology. Lisbon, Portugal, Mar 26-29, 2019. PDF

Abstract

In different research domains conducted experiments aim for the detection of (hyper)linear correlations among multiple features within a given data set. For this purpose methods exist where one among them is highly robust against noise and detects linear correlated clusters regardless of any locality assumption. This method is based on parameter space transformation. The currently available parameter transform based algorithms detect the clusters scanning explicitly for intersections of functions in parameter space. This approach comes with drawbacks. It is difficult to analyze aspects going beyond the sole intersection of functions, such as e.g. the area around the intersections and further it is computationally expensive. The work in progress method we provide here overcomes the mentioned drawbacks by sampling d-dimensional tuples in data space, generating a (hyper)plane and representing this plane as a single point in parameter space. By this approach we no longer scan for intersection points of functions in parameter space but for dense regions of such parameter vectors. By this approach in future work well established clustering algorithms can be applied in parameter space to detect e.g. dense regions, modes or hierarchies of linear correlations in parameter space.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[3]

D. Kazempour and T. Seidl.
Insights into a running clockwork: On interactive process-aware clustering.
EDBT 2019 - 22nd International Conference on Extending Database Technology. Lisbon, Portugal, Mar 26-29, 2019. PDF

Abstract

In recent years the demand for having algorithms which provide not only their results, but also add explainability up to a certain extent increased. In this paper we envision a class of clustering algorithms where the users can interact not only with the input or output but also intercept within the very clustering process itself, which we coin with the term process-aware clustering. Further we aspire to sketch the challenges emerging with such type of algorithms, such as the need of adequate measures which evaluate the progression through the computation process of a clustering method. Beyond the explainability on how the results are generated, we propose methods tailored at systematically analyzing the hyperparameter space of an algorithm, determining in a more ordered fashion suitable hyperparameters rather then applying a trial-and-error schema.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[2]

D. Kazempour, M. Kazakov, P. Kröger and T. Seidl.
DICE: Density-based Interactive Clustering and Exploration.
BTW 2019 - 18th Symposium of Database Systems for Business, Technology and Web. Rostock, Germany, Mar 04-08, 2019. DOI

Abstract

Clustering algorithms are mostly following the pipeline to provide input data, and hyperparameter values. Then the algorithms are executed and the output files are generated or visualized. We provide in our work an early prototype of an interactive density-based clustering tool named DICE in which the users can change the hyperparameter settings and immediately observe the resulting clusters. Further the users can browse through each of the single detected clusters and get statistics regarding as well as a convex hull profile for each cluster. Further DICE keeps track of the chosen settings, enabling the user to review which hyperparameter values have been previously chosen. DICE can not only be used in scientific context of analyzing data, but also in didactic settings in which students can learn in an exploratory fashion how a density-based clustering algorithm like e.g. DBSCAN behaves.

MCML Authors

Daniyal Kazempour

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[1]

D. Mautz, W. Ye, C. Plant and C. Böhm.
Discovering Non-Redundant K-means Clusterings in Optimal Subspaces.
KDD 2018 - 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London, UK, Aug 19-23, 2018. DOI

Abstract

A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the data set. The new research field of non-redundant clustering addresses this class of problems. In this paper, we follow the approach that different, non-redundant k-means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space. We assume that these subspaces (and optionally a further noise space without any cluster structure) are orthogonal to each other. This assumption enables a particularly rigorous mathematical treatment of the non-redundant clustering problem and thus a particularly efficient algorithm, which we call Nr-Kmeans (for non-redundant k-means). The superiority of our algorithm is demonstrated both theoretically, as well as in extensive experiments.

MCML Authors

Christian Böhm

Prof. Dr.