Research Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

heads the Chair of Artificial Intelligence and Machine Learning at LMU Munich.

His research interests are centered around methods and theoretical foundations of artificial intelligence, with a specific focus on machine learning and reasoning under uncertainty. He has published more than 300 articles on these topics in top-tier journals and major international conferences, and several of his contributions have been recognized with scientific awards.

Team members @MCML

PhD Students

Clemens Damke

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Jonas Hanselle

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Paul Hofman

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Alireza Javanmardi

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Timo Kaufmann

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Valentin Margraf

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Maximilian Muschalik

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Tobias Oberkofler

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Yusuf Sale

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Mohammad Hossein Shaker

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Joshua Stiller

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence and Machine Learning

Publications @MCML

2025

[103]

Y. Sale, A. Javanmardi and E. Hüllermeier.
Aleatoric and Epistemic Uncertainty in Conformal Prediction.
COPA 2025 - 14th Symposium on Conformal and Probabilistic Prediction with Applications. Egham, UK, Sep 10-12, 2025. To be published.

Abstract

Recently, there has been a particular interest in distinguishing different types of uncertainty in supervised machine learning (ML) settings (H¨ullermeier and Waegeman, 2021). Aleatoric uncertainty captures the inherent randomness in the data-generating process. As it represents variability that cannot be reduced even with more data, it is often referred to as irreducible uncertainty. In contrast, epistemic uncertainty arises from a lack of knowledge about the underlying data-generating process, which—in principle—can be reduced by acquiring additional data or improving the model itself (viz. reducible uncertainty). In parallel, interest in conformal prediction (CP)—both its theory and applications—has become equally vigorous. Conformal Prediction (Vovk et al., 2005) is a model-agnostic framework for uncertainty quantification that provides prediction sets or intervals with rigorous statistical coverage guarantees. Notably, CP is distribution-free and makes only the mild assumption of exchangeability. Under this assumption, it yields prediction intervals that contain the true label with a user-specified probability. Thus, conformal prediction is seen as a promising tool to quantify uncertainty. But how is it related to aleatoric and epistemic uncertainty? In particular, we first analyze how (estimates of) aleatoric and epistemic uncertainty enter into the construction of vanilla CP—that is, how noise and model error jointly shape the global threshold. We then review ‘uncertainty-aware’ extensions that integrate these uncertainty estimates into the CP pipeline.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[102]

J. Hanselle, A. Javanmardi, T. Oberkofler, Y. Sale and E. Hüllermeier.
Conformal Prediction without Nonconformity Scores.
UAI 2025 - 41st Conference on Uncertainty in Artificial Intelligence. Rio de Janeiro, Brazil, Jul 21-25, 2025. To be published.

Abstract

Conformal prediction (CP) is an uncertainty quantification framework that allows for constructing
statistically valid prediction sets. Key to the construction of these sets is the notion of nonconformity function, which assigns a real-valued score to individual data points: only those (hypothetical) data points contribute to a prediction set that sufficiently conform to the data. The point of departure of this work is the observation that CP predictions are invariant against (strictly) monotone transformations of a nonconformity function. In other words, it is only the ordering of the scores that matters, not their quantitative values. Consequently, instead of scoring individual data points, a conformal predictor only needs to be able to compare pairs of data points, deciding which of them is the more conforming one. This suggests an interesting connection between CP and preference learning, in particular learning-to-rank methods, and makes CP amenable to training data in the form of (qualitative) preferences. Elaborating on
this connection, we propose methods for learning (latent) nonconformity functions from data of that
kind and show their usefulness in real-world classification tasks.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Tobias Oberkofler

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[101]

X. Feng, Z. Jiang, T. Kaufmann, E. Hüllermeier, P. Weng and Y. Zhu.
Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. URL

Abstract

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. However, traditional methods based on pairwise trajectory comparisons face notable challenges, including the difficulty in comparing trajectories with subtle differences and the limitation of conveying only ordinal information, limiting direct inference of preference strength. In this paper, we introduce a novel distinguishability query, allowing humans to express preference strength by comparing two pairs of trajectories. Labelers first indicate which pair is easier to compare, then provide preference feedback only on the easier pair. Our proposed query type directly captures preference strength and is expected to reduce the cognitive load on the labeler. We further connect this query to cardinal utility and difference relations and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results demonstrate the potential of our method for faster, data-efficient learning and improved user-friendliness in RLHF benchmarks, particularly in classical control settings where preference strength is critical for expected utility maximization.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[100]

R. Sharma, S. Mukherjee, A. Šipka, E. Hüllermeier, S. Vollmer, S. Redyuk and D. A. Selby.
X-Hacking: The Threat of Misguided AutoML.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. URL

Abstract

Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values. We show how easily an automated machine learning pipeline can be adapted to exploit model multiplicity at scale: searching a set of ‘defensible’ models with similar predictive performance to find a desired explanation. We formulate the trade-off between explanation and accuracy as a multi-objective optimisation problem, and illustrate empirically on familiar real-world datasets that, on average, Bayesian optimisation accelerates X-hacking 3-fold for features susceptible to it, versus random sampling. We show the vulnerability of a dataset to X-hacking can be determined by information redundancy among features. Finally, we suggest possible methods for detection and prevention, and discuss ethical implications for the credibility and reproducibility of XAI.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[99]

P. Kolpaczki, T. Nielen and E. Hüllermeier.
Antithetic Sampling for Top-k Shapley Identification.
xAI 2025 - 3rd World Conference on Explainable Artificial Intelligence. Istanbul, Turkey, Jul 09-11, 2025. Preprint. arXiv

Abstract

Additive feature explanations rely primarily on game-theoretic notions such as the Shapley value by viewing features as cooperating players. The Shapley value’s popularity in and outside of explainable AI stems from its axiomatic uniqueness. However, its computational complexity severely limits practicability. Most works investigate the uniform approximation of all features’ Shapley values, needlessly consuming samples for insignificant features. In contrast, identifying the k most important features can already be sufficiently insightful and yields the potential to leverage algorithmic opportunities connected to the field of multi-armed bandits. We propose Comparable Marginal Contributions Sampling (CMCS), a method for the top-k identification problem utilizing a new sampling scheme taking advantage of correlated observations. We conduct experiments to showcase the efficacy of our method in compared to competitive baselines. Our empirical findings reveal that estimation quality for the approximate-all problem does not necessarily transfer to top-k identification and vice versa.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[98]

Y. Sale and A. Ramdas.
Online Selective Conformal Prediction: Errors and Solutions.
Transactions on Machine Learning Research (Jul. 2025). Preprint. URL

Abstract

In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for this by suitably selecting the calibration data. In this paper, we evaluate existing calibration selection strategies and pinpoint some fundamental errors in the associated claims that guarantee selection-conditional coverage and control of the false coverage rate (FCR). To address these shortcomings, we propose novel calibration selection strategies that provably preserve the exchangeability of the calibration data and the selected test datum. Consequently, we demonstrate that online selective conformal inference with these strategies guarantees both selection-conditional coverage and FCR control. Our theoretical findings are supported by experimental evidence examining tradeoffs between valid methods.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

[97]

S. Haas and E. Hüllermeier.
Aleatoric and Epistemic Uncertainty Measures for Ordinal Classification through Binary Reduction.
Preprint (Jul. 2025). arXiv

Abstract

Ordinal classification problems, where labels exhibit a natural order, are prevalent in high-stakes fields such as medicine and finance. Accurate uncertainty quantification, including the decomposition into aleatoric (inherent variability) and epistemic (lack of knowledge) components, is crucial for reliable decision-making. However, existing research has primarily focused on nominal classification and regression. In this paper, we introduce a novel class of measures of aleatoric and epistemic uncertainty in ordinal classification, which is based on a suitable reduction to (entropy- and variance-based) measures for the binary case. These measures effectively capture the trade-off in ordinal classification between exact hit-rate and minimial error distances. We demonstrate the effectiveness of our approach on various tabular ordinal benchmark datasets using ensembles of gradient-boosted trees and multi-layer perceptrons for approximate Bayesian inference. Our method significantly outperforms standard and label-wise entropy and variance-based measures in error detection, as indicated by misclassification rates and mean absolute error. Additionally, the ordinal measures show competitive performance in out-of-distribution (OOD) detection. Our findings highlight the importance of considering the ordinal nature of classification problems when assessing uncertainty.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[96]

V. Margraf, T. Koerner, A. Tornede and M. Wever.
RunAndSchedule2Survive: Algorithm Scheduling Based on Run2Survive.
ACM Transactions on Evolutionary Learning and Optimization Just accepted (Jun. 2025). DOI

Abstract

The algorithm selection problem aims to identify the most suitable algorithm for a given problem instance under specific time constraints, where suitability typically refers to a performance metric such as algorithm runtime. While previous work has employed machine learning techniques to tackle this challenge, methods from survival analysis have proven particularly effective. This paper presents RunAndSchedule2Survive to address the more general and complex problem of algorithm scheduling, where the objective is to allocate computational resources across multiple algorithms to maximize performance within specified time constraints. Our approach combines survival analysis with evolutionary algorithms to optimize algorithm schedules by leveraging runtime distributions modeled as survival functions. Experimental results across various standard benchmarks demonstrate that our approach significantly outperforms previous methods for algorithm scheduling and yields more robust results than its algorithm selection variant. More specifically, RunAndSchedule2Survive achieves superior performance in 20 out of 25 benchmark scenarios, surpassing hitherto state-of-the-art approaches.

MCML Authors

Valentin Margraf

Artificial Intelligence and Machine Learning

[95]

P. Gupta, M. Wever and E. Hüllermeier.
Information Leakage Detection through Approximate Bayes-optimal Prediction.
Information Sciences In Press, Journal Pre-proof.122419 (Jun. 2025). DOI

Abstract

In today’s data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor’s log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

MCML Authors

Marcel Wever

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[94]

P. Hofman, Y. Sale and E. Hüllermeier.
Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks.
Preprint (May. 2025). arXiv

Abstract

We address the problem of uncertainty quantification and propose measures of total, aleatoric, and epistemic uncertainty based on a known decomposition of (strictly) proper scoring rules, a specific type of loss function, into a divergence and an entropy component. This leads to a flexible framework for uncertainty quantification that can be instantiated with different losses (scoring rules), which makes it possible to tailor uncertainty quantification to the use case at hand. We show that this flexibility is indeed advantageous. In particular, we analyze the task of selective prediction and show that the scoring rule should ideally match the task loss. In addition, we perform experiments on two other common tasks. For out-of-distribution detection, our results confirm that a widely used measure of epistemic uncertainty, mutual information, performs best. Moreover, in the setting of active learning, our measure of epistemic uncertainty based on the zero-one-loss consistently outperforms other uncertainty measures.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[93]

A. Javanmardi, S. H. Zargarbashi, S. M. A. R. Thies, W. Waegeman, A. Bojchevski and E. Hüllermeier.
Optimal Conformal Prediction under Epistemic Uncertainty.
Preprint (May. 2025). arXiv

Abstract

Conformal prediction (CP) is a popular frequentist framework for representing uncertainty by providing prediction sets that guarantee coverage of the true label with a user-adjustable probability. In most applications, CP operates on confidence scores coming from a standard (first-order) probabilistic predictor (e.g., softmax outputs). Second-order predictors, such as credal set predictors or Bayesian models, are also widely used for uncertainty quantification and are known for their ability to represent both aleatoric and epistemic uncertainty. Despite their popularity, there is still an open question on ``how they can be incorporated into CP’’. In this paper, we discuss the desiderata for CP when valid second-order predictions are available. We then introduce Bernoulli prediction sets (BPS), which produce the smallest prediction sets that ensure conditional coverage in this setting. When given first-order predictions, BPS reduces to the well-known adaptive prediction sets (APS). Furthermore, when the validity assumption on the second-order predictions is compromised, we apply conformal risk control to obtain a marginal coverage guarantee while still accounting for epistemic uncertainty.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[92]

T. Löhr, P. Hofman, F. Mohr and E. Hüllermeier.
Credal Prediction based on Relative Likelihood.
Preprint (May. 2025). arXiv

Abstract

Predictions in the form of sets of probability distributions, so-called credal sets, provide a suitable means to represent a learner’s epistemic uncertainty. In this paper, we propose a theoretically grounded approach to credal prediction based on the statistical notion of relative likelihood: The target of prediction is the set of all (conditional) probability distributions produced by the collection of plausible models, namely those models whose relative likelihood exceeds a specified threshold. This threshold has an intuitive interpretation and allows for controlling the trade-off between correctness and precision of credal predictions. We tackle the problem of approximating credal sets defined in this way by means of suitably modified ensemble learning techniques. To validate our approach, we illustrate its effectiveness by experiments on benchmark datasets demonstrating superior uncertainty representation without compromising predictive performance. We also compare our method against several state-of-the-art baselines in credal prediction.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[91]

J. Wang, P. Gupta, I. Habernal and E. Hüllermeier.
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs.
Preprint (May. 2025). arXiv

Abstract

Recent studies demonstrate that Large Language Models (LLMs) are vulnerable to different prompt-based attacks, generating harmful content or sensitive information. Both closed-source and open-source LLMs are underinvestigated for these attacks. This paper studies effective prompt injection attacks against the 14 most popular open-source LLMs on five attack benchmarks. Current metrics only consider successful attacks, whereas our proposed Attack Success Probability (ASP) also captures uncertainty in the model’s response, reflecting ambiguity in attack feasibility. By comprehensively analyzing the effectiveness of prompt injection attacks, we propose a simple and effective hypnotism attack; results show that this attack causes aligned language models, including Stablelm2, Mistral, Openchat, and Vicuna, to generate objectionable behaviors, achieving around 90% ASP. They also indicate that our ignore prefix attacks can break all 14 open-source LLMs, achieving over 60% ASP on a multi-categorical dataset. We find that moderately well-known LLMs exhibit higher vulnerability to prompt injection attacks, highlighting the need to raise public awareness and prioritize efficient mitigation strategies.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[90]

M. Spliethöver, T. Knebler, F. Fumagalli, M. Muschalik, B. Hammer, E. Hüllermeier and H. Wachsmuth.
Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. DOI

Abstract

Recent advances on instruction fine-tuning have led to the development of various prompting techniques for large language models, such as explicit reasoning steps. However, the success of techniques depends on various parameters, such as the task, language model, and context provided. Finding an effective prompt is, therefore, often a trial-and-error process. Most existing approaches to automatic prompting aim to optimize individual techniques instead of compositions of techniques and their dependence on the input. To fill this gap, we propose an adaptive prompting approach that predicts the optimal prompt composition ad-hoc for a given input. We apply our approach to social bias detection, a highly context-dependent task that requires semantic understanding. We evaluate it with three large language models on three datasets, comparing compositions to individual techniques and other baselines. The results underline the importance of finding an effective prompt composition. Our approach robustly ensures high detection performance, and is best in several settings. Moreover, first experiments on other tasks support its generalizability.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[89]

A. Findeis, T. Kaufmann, E. Hüllermeier, S. Albanie and R. D. Mullins.
Inverse Constitutional AI: Compressing Preferences into Principles.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL GitHub

Abstract

Feedback data is widely used for fine-tuning and evaluating state-of-the-art AI models. Pairwise text preferences, where human or AI annotators select the “better” of two options, are particularly common. Such preferences are used to train (reward) models or to rank models with aggregate statistics. For many applications it is desirable to understand annotator preferences in addition to modelling them – not least because extensive prior work has shown various unintended biases in preference datasets. Yet, preference datasets remain challenging to interpret. Neither black-box reward models nor statistics can answer why one text is preferred over another. Manual interpretation of the numerous (long) response pairs is usually equally infeasible. In this paper, we introduce the Inverse Constitutional AI (ICAI) problem, formulating the interpretation of pairwise text preference data as a compression task. In constitutional AI, a set of principles (a constitution) is used to provide feedback and fine-tune AI models. ICAI inverts this process: given a feedback dataset, we aim to extract a constitution that best enables a large language model (LLM) to reconstruct the original annotations. We propose a corresponding ICAI algorithm and validate its generated constitutions quantitatively based on annotation reconstruction accuracy on several datasets: (a) synthetic feedback data with known principles; (b) AlpacaEval cross-annotated human feedback data; (c) crowdsourced Chatbot Arena data; and (d) PRISM data from diverse demographic groups. As an example application, we further demonstrate the detection of biases in human feedback data. As a short and interpretable representation of the original dataset, generated constitutions have many potential use cases: they may help identify undesirable annotator biases, better understand model performance, scale feedback to unseen data, or assist with adapting AI models to individual user or group preferences.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[88]

M. Muschalik, F. Fumagalli, P. Frazzetto, J. Strotherm, L. Hermes, A. Sperduti, E. Hüllermeier and B. Hammer.
Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL

Abstract

Albeit the ubiquitous use of Graph Neural Networks (GNNs) in machine learning (ML) prediction tasks involving graph-structured data, their interpretability remains challenging. In explainable artificial intelligence (XAI), the Shapley Value (SV) is the predominant method to quantify contributions of individual features to a ML model’s output. Addressing the limitations of SVs in complex prediction models, Shapley Interactions (SIs) extend the SV to groups of features. In this work, we explain single graph predictions of GNNs with SIs that quantify node contributions and interactions among multiple nodes. By exploiting the GNN architecture, we show that the structure of interactions in node embeddings are preserved for graph prediction. As a result, the exponential complexity of SIs depends only on the receptive fields, i.e. the message-passing ranges determined by the connectivity of the graph and the number of convolutional layers. Based on our theoretical results, we introduce GraphSHAP-IQ, an efficient approach to compute any-order SIs exactly. GraphSHAP-IQ is applicable to popular message passing techniques in conjunction with a linear global pooling and output layer. We showcase that GraphSHAP-IQ substantially reduces the exponential complexity of computing exact SIs on multiple benchmark datasets. Beyond exact computation, we evaluate GraphSHAP-IQ’s approximation of SIs on popular GNN architectures and compare with existing baselines. Lastly, we visualize SIs of real-world water distribution networks and molecule structures using a SI-Graph.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[87]

R. Visser, F. Fumagalli, M. Muschalik, E. Hüllermeier and B. Hammer.
Explaining Outliers using Isolation Forest and Shapley Interactions.
ESANN 2025 - European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges, Belgium, Apr 23-25, 2025. PDF

Abstract

In unsupervised machine learning, Isolation Forest (IsoForest) is a widely used algorithm for the eﬃcient detection of outliers. Identifying the features responsible for observed anomalies is crucial for practitioners, yet the ensemble nature of IsoForest complicates interpretation and comparison. As a remedy, SHAP is a prevalent method to interpret outlier scoring models by assigning contributions to individual features based on the Shapley Value (SV). However, complex anomalies typically involve interaction of features, and it is paramount for practitioners to distinguish such complex anomalies from simple cases. In this work, we propose Shapley Interactions (SIs) to enrich explanations of outliers with feature interactions. SIs, as an extension of the SV, decompose the outlier score into contributions of individual features and interactions of features up to a specified explanation order. We modify IsoForest to compute SI using TreeSHAP-IQ, an extension of TreeSHAP for tree-based models, using the shapiqpackage. Using a qualitative and quantitative analysis on synthetic and real-world datasets, we demonstrate the benefit of SI and feature interactions for outlier explanations over feature contributions alone.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Artificial Intelligence and Machine Learning

[86]

C. Bülte, Y. Sale, T. Löhr, P. Hofman, G. Kutyniok and E. Hüllermeier.
An Axiomatic Assessment of Entropy- and Variance-based Uncertainty Quantification in Regression.
Preprint (Apr. 2025). arXiv

Abstract

Uncertainty quantification (UQ) is crucial in machine learning, yet most (axiomatic) studies of uncertainty measures focus on classification, leaving a gap in regression settings with limited formal justification and evaluations. In this work, we introduce a set of axioms to rigorously assess measures of aleatoric, epistemic, and total uncertainty in supervised regression. By utilizing a predictive exponential family, we can generalize commonly used approaches for uncertainty representation and corresponding uncertainty measures. More specifically, we analyze the widely used entropy- and variance-based measures regarding limitations and challenges. Our findings provide a principled foundation for UQ in regression, offering theoretical insights and practical guidelines for reliable uncertainty assessment.

MCML Authors

Christopher Bülte

Mathematical Foundations of Artificial Intelligence

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

A2 | Mathematical Foundations

Artificial Intelligence and Machine Learning

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[85]

L. Fichtel, M. Spliethöver, E. Hüllermeier, P. Jimenez, N. Klowait, S. Kopp, A.-C. N. Ngomo, A. Robrecht, I. Scharlau, L. Terfloth, A.-L. Vollmer and H. Wachsmuth.
Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues.
Preprint (Apr. 2025). arXiv

Abstract

The ability to generate explanations that are understood by explainees is the quintessence of explainable artificial intelligence. Since understanding depends on the explainee’s background and needs, recent research has focused on co-constructive explanation dialogues, where the explainer continuously monitors the explainee’s understanding and adapts explanations dynamically. We investigate the ability of large language models (LLMs) to engage as explainers in co-constructive explanation dialogues. In particular, we present a user study in which explainees interact with LLMs, of which some have been instructed to explain a predefined topic co-constructively. We evaluate the explainees’ understanding before and after the dialogue, as well as their perception of the LLMs’ co-constructive behavior. Our results indicate that current LLMs show some co-constructive behaviors, such as asking verification questions, that foster the explainees’ engagement and can improve understanding of a topic. However, their ability to effectively monitor the current understanding and scaffold the explanations accordingly remains limited.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[84]

N. Röhrich, A. Hoffmann, R. Nordsieck, E. Zarbali and A. Javanmardi.
Masked Autoencoder Self Pre-Training for Defect Detection in Microelectronics.
Preprint (Apr. 2025). arXiv

Abstract

Whereas in general computer vision, transformer-based architectures have quickly become the gold standard, microelectronics defect detection still heavily relies on convolutional neural networks (CNNs). We hypothesize that this is due to the fact that a) transformers have an increased need for data and b) labelled image generation procedures for microelectronics are costly, and labelled data is therefore sparse. Whereas in other domains, pre-training on large natural image datasets can mitigate this problem, in microelectronics transfer learning is hindered due to the dissimilarity of domain data and natural images. Therefore, we evaluate self pre-training, where models are pre-trained on the target dataset, rather than another dataset. We propose a vision transformer (ViT) pre-training framework for defect detection in microelectronics based on masked autoencoders (MAE). In MAE, a large share of image patches is masked and reconstructed by the model during pre-training. We perform pre-training and defect detection using a dataset of less than 10.000 scanning acoustic microscopy (SAM) images labelled using transient thermal analysis (TTA). Our experimental results show that our approach leads to substantial performance gains compared to a) supervised ViT, b) ViT pre-trained on natural image datasets, and c) state-of-the-art CNN-based defect detection models used in the literature. Additionally, interpretability analysis reveals that our self pre-trained models, in comparison to ViT baselines, correctly focus on defect-relevant features such as cracks in the solder material. This demonstrates that our approach yields fault-specific feature representations, making our self pre-trained models viable for real-world defect detection in microelectronics.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

[83]

C. Damke and E. Hüllermeier.
Adjusted Count Quantification Learning on Graphs.
Preprint (Mar. 2025). arXiv

Abstract

Quantification learning is the task of predicting the label distribution of a set of instances. We study this problem in the context of graph-structured data, where the instances are vertices. Previously, this problem has only been addressed via node clustering methods. In this paper, we extend the popular Adjusted Classify & Count (ACC) method to graphs. We show that the prior probability shift assumption upon which ACC relies is often not fulfilled and propose two novel graph quantification techniques: Structural importance sampling (SIS) makes ACC applicable in graph domains with covariate shift. Neighborhood-aware ACC improves quantification in the presence of non-homophilic edges. We show the effectiveness of our techniques on multiple graph quantification tasks.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[82]

X. Feng, Z. Jiang, T. Kaufmann, P. Xu, E. Hüllermeier, P. Weng and Y. Zhu.
DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Defining a reward function is usually a challenging but critical task for the system designer in reinforcement learning, especially when specifying complex behaviors. Reinforcement learning from human feedback (RLHF) emerges as a promising approach to circumvent this. In RLHF, the agent typically learns a reward function by querying a human teacher using pairwise comparisons of trajectory segments. A key question in this domain is how to reduce the number of queries necessary to learn an informative reward function since asking a human teacher too many queries is impractical and costly. To tackle this question, we propose DUO, a novel method for diverse, uncertain, on-policy query generation and selection in RLHF. Our method produces queries that are (1) more relevant for policy training (via an on-policy criterion), (2) more informative (via a principled measure of epistemic uncertainty), and (3) diverse (via a clustering-based filter). Experimental results on a variety of locomotion and robotic manipulation tasks demonstrate that our method can outperform state-of-the-art RLHF methods given the same total budget of queries, while being robust to possibly irrational teachers.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[81]

J. Hanselle, S. Heid, J. Fürnkranz and E. Hüllermeier.
Probabilistic scoring lists for interpretable machine learning.
Machine Learning 114.55 (Feb. 2025). DOI

Abstract

A scoring system is a simple decision model that checks a set of features, adds a certain number of points to a total score for each feature that is satisfied, and finally makes a decision by comparing the total score to a threshold. Scoring systems have a long history of active use in safety-critical domains such as healthcare and justice, where they provide guidance for making objective and accurate decisions. Given their genuine interpretability, the idea of learning scoring systems from data is obviously appealing from the perspective of explainable AI. In this paper, we propose a practically motivated extension of scoring systems called probabilistic scoring lists (PSL), as well as a method for learning PSLs from data. Instead of making a deterministic decision, a PSL represents uncertainty in the form of probability distributions, or, more generally, probability intervals. Moreover, in the spirit of decision lists, a PSL evaluates features one by one and stops as soon as a decision can be made with enough confidence. To evaluate our approach, we conduct case studies in the medical domain and on standard benchmark data.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[80]

M. Jürgens, T. Mortier, E. Hüllermeier, V. Bengs and W. Waegeman.
A calibration test for evaluating set-based epistemic uncertainty representations.
Preprint (Feb. 2025). arXiv

Abstract

The accurate representation of epistemic uncertainty is a challenging yet essential task in machine learning. A widely used representation corresponds to convex sets of probabilistic predictors, also known as credal sets. One popular way of constructing these credal sets is via ensembling or specialized supervised learning methods, where the epistemic uncertainty can be quantified through measures such as the set size or the disagreement among members. In principle, these sets should contain the true data-generating distribution. As a necessary condition for this validity, we adopt the strongest notion of calibration as a proxy. Concretely, we propose a novel statistical test to determine whether there is a convex combination of the set’s predictions that is calibrated in distribution. In contrast to previous methods, our framework allows the convex combination to be instance dependent, recognizing that different ensemble members may be better calibrated in different regions of the input space. Moreover, we learn this combination via proper scoring rules, which inherently optimize for calibration. Building on differentiable, kernel-based estimators of calibration errors, we introduce a nonparametric testing procedure and demonstrate the benefits of capturing instance-level variability on of synthetic and real-world experiments.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

[79]

G. D. Pelegrina, P. Kolpaczki and E. Hüllermeier.
Shapley Value Approximation Based on k-Additive Games.
Preprint (Feb. 2025). arXiv

Abstract

The Shapley value is the prevalent solution for fair division problems in which a payout is to be divided among multiple agents. By adopting a game-theoretic view, the idea of fair division and the Shapley value can also be used in machine learning to quantify the individual contribution of features or data points to the performance of a predictive model. Despite its popularity and axiomatic justification, the Shapley value suffers from a computational complexity that scales exponentially with the number of entities involved, and hence requires approximation methods for its reliable estimation. We propose SVAkADD, a novel approximation method that fits a k-additive surrogate game. By taking advantage of k-additivity, we are able to elicit the exact Shapley values of the surrogate game and then use these values as estimates for the original fair division problem. The efficacy of our method is evaluated empirically and compared to competing methods.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[78]

T. Mortier, A. Javanmardi, Y. Sale, E. Hüllermeier and W. Waegeman.
Conformal Prediction in Hierarchical Classification.
Preprint (Jan. 2025). arXiv

Abstract

Conformal prediction has emerged as a widely used framework for constructing valid prediction sets in classification and regression tasks. In this work, we extend the split conformal prediction framework to hierarchical classification, where prediction sets are commonly restricted to internal nodes of a predefined hierarchy, and propose two computationally efficient inference algorithms. The first algorithm returns internal nodes as prediction sets, while the second relaxes this restriction, using the notion of representation complexity, yielding a more general and combinatorial inference problem, but smaller set sizes. Empirical evaluations on several benchmark datasets demonstrate the effectiveness of the proposed algorithms in achieving nominal coverage.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[77]

M. H. Shaker and E. Hüllermeier.
Random Forest Calibration.
Preprint (Jan. 2025). arXiv

Abstract

The Random Forest (RF) classifier is often claimed to be relatively well calibrated when compared with other machine learning methods. Moreover, the existing literature suggests that traditional calibration methods, such as isotonic regression, do not substantially enhance the calibration of RF probability estimates unless supplied with extensive calibration data sets, which can represent a significant obstacle in cases of limited data availability. Nevertheless, there seems to be no comprehensive study validating such claims and systematically comparing state-of-the-art calibration methods specifically for RF. To close this gap, we investigate a broad spectrum of calibration methods tailored to or at least applicable to RF, ranging from scaling techniques to more advanced algorithms. Our results based on synthetic as well as real-world data unravel the intricacies of RF probability estimates, scrutinize the impacts of hyper-parameters, compare calibration methods in a systematic way. We show that a well-optimized RF performs as well as or better than leading calibration approaches.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

2024

[76]

A. Javanmardi, D. Stutz and E. Hüllermeier.
Conformalized Credal Set Predictors.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[75]

M. Muschalik, H. Baniecki, F. Fumagalli, P. Kolpaczki, B. Hammer and E. Hüllermeier.
shapiq: Shapley Interactions for Machine Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[74]

N. Saberi, M. H. Shaker, C. R. Duguay, K. A. Scott and E. Hüllermeier.
Uncertainty Estimation of Lake Ice Cover Maps From a Random Forest Classifier Using MODIS TOA Reflectance Data.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (Dec. 2024). DOI

Abstract

This article presents a method to improve the usability of lake ice cover (LIC) maps generated from moderate resolution imaging spectroradiometer (MODIS) top-of-atmosphere reflectance data by providing estimates of aleatoric and epistemic uncertainty. We used a random forest (RF) classifier, which has been shown to have superior performance in classifying lake ice, open water, and clouds, to generate daily LIC maps with inherent (aleatoric) and model (epistemic) uncertainties. RF allows for the learning of different hypotheses (trees), producing diverse predictions that can be utilized to quantify aleatoric and epistemic uncertainty. We use a decomposition of Shannon entropy to quantify these uncertainties and apply pixel-based uncertainty estimation. Our results show that using uncertainty values to reject the classification of uncertain pixels significantly improves recall and precision. The method presented herein is under consideration for integration into the processing chain implemented for the production of daily LIC maps as part of the European Space Agency’s Climate Change Initiative (CCI+) Lakes project.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[73]

F. Fumagalli, M. Muschalik, E. Hüllermeier, B. Hammer and J. Herbinger.
Unifying Feature-Based Explanations with Functional ANOVA and Cooperative Game Theory.
Preprint (Dec. 2024). arXiv

Abstract

Feature-based explanations, using perturbations or gradients, are a prevalent tool to understand decisions of black box machine learning models. Yet, differences between these methods still remain mostly unknown, which limits their applicability for practitioners. In this work, we introduce a unified framework for local and global feature-based explanations using two well-established concepts: functional ANOVA (fANOVA) from statistics, and the notion of value and interaction from cooperative game theory. We introduce three fANOVA decompositions that determine the influence of feature distributions, and use game-theoretic measures, such as the Shapley value and interactions, to specify the influence of higher-order interactions. Our framework combines these two dimensions to uncover similarities and differences between a wide range of explanation techniques for features and groups of features. We then empirically showcase the usefulness of our framework on synthetic and real-world datasets.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Julia Herbinger

Dr.

* Former Member

[72]

S. M. A. R. Thies, J. C. Alfaro and V. Bengs.
MORE–PLR: Multi-Output Regression Employed for Partial Label Ranking.
DS 2024 - 27th International Conference on Discovery Science. Pisa, Italy, Oct 14-16, 2024. DOI GitHub

Abstract

The partial label ranking (PLR) problem is a supervised learning scenario where the learner predicts a ranking with ties of the labels for a given input instance. It generalizes the well-known label ranking (LR) problem, which only allows for strict rankings. So far, pre-vious learning approaches for PLR have primarily adapted LR methods to accommodate ties in predictions. This paper proposes using multi-output regression (MOR) to address the PLR problem by treating ranking positions as multivariate targets, an approach that has received little attention in both LR and PLR. To effectively employ this approach, we introduce several post-hoc layers that convert MOR results into a ranking, potentially including ties. This framework produces a range of learning approaches, which we demonstrate in experimental evaluations to be competitive with the current state-of-the-art PLR methods.

MCML Authors

Viktor Bengs

Dr.

* Former Member

[71]

S. Haas, K. Hegestweiler, M. Rapp, M. Muschalik and E. Hüllermeier.
Stakeholder-centric explanations for black-box decisions: an XAI process model and its application to automotive goodwill assessments.
Frontiers in Artificial Intelligence 7 (Oct. 2024). DOI

Abstract

Machine learning has made tremendous progress in predictive performance in recent years. Despite these advances, employing machine learning models in high-stake domains remains challenging due to the opaqueness of many high-performance models. If their behavior cannot be analyzed, this likely decreases the trust in such models and hinders the acceptance of human decision-makers. Motivated by these challenges, we propose a process model for developing and evaluating explainable decision support systems that are tailored to the needs of different stakeholders. To demonstrate its usefulness, we apply the process model to a real-world application in an enterprise context. The goal is to increase the acceptance of an existing black-box model developed at a car manufacturer for supporting manual goodwill assessments. Following the proposed process, we conduct two quantitative surveys targeted at the application’s stakeholders. Our study reveals that textual explanations based on local feature importance best fit the needs of the stakeholders in the considered use case. Specifically, our results show that all stakeholders, including business specialists, goodwill assessors, and technical IT experts, agree that such explanations significantly increase their trust in the decision support system. Furthermore, our technical evaluation confirms the faithfulness and stability of the selected explanation method. These practical findings demonstrate the potential of our process model to facilitate the successful deployment of machine learning models in enterprise settings. The results emphasize the importance of developing explanations that are tailored to the specific needs and expectations of diverse stakeholders.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[70]

C. Damke and E. Hüllermeier.
CUQ-GNN: Committee-Based Graph Uncertainty Quantification Using Posterior Networks.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

In this work, we study the influence of domain-specific characteristics when defining a meaningful notion of predictive uncertainty on graph data. Previously, the so-called Graph Posterior Network (GPN) model has been proposed to quantify uncertainty in node classification tasks. Given a graph, it uses Normalizing Flows (NFs) to estimate class densities for each node independently and converts those densities into Dirichlet pseudo-counts, which are then dispersed through the graph using the personalized Page-Rank (PPR) algorithm. The architecture of GPNs is motivated by a set of three axioms on the properties of its uncertainty estimates. We show that those axioms are not always satisfied in practice and therefore propose the family of Committe-based Uncertainty Quantification Graph Neural Networks (CUQ-GNNs), which combine standard Graph Neural Networks (GNNs) with the NF-based uncertainty estimation of Posterior Networks (PostNets). This approach adapts more flexibly to domain-specific demands on the properties of uncertainty estimates. We compare CUQ-GNN against GPN and other uncertainty quantification approaches on common node classification benchmarks and show that it is effective at producing useful uncertainty estimates.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[69]

R. Fischer, M. Wever, S. Buschjäger and T. Liebig.
MetaQuRe: Meta-learning from Model Quality and Resource Consumption.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

Automated machine learning (AutoML) allows for selecting, parametrizing, and composing learning algorithms for a given data set. While resources play a pivotal role in neural architecture search, it is less pronounced by classical AutoML approaches. In fact, they generally focus on only maximizing predictive quality and disregard the importance of finding resource-efficient solutions. To push resource awareness further, our work explicitly explores how measures such as running time or energy consumption can be better considered in AutoML. Firstly, we propose a novel method for algorithm selection that balances multiple performance aspects (including resource demand) as prioritized by the user with the help of compositional meta-learning. Secondly, to foster research on green meta-learning and AutoML, we release the MetaQuRe data set, which contains information on predictive (Qu)ality and (Re)source consumption of models evaluated across hundreds of data sets and four execution environments. We use this data to put our methodology into practice and conduct an in-depth analysis of how our approach and data set can help in making AutoML more resource-aware, which represents our third contribution. Lastly, we publish MetaQuRe alongside an extensive code base, allowing for reproducing all results, expanding our data with results from custom environments, and exploring MetaQuRe interactively. In short, our work demonstrates both the importance as well as benefits of rethinking AutoML and meta-learning in a resource-aware way, thus paving the path for making future ML solutions more sustainable.

MCML Authors

Marcel Wever

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[68]

A. Vahidi, L. Wimmer, H. A. Gündüz, B. Bischl, E. Hüllermeier and M. Rezaei.
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory demands. In addition, the efficiency of a deep ensemble is related to diversity among the ensemble members, which is challenging for large, over-parameterized deep neural networks. Moreover, ensemble learning has not yet seen such widespread adoption for unsupervised learning and it remains a challenging endeavor for self-supervised or unsupervised representation learning. Motivated by these challenges, we present a novel self-supervised training regime that leverages an ensemble of independent sub-networks, complemented by a new loss function designed to encourage diversity. Our method efficiently builds a sub-model ensemble with high diversity, leading to well-calibrated estimates of model uncertainty, all achieved with minimal computational overhead compared to traditional deep self-supervised ensembles. To evaluate the effectiveness of our approach, we conducted extensive experiments across various tasks, including in-distribution generalization, out-of-distribution detection, dataset corruption, and semi-supervised settings. The results demonstrate that our method significantly improves prediction reliability. Our approach not only achieves excellent accuracy but also enhances calibration, improving on important baseline performance across a wide range of self-supervised architectures in computer vision, natural language processing, and genomics data.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Hüseyin Anil Gündüz

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Mina Rezaei

Dr.

Statistical Learning and Data Science

[67]

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Explaining Change in Models and Data with Global Feature Importance and Effects.
TempXAI @ECML-PKDD 2024 - Tutorial-Workshop Explainable AI for Time Series and Data Streams at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. PDF

Abstract

In dynamic machine learning environments, where data streams continuously evolve, traditional explanation methods struggle to remain faithful to the underlying model or data distribution. Therefore, this work presents a unified framework for efficiently computing incremental model-agnostic global explanations tailored for time-dependent models. By extending static model-agnostic methods such as Permutation Feature Importance, SAGE, and Partial Dependence Plots into the online learning context, the proposed framework enables the continuous updating of explanations as new data becomes available. These incremental variants ensure that global explanations remain relevant while minimizing computational overhead. The framework also addresses key challenges related to data distribution maintenance and perturbation generation in online learning, offering time and memory efficient solutions like geometric reservoir-based sampling for data replacement.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[66]

P. Kolpaczki, E. Hüllermeier and V. Bengs.
Piecewise-Stationary Dueling Bandits.
Transactions on Machine Learning Research (Sep. 2024). URL

Abstract

We study the piecewise-stationary dueling bandits problem with arms, where the time horizon consists of stationary segments, each of which is associated with its own preference matrix. The learner repeatedly selects a pair of arms and observes a binary preference between them as feedback. To minimize the accumulated regret, the learner needs to pick the Condorcet winner of each stationary segment as often as possible, despite preference matrices and segment lengths being unknown. We propose the Beat the Winner Reset algorithm and prove a bound on its expected binary weak regret in the stationary case, which tightens the bound of current state-of-art algorithms. We also show a regret bound for the non-stationary case, without requiring knowledge of or . We further propose and analyze two meta-algorithms, DETECT for weak regret and Monitored Dueling Bandits for strong regret, both based on a detection-window approach that can incorporate any dueling bandit algorithm as a black-box algorithm. Finally, we prove a worst-case lower bound for expected weak regret in the non-stationary case.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

[65]

J. Brandt, M. Wever, V. Bengs and E. Hüllermeier.
Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO.
IJCAI 2024 - 33rd International Joint Conference on Artificial Intelligence. Jeju, Korea, Aug 03-09, 2024. DOI

Abstract

Hyperparameter optimization (HPO) is indispensable for achieving optimal performance in machine learning tasks. A popular class of methods in this regard is based on Successive Halving (SHA), which casts HPO into a pure-exploration multi-armed bandit problem under finite sampling budget constraints. This is accomplished by considering hyperparameter configurations as arms and rewards as the negative validation losses. While enjoying theoretical guarantees as well as working well in practice, SHA comes, however, with several hyperparameters itself, one of which is the maximum budget that can be allocated to evaluate a single arm (hyperparameter configuration). Although there are already solutions to this meta hyperparameter optimization problem, such as the doubling trick or asynchronous extensions of SHA, these are either practically inefficient or lack theoretical guarantees. In this paper, we propose incremental SHA (iSHA), a synchronous extension of SHA, allowing to increase the maximum budget a posteriori while still enjoying theoretical guarantees. Our empirical analysis of HPO problems corroborates our theoretical findings and shows that iSHA is more resource-efficient than existing SHA-based approaches.

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[64]

S. Heid, J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Learning decision catalogues for situated decision making: The case of scoring systems.
International Journal of Approximate Reasoning 171 (Aug. 2024). DOI

Abstract

In this paper, we formalize the problem of learning coherent collections of decision models, which we call decision catalogues, and illustrate it for the case where models are scoring systems. This problem is motivated by the recent rise of algorithmic decision-making and the idea to improve human decision-making through machine learning, in conjunction with the observation that decision models should be situated in terms of their complexity and resource requirements: Instead of constructing a single decision model and using this model in all cases, different models might be appropriate depending on the decision context. Decision catalogues are supposed to support a seamless transition from very simple, resource-efficient to more sophisticated but also more demanding models. We present a general algorithmic framework for inducing such catalogues from training data, which tackles the learning task as a problem of searching the space of candidate catalogues systematically and, to this end, makes use of heuristic search methods. We also present a concrete instantiation of this framework as well as empirical studies for performance evaluation, which, in a nutshell, show that greedy search is an efficient and hard-to-beat strategy for the construction of catalogues of scoring systems.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[63]

S. Dutta, T. Kaufmann, G. Glavaš, I. Habernal, K. Kersting, F. Kreuter, M. Mezini, I. Gurevych, E. Hüllermeier and H. Schütze.
Problem Solving Through Human-AI Preference-Based Cooperation.
Preprint (Aug. 2024). arXiv

Abstract

While there is a widespread belief that artificial general intelligence (AGI) – or even superhuman AI – is imminent, complex problems in expert domains are far from being solved. We argue that such problems require human-AI cooperation and that the current state of the art in generative AI is unable to play the role of a reliable partner due to a multitude of shortcomings, including difficulty to keep track of a complex solution artifact (e.g., a software program), limited support for versatile human preference expression and lack of adapting to human preference in an interactive setting. To address these challenges, we propose HAICo2, a novel human-AI co-construction framework. We take first steps towards a formalization of HAICo2 and discuss the difficult open research problems that it faces.

MCML Authors

Timo Kaufmann

C4 | Computational Social Sciences

Artificial Intelligence and Machine Learning

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Eyke Hüllermeier

Prof. Dr.

B2 | Natural Language Processing

Artificial Intelligence and Machine Learning

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[62]

F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and k-Shapley values (k-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Artificial Intelligence and Machine Learning

[61]

M. Herrmann, F. J. D. Lange, K. Eggensperger, G. Casalicchio, M. Wever, M. Feurer, D. Rügamer, E. Hüllermeier, A.-L. Boulesteix and B. Bischl.
Position: Why We Must Rethink Empirical Research in Machine Learning.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

We warn against a common but incomplete understanding of empirical research in machine learning (ML) that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical ML research is fashioned as confirmatory research while it should rather be considered exploratory.

MCML Authors

Moritz Herrmann

Dr.

Transfer Coordinator

Biometry in Molecular Medicine

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Marcel Wever

Dr.

* Former Member

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

[60]

Y. Sale, V. Bengs, M. Caprio and E. Hüllermeier.
Second-Order Uncertainty Quantification: A Distance-Based Approach.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

In the past couple of years, various approaches to representing and quantifying different types of predictive uncertainty in machine learning, notably in the setting of classification, have been proposed on the basis of second-order probability distributions, i.e., predictions in the form of distributions on probability distributions. A completely conclusive solution has not yet been found, however, as shown by recent criticisms of commonly used uncertainty measures associated with second-order distributions, identifying undesirable theoretical properties of these measures. In light of these criticisms, we propose a set of formal criteria that meaningful uncertainty measures for predictive uncertainty based on second-order distributions should obey. Moreover, we provide a general framework for developing uncertainty measures to account for these criteria, and offer an instantiation based on the Wasserstein distance, for which we prove that all criteria are satisfied.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[59]

X. Feng, Z. Jiang, T. Kaufmann, E. Hüllermeier, P. Weng and Y. Zhu.
Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries.
MHFAIA @ICML 2024 - Workshop on Models of Human Feedback for AI Alignment at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. Traditional methods with pairwise trajectory comparisons face challenges: trajectories with subtle differences are hard to compare, and comparisons are ordinal, limiting direct inference of preference strength. In this paper, we introduce the distinguishability query, where humans compare two pairs of trajectories and indicate which pair is easier to compare and then give preference feedback on the easier pair. This type of query directly infers preference strength and is expected to reduce cognitive load on the labeler. We also connect this query to cardinal utility and difference relations, and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results empirically demonstrates the potential of our method for faster, data-efficient learning and improved user-friendliness on RLHF benchmarks.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[58]

P. Hofman, Y. Sale and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty: A Credal Approach.
SPIGM @ICML 2024 - Workshop on Structured Probabilistic Inference & Generative Modeling at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Uncertainty representation and quantification are paramount in machine learning, especially in safety-critical applications. In this paper, we propose a novel framework for the quantification of aleatoric and epistemic uncertainty based on the notion of credal sets, i.e., sets of probability distributions. Thus, we assume a learner that produces (second-order) predictions in the form of sets of probability distributions on outcomes. Practically, such an approach can be realized by means of ensemble learning: Given an ensemble of learners, credal sets are generated by including sufficiently plausible predictors, where plausibility is measured in terms of (relative) likelihood. We provide a formal justification for the framework and introduce new measures of epistemic and aleatoric uncertainty as concrete instantiations. We evaluate these measures both theoretically, by analysing desirable axiomatic properties, and empirically, by comparing them in terms of performance and effectiveness to existing measures of uncertainty in an experimental study.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[57]

P. Kolpaczki, G. Haselbeck and E. Hüllermeier.
How Much Can Stratification Improve the Approximation of Shapley Values?
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Over the last decade, the Shapley value has become one of the most widely applied tools to provide post-hoc explanations for black box models. However, its theoretically justified solution to the problem of dividing a collective benefit to the members of a group, such as features or data points, comes at a price. Without strong assumptions, the exponential number of member subsets excludes an exact calculation of the Shapley value. In search for a remedy, recent works have demonstrated the efficacy of approximations based on sampling with stratification, in which the sample space is partitioned into smaller subpopulations. The effectiveness of this technique mainly depends on the degree to which the allocation of available samples over the formed strata mirrors their unknown variances. To uncover the hypothetical potential of stratification, we investigate the gap in approximation quality caused by the lack of knowledge of the optimal allocation. Moreover, we combine recent advances to propose two state-of-the-art algorithms Adaptive SVARM and Continuous Adaptive SVARM that adjust the sample allocation on-the-fly. The potential of our approach is assessed in an empirical evaluation.

MCML Authors

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[56]

C. Damke and E. Hüllermeier.
Linear Opinion Pooling for Uncertainty Quantification on Graphs.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL GitHub

Abstract

We address the problem of uncertainty quantification for graph-structured data, or, more specifically, the problem to quantify the predictive uncertainty in (semi-supervised) node classification. Key questions in this regard concern the distinction between two different types of uncertainty, aleatoric and epistemic, and how to support uncertainty quantification by leveraging the structural information provided by the graph topology. Challenging assumptions and postulates of state-of-the-art methods, we propose a novel approach that represents (epistemic) uncertainty in terms of mixtures of Dirichlet distributions and refers to the established principle of linear opinion pooling for propagating information between neighbored nodes in the graph. The effectiveness of this approach is demonstrated in a series of experiments on a variety of graph-structured datasets.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[55]

Y. Sale, P. Hofman, T. Löhr, L. Wimmer, T. Nagler and E. Hüllermeier.
Label-wise Aleatoric and Epistemic Uncertainty Quantification.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL

Abstract

We present a novel approach to uncertainty quantification in classification tasks based on label-wise decomposition of uncertainty measures. This label-wise perspective allows uncertainty to be quantified at the individual class level, thereby improving cost-sensitive decision-making and helping understand the sources of uncertainty. Furthermore, it allows to define total, aleatoric, and epistemic uncertainty on the basis of non-categorical measures such as variance, going beyond common entropy-based measures. In particular, variance-based measures address some of the limitations associated with established methods that have recently been discussed in the literature. We show that our proposed measures adhere to a number of desirable properties. Through empirical evaluation on a variety of benchmark data sets – including applications in the medical domain where accurate uncertainty quantification is crucial – we establish the effectiveness of label-wise uncertainty quantification.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Lisa Wimmer

Statistical Learning and Data Science

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[54]

T. Löhr, M. Ingrisch and E. Hüllermeier.
Towards Aleatoric and Epistemic Uncertainty in Medical Image Classification.
AIME 2024 - 22nd International Conference on Artificial Intelligence in Medicine. Salt Lake City, UT, USA, Jul 09-12, 2024. DOI

Abstract

Medical domain applications require a detailed understanding of the decision making process, in particular when data-driven modeling via machine learning is involved, and quantifying uncertainty in the process adds trust and interpretability to predictive models. However, current uncertainty measures in medical imaging are mostly monolithic and do not distinguish between different sources and types of uncertainty. In this paper, we advocate the distinction between so-called aleatoric and epistemic uncertainty in the medical domain and illustrate its potential in clinical decision making for the case of PET/CT image classification.

MCML Authors

Michael Ingrisch

Prof. Dr.

C1 | Medicine

Clinical Data Science in Radiology

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[53]

A. Javanmardi, O. K. Aimiyekagbon, A. Bender, J. K. Kimotho, W. Sextro and E. Hüllermeier.
Remaining Useful Lifetime Estimation of Bearings Operating under Time-Varying Conditions.
PHME 2024 - 8th European Conference of the Prognostics and Health Management Society 2024. Prague, Czech Republic, Jul 03-05, 2024. DOI

Abstract

This paper investigates the remaining useful lifetime (RUL) estimation of bearings under dynamic, i.e., time-varying, operating conditions (OC). Unlike conventional studies that assume constant OC in bearing accelerated life tests, we introduce a dataset with time-varying OC during run-to-failure experiments, simulating real-world scenarios. We explore data-driven approaches to identify the transition point from a healthy to an unhealthy state and estimate the RUL. Additionally, we examine strategies for integrating OC information to enhance RUL estimations. These methodologies are evaluated through numerical experiments using various machine learning algorithms.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[52]

J. Brandt, B. Haddenhorst, V. Bengs and E. Hüllermeier.
Dueling Bandits with Delayed Feedback.
DataNinja sAIOnARA 2024 - DataNinja sAIOnARA Conference: Shaping Trustworthy AI: Opportunities, Innovation and Achievements for Reliable Approaches. Bielefeld, Germany, Jun 25-27, 2024. DOI

Abstract

Dueling Bandits is a well-studied extension of the Multi-Armed Bandits problem, in which the learner must select two arms in each time step and receives a binary feedback as an outcome of the chosen duel. However, all of the existing best arm identification algorithms for the Dueling Bandits setting assume that the feedback can be observed immediately after selecting the two arms. If this is not the case, the algorithms simply do nothing and wait until the feedback of the recent duel can be observed, which is a waste of runtime. We propose an algorithm that can already start a new duel even if the previous one is not finished and thus is much more time efficient. Our arm selection strategy balances the expected information gain of the chosen duel and the expected delay until we observe the feedback. By theoretically grounded confidence bounds we can ensure that the arms we discard are not the best arms with high probability.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[51]

E. Hüllermeier.
On the Challenge of Quantifying Epistemic Uncertainty in Machine Learning.
SIPTA - The Society for Imprecise Probabilities: Theories and Applications. Virtual, Jun 14, 2024. Invited Talk. PDF

Abstract

n/a

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[50]

S. Haas and E. Hüllermeier.
Conformalized prescriptive machine learning for uncertainty-aware automated decision making: the case of goodwill requests.
International Journal of Data Science and Analytics (Jun. 2024). DOI

Abstract

Due to the inherent presence of uncertainty in machine learning (ML) systems, the usage of ML is until now out of scope for many critical (financial) business processes. One such process is goodwill assessment at car manufacturers, where a large part of goodwill cases is still assessed manually by human experts. To increase the degree of automation while still providing an overall reliable assessment service, we propose a selective uncertainty-aware automated decision making approach based on uncertainty quantification through conformal prediction. In our approach, goodwill requests are still shifted to human experts in case the risk of a wrong assessment is too high. Nevertheless, ML can be introduced into the process with reduced and controllable risk. We hereby determine the risk of wrong ML assessments through two hierarchical conformal predictors that make use of the prediction set and interval size as the main criteria for quantifying uncertainty. We also utilize conformal prediction’s property to output empty prediction sets if no prediction is significant enough and abstain from an automatic decision in that case. Instead of providing mathematical guarantees for limited risk, we focus on the risk vs. degree of automation trade-off and how a business decision maker can select in an a posteriori fashion a trade-off that best suits the business problem at hand from a set of pareto optimal solutions. We also show empirically on a goodwill data set of a BMW National Sales Company that by only selecting certain requests for automated decision making we can significantly increase the accuracy of automatically processed requests. For instance, from 92 to 98% for labor and from 90 to 98% for parts contributions respectively, while still maintaining a degree of automation of approximately 70%.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[49]

T. Kaufmann, J. Blüml, A. Wüst, Q. Delfosse, K. Kersting and E. Hüllermeier.
OCALM: Object-Centric Assessment with Language Models.
Preprint (Jun. 2024). arXiv

Abstract

Properly defining a reward signal to efficiently train a reinforcement learning (RL) agent is a challenging task. Designing balanced objective functions from which a desired behavior can emerge requires expert knowledge, especially for complex environments. Learning rewards from human feedback or using large language models (LLMs) to directly provide rewards are promising alternatives, allowing non-experts to specify goals for the agent. However, black-box reward models make it difficult to debug the reward. In this work, we propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for RL agents from natural language task descriptions. OCALM uses the extensive world-knowledge of LLMs while leveraging the object-centric nature common to many environments to derive reward functions focused on relational concepts, providing RL agents with the ability to derive policies from task descriptions.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[48]

V. Margraf, M. Wever, S. Gilhuber, G. M. Tavares, T. Seidl and E. Hüllermeier.
ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data.
Preprint (Jun. 2024). arXiv GitHub

Abstract

In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled, aiming to enhance learning algorithms’ efficiency and performance. Numerous such query strategies have been proposed and compared in the active learning literature. However, the community still lacks standardized benchmarks for comparing the performance of different query strategies. This particularly holds for the combination of query strategies with different learning algorithms into active learning pipelines and examining the impact of the learning algorithm choice. To close this gap, we propose ALPBench, which facilitates the specification, execution, and performance monitoring of active learning pipelines. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, ALPBench consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. To demonstrate its usefulness and broad compatibility with various learning algorithms and query strategies, we conduct an exemplary study evaluating 9 query strategies paired with 8 learning algorithms in 2 different settings.

MCML Authors

Valentin Margraf

Artificial Intelligence and Machine Learning

Marcel Wever

Dr.

A3 | Computational Models
→ Group Thomas Seidl

* Former Member

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Gabriel Marques Tavares

Dr.

A3 | Computational Models
→ Group Thomas Seidl

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[47]

A. Vahidi, S. Schosser, L. Wimmer, Y. Li, B. Bischl, E. Hüllermeier and M. Rezaei.
Probabilistic Self-supervised Representation Learning via Scoring Rules Minimization.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL GitHub

Abstract

In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN’s convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method’s optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Yawei Li

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Mina Rezaei

Dr.

Statistical Learning and Data Science

[46]

V. Bengs, B. Haddenhorst and E. Hüllermeier.
Identifying Copeland Winners in Dueling Bandits with Indifferences.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[45]

P. Kolpaczki, M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
SVARM-IQ: Efficient Approximation of Any-order Shapley Interactions through Stratification.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Abstract

Addressing the limitations of individual attribution scores via the Shapley value (SV), the field of explainable AI (XAI) has recently explored intricate interactions of features or data points. In particular, extensions of the SV, such as the Shapley Interaction Index (SII), have been proposed as a measure to still benefit from the axiomatic basis of the SV. However, similar to the SV, their exact computation remains computationally prohibitive. Hence, we propose with SVARM-IQ a sampling-based approach to efficiently approximate Shapley-based interaction indices of any order. SVARM-IQ can be applied to a broad class of interaction indices, including the SII, by leveraging a novel stratified representation. We provide non-asymptotic theoretical guarantees on its approximation quality and empirically demonstrate that SVARM-IQ achieves state-of-the-art estimation results in practical XAI scenarios on different model classes and application domains.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[44]

P. Hofman, Y. Sale and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty with Proper Scoring Rules.
Preprint (Apr. 2024). arXiv

Abstract

Uncertainty representation and quantification are paramount in machine learning and constitute an important prerequisite for safety-critical applications. In this paper, we propose novel measures for the quantification of aleatoric and epistemic uncertainty based on proper scoring rules, which are loss functions with the meaningful property that they incentivize the learner to predict ground-truth (conditional) probabilities. We assume two common representations of (epistemic) uncertainty, namely, in terms of a credal set, i.e. a set of probability distributions, or a second-order distribution, i.e., a distribution over probability distributions. Our framework establishes a natural bridge between these representations. We provide a formal justification of our approach and introduce new measures of epistemic and aleatoric uncertainty as concrete instantiations.

MCML Authors

Paul Hofman

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[43]

J. Rodemann, F. Croppi, P. Arens, Y. Sale, J. Herbinger, B. Bischl, E. Hüllermeier, T. Augustin, C. J. Walsh and G. Casalicchio.
Explaining Bayesian Optimization by Shapley Values Facilitates Human-AI Collaboration.
Preprint (Mar. 2024). arXiv

Abstract

Bayesian optimization (BO) with Gaussian processes (GP) has become an indispensable algorithm for black box optimization problems. Not without a dash of irony, BO is often considered a black box itself, lacking ways to provide reasons as to why certain parameters are proposed to be evaluated. This is particularly relevant in human-in-the-loop applications of BO, such as in robotics. We address this issue by proposing ShapleyBO, a framework for interpreting BO’s proposals by game-theoretic Shapley this http URL quantify each parameter’s contribution to BO’s acquisition function. Exploiting the linearity of Shapley values, we are further able to identify how strongly each parameter drives BO’s exploration and exploitation for additive acquisition functions like the confidence bound. We also show that ShapleyBO can disentangle the contributions to exploration into those that explore aleatoric and epistemic uncertainty. Moreover, our method gives rise to a ShapleyBO-assisted human machine interface (HMI), allowing users to interfere with BO in case proposals do not align with human reasoning. We demonstrate this HMI’s benefits for the use case of personalizing wearable robotic devices (assistive back exosuits) by human-in-the-loop BO. Results suggest human-BO teams with access to ShapleyBO can achieve lower regret than teams without.

MCML Authors

Yusuf Sale

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Julia Herbinger

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Giuseppe Casalicchio

Dr.

Statistical Learning and Data Science

[42]

P. Kolpaczki, V. Bengs, M. Muschalik and E. Hüllermeier.
Approximating the Shapley Value without Marginal Contributions.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

The Shapley value, which is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, has recently been used intensively in explainable artificial intelligence. Its meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley value, most of them revolve around the notion of an agent’s marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contribution. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[41]

J. Lienen and E. Hüllermeier.
Mitigating Label Noise through Data Ambiguation.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

Label noise poses an important challenge in machine learning, especially in deep learning, in which large models with high expressive power dominate the field. Models of that kind are prone to memorizing incorrect labels, thereby harming generalization performance. Many methods have been proposed to address this problem, including robust loss functions and more complex label correction approaches. Robust loss functions are appealing due to their simplicity, but typically lack flexibility, while label correction usually adds substantial complexity to the training setup. In this paper, we suggest to address the shortcomings of both methodologies by ‘ambiguating’ the target information, adding additional, complementary candidate labels in case the learner is not sufficiently convinced of the observed training label. More precisely, we leverage the framework of so-called superset learning to construct set-valued targets based on a confidence threshold, which deliver imprecise yet more reliable beliefs about the ground-truth, effectively helping the learner to suppress the memorization effect. In an extensive empirical evaluation, our method demonstrates favorable learning behavior on synthetic and real-world noise, confirming the effectiveness in detecting and correcting erroneous training labels.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[40]

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

While shallow decision trees may be interpretable, larger ensemble models like gradient-boosted trees, which often set the state of the art in machine learning problems involving tabular data, still remain black box models. As a remedy, the Shapley value (SV) is a well-known concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions. The model-specific TreeSHAP methodology solves the exponential complexity for retrieving exact SVs from tree-based models. Expanding beyond individual feature attribution, Shapley interactions reveal the impact of intricate feature interactions of any order. In this work, we present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions of tree-based models. TreeSHAP-IQ is supported by a mathematical framework that exploits polynomial arithmetic to compute the interaction scores in a single recursive traversal of the tree, akin to Linear TreeSHAP. We apply TreeSHAP-IQ on state-of-the-art tree ensembles and explore interactions on well-established benchmark datasets.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[39]

E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part I.
4OR (Jan. 2024). DOI

Abstract

Multiple criteria decision aiding (MCDA) and preference learning (PL) are established research fields, which have different roots, developed in different communities – the former in the decision sciences and operations research, the latter in AI and machine learning – and have their own agendas in terms of problem setting, assumptions, and criteria of success. In spite of this, they share the major goal of constructing practically useful decision models that either support humans in the task of choosing the best, classifying, or ranking alternatives from a given set, or even automate decision-making by acting autonomously on behalf of the human. Therefore, MCDA and PL can complement and mutually benefit from each other, a potential that has been exhausted only to some extent so far. By elaborating on the connection between MCDA and PL in more depth, our goal is to stimulate further research at the junction of these two fields. To this end, we first review both methodologies, MCDA in this part of the paper and PL in the second part, with the intention of highlighting their most common elements. In the second part, we then compare both methodologies in a systematic way and give an overview of existing work on combining PL and MCDA.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[38]

E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part II.
4OR (Jan. 2024). DOI

Abstract

This article elaborates on the connection between multiple criteria decision aiding (MCDA) and preference learning (PL), two research fields with different roots and developed in different communities. It complements the first part of the paper, in which we started with a review of MCDA. In this part, a similar review will be given for PL, followed by a systematic comparison of both methodologies, as well as an overview of existing work on combining PL and MCDA. Our main goal is to stimulate further research at the junction of these two methodologies.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

2023

[37]

F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
SHAP-IQ: Unified Approximation of any-order Shapley Interactions.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[36]

T. Kaufmann, P. Weng, V. Bengs and E. Hüllermeier.
A Survey of Reinforcement Learning from Human Feedback.
Preprint (Dec. 2023). arXiv

Abstract

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in directing the model’s capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[35]

Y. Sale, P. Hofman, L. Wimmer, E. Hüllermeier and T. Nagler.
Second-Order Uncertainty Quantification: Variance-Based Measures.
Preprint (Dec. 2023). arXiv

Abstract

Uncertainty quantification is a critical aspect of machine learning models, providing important insights into the reliability of predictions and aiding the decision-making process in real-world applications. This paper proposes a novel way to use variance-based measures to quantify uncertainty on the basis of second-order distributions in classification problems. A distinctive feature of the measures is the ability to reason about uncertainties on a class-based level, which is useful in situations where nuanced decision-making is required. Recalling some properties from the literature, we highlight that the variance-based measures satisfy important (axiomatic) properties. In addition to this axiomatic approach, we present empirical results showing the measures to be effective and competitive to commonly used entropy-based measures.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

Lisa Wimmer

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[34]

J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Probabilistic Scoring Lists for Interpretable Machine Learning.
DS 2023 - 26th International Conference on Discovery Science. Porto, Portugal, Oct 09-11, 2023. DOI

Abstract

A scoring system is a simple decision model that checks a set of features, adds a certain number of points to a total score for each feature that is satisfied, and finally makes a decision by comparing the total score to a threshold. Scoring systems have a long history of active use in safety-critical domains such as healthcare and justice, where they provide guidance for making objective and accurate decisions. Given their genuine interpretability, the idea of learning scoring systems from data is obviously appealing from the perspective of explainable AI. In this paper, we propose a practically motivated extension of scoring systems called probabilistic scoring lists (PSL), as well as a method for learning PSLs from data. Instead of making a deterministic decision, a PSL represents uncertainty in the form of probability distributions. Moreover, in the spirit of decision lists, a PSL evaluates features one by one and stops as soon as a decision can be made with enough confidence. To evaluate our approach, we conduct a case study in the medical domain.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[33]

J. Brandt, E. Schede, S. Sharma, V. Bengs, E. Hüllermeier and K. Tierney.
Contextual Preselection Methods in Pool-based Realtime Algorithm Configuration.
LWDA 2023 - Conference on Lernen. Wissen. Daten. Analysen. Marburg, Germany, Oct 09-11, 2023. PDF

Abstract

Realtime algorithm configuration is concerned with the task of designing a dynamic algorithm configurator that observes sequentially arriving problem instances of an algorithmic problem class for which it selects suitable algorithm configurations (e.g., minimal runtime) of a specific target algorithm. The Contextual Preselection under the Plackett-Luce (CPPL) algorithm maintains a pool of configurations from which a set of algorithm configurations is selected that are run in parallel on the current problem instance. It uses the well-known UCB selection strategy from the bandit literature, while the pool of configurations is updated over time via a racing mechanism. In this paper, we investigate whether the performance of CPPL can be further improved by using different bandit-based selection strategies as well as a ranking-based strategy to update the candidate pool. Our experimental results show that replacing these components can indeed improve performance again significantly.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[32]

J. Hanselle, J. Kornowicz, S. Heid, K. Thommes and E. Hüllermeier.
Comparing Humans and Algorithms in Feature Ranking: A Case-Study in the Medical Domain.
LWDA 2023 - Conference on Lernen. Wissen. Daten. Analysen. Marburg, Germany, Oct 09-11, 2023. PDF

Abstract

The selection of useful, informative, and meaningful features is a key prerequisite for the successful application of machine learning in practice, especially in knowledge-intense domains like decision support. Here, the task of feature selection, or ranking features by importance, can, in principle, be solved automatically in a data-driven way but also supported by expert knowledge. Besides, one may of course, conceive a combined approach, in which a learning algorithm closely interacts with a human expert. In any case, finding an optimal approach requires a basic understanding of human capabilities in judging the importance of features compared to those of a learning algorithm. Hereto, we conducted a case study in the medical domain, comparing feature rankings based on human judgment to rankings automatically derived from data. The quality of a ranking is determined by the performance of a decision list processing features in the order specified by the ranking, more specifically by so-called probabilistic scoring systems.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[31]

P. Becker and V. Bengs.
Shapley-Based Feature Selection for Online Algorithm Selection.
DynXAI @ECML-PKDD 2023 - Workshop on Explainable Artificial Intelligence: From Static to Dynamic at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Online algorithm selection concerns the task of designing a dynamic algorithm selector that observes sequentially arriving problem instances of an algorithmic problem class for which it must select a suitable algorithm from a given pool of candidate algorithms. Typically the suitability of a candidate algorithm is determined by its average runtime for solving a problem instance. Recent work has shown that multi-armed bandit algorithms can be leveraged for specifying a suitable algorithm selector by taking available feature information of problem instances into account. In this paper, we investigate whether the performance of these bandit-based selection strategies can be further improved by incorporating feature selection. To this end, we use the concept of Shapley values from cooperative game theory to specify the contribution of the features with respect to the suitability of the candidate algorithms and adapt the bandit-based selection strategies to consider only features with the highest contribution. We present two different Shapley value-based approaches and show empirically that UCB-based bandit selection strategies can be improved, while Thompson sampling-based strategies actually deteriorate in terms of average runtime.

MCML Authors

Viktor Bengs

Dr.

* Former Member

[30]

S. Haas and E. Hüllermeier.
Rectifying Bias in Ordinal Observational Data Using Unimodal Label Smoothing.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

This paper proposes a novel approach for modeling observational data in the form of expert ratings, which are commonly given on an ordered (numerical or ordinal) scale. In practice, such ratings are often biased, due to the expert’s preferences, psychological effects, etc. Our approach aims to rectify these biases, thereby preventing machine learning methods from transferring them to models trained on the data. To this end, we make use of so-called label smoothing, which allows for redistributing probability mass from the originally observed rating to other ratings, which are considered as possible corrections. This enables the incorporation of domain knowledge into the standard cross-entropy loss and leads to flexibly configurable models. Concretely, our method is realized for ordinal ratings and allows for arbitrary unimodal smoothings using a binary smoothing relation. Additionally, the paper suggests two practically motivated smoothing heuristics to address common biases in observational data, a time-based smoothing to handle concept drift and a class-wise smoothing based on class priors to mitigate data imbalance. The effectiveness of the proposed methods is demonstrated on four real-world goodwill assessment data sets of a car manufacturer with the aim of automating goodwill decisions. Overall, this paper presents a promising approach for modeling ordinal observational data that can improve decision-making processes and reduce reliance on human expertise.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[29]

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[28]

E. Terzieva, M. Muschalik, P. Hofman and E. Hüllermeier.
Identifying Trends in Feature Attributions During Training of Neural Networks.
ECML-PKDD 2023 - Workshop Uncertainty meets Explainability in Machine Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Abstract

This study investigates the evolving dynamics of commonly used feature attribution (FA) values during training of neural networks. As models transition from a state of high uncertainty to low uncertainty, we show that the features’ significance also changes, which is inline with the general learning theory of deep neural networks. During model training, we compute FA scores through Layer-wise Relevance Propagation (LRP) and Gradient-weighted Class Activation Mapping (Grad-CAM), which are selected for their efficiency and speed of computation. We summarize the attribution scores in terms of the sum of the absolute values of FA scores and their entropy. We further analyze these summary scores in relation to the models’ generalization capabilities. The analysis identifies trends where FA values increase in magnitude while entropy decreases during the training process, regardless of model generalization, suggesting independence of overfitting. This research offers a unique view on the application of FA methods in explainable artificial intelligence (XAI) and raises intriguing questions about their behavior across varying model architectures and datasets, which may have implications for future work combining XAI and uncertainty estimation in machine learning.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[27]

T. Kaufmann, S. Ball, J. Beck, E. Hüllermeier and F. Kreuter.
On the challenges and practices of reinforcement learning from real human feedback.
HLDM @ECML-PKDD 2023 - 1st Workshop on Hybrid Human-Machine Learning and Decision Making at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that does not require an engineered reward function but instead learns from human feedback. Due to its increasing popularity, various authors have studied how to learn an accurate reward model from only few samples, making optimal use of this feedback. Because of the cost and complexity of user studies, however, this research is often conducted with synthetic human feedback. Such feedback can be generated by evaluating behavior based on ground-truth rewards which are available for some benchmark tasks. While this setting can help evaluate some aspects of RLHF, it differs from practical settings in which synthetic feedback is not available. Working with real human feedback brings additional challenges that cannot be observed with synthetic feedback, including fatigue, inter-rater inconsistencies, delay, misunderstandings, and modality-dependent difficulties. We describe and discuss some of these challenges together with current practices and opportunities for further research in this paper.

MCML Authors

Timo Kaufmann

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence and Machine Learning

Sarah Ball

Social Data Science and AI

Jacob Beck

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Eyke Hüllermeier

Prof. Dr.

C4 | Computational Social Sciences

Artificial Intelligence and Machine Learning

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[26]

A. Javanmardi, Y. Sale, P. Hofman and E. Hüllermeier.
Conformal Prediction with Partially Labeled Data.
COPA 2023 - 12th Symposium on Conformal and Probabilistic Prediction with Applications. Limassol, Cyprus, Sep 13-15, 2023. URL

Abstract

While the predictions produced by conformal prediction are set-valued, the data used for training and calibration is supposed to be precise. In the setting of superset learning or learning from partial labels, a variant of weakly supervised learning, it is exactly the other way around: training data is possibly imprecise (set-valued), but the model induced from this data yields precise predictions. In this paper, we combine the two settings by making conformal prediction amenable to set-valued training data. We propose a generalization of the conformal prediction procedure that can be applied to set-valued training and calibration data. We prove the validity of the proposed method and present experimental studies in which it compares favorably to natural baselines.

MCML Authors

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[25]

M. Caprio, Y. Sale, E. Hüllermeier and I. Lee.
A Novel Bayes' Theorem for Upper Probabilities.
Epi UAI 2023 - International Workshop on Epistemic Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Aug 04, 2023. DOI

Abstract

In their seminal 1990 paper, Wasserman and Kadane establish an upper bound for the Bayes’ posterior probability of a measurable set A, when the prior lies in a class of probability measures and the likelihood is precise. They also give a sufficient condition for such upper bound to hold with equality. In this paper, we introduce a generalization of their result by additionally addressing uncertainty related to the likelihood. We give an upper bound for the posterior probability when both the prior and the likelihood belong to a set of probabilities. Furthermore, we give a sufficient condition for this upper bound to become an equality. This result is interesting on its own, and has the potential of being applied to various fields of engineering (e.g. model predictive control), machine learning, and artificial intelligence.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[24]

S. Henzgen and E. Hüllermeier.
Weighting by Tying: A New Approach to Weighted Rank Correlation.
Preprint (Aug. 2023). arXiv

Abstract

Measures of rank correlation are commonly used in statistics to capture the degree of concordance between two orderings of the same set of items. Standard measures like Kendall’s tau and Spearman’s rho coefficient put equal emphasis on each position of a ranking. Yet, motivated by applications in which some of the positions (typically those on the top) are more important than others, a few weighted variants of these measures have been proposed. Most of these generalizations fail to meet desirable formal properties, however. Besides, they are often quite inflexible in the sense of committing to a fixed weighing scheme. In this paper, we propose a weighted rank correlation measure on the basis of fuzzy order relations. Our measure, called scaled gamma, is related to Goodman and Kruskal’s gamma rank correlation. It is parametrized by a fuzzy equivalence relation on the rank positions, which in turn is specified conveniently by a so-called scaling function. This approach combines soundness with flexibility: it has a sound formal foundation and allows for weighing rank positions in a flexible way.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[23]

Y. Sale, M. Caprio and E. Hüllermeier.
Is the Volume of a Credal Set a Good Measure for Epistemic Uncertainty?
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Abstract

Adequate uncertainty representation and quantification have become imperative in various scientific disciplines, especially in machine learning and artificial intelligence. As an alternative to representing uncertainty via one single probability measure, we consider credal sets (convex sets of probability measures). The geometric representation of credal sets as d-dimensional polytopes implies a geometric intuition about (epistemic) uncertainty. In this paper, we show that the volume of the geometric representation of a credal set is a meaningful measure of epistemic uncertainty in the case of binary classification, but less so for multi-class classification. Our theoretical findings highlight the crucial role of specifying and employing uncertainty measures in machine learning in an appropriate way, and for being aware of possible pitfalls.

MCML Authors

Yusuf Sale

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence and Machine Learning

[22]

L. Wimmer, Y. Sale, P. Hofman, B. Bischl and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures?
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Abstract

The quantification of aleatoric and epistemic uncertainty in terms of conditional entropy and mutual information, respectively, has recently become quite common in machine learning. While the properties of these measures, which are rooted in information theory, seem appealing at first glance, we identify various incoherencies that call their appropriateness into question. In addition to the measures themselves, we critically discuss the idea of an additive decomposition of total uncertainty into its aleatoric and epistemic constituents. Experiments across different computer vision tasks support our theoretical findings and raise concerns about current practice in uncertainty quantification.

MCML Authors

Lisa Wimmer

Statistical Learning and Data Science

Yusuf Sale

Artificial Intelligence and Machine Learning

Paul Hofman

Artificial Intelligence and Machine Learning

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[21]

M. K. Belaid, R. Bornemann, M. Rabus, R. Krestel and E. Hüllermeier.
Compare-xAI: Toward Unifying Functional Testing Methods for Post-hoc XAI Algorithms into a Multi-dimensional Benchmark.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. DOI GitHub

Abstract

In recent years, Explainable AI (xAI) attracted a lot of attention as various countries turned explanations into a legal right. xAI algorithms enable humans to understand the underlying models and explain their behavior, leading to insights through which the models can be analyzed and improved beyond the accuracy metric by, e.g., debugging the learned pattern and reducing unwanted biases. However, the widespread use of xAI and the rapidly growing body of published research in xAI have brought new challenges. A large number of xAI algorithms can be overwhelming and make it difficult for practitioners to choose the correct xAI algorithm for their specific use case. This problem is further exacerbated by the different approaches used to assess novel xAI algorithms, making it difficult to compare them to existing methods. To address this problem, we introduce Compare-xAI, a benchmark that allows for a direct comparison of popular xAI algorithms with a variety of different use cases. We propose a scoring protocol employing a range of functional tests from the literature, each targeting a specific end-user requirement in explaining a model. To make the benchmark results easily accessible, we group the tests into four categories (fidelity, fragility, stability, and stress tests). We present results for 13 xAI algorithms based on 11 functional tests. After analyzing the findings, we derive potential solutions for data science practitioners as workarounds to the found practical limitations. Finally, Compare-xAI is a tentative to unify systematic evaluation and comparison methods for xAI algorithms with a focus on the end-user’s requirements.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[20]

M. Muschalik, F. Fumagalli, R. Jagtani, B. Hammer and E. Hüllermeier.
iPDP: On Partial Dependence Plots in Dynamic Modeling Scenarios.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. Best Paper Award. DOI

Abstract

Post-hoc explanation techniques such as the well-established partial dependence plot (PDP), which investigates feature dependencies, are used in explainable artificial intelligence (XAI) to understand black-box machine learning models. While many real-world applications require dynamic models that constantly adapt over time and react to changes in the underlying distribution, XAI, so far, has primarily considered static learning environments, where models are trained in a batch mode and remain unchanged. We thus propose a novel model-agnostic XAI framework called incremental PDP (iPDP) that extends on the PDP to extract time-dependent feature effects in non-stationary learning environments. We formally analyze iPDP and show that it approximates a time-dependent variant of the PDP that properly reacts to real and virtual concept drift. The time-sensitivity of iPDP is controlled by a single smoothing parameter, which directly corresponds to the variance and the approximation error of iPDP in a static learning environment. We illustrate the efficacy of iPDP by showcasing an example application for drift detection and conducting multiple experiments on real-world and synthetic data sets and streams.

MCML Authors

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[19]

V. Bengs, E. Hüllermeier and W. Waegeman.
On Second-Order Scoring Rules for Epistemic Uncertainty Quantification.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i.e., the uncertainty caused by a lack of knowledge and data. An emerging branch of the literature proposes the use of a second-order learner that provides predictions in terms of distributions on probability distributions. However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. As a main mathematical tool to prove this result, we introduce the generalised notion of second-order scoring rules.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[18]

M. Wever, M. Özdogan and E. Hüllermeier.
Cooperative Co-Evolution for Ensembles of Nested Dichotomies for Multi-Class Classification.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

Abstract

In multi-class classification, it can be beneficial to decompose a learning problem into several simpler problems. One such reduction technique is the use of so-called nested dichotomies, which recursively bisect the set of possible classes such that the resulting subsets can be arranged in the form of a binary tree, where each split defines a binary classification problem. Recently, a genetic algorithm for optimizing the structure of such nested dichotomies has achieved state-of-the-art results. Motivated by its success, we propose to extend this approach using a co-evolutionary scheme to optimize both the structure of nested dichotomies and their composition into ensembles through which they are evaluated. Furthermore, we present an experimental study showing this approach to yield ensembles of nested dichotomies at substantially lower cost and, in some cases, even with an improved generalization performance.

MCML Authors

Marcel Wever

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[17]

T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever and E. Hüllermeier.
Towards Green Automated Machine Learning: Status Quo and Future Directions.
Journal of Artificial Intelligence Research 77 (Jun. 2023). DOI

Abstract

Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution — a machine learning pipeline — tailored to the learning task (dataset) at hand. Over the last decade, AutoML has developed into an independent research field with hundreds of contributions. At the same time, AutoML is being criticized for its high resource consumption as many approaches rely on the (costly) evaluation of many machine learning pipelines, as well as the expensive large-scale experiments across many datasets and approaches. In the spirit of recent work on Green AI, this paper proposes Green AutoML, a paradigm to make the whole AutoML process more environmentally friendly. Therefore, we first elaborate on how to quantify the environmental footprint of an AutoML tool. Afterward, different strategies on how to design and benchmark an AutoML tool w.r.t. their “greenness”, i.e., sustainability, are summarized. Finally, we elaborate on how to be transparent about the environmental footprint and what kind of research incentives could direct the community in a more sustainable AutoML research direction. As part of this, we propose a sustainability checklist to be attached to every AutoML paper featuring all core aspects of Green AutoML.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Marcel Wever

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[16]

A.-K. Wickert, C. Damke, L. Baumgärtner, E. Hüllermeier and M. Mezini.
UnGoML: Automated Classification of unsafe Usages in Go.
MSR 2023 - IEEE/ACM 20th International Conference on Mining Software Repositories. Melbourne, Australia, May 15-16, 2023. FOSS (Free, Open Source Software) Impact Paper Award. DOI GitHub

Abstract

The Go programming language offers strong protection from memory corruption. As an escape hatch of these protections, it provides the unsafe package. Previous studies identified that this unsafe package is frequently used in real-world code for several purposes, e.g., serialization or casting types. Due to the variety of these reasons, it may be possible to refactor specific usages to avoid potential vulnerabilities. However, the classification of unsafe usages is challenging and requires the context of the call and the program’s structure. In this paper, we present the first automated classifier for unsafe usages in Go, UnGoML, to identify what is done with the unsafe package and why it is used. For UnGoML, we built four custom deep learning classifiers trained on a manually labeled data set. We represent Go code as enriched control-flow graphs (CFGs) and solve the label prediction task with one single-vertex and three context-aware classifiers. All three context-aware classifiers achieve a top-1 accuracy of more than 86% for both dimensions, WHAT and WHY. Furthermore, in a set-valued conformal prediction setting, we achieve accuracies of more than 93% with mean label set sizes of 2 for both dimensions. Thus, UnGoML can be used to efficiently filter unsafe usages for use cases such as refactoring or a security audit.

MCML Authors

Clemens Damke

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[15]

D. Schubert, P. Gupta and M. Wever.
Meta-learning for Automated Selection of Anomaly Detectors for Semi-supervised Datasets.
IDA 2023 - 21st International Symposium on Intelligent Data Analysis. Louvain-la-Neuve, Belgium, Apr 12-14, 2023. DOI

Abstract

In anomaly detection, a prominent task is to induce a model to identify anomalies learned solely based on normal data. Generally, one is interested in finding an anomaly detector that correctly identifies anomalies, i.e., data points that do not belong to the normal class, without raising too many false alarms. Which anomaly detector is best suited depends on the dataset at hand and thus needs to be tailored. The quality of an anomaly detector may be assessed via confusion-based metrics such as the Matthews correlation coefficient (MCC). However, since during training only normal data is available in a semi-supervised setting, such metrics are not accessible. To facilitate automated machine learning for anomaly detectors, we propose to employ meta-learning to predict MCC scores using the metrics that can be computed with normal data only and order anomaly detectors using the predicted scores for selection. First promising results can be obtained considering the hypervolume and the false positive rate as meta-features.

MCML Authors

Marcel Wever

Dr.

* Former Member

[14]

T. Tornede, A. Tornede, L. Fehring, L. Gehring, H. Graf, J. Hanselle, F. Mohr and M. Wever.
PyExperimenter: Easily distribute experiments and track results.
The Journal of Open Source Software 8.86 (Apr. 2023). DOI

Abstract

PyExperimenter is a tool to facilitate the setup, documentation, execution, and subsequent evaluation of results from an empirical study of algorithms and in particular is designed to reduce the involved manual effort significantly. It is intended to be used by researchers in the field of artificial intelligence, but is not limited to those.
The empirical analysis of algorithms is often accompanied by the execution of algorithms for different inputs and variants of the algorithms, specified via parameters, and the measurement of non-functional properties. Since the individual evaluations are usually independent, the evaluation can be performed in a distributed manner on an HPC system. However, setting up, documenting, and evaluating the results of such a study is often file-based. Usually, this requires extensive manual work to create configuration files for the inputs or to read and aggregate measured results from a report file. In addition, monitoring and restarting individual executions is tedious and time-consuming.
PyExperimenter adresses theses challenges by means of a single well defined configuration file and a central database for managing massively parallel evaluations, as well as collecting and aggregating their results. Thereby, PyExperimenter alleviates the aforementioned overhead and allows experiment executions to be defined and monitored with ease.

MCML Authors

Jonas Hanselle

Artificial Intelligence and Machine Learning

Marcel Wever

Dr.

* Former Member

[13]

M. K. Belaid, D. E. Mekki, M. Rabus and E. Hüllermeier.
Optimizing Data Shapley Interaction Calculation from $O(2^n)$ to $O(t n^2)$ for KNN models.
Preprint (Apr. 2023). arXiv

Abstract

With the rapid growth of data availability and usage, quantifying the added value of each training data point has become a crucial process in the field of artificial intelligence. The Shapley values have been recognized as an effective method for data valuation, enabling efficient training set summarization, acquisition, and outlier removal. In this paper, we introduce ‘STI-KNN’, an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in $O(t n^2)$ time, which is a significant improvement over the $O(2^n)$ time complexity of baseline methods. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[12]

E. Hüllermeier.
Representation of Quantification of Uncertainty in Machine Learning.
TRR 165/181 2023 - Scale interactions, data-driven modeling, and uncertainty in weather and climate. Ingolstadt, Germany, Mar 27-30, 2023. Invited Talk. PDF

Abstract

n/a

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[11]

J. Brandt, E. Schede, B. Haddenhorst, V. Bengs, E. Hüllermeier and K. Tierney.
AC-Band: A Combinatorial Bandit-Based Approach to Algorithm Configuration.
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI

Abstract

We study the algorithm configuration (AC) problem, in which one seeks to find an optimal parameter configuration of a given target algorithm in an automated way. Although this field of research has experienced much progress recently regarding approaches satisfying strong theoretical guarantees, there is still a gap between the practical performance of these approaches and the heuristic state-of-the-art approaches. Recently, there has been significant progress in designing AC approaches that satisfy strong theoretical guarantees. However, a significant gap still remains between the practical performance of these approaches and state-of-the-art heuristic methods. To this end, we introduce AC-Band, a general approach for the AC problem based on multi-armed bandits that provides theoretical guarantees while exhibiting strong practical performance. We show that AC-Band requires significantly less computation time than other AC approaches providing theoretical guarantees while still yielding high-quality configurations.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[10]

J. Brandt, M. Wever, D. Iliadis, V. Bengs and E. Hüllermeier.
Iterative Deepening Hyperband.
Preprint (Feb. 2023). arXiv

Abstract

Hyperparameter optimization (HPO) is concerned with the automated search for the most appropriate hyperparameter configuration (HPC) of a parameterized machine learning algorithm. A state-of-the-art HPO method is Hyperband, which, however, has its own parameters that influence its performance. One of these parameters, the maximal budget, is especially problematic: If chosen too small, the budget needs to be increased in hindsight and, as Hyperband is not incremental by design, the entire algorithm must be re-run. This is not only costly but also comes with a loss of valuable knowledge already accumulated. In this paper, we propose incremental variants of Hyperband that eliminate these drawbacks, and show that these variants satisfy theoretical guarantees qualitatively similar to those for the original Hyperband with the ‘right’ budget. Moreover, we demonstrate their practical utility in experiments with benchmark data sets.

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[9]

V. Bengs and E. Hüllermeier.
Multi-armed bandits with censored consumption of resources.
Machine Learning 112.1 (Jan. 2023). DOI

Abstract

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit. Otherwise, the observation is censored, i.e., no reward is obtained. For this problem setting, we introduce a measure of regret, which incorporates both the actual amount of consumed resources of each learning round and the optimality of realizable rewards as well as the risk of exceeding the allocated resource limit. Thus, to minimize regret, the learner needs to set a resource limit and choose an arm in such a way that the chance to realize a high reward within the predefined resource limit is high, while the resource limit itself should be kept as low as possible. We propose a UCB-inspired online learning algorithm, which we analyze theoretically in terms of its regret upper bound. In a simulation study, we show that our learning algorithm outperforms straightforward extensions of standard multi-armed bandit algorithms.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[8]

P. Gupta, J. P. Drees and E. Hüllermeier.
Automated Side-Channel Attacks using Black-Box Neural Architecture Search.
Preprint at Cryptology ePrint Archive (Jan. 2023). URL

Abstract

The usage of convolutional neural networks (CNNs) to break cryptographic systems through hardware side-channels has enabled fast and adaptable attacks on devices like smart cards and TPMs. Current literature proposes fixed CNN architectures designed by domain experts to break such systems, which is time-consuming and unsuitable for attacking a new system. Recently, an approach using neural architecture search (NAS), which is able to acquire a suitable architecture automatically, has been explored. These works use the secret key information in the attack dataset for optimization and only explore two different search strategies using one-dimensional CNNs. We propose a NAS approach that relies only on using the profiling dataset for optimization, making it fully black-box. Using a large-scale experimental parameter study, we explore which choices for NAS, such as 1-D or 2-D CNNs and search strategy, produce the best results on 10 state-of-the-art datasets for Hamming weight and identity leakage models. We show that applying the random search strategy on 1-D inputs results in a high success rate and retrieves the correct secret key using a single attack trace on two of the datasets. This combination matches the attack efficiency of fixed CNN architectures, outperforming them in 4 out of 10 datasets. Our experiments also point toward the need for repeated attack evaluations of machine learning-based solutions in order to avoid biased performance estimates.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

2022

[7]

S. Legler, T. Janjic, M. H. Shaker and E. Hüllermeier.
Machine learning for estimating parameters of a convective-scale model: A comparison of neural networks and random forests.
GMA - 32nd Workshop of Computational Intelligence of the VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik. Berlin, Germany, Dec 01-02, 2022. PDF

Abstract

Errors and inaccuracies in the representation of clouds in convection-permitting numerical weather prediction models can be caused by various sources, including the forcing and boundary conditions, the representation of orography, and the accuracy of the numerical schemes determining the evolution of humidity and temperature. Moreover, the parametrization of microphysics and the parametrization of processes in the surface and boundary layers do have a significant influence. These schemes typically contain several tunable parameters that are either non-physical or only crudely known, leading to model errors and imprecision. Furthermore, not accounting for uncertainties in these parameters might lead to overconfidence in the model during forecasting and data assimilation (DA).
Traditionally, the numerical values of model parameters are chosen by manual model tuning. More objectively, they can be estimated from observations by the so-called augmented state approach during the data assimilation [7]. Alternatively, the problem of estimating model parameters has recently been tackled by means of a hybrid approach combining DA with machine learning, more specifically a Bayesian neural network (BNN) [6]. As a proof of concept, this approach has been applied to a one-dimensional modified shallow-water (MSW) model [8].
Even though the BNN is able to accurately estimate the model parameters and their uncertainties, its high computational cost poses an obstacle to its use in operational settings where the grid sizes of the atmospheric fields are much larger than in the simple MSW model. Because random forests (RF) [2] are typically computationally cheaper while still being able to adequately represent uncertainties, we are interested in comparing RFs and BNNs. To this end, we follow [6] and again consider the problem of estimating the three model parameters of the MSW model as a function of the atmospheric state.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[6]

V. Bengs, E. Hüllermeier and W. Waegeman.
Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner’s (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[5]

J. Brandt, V. Bengs, B. Haddenhorst and E. Hüllermeier.
Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. Unlike existing works, we study this problem in a non-stochastic setting with subset-dependent feedback, i.e., the semi-bandit feedback received could be generated by an oblivious adversary and also might depend on the chosen set of arms. In addition, we consider a general feedback scenario covering both the numerical-based as well as preference-based case and introduce a sound theoretical framework for this setting guaranteeing sensible notions of optimal arms, which a learner seeks to find. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies from aggressive to conservative. Theoretical questions about the sufficient and necessary budget of the algorithm to find the best arm are answered and complemented by deriving lower bounds for any learning algorithm for this problem scenario.

MCML Authors

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[4]

A. Campagner, J. Lienen, E. Hüllermeier and D. Ciucci.
Scikit-Weak: A Python Library for Weakly Supervised Machine Learning.
IJCRS 2022 - International Joint Conference on Rough Sets. Suzhou, China, Nov 11-14, 2022. DOI

Abstract

In this article we introduce and describe SCIKIT-WEAK, a Python library inspired by SCIKIT-LEARN and developed to provide an easy-to-use framework for dealing with weakly supervised and imprecise data learning problems, which, despite their importance in real-world settings, cannot be easily managed by existing libraries. We provide a rationale for the development of such a library, then we discuss its design and the currently implemented methods and classes, which encompass several state-of-the-art algorithms.

MCML Authors

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[3]

E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
Journal of Artificial Intelligence Research 75 (Oct. 2022). DOI

Abstract

Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There is currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. We review existing AC literature within the lens of our taxonomies, outline relevant design choices of configuration approaches, contrast methods and problem variants against each other, and describe the state of AC in industry. Finally, our review provides researchers and practitioners with a look at future research directions in the field of AC.

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[2]

E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
IJCAI-ECAI 2022 - 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence. Vienna, Austria, Jul 23-29, 2022. Extended Abstract. DOI

Abstract

MCML Authors

Marcel Wever

Dr.

* Former Member

Viktor Bengs

Dr.

* Former Member

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[1]

V.-L. Nguyen, M. H. Shaker and E. Hüllermeier.
How to measure uncertainty in uncertainty sampling for active learning.
Machine Learning 111.1 (2022). DOI

Abstract

Various strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.

MCML Authors

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.