Home | Research | Groups | Eyke Hüllermeier

Research Group Eyke Hüllermeier

Link to website at LMU

Eyke Hüllermeier

Prof. Dr.

Principal Investigator

Artificial Intelligence and Machine Learning

Eyke Hüllermeier

heads the Chair of Artificial Intelligence and Machine Learning at LMU Munich.

His research interests are centered around methods and theoretical foundations of artificial intelligence, with a specific focus on machine learning and reasoning under uncertainty. He has published more than 300 articles on these topics in top-tier journals and major international conferences, and several of his contributions have been recognized with scientific awards.

Team members @MCML


Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

PhD Students

Link to website

Jonas Hanselle

Artificial Intelligence and Machine Learning

Link to website

Paul Hofman

Artificial Intelligence and Machine Learning

Link to website

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Link to website

Timo Kaufmann

Artificial Intelligence and Machine Learning

Link to website

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Link to website

Valentin Margraf

Artificial Intelligence and Machine Learning

Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to website

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Publications @MCML


M. Spliethöver, T. Knebler, F. Fumagalli, M. Muschalik, B. Hammer, E. Hüllermeier and H. Wachsmuth.
Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection.
NAACL 2025 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Albuquerque, NM, USA, Apr 29-May 04, 2025. To be published. Preprint available. arXiv

Recent advances on instruction fine-tuning have led to the development of various prompting techniques for large language models, such as explicit reasoning steps. However, the success of techniques depends on various parameters, such as the task, language model, and context provided. Finding an effective prompt is, therefore, often a trial-and-error process. Most existing approaches to automatic prompting aim to optimize individual techniques instead of compositions of techniques and their dependence on the input. To fill this gap, we propose an adaptive prompting approach that predicts the optimal prompt composition ad-hoc for a given input. We apply our approach to social bias detection, a highly context-dependent task that requires semantic understanding. We evaluate it with three large language models on three datasets, comparing compositions to individual techniques and other baselines. The results underline the importance of finding an effective prompt composition. Our approach robustly ensures high detection performance, and is best in several settings. Moreover, first experiments on other tasks support its generalizability.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

F. Fumagalli, M. Muschalik, P. Frazzetto, J. Strotherm, L. Hermes, A. Sperduti, E. Hüllermeier and B. Hammer.
Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Albeit the ubiquitous use of Graph Neural Networks (GNNs) in machine learning (ML) prediction tasks involving graph-structured data, their interpretability remains challenging. In explainable artificial intelligence (XAI), the Shapley Value (SV) is the predominant method to quantify contributions of individual features to a ML model’s output. Addressing the limitations of SVs in complex prediction models, Shapley Interactions (SIs) extend the SV to groups of features. In this work, we explain single graph predictions of GNNs with SIs that quantify node contributions and interactions among multiple nodes. By exploiting the GNN architecture, we show that the structure of interactions in node embeddings are preserved for graph prediction. As a result, the exponential complexity of SIs depends only on the receptive fields, i.e. the message-passing ranges determined by the connectivity of the graph and the number of convolutional layers. Based on our theoretical results, we introduce GraphSHAP-IQ, an efficient approach to compute any-order SIs exactly. GraphSHAP-IQ is applicable to popular message passing techniques in conjunction with a linear global pooling and output layer. We showcase that GraphSHAP-IQ substantially reduces the exponential complexity of computing exact SIs on multiple benchmark datasets. Beyond exact computation, we evaluate GraphSHAP-IQ’s approximation of SIs on popular GNN architectures and compare with existing baselines. Lastly, we visualize SIs of real-world water distribution networks and molecule structures using a SI-Graph.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Y. Zhang, Z. Ma, Y. Ma, Z. Han, Y. Wu and V. Tresp.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. To be published. Preprint available. arXiv

LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks and continuously refining this plan, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.

MCML Authors
Link to website

Yao Zhang

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

G. D. Pelegrina, P. Kolpaczki and E. Hüllermeier.
Shapley Value Approximation Based on k-Additive Games.
Preprint (Feb. 2025). arXiv

The Shapley value is the prevalent solution for fair division problems in which a payout is to be divided among multiple agents. By adopting a game-theoretic view, the idea of fair division and the Shapley value can also be used in machine learning to quantify the individual contribution of features or data points to the performance of a predictive model. Despite its popularity and axiomatic justification, the Shapley value suffers from a computational complexity that scales exponentially with the number of entities involved, and hence requires approximation methods for its reliable estimation. We propose SVAkADD, a novel approximation method that fits a k-additive surrogate game. By taking advantage of k-additivity, we are able to elicit the exact Shapley values of the surrogate game and then use these values as estimates for the original fair division problem. The efficacy of our method is evaluated empirically and compared to competing methods.

MCML Authors
Link to website

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

T. Mortier, A. Javanmardi, Y. Sale, E. Hüllermeier and W. Waegeman.
Conformal Prediction in Hierarchical Classification.
Preprint (Jan. 2025). arXiv

Conformal prediction has emerged as a widely used framework for constructing valid prediction sets in classification and regression tasks. In this work, we extend the split conformal prediction framework to hierarchical classification, where prediction sets are commonly restricted to internal nodes of a predefined hierarchy, and propose two computationally efficient inference algorithms. The first algorithm returns internal nodes as prediction sets, while the second relaxes this restriction, using the notion of representation complexity, yielding a more general and combinatorial inference problem, but smaller set sizes. Empirical evaluations on several benchmark datasets demonstrate the effectiveness of the proposed algorithms in achieving nominal coverage.

MCML Authors
Link to website

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

M. H. Shaker and E. Hüllermeier.
Random Forest Calibration.
Preprint (Jan. 2025). arXiv

The Random Forest (RF) classifier is often claimed to be relatively well calibrated when compared with other machine learning methods. Moreover, the existing literature suggests that traditional calibration methods, such as isotonic regression, do not substantially enhance the calibration of RF probability estimates unless supplied with extensive calibration data sets, which can represent a significant obstacle in cases of limited data availability. Nevertheless, there seems to be no comprehensive study validating such claims and systematically comparing state-of-the-art calibration methods specifically for RF. To close this gap, we investigate a broad spectrum of calibration methods tailored to or at least applicable to RF, ranging from scaling techniques to more advanced algorithms. Our results based on synthetic as well as real-world data unravel the intricacies of RF probability estimates, scrutinize the impacts of hyper-parameters, compare calibration methods in a systematic way. We show that a well-optimized RF performs as well as or better than leading calibration approaches.

MCML Authors
Link to website

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning


A. Javanmardi, D. Stutz and E. Hüllermeier.
Conformalized Credal Set Predictors.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.

MCML Authors
Link to website

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

M. Muschalik, H. Baniecki, F. Fumagalli, P. Kolpaczki, B. Hammer and E. Hüllermeier.
shapiq: Shapley Interactions for Machine Learning.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to website

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

F. Fumagalli, M. Muschalik, E. Hüllermeier, B. Hammer and J. Herbinger.
Unifying Feature-Based Explanations with Functional ANOVA and Cooperative Game Theory.
Preprint (Dec. 2024). arXiv

Feature-based explanations, using perturbations or gradients, are a prevalent tool to understand decisions of black box machine learning models. Yet, differences between these methods still remain mostly unknown, which limits their applicability for practitioners. In this work, we introduce a unified framework for local and global feature-based explanations using two well-established concepts: functional ANOVA (fANOVA) from statistics, and the notion of value and interaction from cooperative game theory. We introduce three fANOVA decompositions that determine the influence of feature distributions, and use game-theoretic measures, such as the Shapley value and interactions, to specify the influence of higher-order interactions. Our framework combines these two dimensions to uncover similarities and differences between a wide range of explanation techniques for features and groups of features. We then empirically showcase the usefulness of our framework on synthetic and real-world datasets.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

R. Liao, M. Erler, H. Wang, G. Zhai, G. Zhang, Y. Ma and V. Tresp.
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA.

MCML Authors
Link to website

Ruotong Liao

Database Systems and Data Mining

Link to website

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to website

Gengyuan Zhang

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

J. Bi, Y. Wang, H. Chen, X. Xiao, A. Hecker, V. Tresp and Y. Ma.
Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering.
Preprint (Nov. 2024). arXiv

Multimodal Large Language Models (MLLMs) have significantly advanced visual tasks by integrating visual representations into large language models (LLMs). The textual modality, inherited from LLMs, equips MLLMs with abilities like instruction following and in-context learning. In contrast, the visual modality enhances performance in downstream tasks by leveraging rich semantic content, spatial information, and grounding capabilities. These intrinsic modalities work synergistically across various visual tasks. Our research initially reveals a persistent imbalance between these modalities, with text often dominating output generation during visual instruction tuning. This imbalance occurs when using both full fine-tuning and parameter-efficient fine-tuning (PEFT) methods. We then found that re-balancing these modalities can significantly reduce the number of trainable parameters required, inspiring a direction for further optimizing visual instruction tuning. We introduce Modality Linear Representation-Steering (MoReS) to achieve the goal. MoReS effectively re-balances the intrinsic modalities throughout the model, where the key idea is to steer visual representations through linear transformations in the visual subspace across each model layer. To validate our solution, we composed LLaVA Steering, a suite of models integrated with the proposed MoReS method. Evaluation results show that the composed LLaVA Steering models require, on average, 500 times fewer trainable parameters than LoRA needs while still achieving comparable performance across three visual benchmarks and eight visual question-answering tasks. Last, we present the LLaVA Steering Factory, an in-house developed platform that enables researchers to quickly customize various MLLMs with component-based architecture for seamlessly integrating state-of-the-art models, and evaluate their intrinsic modality imbalance.

MCML Authors
Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

S. M. A. R. Thies, J. C. Alfaro and V. Bengs.
MORE–PLR: Multi-Output Regression Employed for Partial Label Ranking.
DS 2024 - 27th International Conference on Discovery Science. Pisa, Italy, Oct 14-16, 2024. To be published. GitHub

The partial label ranking (PLR) problem is a supervised learning scenario where the learner predicts a ranking with ties of the labels for a given input instance. It generalizes the well-known label ranking (LR) problem, which only allows for strict rankings. So far, pre-vious learning approaches for PLR have primarily adapted LR methods to accommodate ties in predictions. This paper proposes using multi-output regression (MOR) to address the PLR problem by treating ranking positions as multivariate targets, an approach that has received little attention in both LR and PLR. To effectively employ this approach, we introduce several post-hoc layers that convert MOR results into a ranking, potentially including ties. This framework produces a range of learning approaches, which we demonstrate in experimental evaluations to be competitive with the current state-of-the-art PLR methods.

MCML Authors

Z. Ding, Y. Li, Y. He, A. Norelli, J. Wu, V. Tresp, Y. Ma and M. Bronstein.
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models.
Preprint (Oct. 2024). arXiv

Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2) Meanwhile, more powerful models are needed to identify and select the most critical temporal information within the extended context provided by longer histories. To address these problems, we propose a CTDG representation learning model named DyGMamba, originating from the popular Mamba state space model (SSM). DyGMamba first leverages a node-level SSM to encode the sequence of historical node interactions. Another time-level SSM is then employed to exploit the temporal patterns hidden in the historical graph, where its output is used to dynamically select the critical information from the interaction history. We validate DyGMamba experimentally on the dynamic link prediction task. The results show that our model achieves state-of-the-art in most cases. DyGMamba also maintains high efficiency in terms of computational resources, making it possible to capture long temporal dependencies with a limited computation budget.

MCML Authors
Link to website

Zifeng Ding

Database Systems and Data Mining

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Y. Sun, Z. Wu, Y. Ma and V. Tresp.
Quantum Architecture Search with Unsupervised Representation Learning.
Preprint (Oct. 2024). arXiv

Unsupervised representation learning presents new opportunities for advancing Quantum Architecture Search (QAS) on Noisy Intermediate-Scale Quantum (NISQ) devices. QAS is designed to optimize quantum circuits for Variational Quantum Algorithms (VQAs). Most QAS algorithms tightly couple the search space and search algorithm, typically requiring the evaluation of numerous quantum circuits, resulting in high computational costs and limiting scalability to larger quantum circuits. Predictor-based QAS algorithms mitigate this issue by estimating circuit performance based on structure or embedding. However, these methods often demand time-intensive labeling to optimize gate parameters across many circuits, which is crucial for training accurate predictors. Inspired by the classical neural architecture search algorithm Arch2vec, we investigate the potential of unsupervised representation learning for QAS without relying on predictors. Our framework decouples unsupervised architecture representation learning from the search process, enabling the learned representations to be applied across various downstream tasks. Additionally, it integrates an improved quantum circuit graph encoding scheme, addressing the limitations of existing representations and enhancing search efficiency. This predictor-free approach removes the need for large labeled datasets. During the search, we employ REINFORCE and Bayesian Optimization to explore the latent representation space and compare their performance against baseline methods. Our results demonstrate that the framework efficiently identifies high-performing quantum circuits with fewer search iterations.

MCML Authors
Link to website

Yize Sun

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

C. Damke and E. Hüllermeier.
CUQ-GNN: Committee-Based Graph Uncertainty Quantification Using Posterior Networks.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

In this work, we study the influence of domain-specific characteristics when defining a meaningful notion of predictive uncertainty on graph data. Previously, the so-called Graph Posterior Network (GPN) model has been proposed to quantify uncertainty in node classification tasks. Given a graph, it uses Normalizing Flows (NFs) to estimate class densities for each node independently and converts those densities into Dirichlet pseudo-counts, which are then dispersed through the graph using the personalized Page-Rank (PPR) algorithm. The architecture of GPNs is motivated by a set of three axioms on the properties of its uncertainty estimates. We show that those axioms are not always satisfied in practice and therefore propose the family of Committe-based Uncertainty Quantification Graph Neural Networks (CUQ-GNNs), which combine standard Graph Neural Networks (GNNs) with the NF-based uncertainty estimation of Posterior Networks (PostNets). This approach adapts more flexibly to domain-specific demands on the properties of uncertainty estimates. We compare CUQ-GNN against GPN and other uncertainty quantification approaches on common node classification benchmarks and show that it is effective at producing useful uncertainty estimates.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

R. Fischer, M. Wever, S. Buschjäger and T. Liebig.
MetaQuRe: Meta-learning from Model Quality and Resource Consumption.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Automated machine learning (AutoML) allows for selecting, parametrizing, and composing learning algorithms for a given data set. While resources play a pivotal role in neural architecture search, it is less pronounced by classical AutoML approaches. In fact, they generally focus on only maximizing predictive quality and disregard the importance of finding resource-efficient solutions. To push resource awareness further, our work explicitly explores how measures such as running time or energy consumption can be better considered in AutoML. Firstly, we propose a novel method for algorithm selection that balances multiple performance aspects (including resource demand) as prioritized by the user with the help of compositional meta-learning. Secondly, to foster research on green meta-learning and AutoML, we release the MetaQuRe data set, which contains information on predictive (Qu)ality and (Re)source consumption of models evaluated across hundreds of data sets and four execution environments. We use this data to put our methodology into practice and conduct an in-depth analysis of how our approach and data set can help in making AutoML more resource-aware, which represents our third contribution. Lastly, we publish MetaQuRe alongside an extensive code base, allowing for reproducing all results, expanding our data with results from custom environments, and exploring MetaQuRe interactively. In short, our work demonstrates both the importance as well as benefits of rethinking AutoML and meta-learning in a resource-aware way, thus paving the path for making future ML solutions more sustainable.

MCML Authors

S. Gilhuber, A. Beer, Y. Ma and T. Seidl.
FALCUN: A Simple and Efficient Deep Active Learning Strategy.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

We propose FALCUN, a novel deep batch active learning method that is label- and time-efficient. Our proposed acquisition uses a natural, self-adjusting balance of uncertainty and diversity: It slowly transitions from emphasizing uncertain instances at the decision boundary to emphasizing batch diversity. In contrast, established deep active learning methods often have a fixed weighting of uncertainty and diversity, limiting their effectiveness over diverse data sets exhibiting different characteristics. Moreover, to increase diversity, most methods demand intensive search through a deep neural network’s high-dimensional latent embedding space. This leads to high acquisition times when experts are idle while waiting for the next batch for annotation. We overcome this structural problem by exclusively operating on the low-dimensional probability space, yielding much faster acquisition times without sacrificing label efficiency. In extensive experiments, we show FALCUN’s suitability for diverse use cases, including medical images and tabular data. Compared to state-of-the-art methods like BADGE, CLUE, and AlfaMix, FALCUN consistently excels in quality and speed: while FALCUN is among the fastest methods, it has the highest average label efficiency.

MCML Authors
Link to website

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Anna Beer

Anna Beer


* Former Member

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

A. Vahidi, L. Wimmer, H. A. Gündüz, B. Bischl, E. Hüllermeier and M. Rezaei.
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory demands. In addition, the efficiency of a deep ensemble is related to diversity among the ensemble members, which is challenging for large, over-parameterized deep neural networks. Moreover, ensemble learning has not yet seen such widespread adoption for unsupervised learning and it remains a challenging endeavor for self-supervised or unsupervised representation learning. Motivated by these challenges, we present a novel self-supervised training regime that leverages an ensemble of independent sub-networks, complemented by a new loss function designed to encourage diversity. Our method efficiently builds a sub-model ensemble with high diversity, leading to well-calibrated estimates of model uncertainty, all achieved with minimal computational overhead compared to traditional deep self-supervised ensembles. To evaluate the effectiveness of our approach, we conducted extensive experiments across various tasks, including in-distribution generalization, out-of-distribution detection, dataset corruption, and semi-supervised settings. The results demonstrate that our method significantly improves prediction reliability. Our approach not only achieves excellent accuracy but also enhances calibration, improving on important baseline performance across a wide range of self-supervised architectures in computer vision, natural language processing, and genomics data.

MCML Authors
Link to website

Lisa Wimmer

Statistical Learning and Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Link to website

Mina Rezaei


Statistical Learning and Data Science

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Explaining Change in Models and Data with Global Feature Importance and Effects.
TempXAI @ECML-PKDD 2024 - Tutorial-Workshop Explainable AI for Time Series and Data Streams at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. PDF

In dynamic machine learning environments, where data streams continuously evolve, traditional explanation methods struggle to remain faithful to the underlying model or data distribution. Therefore, this work presents a unified framework for efficiently computing incremental model-agnostic global explanations tailored for time-dependent models. By extending static model-agnostic methods such as Permutation Feature Importance, SAGE, and Partial Dependence Plots into the online learning context, the proposed framework enables the continuous updating of explanations as new data becomes available. These incremental variants ensure that global explanations remain relevant while minimizing computational overhead. The framework also addresses key challenges related to data distribution maintenance and perturbation generation in online learning, offering time and memory efficient solutions like geometric reservoir-based sampling for data replacement.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

J. Brandt, M. Wever, V. Bengs and E. Hüllermeier.
Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO.
IJCAI 2024 - 33rd International Joint Conference on Artificial Intelligence. Jeju, Korea, Aug 03-09, 2024. DOI

Hyperparameter optimization (HPO) is indispensable for achieving optimal performance in machine learning tasks. A popular class of methods in this regard is based on Successive Halving (SHA), which casts HPO into a pure-exploration multi-armed bandit problem under finite sampling budget constraints. This is accomplished by considering hyperparameter configurations as arms and rewards as the negative validation losses. While enjoying theoretical guarantees as well as working well in practice, SHA comes, however, with several hyperparameters itself, one of which is the maximum budget that can be allocated to evaluate a single arm (hyperparameter configuration). Although there are already solutions to this meta hyperparameter optimization problem, such as the doubling trick or asynchronous extensions of SHA, these are either practically inefficient or lack theoretical guarantees. In this paper, we propose incremental SHA (iSHA), a synchronous extension of SHA, allowing to increase the maximum budget a posteriori while still enjoying theoretical guarantees. Our empirical analysis of HPO problems corroborates our theoretical findings and shows that iSHA is more resource-efficient than existing SHA-based approaches.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

S. Heid, J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Learning decision catalogues for situated decision making: The case of scoring systems.
International Journal of Approximate Reasoning 171 (Aug. 2024). DOI

In this paper, we formalize the problem of learning coherent collections of decision models, which we call decision catalogues, and illustrate it for the case where models are scoring systems. This problem is motivated by the recent rise of algorithmic decision-making and the idea to improve human decision-making through machine learning, in conjunction with the observation that decision models should be situated in terms of their complexity and resource requirements: Instead of constructing a single decision model and using this model in all cases, different models might be appropriate depending on the decision context. Decision catalogues are supposed to support a seamless transition from very simple, resource-efficient to more sophisticated but also more demanding models. We present a general algorithmic framework for inducing such catalogues from training data, which tackles the learning task as a problem of searching the space of candidate catalogues systematically and, to this end, makes use of heuristic search methods. We also present a concrete instantiation of this framework as well as empirical studies for performance evaluation, which, in a nutshell, show that greedy search is an efficient and hard-to-beat strategy for the construction of catalogues of scoring systems.

MCML Authors
Link to website

Jonas Hanselle

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and k-Shapley values (k-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to website

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

M. Herrmann, F. J. D. Lange, K. Eggensperger, G. Casalicchio, M. Wever, M. Feurer, D. Rügamer, E. Hüllermeier, A.-L. Boulesteix and B. Bischl.
Position: Why We Must Rethink Empirical Research in Machine Learning.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

We warn against a common but incomplete understanding of empirical research in machine learning (ML) that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical ML research is fashioned as confirmatory research while it should rather be considered exploratory.

MCML Authors
Link to Profile Moritz Herrmann

Moritz Herrmann


Transfer Coordinator

Biometry in Molecular Medicine

Link to website

Giuseppe Casalicchio


Statistical Learning and Data Science

Link to Profile Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning and Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Link to Profile Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Y. Sale, V. Bengs, M. Caprio and E. Hüllermeier.
Second-Order Uncertainty Quantification: A Distance-Based Approach.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

In the past couple of years, various approaches to representing and quantifying different types of predictive uncertainty in machine learning, notably in the setting of classification, have been proposed on the basis of second-order probability distributions, i.e., predictions in the form of distributions on probability distributions. A completely conclusive solution has not yet been found, however, as shown by recent criticisms of commonly used uncertainty measures associated with second-order distributions, identifying undesirable theoretical properties of these measures. In light of these criticisms, we propose a set of formal criteria that meaningful uncertainty measures for predictive uncertainty based on second-order distributions should obey. Moreover, we provide a general framework for developing uncertainty measures to account for these criteria, and offer an instantiation based on the Wasserstein distance, for which we prove that all criteria are satisfied.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Y. Sun, J. Liu, Z. Wu, Z. Ding, Y. Ma, T. Seidl and V. Tresp.
SA-DQAS: Self-attention Enhanced Differentiable Quantum Architecture Search.
ICML 2024 - Workshop Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators at the 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. PDF

We introduce SA-DQAS in this paper, a novel framework that enhances the gradient-based Differentiable Quantum Architecture Search (DQAS) with a self-attention mechanism, aimed at optimizing circuit design for Quantum Machine Learning (QML) challenges. Analogous to a sequence of words in a sentence, a quantum circuit can be viewed as a sequence of placeholders containing quantum gates. Unlike DQAS, each placeholder is independent, while the self-attention mechanism in SA-DQAS helps to capture relation and dependency information among each operation candidate placed on placeholders in a circuit. To evaluate and verify, we conduct experiments on job-shop scheduling problems (JSSP), Max-cut problems, and quantum fidelity. Incorporating self-attention improves the stability and performance of the resulting quantum circuits and refines their structural design with higher noise resilience and fidelity. Our research demonstrates the first successful integration of self-attention with DQAS.

MCML Authors
Link to website

Yize Sun

Database Systems and Data Mining

Link to website

Zifeng Ding

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

P. Kolpaczki, G. Haselbeck and E. Hüllermeier.
How Much Can Stratification Improve the Approximation of Shapley Values?
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Over the last decade, the Shapley value has become one of the most widely applied tools to provide post-hoc explanations for black box models. However, its theoretically justified solution to the problem of dividing a collective benefit to the members of a group, such as features or data points, comes at a price. Without strong assumptions, the exponential number of member subsets excludes an exact calculation of the Shapley value. In search for a remedy, recent works have demonstrated the efficacy of approximations based on sampling with stratification, in which the sample space is partitioned into smaller subpopulations. The effectiveness of this technique mainly depends on the degree to which the allocation of available samples over the formed strata mirrors their unknown variances. To uncover the hypothetical potential of stratification, we investigate the gap in approximation quality caused by the lack of knowledge of the optimal allocation. Moreover, we combine recent advances to propose two state-of-the-art algorithms Adaptive SVARM and Continuous Adaptive SVARM that adjust the sample allocation on-the-fly. The potential of our approach is assessed in an empirical evaluation.

MCML Authors
Link to website

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

C. Damke and E. Hüllermeier.
Linear Opinion Pooling for Uncertainty Quantification on Graphs.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL GitHub

We address the problem of uncertainty quantification for graph-structured data, or, more specifically, the problem to quantify the predictive uncertainty in (semi-supervised) node classification. Key questions in this regard concern the distinction between two different types of uncertainty, aleatoric and epistemic, and how to support uncertainty quantification by leveraging the structural information provided by the graph topology. Challenging assumptions and postulates of state-of-the-art methods, we propose a novel approach that represents (epistemic) uncertainty in terms of mixtures of Dirichlet distributions and refers to the established principle of linear opinion pooling for propagating information between neighbored nodes in the graph. The effectiveness of this approach is demonstrated in a series of experiments on a variety of graph-structured datasets.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Y. Sale, P. Hofman, T. Löhr, L. Wimmer, T. Nagler and E. Hüllermeier.
Label-wise Aleatoric and Epistemic Uncertainty Quantification.
UAI 2024 - 40th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain, Jul 16-18, 2024. URL

We present a novel approach to uncertainty quantification in classification tasks based on label-wise decomposition of uncertainty measures. This label-wise perspective allows uncertainty to be quantified at the individual class level, thereby improving cost-sensitive decision-making and helping understand the sources of uncertainty. Furthermore, it allows to define total, aleatoric, and epistemic uncertainty on the basis of non-categorical measures such as variance, going beyond common entropy-based measures. In particular, variance-based measures address some of the limitations associated with established methods that have recently been discussed in the literature. We show that our proposed measures adhere to a number of desirable properties. Through empirical evaluation on a variety of benchmark data sets – including applications in the medical domain where accurate uncertainty quantification is crucial – we establish the effectiveness of label-wise uncertainty quantification.

MCML Authors
Link to website

Paul Hofman

Artificial Intelligence and Machine Learning

Link to website

Lisa Wimmer

Statistical Learning and Data Science

Link to Profile Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

T. Löhr, M. Ingrisch and E. Hüllermeier.
Towards Aleatoric and Epistemic Uncertainty in Medical Image Classification.
AIME 2024 - 22nd International Conference on Artificial Intelligence in Medicine. Salt Lake City, UT, USA, Jul 09-12, 2024. DOI

Medical domain applications require a detailed understanding of the decision making process, in particular when data-driven modeling via machine learning is involved, and quantifying uncertainty in the process adds trust and interpretability to predictive models. However, current uncertainty measures in medical imaging are mostly monolithic and do not distinguish between different sources and types of uncertainty. In this paper, we advocate the distinction between so-called aleatoric and epistemic uncertainty in the medical domain and illustrate its potential in clinical decision making for the case of PET/CT image classification.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Z. Ding, H. Cai, J. Wu, Y. Ma, R. Liao, B. Xiong and V. Tresp.
zrLLM: Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL

Modeling evolving knowledge over temporal knowledge graphs (TKGs) has become a heated topic. Various methods have been proposed to forecast links on TKGs. Most of them are embedding-based, where hidden representations are learned to represent knowledge graph (KG) entities and relations based on the observed graph contexts. Although these methods show strong performance on traditional TKG forecasting (TKGF) benchmarks, they face a strong challenge in modeling the unseen zero-shot relations that have no prior graph context. In this paper, we try to mitigate this problem as follows. We first input the text descriptions of KG relations into large language models (LLMs) for generating relation representations, and then introduce them into embedding-based TKGF methods. LLM-empowered representations can capture the semantic information in the relation descriptions. This makes the relations, whether seen or unseen, with similar semantic meanings stay close in the embedding space, enabling TKGF models to recognize zero-shot relations even without any observed graph context. Experimental results show that our approach helps TKGF models to achieve much better performance in forecasting the facts with previously unseen relations, while still maintaining their ability in link forecasting regarding seen relations.

MCML Authors
Link to website

Zifeng Ding

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to website

Ruotong Liao

Database Systems and Data Mining

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

R. Liao, X. Jia, Y. Li, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL GitHub

The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional embedding-based and rule-based methods dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval-augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and few-shot parameter-efficient instruction tuning to solve the above challenges, respectively. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting with low computation resources using extremely limited training data as few as 16 samples. GenTKG also highlights remarkable cross-domain generalizability with outperforming performance on unseen datasets without re-training, and in-domain generalizability regardless of time split in the same dataset. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.

MCML Authors
Link to website

Ruotong Liao

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

A. Findeis, T. Kaufmann, E. Hüllermeier, S. Albanie and R. Mullins.
Inverse Constitutional AI: Compressing Preferences into Principles.
Preprint (Jun. 2024). arXiv GitHub

Feedback data plays an important role in fine-tuning and evaluating state-of-the-art AI models. Often pairwise text preferences are used: given two texts, human (or AI) annotators select the ‘better’ one. Such feedback data is widely used to align models to human preferences (e.g., reinforcement learning from human feedback), or to rank models according to human preferences (e.g., Chatbot Arena). Despite its wide-spread use, prior work has demonstrated that human-annotated pairwise text preference data often exhibits unintended biases. For example, human annotators have been shown to prefer assertive over truthful texts in certain contexts. Models trained or evaluated on this data may implicitly encode these biases in a manner hard to identify. In this paper, we formulate the interpretation of existing pairwise text preference data as a compression task: the Inverse Constitutional AI (ICAI) problem. In constitutional AI, a set of principles (or constitution) is used to provide feedback and fine-tune AI models. The ICAI problem inverts this process: given a dataset of feedback, we aim to extract a constitution that best enables a large language model (LLM) to reconstruct the original annotations. We propose a corresponding initial ICAI algorithm and validate its generated constitutions quantitatively based on reconstructed annotations. Generated constitutions have many potential use-cases – they may help identify undesirable biases, scale feedback to unseen data or assist with adapting LLMs to individual user preferences. We demonstrate our approach on a variety of datasets: (a) synthetic feedback datasets with known underlying principles; (b) the AlpacaEval dataset of cross-annotated human feedback; and (c) the crowdsourced Chatbot Arena data set.

MCML Authors
Link to website

Timo Kaufmann

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

T. Kaufmann, J. Blüml, A. Wüst, Q. Delfosse, K. Kersting and E. Hüllermeier.
OCALM: Object-Centric Assessment with Language Models.
Preprint (Jun. 2024). arXiv

Properly defining a reward signal to efficiently train a reinforcement learning (RL) agent is a challenging task. Designing balanced objective functions from which a desired behavior can emerge requires expert knowledge, especially for complex environments. Learning rewards from human feedback or using large language models (LLMs) to directly provide rewards are promising alternatives, allowing non-experts to specify goals for the agent. However, black-box reward models make it difficult to debug the reward. In this work, we propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for RL agents from natural language task descriptions. OCALM uses the extensive world-knowledge of LLMs while leveraging the object-centric nature common to many environments to derive reward functions focused on relational concepts, providing RL agents with the ability to derive policies from task descriptions.

MCML Authors
Link to website

Timo Kaufmann

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

V. Margraf, M. Wever, S. Gilhuber, G. M. Tavares, T. Seidl and E. Hüllermeier.
ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data.
Preprint (Jun. 2024). arXiv GitHub

In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled, aiming to enhance learning algorithms’ efficiency and performance. Numerous such query strategies have been proposed and compared in the active learning literature. However, the community still lacks standardized benchmarks for comparing the performance of different query strategies. This particularly holds for the combination of query strategies with different learning algorithms into active learning pipelines and examining the impact of the learning algorithm choice. To close this gap, we propose ALPBench, which facilitates the specification, execution, and performance monitoring of active learning pipelines. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, ALPBench consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. To demonstrate its usefulness and broad compatibility with various learning algorithms and query strategies, we conduct an exemplary study evaluating 9 query strategies paired with 8 learning algorithms in 2 different settings.

MCML Authors
Link to website

Valentin Margraf

Artificial Intelligence and Machine Learning

Link to website

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Link to website

Gabriel Marques Tavares


Database Systems and Data Mining

Link to Profile Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

A. Vahidi, S. Schosser, L. Wimmer, Y. Li, B. Bischl, E. Hüllermeier and M. Rezaei.
Probabilistic Self-supervised Learning via Scoring Rules Minimization.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL GitHub

In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN’s convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method’s optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.

MCML Authors
Link to website

Lisa Wimmer

Statistical Learning and Data Science

Link to website

Yawei Li

Statistical Learning and Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Link to website

Mina Rezaei


Statistical Learning and Data Science

V. Bengs, B. Haddenhorst and E. Hüllermeier.
Identifying Copeland Winners in Dueling Bandits with Indifferences.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

P. Kolpaczki, M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
SVARM-IQ: Efficient Approximation of Any-order Shapley Interactions through Stratification.
AISTATS 2024 - 27th International Conference on Artificial Intelligence and Statistics. Valencia, Spain, May 02-04, 2024. URL

Addressing the limitations of individual attribution scores via the Shapley value (SV), the field of explainable AI (XAI) has recently explored intricate interactions of features or data points. In particular, extensions of the SV, such as the Shapley Interaction Index (SII), have been proposed as a measure to still benefit from the axiomatic basis of the SV. However, similar to the SV, their exact computation remains computationally prohibitive. Hence, we propose with SVARM-IQ a sampling-based approach to efficiently approximate Shapley-based interaction indices of any order. SVARM-IQ can be applied to a broad class of interaction indices, including the SII, by leveraging a novel stratified representation. We provide non-asymptotic theoretical guarantees on its approximation quality and empirically demonstrate that SVARM-IQ achieves state-of-the-art estimation results in practical XAI scenarios on different model classes and application domains.

MCML Authors
Link to website

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

P. Hofman, Y. Sale and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty with Proper Scoring Rules.
Preprint (Apr. 2024). arXiv

Uncertainty representation and quantification are paramount in machine learning and constitute an important prerequisite for safety-critical applications. In this paper, we propose novel measures for the quantification of aleatoric and epistemic uncertainty based on proper scoring rules, which are loss functions with the meaningful property that they incentivize the learner to predict ground-truth (conditional) probabilities. We assume two common representations of (epistemic) uncertainty, namely, in terms of a credal set, i.e. a set of probability distributions, or a second-order distribution, i.e., a distribution over probability distributions. Our framework establishes a natural bridge between these representations. We provide a formal justification of our approach and introduce new measures of epistemic and aleatoric uncertainty as concrete instantiations.

MCML Authors
Link to website

Paul Hofman

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

J. Rodemann, F. Croppi, P. Arens, Y. Sale, J. Herbinger, B. Bischl, E. Hüllermeier, T. Augustin, C. J. Walsh and G. Casalicchio.
Explaining Bayesian Optimization by Shapley Values Facilitates Human-AI Collaboration.
Preprint (Mar. 2024). arXiv

In today’s data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor’s log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

MCML Authors
Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Link to website

Giuseppe Casalicchio


Statistical Learning and Data Science

P. Kolpaczki, V. Bengs, M. Muschalik and E. Hüllermeier.
Approximating the Shapley Value without Marginal Contributions.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

The Shapley value, which is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, has recently been used intensively in explainable artificial intelligence. Its meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley value, most of them revolve around the notion of an agent’s marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contribution. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.

MCML Authors
Link to website

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

J. Lienen and E. Hüllermeier.
Mitigating Label Noise through Data Ambiguation.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Label noise poses an important challenge in machine learning, especially in deep learning, in which large models with high expressive power dominate the field. Models of that kind are prone to memorizing incorrect labels, thereby harming generalization performance. Many methods have been proposed to address this problem, including robust loss functions and more complex label correction approaches. Robust loss functions are appealing due to their simplicity, but typically lack flexibility, while label correction usually adds substantial complexity to the training setup. In this paper, we suggest to address the shortcomings of both methodologies by ‘ambiguating’ the target information, adding additional, complementary candidate labels in case the learner is not sufficiently convinced of the observed training label. More precisely, we leverage the framework of so-called superset learning to construct set-valued targets based on a confidence threshold, which deliver imprecise yet more reliable beliefs about the ground-truth, effectively helping the learner to suppress the memorization effect. In an extensive empirical evaluation, our method demonstrates favorable learning behavior on synthetic and real-world noise, confirming the effectiveness in detecting and correcting erroneous training labels.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

While shallow decision trees may be interpretable, larger ensemble models like gradient-boosted trees, which often set the state of the art in machine learning problems involving tabular data, still remain black box models. As a remedy, the Shapley value (SV) is a well-known concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions. The model-specific TreeSHAP methodology solves the exponential complexity for retrieving exact SVs from tree-based models. Expanding beyond individual feature attribution, Shapley interactions reveal the impact of intricate feature interactions of any order. In this work, we present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions of tree-based models. TreeSHAP-IQ is supported by a mathematical framework that exploits polynomial arithmetic to compute the interaction scores in a single recursive traversal of the tree, akin to Linear TreeSHAP. We apply TreeSHAP-IQ on state-of-the-art tree ensembles and explore interactions on well-established benchmark datasets.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part I.
4OR (Jan. 2024). DOI

Multiple criteria decision aiding (MCDA) and preference learning (PL) are established research fields, which have different roots, developed in different communities – the former in the decision sciences and operations research, the latter in AI and machine learning – and have their own agendas in terms of problem setting, assumptions, and criteria of success. In spite of this, they share the major goal of constructing practically useful decision models that either support humans in the task of choosing the best, classifying, or ranking alternatives from a given set, or even automate decision-making by acting autonomously on behalf of the human. Therefore, MCDA and PL can complement and mutually benefit from each other, a potential that has been exhausted only to some extent so far. By elaborating on the connection between MCDA and PL in more depth, our goal is to stimulate further research at the junction of these two fields. To this end, we first review both methodologies, MCDA in this part of the paper and PL in the second part, with the intention of highlighting their most common elements. In the second part, we then compare both methodologies in a systematic way and give an overview of existing work on combining PL and MCDA.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part II.
4OR (Jan. 2024). DOI

This article elaborates on the connection between multiple criteria decision aiding (MCDA) and preference learning (PL), two research fields with different roots and developed in different communities. It complements the first part of the paper, in which we started with a review of MCDA. In this part, a similar review will be given for PL, followed by a systematic comparison of both methodologies, as well as an overview of existing work on combining PL and MCDA. Our main goal is to stimulate further research at the junction of these two methodologies.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

P. Gupta, M. Wever and E. Hüllermeier.
Information Leakage Detection through Approximate Bayes-optimal Prediction.
Preprint (Jan. 2024). arXiv

In today’s data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor’s log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning


S. Chen, J. Gu, Z. Han, Y. Ma, P. Torr and V. Tresp.
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL GitHub

Various adaptation methods, such as LoRA, prompts, and adapters, have been proposed to enhance the performance of pre-trained vision-language models in specific domains. As test samples in real-world applications usually differ from adaptation data, the robustness of these adaptation methods against distribution shifts are essential. In this study, we assess the robustness of 11 widely-used adaptation methods across 4 vision-language datasets under multimodal corruptions. Concretely, we introduce 7 benchmark datasets, including 96 visual and 87 textual corruptions, to investigate the robustness of different adaptation methods, the impact of available adaptation examples, and the influence of trainable parameter size during adaptation. Our analysis reveals that: 1) Adaptation methods are more sensitive to text corruptions than visual corruptions. 2) Full fine-tuning does not consistently provide the highest robustness; instead, adapters can achieve better robustness with comparable clean performance. 3) Contrary to expectations, our findings indicate that increasing the number of adaptation data and parameters does not guarantee enhanced robustness; instead, it results in even lower robustness. We hope this study could benefit future research in the development of robust multimodal adaptation methods.

MCML Authors
Link to website

Shuo Chen

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
SHAP-IQ: Unified Approximation of any-order Shapley Interactions.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to website

Patrick Kolpaczki

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

R. Liao, X. Jia, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
TGL @NeurIPS 2023 - Workshop Temporal Graph Learning at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

The rapid advancements in large language models (LLMs) have ignited interest in the realm of the temporal knowledge graph (TKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex graph data structure and sequential natural expressions LLMs can handle, and between the enormous data volume of TKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and lightweight few-shot parameter-efficient instruction tuning to solve the above challenges. Extensive experiments have shown that GenTKG is a simple but effective, efficient, and generalizable approach that outperforms conventional methods on temporal relational forecasting with extremely limited computation. Our work opens a new frontier for the temporal knowledge graph domain.

MCML Authors
Link to website

Ruotong Liao

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

T. Kaufmann, P. Weng, V. Bengs and E. Hüllermeier.
A Survey of Reinforcement Learning from Human Feedback.
Preprint (Dec. 2023). arXiv

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in directing the model’s capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

MCML Authors
Link to website

Timo Kaufmann

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Y. Sale, P. Hofman, L. Wimmer, E. Hüllermeier and T. Nagler.
Second-Order Uncertainty Quantification: Variance-Based Measures.
Preprint (Dec. 2023). arXiv

Uncertainty quantification is a critical aspect of machine learning models, providing important insights into the reliability of predictions and aiding the decision-making process in real-world applications. This paper proposes a novel way to use variance-based measures to quantify uncertainty on the basis of second-order distributions in classification problems. A distinctive feature of the measures is the ability to reason about uncertainties on a class-based level, which is useful in situations where nuanced decision-making is required. Recalling some properties from the literature, we highlight that the variance-based measures satisfy important (axiomatic) properties. In addition to this axiomatic approach, we present empirical results showing the measures to be effective and competitive to commonly used entropy-based measures.

MCML Authors
Link to website

Paul Hofman

Artificial Intelligence and Machine Learning

Link to website

Lisa Wimmer

Statistical Learning and Data Science

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Link to Profile Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Z. Ding, Z. Li, R. Qi, J. Wu, B. He, Y. Ma, Z. Meng, S. Chen, R. Liao, Z. Han and V. Tresp.
FORECASTTKGQUESTIONS: A Benchmark for Temporal Question Answering and Forecasting over Temporal Knowledge Graphs.
ISWC 2023 - 22nd International Semantic Web Conference. Athens, Greeke, Nov 06-11, 2023. DOI

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. Previous related works aim to develop QA systems that answer temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning this period can be fully used for inference. In real-world scenarios, however, it is common that given knowledge until the current instance, we wish the TKGQA systems to answer the questions asking about future. As humans constantly plan the future, building forecasting TKGQA systems is important. In this paper, we propose a novel task: forecasting TKGQA, and propose a coupled large-scale TKGQA benchmark dataset, i.e., FORECASTTKGQUESTIONS. It includes three types of forecasting questions, i.e., entity prediction, yes-unknown, and fact reasoning questions. For every question, a timestamp is annotated and QA models only have access to TKG information prior to it for answer inference. We find that previous TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-unknown and fact reasoning questions. To this end, we propose FORECASTTKGQA, a TKGQA model that employs a TKG forecasting module for future inference. Experiments show that it performs well in forecasting TKGQA.

MCML Authors
Link to website

Zifeng Ding

Database Systems and Data Mining

Link to website

Zongyue Li

Spatial Artificial Intelligence

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to website

Shuo Chen

Database Systems and Data Mining

Link to website

Ruotong Liao

Database Systems and Data Mining

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Probabilistic Scoring Lists for Interpretable Machine Learning.
DS 2023 - 26th International Conference on Discovery Science. Porto, Portugal, Oct 09-11, 2023. DOI

A scoring system is a simple decision model that checks a set of features, adds a certain number of points to a total score for each feature that is satisfied, and finally makes a decision by comparing the total score to a threshold. Scoring systems have a long history of active use in safety-critical domains such as healthcare and justice, where they provide guidance for making objective and accurate decisions. Given their genuine interpretability, the idea of learning scoring systems from data is obviously appealing from the perspective of explainable AI. In this paper, we propose a practically motivated extension of scoring systems called probabilistic scoring lists (PSL), as well as a method for learning PSLs from data. Instead of making a deterministic decision, a PSL represents uncertainty in the form of probability distributions. Moreover, in the spirit of decision lists, a PSL evaluates features one by one and stops as soon as a decision can be made with enough confidence. To evaluate our approach, we conduct a case study in the medical domain.

MCML Authors
Link to website

Jonas Hanselle

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

J. Brandt, E. Schede, S. Sharma, V. Bengs, E. Hüllermeier and K. Tierney.
Contextual Preselection Methods in Pool-based Realtime Algorithm Configuration.
LWDA 2023 - Conference on Lernen. Wissen. Daten. Analysen. Marburg, Germany, Oct 09-11, 2023. PDF

Realtime algorithm configuration is concerned with the task of designing a dynamic algorithm configurator that observes sequentially arriving problem instances of an algorithmic problem class for which it selects suitable algorithm configurations (e.g., minimal runtime) of a specific target algorithm. The Contextual Preselection under the Plackett-Luce (CPPL) algorithm maintains a pool of configurations from which a set of algorithm configurations is selected that are run in parallel on the current problem instance. It uses the well-known UCB selection strategy from the bandit literature, while the pool of configurations is updated over time via a racing mechanism. In this paper, we investigate whether the performance of CPPL can be further improved by using different bandit-based selection strategies as well as a ranking-based strategy to update the candidate pool. Our experimental results show that replacing these components can indeed improve performance again significantly.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

J. Hanselle, J. Kornowicz, S. Heid, K. Thommes and E. Hüllermeier.
Comparing Humans and Algorithms in Feature Ranking: A Case-Study in the Medical Domain.
LWDA 2023 - Conference on Lernen. Wissen. Daten. Analysen. Marburg, Germany, Oct 09-11, 2023. PDF

The selection of useful, informative, and meaningful features is a key prerequisite for the successful application of machine learning in practice, especially in knowledge-intense domains like decision support. Here, the task of feature selection, or ranking features by importance, can, in principle, be solved automatically in a data-driven way but also supported by expert knowledge. Besides, one may of course, conceive a combined approach, in which a learning algorithm closely interacts with a human expert. In any case, finding an optimal approach requires a basic understanding of human capabilities in judging the importance of features compared to those of a learning algorithm. Hereto, we conducted a case study in the medical domain, comparing feature rankings based on human judgment to rankings automatically derived from data. The quality of a ranking is determined by the performance of a decision list processing features in the order specified by the ranking, more specifically by so-called probabilistic scoring systems.

MCML Authors
Link to website

Jonas Hanselle

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Y. Shen, R. Liao, Z. Han, Y. Ma and V. Tresp.
GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models.
Preprint (Oct. 2023). arXiv

While multi-modal models have successfully integrated information from image, video, and audio modalities, integrating graph modality into large language models (LLMs) remains unexplored. This discrepancy largely stems from the inherent divergence between structured graph data and unstructured text data. Incorporating graph knowledge provides a reliable source of information, enabling potential solutions to address issues in text generation, e.g., hallucination, and lack of domain knowledge. To evaluate the integration of graph knowledge into language models, a dedicated dataset is needed. However, there is currently no benchmark dataset specifically designed for multimodal graph-language models. To address this gap, we propose GraphextQA, a question answering dataset with paired subgraphs, retrieved from Wikidata, to facilitate the evaluation and future development of graph-language models. Additionally, we introduce a baseline model called CrossGNN, which conditions answer generation on the paired graphs by cross-attending question-aware graph features at decoding. The proposed dataset is designed to evaluate graph-language models’ ability to understand graphs and make use of it for answer generation. We perform experiments with language-only models and the proposed graph-language model to validate the usefulness of the paired graphs and to demonstrate the difficulty of the task.

MCML Authors
Link to website

Ruotong Liao

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Z. Ding, J. Wu, Z. Li, Y. Ma and V. Tresp.
Improving Few-Shot Inductive Learning on Temporal Knowledge Graphs Using Confidence-Augmented Reinforcement Learning.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI GitHub

Temporal knowledge graph completion (TKGC) aims to predict the missing links among the entities in a temporal knowledge graph (TKG). Most previous TKGC methods only consider predicting the missing links among the entities seen in the training set, while they are unable to achieve great performance in link prediction concerning newly-emerged unseen entities. Recently, a new task, i.e., TKG few-shot out-of-graph (OOG) link prediction, is proposed, where TKGC models are required to achieve great link prediction performance concerning newly-emerged entities that only have few-shot observed examples. In this work, we propose a TKGC method FITCARL that combines few-shot learning with reinforcement learning to solve this task. In FITCARL, an agent traverses through the whole TKG to search for the prediction answer. A policy network is designed to guide the search process based on the traversed path. To better address the data scarcity problem in the few-shot setting, we introduce a module that computes the confidence of each candidate action and integrate it into the policy for action selection. We also exploit the entity concept information with a novel concept regularizer to boost model performance. Experimental results show that FITCARL achieves stat-of-the-art performance on TKG few-shot OOG link prediction.

MCML Authors
Link to website

Zifeng Ding

Database Systems and Data Mining

Link to website

Zongyue Li

Spatial Artificial Intelligence

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

S. Haas and E. Hüllermeier.
Rectifying Bias in Ordinal Observational Data Using Unimodal Label Smoothing.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

This paper proposes a novel approach for modeling observational data in the form of expert ratings, which are commonly given on an ordered (numerical or ordinal) scale. In practice, such ratings are often biased, due to the expert’s preferences, psychological effects, etc. Our approach aims to rectify these biases, thereby preventing machine learning methods from transferring them to models trained on the data. To this end, we make use of so-called label smoothing, which allows for redistributing probability mass from the originally observed rating to other ratings, which are considered as possible corrections. This enables the incorporation of domain knowledge into the standard cross-entropy loss and leads to flexibly configurable models. Concretely, our method is realized for ordinal ratings and allows for arbitrary unimodal smoothings using a binary smoothing relation. Additionally, the paper suggests two practically motivated smoothing heuristics to address common biases in observational data, a time-based smoothing to handle concept drift and a class-wise smoothing based on class priors to mitigate data imbalance. The effectiveness of the proposed methods is demonstrated on four real-world goodwill assessment data sets of a car manufacturer with the aim of automating goodwill decisions. Overall, this paper presents a promising approach for modeling ordinal observational data that can improve decision-making processes and reduce reliance on human expertise.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

E. Terzieva, M. Muschalik, P. Hofman and E. Hüllermeier.
Identifying Trends in Feature Attributions During Training of Neural Networks.
ECML-PKDD 2023 - Workshop Uncertainty meets Explainability in Machine Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI

This study investigates the evolving dynamics of commonly used feature attribution (FA) values during training of neural networks. As models transition from a state of high uncertainty to low uncertainty, we show that the features’ significance also changes, which is inline with the general learning theory of deep neural networks. During model training, we compute FA scores through Layer-wise Relevance Propagation (LRP) and Gradient-weighted Class Activation Mapping (Grad-CAM), which are selected for their efficiency and speed of computation. We summarize the attribution scores in terms of the sum of the absolute values of FA scores and their entropy. We further analyze these summary scores in relation to the models’ generalization capabilities. The analysis identifies trends where FA values increase in magnitude while entropy decreases during the training process, regardless of model generalization, suggesting independence of overfitting. This research offers a unique view on the application of FA methods in explainable artificial intelligence (XAI) and raises intriguing questions about their behavior across varying model architectures and datasets, which may have implications for future work combining XAI and uncertainty estimation in machine learning.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to website

Paul Hofman

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

T. Kaufmann, S. Ball, J. Beck, E. Hüllermeier and F. Kreuter.
On the challenges and practices of reinforcement learning from real human feedback.
HLDM @ECML-PKDD 2023 - 1st Workshop on Hybrid Human-Machine Learning and Decision Making at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that does not require an engineered reward function but instead learns from human feedback. Due to its increasing popularity, various authors have studied how to learn an accurate reward model from only few samples, making optimal use of this feedback. Because of the cost and complexity of user studies, however, this research is often conducted with synthetic human feedback. Such feedback can be generated by evaluating behavior based on ground-truth rewards which are available for some benchmark tasks. While this setting can help evaluate some aspects of RLHF, it differs from practical settings in which synthetic feedback is not available. Working with real human feedback brings additional challenges that cannot be observed with synthetic feedback, including fatigue, inter-rater inconsistencies, delay, misunderstandings, and modality-dependent difficulties. We describe and discuss some of these challenges together with current practices and opportunities for further research in this paper.

MCML Authors
Link to website

Timo Kaufmann

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Link to Profile Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

A. Javanmardi, Y. Sale, P. Hofman and E. Hüllermeier.
Conformal Prediction with Partially Labeled Data.
COPA 2023 - 12th Symposium on Conformal and Probabilistic Prediction with Applications. Limassol, Cyprus, Sep 13-15, 2023. URL

While the predictions produced by conformal prediction are set-valued, the data used for training and calibration is supposed to be precise. In the setting of superset learning or learning from partial labels, a variant of weakly supervised learning, it is exactly the other way around: training data is possibly imprecise (set-valued), but the model induced from this data yields precise predictions. In this paper, we combine the two settings by making conformal prediction amenable to set-valued training data. We propose a generalization of the conformal prediction procedure that can be applied to set-valued training and calibration data. We prove the validity of the proposed method and present experimental studies in which it compares favorably to natural baselines.

MCML Authors
Link to website

Alireza Javanmardi

Artificial Intelligence and Machine Learning

Link to website

Paul Hofman

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

M. Caprio, Y. Sale, E. Hüllermeier and I. Lee.
A Novel Bayes' Theorem for Upper Probabilities..
Epi UAI 2023 - International Workshop on Epistemic Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Aug 04, 2023. DOI

In their seminal 1990 paper, Wasserman and Kadane establish an upper bound for the Bayes’ posterior probability of a measurable set A, when the prior lies in a class of probability measures and the likelihood is precise. They also give a sufficient condition for such upper bound to hold with equality. In this paper, we introduce a generalization of their result by additionally addressing uncertainty related to the likelihood. We give an upper bound for the posterior probability when both the prior and the likelihood belong to a set of probabilities. Furthermore, we give a sufficient condition for this upper bound to become an equality. This result is interesting on its own, and has the potential of being applied to various fields of engineering (e.g. model predictive control), machine learning, and artificial intelligence.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

S. Henzgen and E. Hüllermeier.
Weighting by Tying: A New Approach to Weighted Rank Correlation.
Preprint (Aug. 2023). arXiv

Measures of rank correlation are commonly used in statistics to capture the degree of concordance between two orderings of the same set of items. Standard measures like Kendall’s tau and Spearman’s rho coefficient put equal emphasis on each position of a ranking. Yet, motivated by applications in which some of the positions (typically those on the top) are more important than others, a few weighted variants of these measures have been proposed. Most of these generalizations fail to meet desirable formal properties, however. Besides, they are often quite inflexible in the sense of committing to a fixed weighing scheme. In this paper, we propose a weighted rank correlation measure on the basis of fuzzy order relations. Our measure, called scaled gamma, is related to Goodman and Kruskal’s gamma rank correlation. It is parametrized by a fuzzy equivalence relation on the rank positions, which in turn is specified conveniently by a so-called scaling function. This approach combines soundness with flexibility: it has a sound formal foundation and allows for weighing rank positions in a flexible way.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Y. Sale, M. Caprio and E. Hüllermeier.
Is the Volume of a Credal Set a Good Measure for Epistemic Uncertainty?
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

Adequate uncertainty representation and quantification have become imperative in various scientific disciplines, especially in machine learning and artificial intelligence. As an alternative to representing uncertainty via one single probability measure, we consider credal sets (convex sets of probability measures). The geometric representation of credal sets as d-dimensional polytopes implies a geometric intuition about (epistemic) uncertainty. In this paper, we show that the volume of the geometric representation of a credal set is a meaningful measure of epistemic uncertainty in the case of binary classification, but less so for multi-class classification. Our theoretical findings highlight the crucial role of specifying and employing uncertainty measures in machine learning in an appropriate way, and for being aware of possible pitfalls.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

L. Wimmer, Y. Sale, P. Hofman, B. Bischl and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures?
UAI 2023 - 39th Conference on Uncertainty in Artificial Intelligence. Pittsburgh, PA, USA, Jul 31-Aug 03, 2023. URL

The quantification of aleatoric and epistemic uncertainty in terms of conditional entropy and mutual information, respectively, has recently become quite common in machine learning. While the properties of these measures, which are rooted in information theory, seem appealing at first glance, we identify various incoherencies that call their appropriateness into question. In addition to the measures themselves, we critically discuss the idea of an additive decomposition of total uncertainty into its aleatoric and epistemic constituents. Experiments across different computer vision tasks support our theoretical findings and raise concerns about current practice in uncertainty quantification.

MCML Authors
Link to website

Lisa Wimmer

Statistical Learning and Data Science

Link to website

Paul Hofman

Artificial Intelligence and Machine Learning

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

M. K. Belaid, R. Bornemann, M. Rabus, R. Krestel and E. Hüllermeier.
Compare-xAI: Toward Unifying Functional Testing Methods for Post-hoc XAI Algorithms into a Multi-dimensional Benchmark.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. DOI GitHub

In recent years, Explainable AI (xAI) attracted a lot of attention as various countries turned explanations into a legal right. xAI algorithms enable humans to understand the underlying models and explain their behavior, leading to insights through which the models can be analyzed and improved beyond the accuracy metric by, e.g., debugging the learned pattern and reducing unwanted biases. However, the widespread use of xAI and the rapidly growing body of published research in xAI have brought new challenges. A large number of xAI algorithms can be overwhelming and make it difficult for practitioners to choose the correct xAI algorithm for their specific use case. This problem is further exacerbated by the different approaches used to assess novel xAI algorithms, making it difficult to compare them to existing methods. To address this problem, we introduce Compare-xAI, a benchmark that allows for a direct comparison of popular xAI algorithms with a variety of different use cases. We propose a scoring protocol employing a range of functional tests from the literature, each targeting a specific end-user requirement in explaining a model. To make the benchmark results easily accessible, we group the tests into four categories (fidelity, fragility, stability, and stress tests). We present results for 13 xAI algorithms based on 11 functional tests. After analyzing the findings, we derive potential solutions for data science practitioners as workarounds to the found practical limitations. Finally, Compare-xAI is a tentative to unify systematic evaluation and comparison methods for xAI algorithms with a focus on the end-user’s requirements.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

M. Muschalik, F. Fumagalli, R. Jagtani, B. Hammer and E. Hüllermeier.
iPDP: On Partial Dependence Plots in Dynamic Modeling Scenarios.
xAI 2023 - 1st World Conference on eXplainable Artificial Intelligence. Lisbon, Portugal, Jul 26-28, 2023. Best Paper Award. DOI

Post-hoc explanation techniques such as the well-established partial dependence plot (PDP), which investigates feature dependencies, are used in explainable artificial intelligence (XAI) to understand black-box machine learning models. While many real-world applications require dynamic models that constantly adapt over time and react to changes in the underlying distribution, XAI, so far, has primarily considered static learning environments, where models are trained in a batch mode and remain unchanged. We thus propose a novel model-agnostic XAI framework called incremental PDP (iPDP) that extends on the PDP to extract time-dependent feature effects in non-stationary learning environments. We formally analyze iPDP and show that it approximates a time-dependent variant of the PDP that properly reacts to real and virtual concept drift. The time-sensitivity of iPDP is controlled by a single smoothing parameter, which directly corresponds to the variance and the approximation error of iPDP in a static learning environment. We illustrate the efficacy of iPDP by showcasing an example application for drift detection and conducting multiple experiments on real-world and synthetic data sets and streams.

MCML Authors
Link to website

Maximilian Muschalik

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

V. Bengs, E. Hüllermeier and W. Waegeman.
On Second-Order Scoring Rules for Epistemic Uncertainty Quantification.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i.e., the uncertainty caused by a lack of knowledge and data. An emerging branch of the literature proposes the use of a second-order learner that provides predictions in terms of distributions on probability distributions. However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. As a main mathematical tool to prove this result, we introduce the generalised notion of second-order scoring rules.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

A. Giovagnoli, Y. Ma, M. Schubert and V. Tresp.
QNEAT: Natural Evolution of Variational Quantum Circuit Architecture.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

Quantum Machine Learning (QML) is a recent and rapidly evolving field where the theoretical framework and logic of quantum mechanics is employed to solve machine learning tasks. A variety of techniques that have a different level of quantum-classical hybridization has been presented. Here we focus on variational quantum circuits (VQC), which emerged as the most promising candidates for the quantum counterpart of neural networks in the noisy intermediate-scale quantum (NISQ) era. Although showing promising results, VQCs can be hard to train because of different issues e.g. barren plateau, periodicity of the weights or choice of the architecture. In this paper we focus on this last problem and in order to address it we propose a gradient free algorithm inspired by natural evolution to optimise both the weights and the architecture of the VQC. In particular, we present a version of the well known neuroevolution of augmenting topologies (NEAT) algorithm adapted to the case of quantum variational circuits. We test the algorithm with different benchmark problems of classical fields of machine learning i.e. reinforcement learning and optimization.

MCML Authors
Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Matthias Schubert

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

M. Wever, M. Özdogan and E. Hüllermeier.
Cooperative Co-Evolution for Ensembles of Nested Dichotomies for Multi-Class Classification.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

In multi-class classification, it can be beneficial to decompose a learning problem into several simpler problems. One such reduction technique is the use of so-called nested dichotomies, which recursively bisect the set of possible classes such that the resulting subsets can be arranged in the form of a binary tree, where each split defines a binary classification problem. Recently, a genetic algorithm for optimizing the structure of such nested dichotomies has achieved state-of-the-art results. Motivated by its success, we propose to extend this approach using a co-evolutionary scheme to optimize both the structure of nested dichotomies and their composition into ensembles through which they are evaluated. Furthermore, we present an experimental study showing this approach to yield ensembles of nested dichotomies at substantially lower cost and, in some cases, even with an improved generalization performance.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever and E. Hüllermeier.
Towards Green Automated Machine Learning: Status Quo and Future Directions.
Journal of Artificial Intelligence Research 77 (Jun. 2023). DOI

Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution — a machine learning pipeline — tailored to the learning task (dataset) at hand. Over the last decade, AutoML has developed into an independent research field with hundreds of contributions. At the same time, AutoML is being criticized for its high resource consumption as many approaches rely on the (costly) evaluation of many machine learning pipelines, as well as the expensive large-scale experiments across many datasets and approaches. In the spirit of recent work on Green AI, this paper proposes Green AutoML, a paradigm to make the whole AutoML process more environmentally friendly. Therefore, we first elaborate on how to quantify the environmental footprint of an AutoML tool. Afterward, different strategies on how to design and benchmark an AutoML tool w.r.t. their “greenness”, i.e., sustainability, are summarized. Finally, we elaborate on how to be transparent about the environmental footprint and what kind of research incentives could direct the community in a more sustainable AutoML research direction. As part of this, we propose a sustainability checklist to be attached to every AutoML paper featuring all core aspects of Green AutoML.

MCML Authors
Link to website

Jonas Hanselle

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

D. Winkel, N. Strauß, M. Schubert, Y. Ma and T. Seidl.
Constrained Portfolio Management using Action Space Decomposition for Reinforcement Learning.
PAKDD 2023 - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka, Japan, May 25-28, 2023. DOI

Financial portfolio managers typically face multi-period optimization tasks such as short-selling or investing at least a particular portion of the portfolio in a specific industry sector. A common approach to tackle these problems is to use constrained Markov decision process (CMDP) methods, which may suffer from sample inefficiency, hyperparameter tuning, and lack of guarantees for constraint violations. In this paper, we propose Action Space Decomposition Based Optimization (ADBO) for optimizing a more straightforward surrogate task that allows actions to be mapped back to the original task. We examine our method on two real-world data portfolio construction tasks. The results show that our new approach consistently outperforms state-of-the-art benchmark approaches for general CMDPs.

MCML Authors
Link to website

David Winkel

Database Systems and Data Mining

Link to website

Niklas Strauß

Spatial Artificial Intelligence

Link to Profile Matthias Schubert

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

A.-K. Wickert, C. Damke, L. Baumgärtner, E. Hüllermeier and M. Mezini.
UnGoML: Automated Classification of unsafe Usages in Go.
MSR 2023 - IEEE/ACM 20th International Conference on Mining Software Repositories. Melbourne, Australia, May 15-16, 2023. FOSS (Free, Open Source Software) Impact Paper Award. DOI GitHub

The Go programming language offers strong protection from memory corruption. As an escape hatch of these protections, it provides the unsafe package. Previous studies identified that this unsafe package is frequently used in real-world code for several purposes, e.g., serialization or casting types. Due to the variety of these reasons, it may be possible to refactor specific usages to avoid potential vulnerabilities. However, the classification of unsafe usages is challenging and requires the context of the call and the program’s structure. In this paper, we present the first automated classifier for unsafe usages in Go, UnGoML, to identify what is done with the unsafe package and why it is used. For UnGoML, we built four custom deep learning classifiers trained on a manually labeled data set. We represent Go code as enriched control-flow graphs (CFGs) and solve the label prediction task with one single-vertex and three context-aware classifiers. All three context-aware classifiers achieve a top-1 accuracy of more than 86% for both dimensions, WHAT and WHY. Furthermore, in a set-valued conformal prediction setting, we achieve accuracies of more than 93% with mean label set sizes of 2 for both dimensions. Thus, UnGoML can be used to efficiently filter unsafe usages for use cases such as refactoring or a security audit.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Z. Liu, Y. Ma, M. Schubert, Y. Ouyang, W. Rong and Z. Xiong.
Multimodal Contrastive Transformer for Explainable Recommendation.
IEEE Transactions on Computational Social Systems (May. 2023). DOI

Explanations play an essential role in helping users evaluate results from recommender systems. Various natural language generation methods have been proposed to generate explanations for the recommendation. However, they usually suffer from two problems. First, since user-provided review text contains noisy data, the generated explanations may be irrelevant to the recommended items. Second, as lacking some supervision signals, most of the generated sentences are similar, which cannot meet the diversity and personalized needs of users. To tackle these problems, we propose a multimodal contrastive transformer (MMCT) model for an explainable recommendation, which incorporates multimodal information into the learning process, including sentiment features, item features, item images, and refined user reviews. Meanwhile, we propose a dynamic fusion mechanism during the decoding stage, which generates supervision signals to guide the explanation generation. Additionally, we develop a contrastive objective to generate diverse explainable texts. Comprehensive experiments on two real-world datasets show that the proposed model outperforms comparable explainable recommendation baselines in terms of explanation performance and recommendation performance. Efficiency analysis and robustness analysis verify the advantages of the proposed model. While ablation analysis establishes the relative contributions of the respective components and various modalities, the case study shows the working of our model from an intuitive sense.

MCML Authors
Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Matthias Schubert

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

D. Schubert, P. Gupta and M. Wever.
Meta-learning for Automated Selection of Anomaly Detectors for Semi-supervised Datasets.
IDA 2023 - 21st International Symposium on Intelligent Data Analysis. Louvain-la-Neuve, Belgium, Apr 12-14, 2023. DOI

In anomaly detection, a prominent task is to induce a model to identify anomalies learned solely based on normal data. Generally, one is interested in finding an anomaly detector that correctly identifies anomalies, i.e., data points that do not belong to the normal class, without raising too many false alarms. Which anomaly detector is best suited depends on the dataset at hand and thus needs to be tailored. The quality of an anomaly detector may be assessed via confusion-based metrics such as the Matthews correlation coefficient (MCC). However, since during training only normal data is available in a semi-supervised setting, such metrics are not accessible. To facilitate automated machine learning for anomaly detectors, we propose to employ meta-learning to predict MCC scores using the metrics that can be computed with normal data only and order anomaly detectors using the predicted scores for selection. First promising results can be obtained considering the hypervolume and the false positive rate as meta-features.

MCML Authors

T. Tornede, A. Tornede, L. Fehring, L. Gehring, H. Graf, J. Hanselle, F. Mohr and M. Wever.
PyExperimenter: Easily distribute experiments and track results.
The Journal of Open Source Software 8.86 (Apr. 2023). DOI

PyExperimenter is a tool to facilitate the setup, documentation, execution, and subsequent evaluation of results from an empirical study of algorithms and in particular is designed to reduce the involved manual effort significantly. It is intended to be used by researchers in the field of artificial intelligence, but is not limited to those.
The empirical analysis of algorithms is often accompanied by the execution of algorithms for different inputs and variants of the algorithms, specified via parameters, and the measurement of non-functional properties. Since the individual evaluations are usually independent, the evaluation can be performed in a distributed manner on an HPC system. However, setting up, documenting, and evaluating the results of such a study is often file-based. Usually, this requires extensive manual work to create configuration files for the inputs or to read and aggregate measured results from a report file. In addition, monitoring and restarting individual executions is tedious and time-consuming.
PyExperimenter adresses theses challenges by means of a single well defined configuration file and a central database for managing massively parallel evaluations, as well as collecting and aggregating their results. Thereby, PyExperimenter alleviates the aforementioned overhead and allows experiment executions to be defined and monitored with ease.

MCML Authors
Link to website

Jonas Hanselle

Artificial Intelligence and Machine Learning

M. K. Belaid, D. E. Mekki, M. Rabus and E. Hüllermeier.
Optimizing Data Shapley Interaction Calculation from $O(2^n)$ to $O(t n^2)$ for KNN models.
Preprint (Apr. 2023). arXiv

With the rapid growth of data availability and usage, quantifying the added value of each training data point has become a crucial process in the field of artificial intelligence. The Shapley values have been recognized as an effective method for data valuation, enabling efficient training set summarization, acquisition, and outlier removal. In this paper, we introduce ‘STI-KNN’, an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in $O(t n^2)$ time, which is a significant improvement over the $O(2^n)$ time complexity of baseline methods. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

J. Brandt, E. Schede, B. Haddenhorst, V. Bengs, E. Hüllermeier and K. Tierney.
AC-Band: A Combinatorial Bandit-Based Approach to Algorithm Configuration.
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI

We study the algorithm configuration (AC) problem, in which one seeks to find an optimal parameter configuration of a given target algorithm in an automated way. Although this field of research has experienced much progress recently regarding approaches satisfying strong theoretical guarantees, there is still a gap between the practical performance of these approaches and the heuristic state-of-the-art approaches. Recently, there has been significant progress in designing AC approaches that satisfy strong theoretical guarantees. However, a significant gap still remains between the practical performance of these approaches and state-of-the-art heuristic methods. To this end, we introduce AC-Band, a general approach for the AC problem based on multi-armed bandits that provides theoretical guarantees while exhibiting strong practical performance. We show that AC-Band requires significantly less computation time than other AC approaches providing theoretical guarantees while still yielding high-quality configurations.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

J. Brandt, M. Wever, D. Iliadis, V. Bengs and E. Hüllermeier.
Iterative Deepening Hyperband.
Preprint (Feb. 2023). arXiv

Hyperparameter optimization (HPO) is concerned with the automated search for the most appropriate hyperparameter configuration (HPC) of a parameterized machine learning algorithm. A state-of-the-art HPO method is Hyperband, which, however, has its own parameters that influence its performance. One of these parameters, the maximal budget, is especially problematic: If chosen too small, the budget needs to be increased in hindsight and, as Hyperband is not incremental by design, the entire algorithm must be re-run. This is not only costly but also comes with a loss of valuable knowledge already accumulated. In this paper, we propose incremental variants of Hyperband that eliminate these drawbacks, and show that these variants satisfy theoretical guarantees qualitatively similar to those for the original Hyperband with the ‘right’ budget. Moreover, we demonstrate their practical utility in experiments with benchmark data sets.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

V. Bengs and E. Hüllermeier.
Multi-armed bandits with censored consumption of resources.
Machine Learning 112.1 (Jan. 2023). DOI

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit. Otherwise, the observation is censored, i.e., no reward is obtained. For this problem setting, we introduce a measure of regret, which incorporates both the actual amount of consumed resources of each learning round and the optimality of realizable rewards as well as the risk of exceeding the allocated resource limit. Thus, to minimize regret, the learner needs to set a resource limit and choose an arm in such a way that the chance to realize a high reward within the predefined resource limit is high, while the resource limit itself should be kept as low as possible. We propose a UCB-inspired online learning algorithm, which we analyze theoretically in terms of its regret upper bound. In a simulation study, we show that our learning algorithm outperforms straightforward extensions of standard multi-armed bandit algorithms.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

P. Gupta, J. P. Drees and E. Hüllermeier.
Automated Side-Channel Attacks using Black-Box Neural Architecture Search.
Preprint at Cryptology ePrint Archive (Jan. 2023). URL

The usage of convolutional neural networks (CNNs) to break cryptographic systems through hardware side-channels has enabled fast and adaptable attacks on devices like smart cards and TPMs. Current literature proposes fixed CNN architectures designed by domain experts to break such systems, which is time-consuming and unsuitable for attacking a new system. Recently, an approach using neural architecture search (NAS), which is able to acquire a suitable architecture automatically, has been explored. These works use the secret key information in the attack dataset for optimization and only explore two different search strategies using one-dimensional CNNs. We propose a NAS approach that relies only on using the profiling dataset for optimization, making it fully black-box. Using a large-scale experimental parameter study, we explore which choices for NAS, such as 1-D or 2-D CNNs and search strategy, produce the best results on 10 state-of-the-art datasets for Hamming weight and identity leakage models. We show that applying the random search strategy on 1-D inputs results in a high success rate and retrieves the correct secret key using a single attack trace on two of the datasets. This combination matches the attack efficiency of fixed CNN architectures, outperforming them in 4 out of 10 datasets. Our experiments also point toward the need for repeated attack evaluations of machine learning-based solutions in order to avoid biased performance estimates.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning


S. Legler, T. Janjic, M. H. Shaker and E. Hüllermeier.
Machine learning for estimating parameters of a convective-scale model: A comparison of neural networks and random forests.
GMA - 32nd Workshop of Computational Intelligence of the VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik. Berlin, Germany, Dec 01-02, 2022. PDF

Errors and inaccuracies in the representation of clouds in convection-permitting numerical weather prediction models can be caused by various sources, including the forcing and boundary conditions, the representation of orography, and the accuracy of the numerical schemes determining the evolution of humidity and temperature. Moreover, the parametrization of microphysics and the parametrization of processes in the surface and boundary layers do have a significant influence. These schemes typically contain several tunable parameters that are either non-physical or only crudely known, leading to model errors and imprecision. Furthermore, not accounting for uncertainties in these parameters might lead to overconfidence in the model during forecasting and data assimilation (DA).
Traditionally, the numerical values of model parameters are chosen by manual model tuning. More objectively, they can be estimated from observations by the so-called augmented state approach during the data assimilation [7]. Alternatively, the problem of estimating model parameters has recently been tackled by means of a hybrid approach combining DA with machine learning, more specifically a Bayesian neural network (BNN) [6]. As a proof of concept, this approach has been applied to a one-dimensional modified shallow-water (MSW) model [8].
Even though the BNN is able to accurately estimate the model parameters and their uncertainties, its high computational cost poses an obstacle to its use in operational settings where the grid sizes of the atmospheric fields are much larger than in the simple MSW model. Because random forests (RF) [2] are typically computationally cheaper while still being able to adequately represent uncertainties, we are interested in comparing RFs and BNNs. To this end, we follow [6] and again consider the problem of estimating the three model parameters of the MSW model as a function of the atmospheric state.

MCML Authors
Link to website

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

S. Gilhuber, P. Jahn, Y. Ma and T. Seidl.
VERIPS: Verified Pseudo-label Selection for Deep Active Learning.
ICDM 2022 - 22nd IEEE International Conference on Data Mining. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI GitHub

Active learning has the power to significantly reduce the amount of labeled data needed to build strong classifiers. Existing active pseudo-labeling methods show high potential in integrating pseudo-labels within the active learning loop but heavily depend on the prediction accuracy of the model. In this work, we propose VERIPS, an algorithm that significantly outperforms existing pseudo-labeling techniques for active learning. At its core, VERIPS uses a pseudo-label verification mechanism that consists of a second network only trained on data approved by the oracle and helps to discard questionable pseudo-labels. In particular, the verifier model eliminates all pseudo-labels for which it disagrees with the actual task model. VERIPS overcomes the problems of poorly performing initial models, e.g., due to imbalanced or too small initial pools, where previous methods select too many incorrect pseudo-labels and recovering takes long or is not possible. Moreover, VERIPS is particularly insensitive to parameter choices that existing approaches suffer from.

MCML Authors
Link to website

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Link to website

Philipp Jahn

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

V. Bengs, E. Hüllermeier and W. Waegeman.
Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner’s (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

J. Brandt, V. Bengs, B. Haddenhorst and E. Hüllermeier.
Finding optimal arms in non-stochastic combinatorial bandits with semi-bandit feedback and finite budget.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. Unlike existing works, we study this problem in a non-stochastic setting with subset-dependent feedback, i.e., the semi-bandit feedback received could be generated by an oblivious adversary and also might depend on the chosen set of arms. In addition, we consider a general feedback scenario covering both the numerical-based as well as preference-based case and introduce a sound theoretical framework for this setting guaranteeing sensible notions of optimal arms, which a learner seeks to find. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies from aggressive to conservative. Theoretical questions about the sufficient and necessary budget of the algorithm to find the best arm are answered and complemented by deriving lower bounds for any learning algorithm for this problem scenario.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

A. Campagner, J. Lienen, E. Hüllermeier and D. Ciucci.
Scikit-Weak: A Python Library for Weakly Supervised Machine Learning.
IJCRS 2022 - International Joint Conference on Rough Sets. Suzhou, China, Nov 11-14, 2022. DOI

In this article we introduce and describe SCIKIT-WEAK, a Python library inspired by SCIKIT-LEARN and developed to provide an easy-to-use framework for dealing with weakly supervised and imprecise data learning problems, which, despite their importance in real-world settings, cannot be easily managed by existing libraries. We provide a rationale for the development of such a library, then we discuss its design and the currently implemented methods and classes, which encompass several state-of-the-art algorithms.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
Journal of Artificial Intelligence Research 75 (Oct. 2022). DOI

Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There is currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. We review existing AC literature within the lens of our taxonomies, outline relevant design choices of configuration approaches, contrast methods and problem variants against each other, and describe the state of AC in industry. Finally, our review provides researchers and practitioners with a look at future research directions in the field of AC.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

C. M. M. Frey, Y. Ma and M. Schubert.
SEA: Graph Shell Attention in Graph Neural Networks.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

A common problem in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes’ representations of the input graph align and become indiscernible. The latest models employing attention mechanisms with Graph Transformer Layers (GTLs) are still restricted to the layer-wise computational workflow of a GNN that are not beyond preventing such effects. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes’ representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph textbf{S}htextbf{e}ll textbf{A}ttention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node’s representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results while drastically reducing the number of parameters compared to state-of-the-art models.

MCML Authors
Christian Frey

Christian Frey


* Former Member

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Matthias Schubert

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

S. Gilhuber, M. Berrendorf, Y. Ma and T. Seidl.
Accelerating Diversity Sampling for Deep Active Learning By Low-Dimensional Representations.
IAL @ECML-PKDD 2022 - 6th International Workshop on Interactive Adaptive Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-23, 2022. PDF GitHub

Selecting diverse instances for annotation is one of the key factors of successful active learning strategies. To this end, existing methods often operate on high-dimensional latent representations. In this work, we propose to use the low-dimensional vector of predicted probabilities instead, which can be seamlessly integrated into existing methods. We empirically demonstrate that this considerably decreases the query time, i.e., time to select an instance for annotation, while at the same time improving results. Low query times are relevant for active learning researchers, which use a (fast) oracle for simulated annotation and thus are often constrained by query time. It is also practically relevant when dealing with complex annotation tasks for which only a small pool of skilled domain experts is available for annotation with a limited time budget.

MCML Authors
Link to website

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining

Max Berrendorf

Max Berrendorf


* Former Member

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

Z. Ding, Z. Li, R. Qi, J. Wu, B. He, Y. Ma, Z. Meng, S. Chen, R. Liao, Z. Han and V. Tresp.
Forecasting Question Answering over Temporal Knowledge Graphs.
Preprint (Aug. 2022). arXiv

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. TKGQA requires temporal reasoning techniques to extract the relevant information from temporal knowledge bases. The only existing TKGQA dataset, i.e., CronQuestions, consists of temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning the same period can be fully used for answer inference, allowing the TKGQA models to use even the future knowledge to answer the questions based on the past facts. In real-world scenarios, however, it is also common that given the knowledge until now, we wish the TKGQA systems to answer the questions asking about the future. As humans constantly seek plans for the future, building TKGQA systems for answering such forecasting questions is important. Nevertheless, this has still been unexplored in previous research. In this paper, we propose a novel task: forecasting question answering over temporal knowledge graphs. We also propose a large-scale TKGQA benchmark dataset, i.e., ForecastTKGQuestions, for this task. It includes three types of questions, i.e., entity prediction, yes-no, and fact reasoning questions. For every forecasting question in our dataset, QA models can only have access to the TKG information before the timestamp annotated in the given question for answer inference. We find that the state-of-the-art TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-no questions and fact reasoning questions. To this end, we propose ForecastTKGQA, a TKGQA model that employs a TKG forecasting module for future inference, to answer all three types of questions. Experimental results show that ForecastTKGQA outperforms recent TKGQA methods on the entity prediction questions, and it also shows great effectiveness in answering the other two types of questions.

MCML Authors
Link to website

Zifeng Ding

Database Systems and Data Mining

Link to website

Zongyue Li

Spatial Artificial Intelligence

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to website

Shuo Chen

Database Systems and Data Mining

Link to website

Ruotong Liao

Database Systems and Data Mining

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
IJCAI-ECAI 2022 - 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence. Vienna, Austria, Jul 23-29, 2022. Extended Abstract. DOI

Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There is currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. We review existing AC literature within the lens of our taxonomies, outline relevant design choices of configuration approaches, contrast methods and problem variants against each other, and describe the state of AC in industry. Finally, our review provides researchers and practitioners with a look at future research directions in the field of AC.

MCML Authors
Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Z. Liu, Y. Ma, M. Schubert, Y. Ouyang and Z. Xiong.
Multi-Modal Contrastive Pre-training for Recommendation.
ICMR 2022 - ACM International Conference on Multimedia Retrieval. Newark, NJ, USA, Jun 27-30, 2022. DOI

Personalized recommendation plays a central role in various online applications. To provide quality recommendation service, it is of crucial importance to consider multi-modal information associated with users and items, e.g., review text, description text, and images. However, many existing approaches do not fully explore and fuse multiple modalities. To address this problem, we propose a multi-modal contrastive pre-training model for recommendation. We first construct a homogeneous item graph and a user graph based on the relationship of co-interaction. For users, we propose intra-modal aggregation and inter-modal aggregation to fuse review texts and the structural information of the user graph. For items, we consider three modalities: description text, images, and item graph. Moreover, the description text and image complement each other for the same item. One of them can be used as promising supervision for the other. Therefore, to capture this signal and better exploit the potential correlation of intra-modalities, we propose a self-supervised contrastive inter-modal alignment task to make the textual and visual modalities as similar as possible. Then, we apply inter-modal aggregation to obtain the multi-modal representation of items. Next, we employ a binary cross-entropy loss function to capture the potential correlation between users and items. Finally, we fine-tune the pre-trained multi-modal representations using an existing recommendation model. We have performed extensive experiments on three real-world datasets. Experimental results verify the rationality and effectiveness of the proposed method.

MCML Authors
Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Matthias Schubert

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

G. Fu, Z. Meng, Z. Han, Z. Ding, Y. Ma, M. Schubert, V. Tresp and R. Wattenhofer.
TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion.
SPNLP @ACL 2022 - 6th ACL Workshop on Structured Prediction for NLP at the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). Dublin, Ireland, May 22-27, 2022. DOI

Temporal knowledge graphs store the dynamics of entities and relations during a time period. However, typical temporal knowledge graphs often suffer from incomplete dynamics with missing facts in real-world scenarios. Hence, modeling temporal knowledge graphs to complete the missing facts is important. In this paper, we tackle the temporal knowledge graph completion task by proposing TempCaps, which is a Capsule network-based embedding model for Temporal knowledge graph completion. TempCaps models temporal knowledge graphs by introducing a novel dynamic routing aggregator inspired by Capsule Networks. Specifically, TempCaps builds entity embeddings by dynamically routing retrieved temporal relation and neighbor information. Experimental results demonstrate that TempCaps reaches state-of-the-art performance for temporal knowledge graph completion. Additional analysis also shows that TempCaps is efficient.

MCML Authors
Link to website

Zifeng Ding

Database Systems and Data Mining

Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Matthias Schubert

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Y. Liu, Y. Ma, M. Hildebrandt, M. Joblin and V. Tresp.
TLogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs.
AAAI 2022 - 36th Conference on Artificial Intelligence. Virtual, Feb 22-Mar 01, 2022. DOI

Conventional static knowledge graphs model entities in relational data as nodes, connected by edges of specific relation types. However, information and knowledge evolve continuously, and temporal dynamics emerge, which are expected to influence future situations. In temporal knowledge graphs, time information is integrated into the graph by equipping each edge with a timestamp or a time range. Embedding-based methods have been introduced for link prediction on temporal knowledge graphs, but they mostly lack explainability and comprehensible reasoning chains. Particularly, they are usually not designed to deal with link forecasting – event prediction involving future timestamps. We address the task of link forecasting on temporal knowledge graphs and introduce TLogic, an explainable framework that is based on temporal logical rules extracted via temporal random walks. We compare TLogic with state-of-the-art baselines on three benchmark datasets and show better overall performance while our method also provides explanations that preserve time consistency. Furthermore, in contrast to most state-of-the-art embedding-based methods, TLogic works well in the inductive setting where already learned rules are transferred to related datasets with a common vocabulary.

MCML Authors
Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

V.-L. Nguyen, M. H. Shaker and E. Hüllermeier.
How to measure uncertainty in uncertainty sampling for active learning.
Machine Learning 111.1 (2022). DOI

Various strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.

MCML Authors
Link to website

Mohammad Hossein Shaker

Artificial Intelligence and Machine Learning

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning


Y. Ma and V. Tresp.
Causal Inference under Networked Interference and Intervention Policy Enhancement.
AISTATS 2021 - 24th International Conference on Artificial Intelligence and Statistics. Virtual, Apr 13-15, 2021. URL

Estimating individual treatment effects from data of randomized experiments is a critical task in causal inference. The Stable Unit Treatment Value Assumption (SUTVA) is usually made in causal inference. However, interference can introduce bias when the assigned treatment on one unit affects the potential outcomes of the neighboring units. This interference phenomenon is known as spillover effect in economics or peer effect in social science. Usually, in randomized experiments or observational studies with interconnected units, one can only observe treatment responses under interference. Hence, the issue of how to estimate the superimposed causal effect and recover the individual treatment effect in the presence of interference becomes a challenging task in causal inference. In this work, we study causal effect estimation under general network interference using Graph Neural Networks, which are powerful tools for capturing node and link dependencies in graphs. After deriving causal effect estimators, we further study intervention policy improvement on the graph under capacity constraint. We give policy regret bounds under network interference and treatment capacity constraint. Furthermore, a heuristic graph structure-dependent error bound for Graph Neural Network-based causal estimators is provided.

MCML Authors
Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining


Y. Ma and V. Tresp.
A Variational Quantum Circuit Model for Knowledge Graph Embeddings.
QTNML @NeurIPS 2020 - 1st Workshop on Quantum Tensor Networks in Machine Learning at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. PDF

Can quantum computing resources facilitate representation learning? In this work, we propose the first quantum Ansatz for statistical relational learning on knowledge graphs using parametric quantum circuits. We propose a variational quantum circuit for modeling knowledge graphs by introducing quantum representations of entities. In particular, latent representations of entities are encoded as coefficients of quantum states, while predicates are characterized by parametric gates acting on the quantum states. We show that quantum representations can be trained efficiently meanwhile preserving the quantum advantages. Simulations on classical machines with different datasets show that our proposed quantum circuit Ansatz and quantum representations can achieve comparable results to the state-of-the-art classical models, e.g., RESCAL, DISTMULT. Furthermore, after optimizing the models, the complexity of inductive inference on the knowledge graphs can be reduced with respect to the number of entities.

MCML Authors
Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Y. Ma, Z. Han and V. Tresp.
Learning with Temporal Knowledge Graphs.
CIKMW @CIKM 2020 - Workshop at the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020). Galway, Ireland, Oct 19-23, 2020. Invited talk. PDF

Temporal knowledge graphs, also known as episodic or time-dependent knowledge graphs, are large-scale event databases that describe temporally evolving multi-relational data. An episodic knowledge graph can be regarded as a sequence of semantic knowledge graphs incorporated with timestamps. In this talk, we review recently developed learning-based algorithms for temporal knowledge graphs completion and forecasting.

MCML Authors
Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Y. Ma.
Learning with relational knowledge in the context of cognition, quantum computing, and causality.
Dissertation 2020. DOI

This dissertation explores the use of knowledge graphs, including semantic and episodic graphs, for representing static and evolving human knowledge, and proposes methods for improving knowledge inference. It introduces two quantum machine learning algorithms aimed at speeding up knowledge graph inference, demonstrating significant speedups over classical methods. Additionally, the work addresses causal inference in relational data, specifically in social networks, and proposes causal estimators using graph neural networks to estimate superimposed effects and optimize treatment assignments for network welfare. (Shortened.)

MCML Authors
Link to website

Yunpu Ma


Artificial Intelligence and Machine Learning