Home | Publications

Publications by our Members

2025


[1200]
D. Tschernutter and S. Feuerriegel.
Data-driven dynamic police patrolling: An efficient Monte Carlo tree search.
European Journal of Operational Research 32.1 (Feb. 2025). DOI.
Abstract

Crime is responsible for major financial losses and serious harm to the well-being of individuals, and, hence, a crucial task of police operations is effective patrolling. Yet, in existing decision models aimed at police operations, microscopic routing decisions from patrolling are not considered, and, furthermore, the objective is limited to surrogate metrics (e. g., response time) instead of crime prevention. In this paper, we thus formalize the decision problem of dynamic police patrolling as a Markov decision process that models microscopic routing decisions, so that the expected number of prevented crimes are maximized. We experimentally show that standard solution approaches for our decision problem are not scalable to real-world settings. As a remedy, we present a tailored and highly efficient Monte Carlo tree search algorithm. We then demonstrate our algorithm numerically using real-world crime data from Chicago and show that the decision-making by our algorithm offers significant improvements for crime prevention over patrolling tactics from current practice. Informed by our results, we finally discuss implications for improving the patrolling tactics in police operations.

MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


2024


[1199]
K. Bieker, H. T. Kussaba, P. Scholl, J. Jung, A. Swikir, S. Haddadin and G. Kutyniok.
Compositional Construction of Barrier Functions for Switched Impulsive Systems.
63rd IEEE Conference on Decision and Control (CDC 2024). Milan, Italy, Dec 16-19, 2024. To be published. Preprint at arXiv.
Abstract

Many systems occurring in real-world applications, such as controlling the motions of robots or modeling the spread of diseases, are switched impulsive systems. To ensure that the system state stays in a safe region (e.g., to avoid collisions with obstacles), barrier functions are widely utilized. As the system dimension increases, deriving suitable barrier functions becomes extremely complex. Fortunately, many systems consist of multiple subsystems, such as different areas where the disease occurs. In this work, we present sufficient conditions for interconnected switched impulsive systems to maintain safety by constructing local barrier functions for the individual subsystems instead of a global one, allowing for much easier and more efficient derivation. To validate our results, we numerically demonstrate its effectiveness using an epidemiological model.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[1198]
V. Melnychuk, S. Feuerriegel and M. van der Schaar.
Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published.
Abstract

Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the individualized (covariate-conditional) level, namely, the conditional distribution of the treatment effect (CDTE). Unlike average causal quantities, the CDTE is not point identifiable without strong additional assumptions. As a remedy, we employ partial identification to obtain sharp bounds on the CDTE and thereby quantify the aleatoric uncertainty of the treatment effect. We then develop a novel, orthogonal learner for the bounds on the CDTE, which we call AU-learner. We further show that our AU-learner has several strengths in that it satisfies Neyman-orthogonality and is doubly robust. Finally, we propose a fully-parametric deep learning instantiation of our AU-learner.

MCML Authors
Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1197]
E. Ailer, N. Dern, J. Hartford and N. Kilbertus.
Targeted Sequential Indirect Experiment Design.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Scientific hypotheses typically concern specific aspects of complex, imperfectly understood or entirely unknown mechanisms, such as the effect of gene expression levels on phenotypes or how microbial communities influence environmental health. Such queries are inherently causal (rather than purely associational), but in many settings, experiments can not be conducted directly on the target variables of interest, but are indirect. Therefore, they perturb the target variable, but do not remove potential confounding factors. If, additionally, the resulting experimental measurements are multi-dimensional and the studied mechanisms nonlinear, the query of interest is generally not identified. We develop an adaptive strategy to design indirect experiments that optimally inform a targeted query about the ground truth mechanism in terms of sequentially narrowing the gap between an upper and lower bound on the query. While the general formulation consists of a bi-level optimization procedure, we derive an efficiently estimable analytical kernel-based estimator of the bounds for the causal effect, a query of key interest, and demonstrate the efficacy of our approach in confounded, multivariate, nonlinear synthetic settings.

MCML Authors
Link to Elisabeth Ailer

Elisabeth Ailer

Ethics in Systems Design and Machine Learning

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[1196]
R. Dhahri, A. Immer, B. Charpentier, S. Günnemann and V. Fortuin.
Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to naïvely deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlooked. We present Sparsifiability via the Marginal likelihood (SpaM), a pruning framework that highlights the effectiveness of using the Bayesian marginal likelihood in conjunction with sparsity-inducing priors for making neural networks more sparsifiable. Our approach implements an automatic Occam’s razor that selects the most sparsifiable model that still explains the data well, both for structured and unstructured sparsification. In addition, we demonstrate that the pre-computed posterior Hessian approximation used in the Laplace approximation can be re-used to define a cheap pruning criterion, which outperforms many existing (more expensive) approaches. We demonstrate the effectiveness of our framework, especially at high sparsity levels, across a range of different neural network architectures and datasets.

MCML Authors
Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Link to Vincent Fortuin

Vincent Fortuin

Dr.

Associate

Bayesian Deep Learning


[1195]
L. Eyring, S. Karthik, K. Roth, A. Dosovitskiy and Z. Akata.
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Text-to-Image (T2I) models have made significant advancements in recent years, but they still struggle to accurately capture intricate details specified in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from ‘reward hacking’ and may not generalize well to unseen prompt distributions. In this work, we propose Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on the signal from one or multiple human preference reward models. Remarkably, solving this optimization problem with gradient ascent for 50 iterations yields impressive results on four different one-step models across two competitive benchmarks, T2I-CompBench and GenEval. Within a computational budget of 20-50 seconds, ReNO-enhanced one-step models consistently surpass the performance of all current open-source Text-to-Image models. Extensive user studies demonstrate that our model is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. Moreover, given the same computational resources, a ReNO-optimized one-step model outperforms widely-used open-source models such as SDXL and PixArt-α, highlighting the efficiency and effectiveness of ReNO in enhancing T2I model performance at inference time.

MCML Authors
Link to Luca Eyring

Luca Eyring

Interpretable and Reliable Machine Learning

Link to Shyamgopal Karthik

Shyamgopal Karthik

Interpretable and Reliable Machine Learning

Link to Karsten Roth

Karsten Roth

Interpretable and Reliable Machine Learning

Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1194]
F. Hoppe, C. M. Verdun, H. Laus, F. Krahmer and H. Rauhut.
Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.

MCML Authors
Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member

Link to Hannah Laus

Hannah Laus

Optimization & Data Analysis

Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Holger Rauhut

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence


[1193]
A. Javanmardi, D. Stutz and E. Hüllermeier.
Conformalized Credal Set Predictors.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.

MCML Authors
Link to Alireza Javanmardi

Alireza Javanmardi

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1192]
A. H. Kargaran, F. Yvon and H. Schütze.
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

The need for large text corpora has increased with the advent of pretrained language models and, in particular, the discovery of scaling laws for these models. Most available corpora have sufficient data only for languages with large dominant communities. However, there is no corpus available that (i) covers a wide range of minority languages; (ii) is generated by an open-source reproducible pipeline; and (iii) is rigorously cleaned from noise, making it trustworthy to use. We present GlotCC, a clean, document-level, 2TB general domain corpus derived from CommonCrawl, covering more than 1000 languages. We make GlotCC and the system used to generate it - including the pipeline, language identification model, and filters - available to the research community.

MCML Authors
Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1191]
F. Koehler, S. Niedermayr, R. Westermann and N. Thuerey.
APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

We introduce the Autoregressive PDE Emulator Benchmark (APEBench), a comprehensive benchmark suite to evaluate autoregressive neural emulators for solving partial differential equations. APEBench is based on JAX and provides a seamlessly integrated differentiable simulation framework employing efficient pseudo-spectral methods, enabling 46 distinct PDEs across 1D, 2D, and 3D. Facilitating systematic analysis and comparison of learned emulators, we propose a novel taxonomy for unrolled training and introduce a unique identifier for PDE dynamics that directly relates to the stability criteria of classical numerical methods. APEBench enables the evaluation of diverse neural architectures, and unlike existing benchmarks, its tight integration of the solver enables support for differentiable physics training and neural-hybrid emulators. Moreover, APEBench emphasizes rollout metrics to understand temporal generalization, providing insights into the long-term behavior of emulating PDE dynamics. In several experiments, we highlight the similarities between neural emulators and numerical simulators.

MCML Authors
Link to Rüdiger Westermann

Rüdiger Westermann

Prof. Dr.

Computer Graphics & Visualization

Link to Nils Thuerey

Nils Thuerey

Prof. Dr.

Physics-based Simulation


[1190]
G. Ma, Y. Wang, D. Lim, S. Jegelka and Y. Wang.
A Canonicalization Perspective on Invariant and Equivariant Learning.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

In many applications, we desire neural networks to exhibit invariance or equivariance to certain groups due to symmetries inherent in the data. Recently, frame-averaging methods emerged to be a unified framework for attaining symmetries efficiently by averaging over input-dependent subsets of the group, i.e., frames. What we currently lack is a principled understanding of the design of frames. In this work, we introduce a canonicalization perspective that provides an essential and complete view of the design of frames. Canonicalization is a classic approach for attaining invariance by mapping inputs to their canonical forms. We show that there exists an inherent connection between frames and canonical forms. Leveraging this connection, we can efficiently compare the complexity of frames as well as determine the optimality of certain frames. Guided by this principle, we design novel frames for eigenvectors that are strictly superior to existing methods – some are even optimal – both theoretically and empirically. The reduction to the canonicalization perspective further uncovers equivalences between previous methods. These observations suggest that canonicalization provides a fundamental understanding of existing frame-averaging methods and unifies existing equivariant and invariant learning methods.

MCML Authors
Link to Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1189]
Y. Ma, V. Melnychuk, J. Schweisthal and S. Feuerriegel.
DiffPO: A causal diffusion model for learning distributions of potential outcomes.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Predicting potential outcomes of interventions from observational data is crucial for decision-making in medicine, but the task is challenging due to the fundamental problem of causal inference. Existing methods are largely limited to point estimates of potential outcomes with no uncertain quantification; thus, the full information about the distributions of potential outcomes is typically ignored. In this paper, we propose a novel causal diffusion model called DiffPO, which is carefully designed for reliable inferences in medicine by learning the distribution of potential outcomes. In our DiffPO, we leverage a tailored conditional denoising diffusion model to learn complex distributions, where we address the selection bias through a novel orthogonal diffusion loss. Another strength of our DiffPO method is that it is highly flexible (e.g., it can also be used to estimate different causal quantities such as CATE). Across a wide range of experiments, we show that our method achieves state-of-the-art performance.

MCML Authors
Link to Yuchen Ma

Yuchen Ma

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Jonas Schweisthal

Jonas Schweisthal

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1188]
M. Muschalik, H. Baniecki, F. Fumagalli, P. Kolpaczki, B. Hammer and E. Hüllermeier.
shapiq: Shapley Interactions for Machine Learning.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research.

MCML Authors
Link to Maximilian Muschalik

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1187]
T. Nagler, L. Schneider, B. Bischl and M. Feurer.
Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model’s generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment. While reshuffling leads to test performances that are competitive with using fixed splits, it drastically improves results for a single train-validation holdout protocol and can often make holdout become competitive with standard CV while being computationally cheaper.

MCML Authors
Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science


[1186]
R. Paolino, S. Maskey, P. Welke and G. Kutyniok.
Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

We introduce r-loopy Weisfeiler-Leman (r-ℓWL), a novel hierarchy of graph isomorphism tests and a corresponding GNN framework, r-ℓMPNN, that can count cycles up to length r+2. Most notably, we show that r-ℓWL can count homomorphisms of cactus graphs. This strictly extends classical 1-WL, which can only count homomorphisms of trees and, in fact, is incomparable to k-WL for any fixed k. We empirically validate the expressive and counting power of the proposed r-ℓMPNN on several synthetic datasets and present state-of-the-art predictive performance on various real-world datasets.

MCML Authors
Link to Raffaele Paolino

Raffaele Paolino

Mathematical Foundations of Artificial Intelligence

Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[1185]
K. Roth, V. Udandarao, S. Dziadzio, A. Prabhu, M. Cherti, O. Vinyals, O. Hénaff, S. Albanie, M. Bethge and Z. Akata.
A Practitioner's Guide to Continual Multimodal Pretraining.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Multimodal foundation models serve numerous applications at the intersection of vision and language. Still, despite being pretrained on extensive data, they become outdated over time. To keep models updated, research into continual pretraining mainly explores scenarios with either (1) infrequent, indiscriminate updates on large-scale new data, or (2) frequent, sample-level updates. However, practical model deployment often operates in the gap between these two limit cases, as real-world applications often demand adaptation to specific subdomains, tasks or concepts – spread over the entire, varying life cycle of a model. In this work, we complement current perspectives on continual pretraining through a research test bed as well as provide comprehensive guidance for effective continual model updates in such scenarios. We first introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements, constructed over 63 datasets with diverse visual and semantic coverage. Using FoMo-in-Flux, we explore the complex landscape of practical continual pretraining through multiple perspectives: (1) A data-centric investigation of data mixtures and stream orderings that emulate real-world deployment situations, (2) a method-centric investigation ranging from simple fine-tuning and traditional continual learning strategies to parameter-efficient updates and model merging, (3) meta learning rate schedules and mechanistic design choices, and (4) the influence of model and compute scaling. Together, our insights provide a practitioner’s guide to continual multimodal pretraining for real-world deployment.

MCML Authors
Link to Karsten Roth

Karsten Roth

Interpretable and Reliable Machine Learning

Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1184]
D. Rügamer, B. X. W. Liew, Z. Altai and A. Stöcker.
A Functional Extension of Semi-Structured Networks.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Semi-structured networks (SSNs) merge the structures familiar from additive models with deep neural networks, allowing the modeling of interpretable partial feature effects while capturing higher-order non-linearities at the same time. A significant challenge in this integration is maintaining the interpretability of the additive model component. Inspired by large-scale biomechanics datasets, this paper explores extending SSNs to functional data. Existing methods in functional data analysis are promising but often not expressive enough to account for all interactions and non-linearities and do not scale well to large datasets. Although the SSN approach presents a compelling potential solution, its adaptation to functional data remains complex. In this work, we propose a functional SSN method that retains the advantageous properties of classical functional regression approaches while also improving scalability. Our numerical experiments demonstrate that this approach accurately recovers underlying signals, enhances predictive performance, and performs favorably compared to competing methods.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[1183]
R. Stolz, H. Krasowski, J. Thumm, M. Eichelbeck, P. Gassert and M. Althoff.
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Continuous action spaces in reinforcement learning (RL) are commonly defined as multidimensional intervals. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using proximal policy optimization (PPO), we evaluate our methods on four control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.

MCML Authors
Link to Hanna Krasowski

Hanna Krasowski

Dr.

Cyber Physical Systems

Link to Michael Eichelbeck

Michael Eichelbeck

Cyber Physical Systems

Link to Philipp Gassert

Philipp Gassert

Cyber Physical Systems

Link to Matthias Althoff

Matthias Althoff

Prof. Dr.

Cyber Physical Systems


[1182]
J. Wang, M. Ghahremani, Y. Li, B. Ommer and C. Wachinger.
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skeleton human poses, especially in complex pose conditions such as side or rear perspectives of human figures. To address this issue, we present Stable-Pose, a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer (ViT) to gain accurate pose guidance for T2I models. Stable-Pose is designed to adeptly handle pose conditions within pre-trained Stable Diffusion, providing a refined and efficient way of aligning pose representation during image synthesis. We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons. Masked pose images are used to smoothly refine the attention maps based on target pose-related features in a hierarchical manner, transitioning from coarse to fine levels. Additionally, our loss function is formulated to allocate increased emphasis to the pose region, thereby augmenting the model’s precision in capturing intricate pose details. We assessed the performance of Stable-Pose across five public datasets under a wide range of indoor and outdoor human pose scenarios. Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet.

MCML Authors
Link to Morteza Ghahremani

Morteza Ghahremani

Dr.

Artificial Intelligence in Radiology

Link to Yitong Li

Yitong Li

Artificial Intelligence in Radiology

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[1181]
Y. Wang, K. Hu, S. Gupta, Z. Ye, Y. Wang and S. Jegelka.
Understanding the Role of Equivariance in Self-supervised Learning.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Contrastive learning has been a leading paradigm for self-supervised learning, but it is widely observed that it comes at the price of sacrificing useful features (eg colors) by being invariant to data augmentations. Given this limitation, there has been a surge of interest in equivariant self-supervised learning (E-SSL) that learns features to be augmentation-aware. However, even for the simplest rotation prediction method, there is a lack of rigorous understanding of why, when, and how E-SSL learns useful features for downstream tasks. To bridge this gap between practice and theory, we establish an information-theoretic perspective to understand the generalization ability of E-SSL. In particular, we identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks. This synergy effect encourages models to extract class-relevant features to improve its equivariant prediction, which, in turn, benefits downstream tasks requiring semantic features. Based on this perspective, we theoretically analyze the influence of data transformations and reveal several principles for practical designs of E-SSL. Our theory not only aligns well with existing E-SSL methods but also sheds light on new directions by exploring the benefits of model equivariance. We believe that a theoretically grounded understanding on the role of equivariance would inspire more principled and advanced designs in this field.

MCML Authors
Link to Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1180]
D. Winkel, N. Strauß, M. Bernhard, Z. Li, T. Seidl and M. Schubert.
Autoregressive Policy Optimization for Constrained Allocation Tasks.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Allocation tasks represent a class of problems where a limited amount of resources must be allocated to a set of entities at each time step. Prominent examples of this task include portfolio optimization or distributing computational workloads across servers. Allocation tasks are typically bound by linear constraints describing practical requirements that have to be strictly fulfilled at all times. In portfolio optimization, for example, investors may be obligated to allocate less than 30% of the funds into a certain industrial sector in any investment period. Such constraints restrict the action space of allowed allocations in intricate ways, which makes learning a policy that avoids constraint violations difficult. In this paper, we propose a new method for constrained allocation tasks based on an autoregressive process to sequentially sample allocations for each entity. In addition, we introduce a novel de-biasing mechanism to counter the initial bias caused by sequential sampling. We demonstrate the superior performance of our approach compared to a variety of Constrained Reinforcement Learning (CRL) methods on three distinct constrained allocation tasks: portfolio optimization, computational workload distribution, and a synthetic allocation benchmark.

MCML Authors
Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Maximilian Bernhard

Maximilian Bernhard

Database Systems & Data Mining

Link to Zongyue Li

Zongyue Li

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[1179]
M. Yau, N. Karalias, E. Lu, J. Xu and S. Jegelka.
Are Graph Neural Networks Optimal Approximation Algorithms?.
38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message-passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max-Cut, Min-Vertex-Cover, and Max-3-SAT. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against solvers and neural baselines. Finally, we take advantage of OptGNN’s ability to capture convex relaxations to design an algorithm for producing bounds on the optimal solution from the learned embeddings of OptGNN.

MCML Authors
Link to Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1178]
M. Koshil, T. Nagler, M. Feurer and K. Eggensperger.
Towards Localization via Data Embedding for TabPFN.
3rd Table Representation Learning Workshop (TLR 2024) at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. URL.
Abstract

Prior-data fitted networks (PFNs), especially TabPFN, have shown significant promise in tabular data prediction. However, their scalability is limited by the quadratic complexity of the transformer architecture’s attention across training points. In this work, we propose a method to localize TabPFN, which embeds data points into a learned representation and performs nearest neighbor selection in this space. We evaluate it across six datasets, demonstrating its superior performance over standard TabPFN when scaling to larger datasets. We also explore its design choices and analyze the bias-variance trade-off of this localization method, showing that it reduces bias while maintaining manageable variance. This work opens up a pathway for scaling TabPFN to arbitrarily large tabular datasets.

MCML Authors
Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science


[1177]
B. Cong, N. Daheim, Y. Shen, D. Cremers, R. Yokota, M. Khan and T. Möllenhoff.
Variational Low-Rank Adaptation Using IVON.
Workshop Fine-Tuning in Modern Machine Learning: Principles and Scalability (FITML 2024) at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models.

MCML Authors
Yuesong Shen

Yuesong Shen

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1176]
Y. Zhang, Y. Li, X. Wang, Q. Shen, B. Plank, B. Bischl, M. Rezaei and K. Kawaguchi.
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models.
Workshop on Machine Learning and Compression at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all self-attention and feed-forward network (FFN) layers within blocks as individual pruning candidates. FinerCut prunes layers whose removal causes minimal alternation to the model’s output – contributing to a new, lean, interpretable, and task-agnostic pruning method. Tested across 9 benchmarks, our approach retains 90% performance of Llama3-8B with 25% layers removed, and 95% performance of Llama3-70B with 30% layers removed, all without fine-tuning or post-pruning reconstruction. Strikingly, we observe intriguing results with FinerCut: 42% (34 out of 80) of the self-attention layers in Llama3-70B can be removed while preserving 99% of its performance – without additional fine-tuning after removal. Moreover, FinerCut provides a tool to inspect the types and locations of pruned layers, allowing to observe interesting pruning behaviors. For instance, we observe a preference for pruning self-attention layers, often at deeper consecutive decoder layers. We hope our insights inspire future efficient LLM architecture designs.

MCML Authors
Link to Yawei Li

Yawei Li

Statistical Learning & Data Science

Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[1175]
C. Leiber, N. Strauß, M. Schubert and T. Seidl.
Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters.
6th Workshop on Deep Learning and Clustering (DLC 2024) at the 24th IEEE International Conference on Data Mining (ICDM 2024). Abu Dhabi, UAE, Dec 09-12, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in these tasks. However, most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable. Thus, an area of research has emerged that addresses this problem. Most of these approaches estimate the number of clusters separated from the clustering process. This results in a strong dependency of the clustering result on the quality of the initial embedding. Other approaches are tailored to specific clustering processes, making them hard to adapt to other scenarios. In this paper, we propose UNSEEN, a general framework that, starting from a given upper bound, is able to estimate the number of clusters. To the best of our knowledge, it is the first method that can be easily combined with various deep clustering algorithms. We demonstrate the applicability of our approach by combining UNSEEN with the popular deep clustering algorithms DCN, DEC, and DKM and verify its effectiveness through an extensive experimental evaluation on several image and tabular datasets. Moreover, we perform numerous ablations to analyze our approach and show the importance of its components.

MCML Authors
Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining

Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1174]
H. Weingärtner, M. Windl, L. L. Chuang and F. Draxler.
Useful but Distracting: Viewer Experience with Keyword Highlights and Time-Synchronization in Captions for Language Learning.
23rd International Conference on Mobile and Ubiquitous Multimedia (MUM 2024). Stockholm, Sweden, Dec 01-04, 2024. To be published.
Abstract

Captions are a valuable scaffold for language learners, aiding com- prehension and vocabulary acquisition. Past work has proposed enhancements such as keyword highlights for increased learning gains. However, little is known about learners’ experience with enhanced captions, although this is critical for adoption in everyday life. We conducted a survey and focus group to elicit learner preferences and requirements and implemented a processing pipeline for enhanced captions with keyword highlights, time-synchronized keyword highlights, and keyword captions. A subsequent online study (n = 66) showed that time-synchronized keyword highlights were the preferred design for learning but were perceived as too distracting to replace standard captions in everyday viewing scenarios. We conclude that keyword highlights and time-synchronization are suitable for integrating learning into an entertaining everyday- life activity, but the design should be optimized to provide a more seamless experience.

MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media


[1173]
U. Fischer Abaigar, C. Kern, N. Barda and F. Kreuter.
Bridging the gap: Towards an expanded toolkit for AI-driven decision-making in the public sector.
Government Information Quarterly 41.4 (Dec. 2024). DOI.
Abstract

AI-driven decision-making systems are becoming instrumental in the public sector, with applications spanning areas like criminal justice, social welfare, financial fraud detection, and public health. While these systems offer great potential benefits to institutional decision-making processes, such as improved efficiency and reliability, these systems face the challenge of aligning machine learning (ML) models with the complex realities of public sector decision-making. In this paper, we examine five key challenges where misalignment can occur, including distribution shifts, label bias, the influence of past decision-making on the data side, as well as competing objectives and human-in-the-loop on the model output side. Our findings suggest that standard ML methods often rely on assumptions that do not fully account for these complexities, potentially leading to unreliable and harmful predictions. To address this, we propose a shift in modeling efforts from focusing solely on predictive accuracy to improving decision-making outcomes. We offer guidance for selecting appropriate modeling frameworks, including counterfactual prediction and policy learning, by considering how the model estimand connects to the decision-maker’s utility. Additionally, we outline technical methods that address specific challenges within each modeling approach. Finally, we argue for the importance of external input from domain experts and stakeholders to ensure that model assumptions and design choices align with real-world policy objectives, taking a step towards harmonizing AI and public sector objectives.

MCML Authors
Link to Unai Fischer Abaigar

Unai Fischer Abaigar

Social Data Science and AI Lab

Link to Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[1172]
J. Wang, L. Zuo, S. Peng and B. Plank.
MultiClimate: Multimodal Stance Detection on Climate Change Videos.
3rd Workshop on NLP for Positive Impact (NLP4PI 2024) at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. URL. GitHub.
Abstract

Climate change (CC) has attracted increasing attention in NLP in recent years. However, detecting the stance on CC in multimodal data is understudied and remains challenging due to a lack of reliable datasets. To improve the understanding of public opinions and communication strategies, this paper presents MultiClimate, the first open-source manually-annotated stance detection dataset with 100 CC-related YouTube videos and 4,209 frame-transcript pairs. We deploy state-of-the-art vision and language models, as well as multimodal models for MultiClimate stance detection. Results show that text-only BERT significantly outperforms image-only ResNet50 and ViT. Combining both modalities achieves state-of-the-art, 0.747/0.749 in accuracy/F1. Our 100M-sized fusion models also beat CLIP and BLIP, as well as the much larger 9B-sized multimodal IDEFICS and text-only Llama3 and Gemma2, indicating that multimodal stance detection remains challenging for large language models.

MCML Authors
Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1171]
K. Hämmerl, A. Manea, G. Vico, J. Helcl and J. Libovický.
CUNI and LMU Submission to the MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval.
4th Multilingual Representation Learning Workshop (MRL 2024) at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published.
Abstract

We present the joint CUNI and LMU submission to the MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval. The shared task objective was to explore how we can deploy modern methods in NLP in multi-lingual low-resource settings, tested on two sub-tasks: Named-entity recognition and question answering. Our solutions to the subtasks are based on data acquisition and model adaptation. We compare the performance of our submitted systems with the translate-test approach which proved to be the most useful in the previous edition of the shared task. Our results show that using more data as well as fine-tuning recent multilingual pre-trained models leads to considerable improvements over the translate-test baseline.

MCML Authors
Link to Katharina Hämmerl

Katharina Hämmerl

Data Analytics & Statistics


[1170]
M. Di Marco and A. Fraser.
Subword Segmentation in LLMs: Looking at Inflection and Consistency.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. URL.
Abstract

The role of subword segmentation in relation to capturing morphological patterns in LLMs is currently not well explored. Ideally, one would train models like GPT using various segmentations and evaluate how well word meanings are captured. Since this is not computationally feasible, we group words according to their segmentation properties and compare how well a model can solve a linguistic task for these groups. We study two criteria: (i) adherence to morpheme boundaries and (ii) the segmentation consistency of the different inflected forms of a lemma. We select word forms with high and low values for these criteria and carry out experiments on GPT-4o’s ability to capture verbal inflection for 10 languages. Our results indicate that in particular the criterion of segmentation consistency can help to predict the model’s ability to recognize and generate the lemma from an inflected form, providing evidence that subword segmentation is relevant.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[1169]
L. Edman, H. Schmid and A. Fraser.
CUTE: Measuring LLMs’ Understanding of Their Tokens.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Large Language Models (LLMs) show remarkable performance on a wide variety of tasks. Most LLMs split text into multi-character tokens and process them as atomic units without direct access to individual characters. This raises the question: To what extent can LLMs learn orthographic information? To answer this, we propose a new benchmark, CUTE, which features a collection of tasks designed to test the orthographic knowledge of LLMs. We evaluate popular LLMs on CUTE, finding that most of them seem to know the spelling of their tokens, yet fail to use this information effectively to manipulate text, calling into question how much of this knowledge is generalizable.

MCML Authors
Link to Lukas Edman

Lukas Edman

Dr.

Data Analytics & Statistics

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[1168]
W. Lai, V. Hangya and A. Fraser.
Style-Specific Neurons for Steering LLMs in Text Style Transfer.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Text style transfer (TST) aims to modify the style of a text without altering its original meaning. Large language models (LLMs) demonstrate superior performance across multiple tasks, including TST. However, in zero-shot setups, they tend to directly copy a significant portion of the input text to the output without effectively changing its style. To enhance the stylistic variety and fluency of the text, we present sNeuron-TST, a novel approach for steering LLMs using style-specific neurons in TST. Specifically, we identify neurons associated with the source and target styles and deactivate source-style-only neurons to give target-style words a higher probability, aiming to enhance the stylistic diversity of the generated text. However, we find that this deactivation negatively impacts the fluency of the generated text, which we address by proposing an improved contrastive decoding method that accounts for rapid token probability shifts across layers caused by deactivated source-style neurons. Empirical experiments demonstrate the effectiveness of the proposed method on six benchmarks, encompassing formality, toxicity, politics, politeness, authorship, and sentiment.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[1167]
Y. Liu, Y. Zhang, Q. Li, T. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Full-parameter fine-tuning has become the go-to choice for adapting language models (LMs) to downstream tasks due to its excellent performance. As LMs grow in size, fine-tuning the full parameters of LMs requires a prohibitively large amount of GPU memory. Existing approaches utilize zeroth-order optimizer to conserve GPU memory, which can potentially compromise the performance of LMs as non-zero order optimizers tend to converge more readily on most downstream tasks. In this paper, we propose a novel optimizer-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step. HiFT can significantly reduce the amount of gradients and optimizer state parameters residing in GPU memory at the same time, thereby reducing GPU memory usage. Our results demonstrate that: (1) HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning. (2) HiFT supports various optimizers including AdamW, AdaGrad, SGD, etc. (3) HiFT can save more than 60% GPU memory compared with standard full-parameter fine-tuning for 7B model. (4) HiFT enables full-parameter fine-tuning of a 7B model on single 48G A6000 with a precision of 32 using the AdamW optimizer, without using any memory saving techniques.

MCML Authors
Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

Link to Tong Liu

Tong Liu

Database Systems & Data Mining

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1166]
P. Mondorf and B. Plank.
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Knights and knaves problems represent a classic genre of logical puzzles where characters either tell the truth or lie. The objective is to logically deduce each character’s identity based on their statements. The challenge arises from the truth-telling or lying behavior, which influences the logical implications of each statement. Solving these puzzles requires not only direct deductions from individual statements, but the ability to assess the truthfulness of statements by reasoning through various hypothetical scenarios. As such, knights and knaves puzzles serve as compelling examples of suppositional reasoning. In this paper, we introduce TruthQuest, a benchmark for suppositional reasoning based on the principles of knights and knaves puzzles. Our benchmark presents problems of varying complexity, considering both the number of characters and the types of logical statements involved. Evaluations on TruthQuest show that large language models like Llama 3 and Mixtral-8x7B exhibit significant difficulties solving these tasks. A detailed error analysis of the models’ output reveals that lower-performing models exhibit a diverse range of reasoning errors, frequently failing to grasp the concept of truth and lies. In comparison, more proficient models primarily struggle with accurately inferring the logical implications of potentially false statements.

MCML Authors
Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1165]
B. Chen, X. Wang, S. Peng, R. Litschko, A. Korhonen and B. Plank.
'Seeing the Big through the Small': Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. URL.
Abstract

Human label variation (HLV) is a valuable source of information that arises when multiple human annotators provide different labels for valid reasons. In Natural Language Inference (NLI) earlier approaches to capturing HLV involve either collecting annotations from many crowd workers to represent human judgment distribution (HJD) or use expert linguists to provide detailed explanations for their chosen labels. While the former method provides denser HJD information, obtaining it is resource-intensive. In contrast, the latter offers richer textual information but it is challenging to scale up to many human judges. Besides, large language models (LLMs) are increasingly used as evaluators (‘LLM judges’) but with mixed results, and few works aim to study HJDs. This study proposes to exploit LLMs to approximate HJDs using a small number of expert labels and explanations. Our experiments show that a few explanations significantly improve LLMs’ ability to approximate HJDs with and without explicit labels, thereby providing a solution to scale up annotations for HJD. However, fine-tuning smaller soft-label aware models with the LLM-generated model judgment distributions (MJDs) presents partially inconsistent results: while similar in distance, their resulting fine-tuned models and visualized distributions differ substantially. We show the importance of complementing instance-level distance measures with a global-level shape metric and visualization to more effectively evaluate MJDs against human judgment distributions.

MCML Authors
Link to Beiduo Chen

Beiduo Chen

Artificial Intelligence and Computational Linguistics

Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Robert Litschko

Robert Litschko

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1164]
A. Köksal, T. Schick, A. Korhonen and H. Schütze.
LongForm: Effective Instruction Tuning with Reverse Instructions.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. URL. GitHub.
Abstract

Instruction tuning enables language models to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive human annotation, crowd-sourced datasets with alignment issues, and generating noisy examples via LLMs. We introduce the LongForm-C dataset, which is created by reverse instructions. We generate instructions via LLMs for human-written corpus examples using reverse instructions. First we select a diverse set of human-written documents from corpora such as C4 and Wikipedia; then we generate instructions for these documents via LLMs. This approach provides a cheaper and cleaner instruction-tuning dataset with natural output and one suitable for long text generation. Our models outperform 10x larger language models without instruction tuning on tasks such as story/recipe generation and long-form question answering. Moreover, LongForm models outperform prior instruction-tuned models such as FLAN-T5 and Alpaca by a large margin, and improve language understanding capabilities further.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1163]
A. Sedova, R. Litschko, D. Frassinelli, B. Roth and B. Plank.
To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. URL.
Abstract

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.

MCML Authors
Link to Robert Litschko

Robert Litschko

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1162]
R. Zhao, A. Köksal, Y. Liu, L. Weissweiler, A. Korhonen and H. Schütze.
SynthEval: Hybrid Behavioral Testing of NLP Models with Synthetic Evaluation.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. URL. GitHub.
Abstract

Traditional benchmarking in NLP typically involves using static held-out test sets. However, this approach often results in an overestimation of performance and lacks the ability to offer comprehensive, interpretable, and dynamic assessments of NLP models. Recently, works like DynaBench (Kiela et al., 2021) and CheckList (Ribeiro et al., 2020) have addressed these limitations through behavioral testing of NLP models with test types generated by a multistep human-annotated pipeline. Unfortunately, manually creating a variety of test types requires much human labor, often at prohibitive cost. In this work, we propose SYNTHEVAL, a hybrid behavioral testing framework that leverages large language models (LLMs) to generate a wide range of test types for a comprehensive evaluation of NLP models. SYNTHEVAL first generates sentences via LLMs using controlled generation, and then identifies challenging examples by comparing the predictions made by LLMs with task-specific NLP models. In the last stage, human experts investigate the challenging examples, manually design templates, and identify the types of failures the taskspecific models consistently exhibit. We apply SYNTHEVAL to two classification tasks, sentiment analysis and toxic language detection, and show that our framework is effective in identifying weaknesses of strong models on these tasks.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1161]
Z. Ding, J. Wu, J. Wu, Y. Xia and V. Tresp.
Temporal Fact Reasoning over Hyper-Relational Knowledge Graphs.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Stemming from traditional knowledge graphs (KGs), hyper-relational KGs (HKGs) provide additional key-value pairs (i.e., qualifiers) for each KG fact that help to better restrict the fact validity. In recent years, there has been an increasing interest in studying graph reasoning over HKGs. Meanwhile, as discussed in recent works that focus on temporal KGs (TKGs), world knowledge is ever-evolving, making it important to reason over temporal facts in KGs. Previous mainstream benchmark HKGs do not explicitly specify temporal information for each HKG fact. Therefore, almost all existing HKG reasoning approaches do not devise any module specifically for temporal reasoning. To better study temporal fact reasoning over HKGs, we propose a new type of data structure named hyper-relational TKG (HTKG). Every fact in an HTKG is coupled with a timestamp explicitly indicating its time validity. We develop two new benchmark HTKG datasets, i.e., Wiki-hy and YAGO-hy, and propose an HTKG reasoning model that efficiently models hyper-relational temporal facts. To support future research on this topic, we open-source our datasets and model.

MCML Authors
Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Yan Xia

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1160]
E. Garces Arias, J. Rodemann, M. Li, C. Heumann and M. Aßenmacher.
Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Decoding from the output distributions of large language models to produce high-quality text is a complex challenge in language modeling. Various approaches, such as beam search, sampling with temperature, k−sampling, nucleus p−sampling, typical decoding, contrastive decoding, and contrastive search, have been proposed to address this problem, aiming to improve coherence, diversity, as well as resemblance to human-generated text. In this study, we introduce adaptive contrastive search, a novel decoding strategy extending contrastive search by incorporating an adaptive degeneration penalty, guided by the estimated uncertainty of the model at each generation step. This strategy is designed to enhance both the creativity and diversity of the language modeling process while at the same time producing coherent and high-quality generated text output. Our findings indicate performance enhancement in both aspects, across different model architectures and datasets, underscoring the effectiveness of our method in text generation tasks. Our code base, datasets, and models are publicly available.

MCML Authors
Link to Esteban Garces Arias

Esteban Garces Arias

Statistical Learning & Data Science

Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[1159]
R. Liao, M. Erler, H. Wang, G. Zhai, G. Zhang, Y. Ma and V. Tresp.
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA.

MCML Authors
Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

Link to Guangyao Zhai

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to Gengyuan Zhang

Gengyuan Zhang

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1158]
B. Ma, X. Wang, T. Hu, A.-C. Haensch, M. A. Hedderich, B. Plank and F. Kreuter.
The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. URL.
Abstract

Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may capture and convey. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOVs). However, measuring AOVs embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of clarity on how different studies are related to each other and how they can be interpreted. This paper aims to bridge this gap by providing a comprehensive overview of recent works on the evaluation of AOVs in LLMs. Moreover, we survey related approaches in different stages of the evaluation pipeline in these works. By doing so, we address the potential and challenges with respect to understanding the model, human-AI alignment, and downstream application in social sciences. Finally, we provide practical insights into evaluation methods, model enhancement, and interdisciplinary collaboration, thereby contributing to the evolving landscape of evaluating AOVs in LLMs.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Michael Hedderich

Michael Hedderich

Dr.

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[1157]
A. Modarressi, A. Köksal and H. Schütze.
Consistent Document-Level Relation Extraction via Counterfactuals.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Many datasets have been developed to train and evaluate document-level relation extraction (RE) models. Most of these are constructed using real-world data. It has been shown that RE models trained on real-world data suffer from factual biases. To evaluate and address this issue, we present CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using entity replacement. We first demonstrate that models trained on factual data exhibit inconsistent behavior: while they accurately extract triples from factual data, they fail to extract the same triples after counterfactual modification. This inconsistency suggests that models trained on factual data rely on spurious signals such as specific entities and external knowledge – rather than on the input context – to extract triples. We show that by generating document-level counterfactual data with CovEReD and training models on them, consistency is maintained with minimal impact on RE performance. We release our CovEReD pipeline as well as Re-DocRED-CF, a dataset of counterfactual RE documents, to assist in evaluating and addressing inconsistency in document-level RE.

MCML Authors
Link to Ali Modarressi

Ali Modarressi

Statistical NLP and Deep Learning

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1156]
M. Wang, L. Lange, H. Adel, J. Strötgen and H. Schütze.
Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

To ensure large language models contain up-to-date knowledge, they need to be updated regularly. However, model editing is challenging as it might also affect knowledge that is unrelated to the new data. State-of-the-art methods identify parameters associated with specific knowledge and then modify them via direct weight updates. However, these locate-and-edit methods suffer from heavy computational overhead and lack theoretical validation. In contrast, directly fine-tuning the model on requested edits affects the model’s behavior on unrelated knowledge, and significantly damages the model’s generation fluency and consistency. To address these challenges, we propose SAUL, a streamlined model editing method that uses sentence concatenation with augmented random facts for generation regularization. Evaluations on three model editing benchmarks show that SAUL is a practical and reliable solution for model editing outperforming state-of-the-art methods while maintaining generation quality and reducing computational overhead.

MCML Authors
Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1155]
O. Xhelili, Y. Liu and H. Schütze.
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Multilingual pre-trained models (mPLMs) have shown impressive performance on cross-lingual transfer tasks. However, the transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language, even though the two languages may be related or share parts of their vocabularies. Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method aiming to improve the cross-lingual alignment between languages using diverse scripts. We select two areal language groups, Mediterranean-Amharic-Farsi and South+East Asian Languages, wherein the languages are mutually influenced but use different scripts. We apply our method to these language groups and conduct extensive experiments on a spectrum of downstream tasks. The results show that after PPA, models consistently outperform the original model (up to 50% for some tasks) in English-centric transfer. In addition, when we use languages other than English as sources in transfer, our method obtains even larger improvements.

MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1154]
A. Yüksel, A. Köksal, L. K. Senel, A. Korhonen and H. Schütze.
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Multiple choice question answering tasks evaluate the reasoning, comprehension, and mathematical abilities of Large Language Models (LLMs). While existing benchmarks employ automatic translation for multilingual evaluation, this approach is error-prone and potentially introduces culturally biased questions, especially in social sciences. We introduce the first multitask, multiple-choice Turkish QA benchmark, TurkishMMLU, to evaluate LLMs’ understanding of the Turkish language. TurkishMMLU includes over 10,000 questions, covering 9 different subjects from Turkish high-school education curricula. These questions are written by curriculum experts, suitable for the high-school curricula in Turkey, covering subjects ranging from natural sciences and math questions to more culturally representative topics such as Turkish Literature and the history of the Turkish Republic. We evaluate over 20 LLMs, including multilingual open-source (e.g., Gemma, Llama, MT5), closed-source (GPT 4o, Claude, Gemini), and Turkish-adapted (e.g., Trendyol) models. We provide an extensive evaluation, including zero-shot and few-shot evaluation of LLMs, chain-of-thought reasoning, and question difficulty analysis along with model performance. We provide an in-depth analysis of the Turkish capabilities and limitations of current LLMs to provide insights for future LLMs for the Turkish language.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1153]
H. Zhang, J. Liu, Z. Han, S. Chen, B. He, V. Tresp, Z. Xu and J. Gu.
Visual Question Decomposition on Multimodal Large Language Models.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Question decomposition has emerged as an effective strategy for prompting Large Language Models (LLMs) to answer complex questions. However, while existing methods primarily focus on unimodal language models, the question decomposition capability of Multimodal Large Language Models (MLLMs) has yet to be explored. To this end, this paper explores visual question decomposition on MLLMs. Specifically, we introduce a systematic evaluation framework including a dataset and several evaluation criteria to assess the quality of the decomposed sub-questions, revealing that existing MLLMs struggle to produce high-quality sub-questions. To address this limitation, we propose a specific finetuning dataset, DecoVQA+, for enhancing the model’s question decomposition capability. Aiming at enabling models to perform appropriate selective decomposition, we propose an efficient finetuning pipeline. The finetuning pipeline consists of our proposed dataset and a training objective for selective decomposition. Finetuned MLLMs demonstrate significant improvements in the quality of sub-questions and the policy of selective question decomposition. Additionally, the models also achieve higher accuracy with selective decomposition on VQA benchmark datasets.

MCML Authors
Link to Shuo Chen

Shuo Chen

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1152]
A. Maarouf, N. Pröllochs and S. Feuerriegel.
The Virality of Hate Speech on Social Media.
27th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2024). San José, Costa Rica, Nov 09-13, 2024. DOI.
Abstract

Online hate speech is responsible for violent attacks such as, e.g., the Pittsburgh synagogue shooting in 2018, thereby posing a significant threat to vulnerable groups and society in general. However, little is known about what makes hate speech on social media go viral. In this paper, we collect N = 25,219 cascades with 65,946 retweets from X (formerly known as Twitter) and classify them as hateful vs. normal. Using a generalized linear regression, we then estimate differences in the spread of hateful vs. normal content based on author and content variables. We thereby identify important determinants that explain differences in the spreading of hateful vs. normal content. For example, hateful content authored by verified users is disproportionally more likely to go viral than hateful content from non-verified ones: hateful content from a verified user (as opposed to normal content) has a 3.5 times larger cascade size, a 3.2 times longer cascade lifetime, and a 1.2 times larger structural virality. Altogether, we offer novel insights into the virality of hate speech on social media.

MCML Authors
Link to Abdurahman Maarouf

Abdurahman Maarouf

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1151]
A. Bashardoust, S. Feuerriegel and Y. R. Shrestha.
Comparing the Willingness to Share for Human-generated vs. AI-generated Fake News.
27th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2024). San José, Costa Rica, Nov 09-13, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Generative artificial intelligence (AI) presents large risks for society when it is used to create fake news. A crucial factor for fake news to go viral on social media is that users share such content. Here, we aim to shed light on the sharing behavior of users across human-generated vs. AI-generated fake news. Specifically, we study: (1) What is the perceived veracity of human-generated fake news vs. AI-generated fake news? (2) What is the user’s willingness to share human-generated fake news vs. AI-generated fake news on social media? (3) What socio-economic characteristics let users fall for AI-generated fake news? To this end, we conducted a pre-registered, online experiment with N= 988 subjects and 20 fake news from the COVID-19 pandemic generated by GPT-4 vs. humans. Our findings show that AI-generated fake news is perceived as less accurate than human-generated fake news, but both tend to be shared equally. Further, several socio-economic factors explain who falls for AI-generated fake news.

MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1150]
D. Geissler and S. Feuerriegel.
Analyzing the Strategy of Propaganda using Inverse Reinforcement Learning: Evidence from the 2022 Russian Invasion of Ukraine.
27th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2024). San José, Costa Rica, Nov 09-13, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

The 2022 Russian invasion of Ukraine was accompanied by a large-scale, pro-Russian propaganda campaign on social media. However, the strategy behind the dissemination of propaganda has remained unclear, particularly how the online discourse was strategically shaped by the propagandists’ community. Here, we analyze the strategy of the Twitter community using an inverse reinforcement learning (IRL) approach. Specifically, IRL allows us to model online behavior as a Markov decision process, where the goal is to infer the underlying reward structure that guides propagandists when interacting with users with a supporting or opposing stance toward the invasion. Thereby, we aim to understand empirically whether and how between-user interactions are strategically used to promote the proliferation of Russian propaganda. For this, we leverage a large-scale dataset with 349,455 posts with pro-Russian propaganda from 132,131 users. We show that bots and humans follow a different strategy: bots respond predominantly to pro-invasion messages, suggesting that they seek to drive virality; while messages indicating opposition primarily elicit responses from humans, suggesting that they tend to engage in critical discussions. To the best of our knowledge, this is the first study analyzing the strategy behind propaganda from the 2022 Russian invasion of Ukraine through the lens of IRL.

MCML Authors
Link to Dominique Geißler

Dominique Geißler

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1149]
C. Kern, R. Bach, H. Mautner and F. Kreuter.
When Small Decisions Have Big Impact: Fairness Implications of Algorithmic Profiling Schemes.
ACM Journal on Responsible Computing (Nov. 2024). DOI.
Abstract

Algorithmic profiling is increasingly used in the public sector with the hope of allocating limited public resources more effectively and objectively. One example is the prediction-based profiling of job seekers to guide the allocation of support measures by public employment services. However, empirical evaluations of potential side-effects such as unintended discrimination and fairness concerns are rare in this context. We systematically compare and evaluate statistical models for predicting job seekers’ risk of becoming long-term unemployed concerning subgroup prediction performance, fairness metrics, and vulnerabilities to data analysis decisions. Focusing on Germany as a use case, we evaluate profiling models under realistic conditions using large-scale administrative data. We show that despite achieving high prediction performance on average, profiling models can be considerably less accurate for vulnerable social subgroups. In this setting, different classification policies can have very different fairness implications. We therefore call for rigorous auditing processes before such models are put to practice.

MCML Authors
Link to Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[1148]
F. Bongratz, M. Karmann, A. Holz, M. Bonhoeffer, V. Neumaier, S. Deli, B. Schmitz-Koep, C. Zimmer, C. Sorg, M. Thalhammer, D. M. Hedderich and C. Wachinger.
MLV2-Net: Rater-Based Majority-Label Voting for Consistent Meningeal Lymphatic Vessel Segmentation.
Preprint at arXiv (Nov. 2024). arXiv.
Abstract

Meningeal lymphatic vessels (MLVs) are responsible for the drainage of waste products from the human brain. An impairment in their functionality has been associated with aging as well as brain disorders like multiple sclerosis and Alzheimer’s disease. However, MLVs have only recently been described for the first time in magnetic resonance imaging (MRI), and their ramified structure renders manual segmentation particularly difficult. Further, as there is no consistent notion of their appearance, human-annotated MLV structures contain a high inter-rater variability that most automatic segmentation methods cannot take into account. In this work, we propose a new rater-aware training scheme for the popular nnU-Net model, and we explore rater-based ensembling strategies for accurate and consistent segmentation of MLVs. This enables us to boost nnU-Net’s performance while obtaining explicit predictions in different annotation styles and a rater-based uncertainty estimation. Our final model, MLV2-Net, achieves a Dice similarity coefficient of 0.806 with respect to the human reference standard. The model further matches the human inter-rater reliability and replicates age-related associations with MLV volume.

MCML Authors
Link to Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Radiology

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[1147]
V. Ehm, N. El Amrani, Y. Xie, L. Bastian, M. Gao, W. Wang, L. Sang, D. Cao, Z. Lähner, D. Cremers and F. Bernard.
Beyond Complete Shapes: A Quantitative Evaluation of 3D Shape Matching Algorithms.
Preprint at arXiv (Nov. 2024). arXiv.
Abstract

Finding correspondences between 3D shapes is an important and long-standing problem in computer vision, graphics and beyond. While approaches based on machine learning dominate modern 3D shape matching, almost all existing (learning-based) methods require that at least one of the involved shapes is complete. In contrast, the most challenging and arguably most practically relevant setting of matching partially observed shapes, is currently underexplored. One important factor is that existing datasets contain only a small number of shapes (typically below 100), which are unable to serve data-hungry machine learning approaches, particularly in the unsupervised regime. In addition, the type of partiality present in existing datasets is often artificial and far from realistic. To address these limitations and to encourage research on these relevant settings, we provide a generic and flexible framework for the procedural generation of challenging partial shape matching scenarios. Our framework allows for a virtually infinite generation of partial shape matching instances from a finite set of shapes with complete geometry. Further, we manually create cross-dataset correspondences between seven existing (complete geometry) shape matching datasets, leading to a total of 2543 shapes. Based on this, we propose several challenging partial benchmark settings, for which we evaluate respective state-of-the-art methods as baselines.

MCML Authors
Link to Viktoria Ehm

Viktoria Ehm

Computer Vision & Artificial Intelligence

Link to Lennart Bastian

Lennart Bastian

Computer Aided Medical Procedures & Augmented Reality

Link to Maolin Gao

Maolin Gao

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1146]
K. Flöge, M. A. Moeed and V. Fortuin.
Stein Variational Newton Neural Network Ensembles.
Preprint at arXiv (Nov. 2024). arXiv.
Abstract

Deep neural network ensembles are powerful tools for uncertainty quantification, which have recently been re-interpreted from a Bayesian perspective. However, current methods inadequately leverage second-order information of the loss landscape, despite the recent availability of efficient Hessian approximations. We propose a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates. Our approach uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations. We validate the effectiveness of our method on diverse regression and classification tasks, demonstrating superior performance with a significantly reduced number of training epochs compared to existing ensemble-based methods, while enhancing uncertainty quantification and robustness against overfitting.

MCML Authors
Link to Vincent Fortuin

Vincent Fortuin

Dr.

Associate

Bayesian Deep Learning


[1145]
K. Flöge, S. Udayakumar, J. Sommer, M. Piraud, S. Kesselheim, V. Fortuin, S. Günneman, K. J. van der Weg, H. Gohlke, A. Bazarova and E. Merdivan.
OneProt: Towards Multi-Modal Protein Foundation Models.
Preprint at arXiv (Nov. 2024). arXiv.
Abstract

Recent AI advances have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal AI for proteins that integrates structural, sequence, alignment, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of modality encoders along protein sequences. It demonstrates strong performance in retrieval tasks and surpasses state-of-the-art methods in various downstream tasks, including metal ion binding classification, gene-ontology annotation, and enzyme function prediction. This work expands multi-modal capabilities in protein models, paving the way for applications in drug discovery, biocatalytic reaction planning, and protein engineering.

MCML Authors
Link to Vincent Fortuin

Vincent Fortuin

Dr.

Associate

Bayesian Deep Learning


[1144]
L. He, E. Nie, H. Schmid, H. Schütze, N. Mesgarani and J. Brennan.
Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning.
Preprint at arXiv (Nov. 2024). arXiv.
Abstract

This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning) by distinguishing two LLM evaluation paradigms: psycholinguistic and neurolinguistic. Traditional psycholinguistic evaluations often reflect statistical biases that may misrepresent LLMs’ true linguistic capabilities. We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers. This method allows for a detailed examination of how LLMs represent form and meaning, and whether these representations are consistent across languages. Our contributions are three-fold: (1) We compare neurolinguistic and psycholinguistic methods, revealing distinct patterns in LLM assessment; (2) We demonstrate that LLMs exhibit higher competence in form compared to meaning, with the latter largely correlated to the former; (3) We present new conceptual minimal pair datasets for Chinese (COMPS-ZH) and German (COMPS-DE), complementing existing English datasets.

MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1143]
P. Janetzky, T. Schlagenhauf and S. Feuerriegel.
Slowing Down Forgetting in Continual Learning.
Preprint at arXiv (Nov. 2024). arXiv.
Abstract

A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods to slow down forgetting. We further demonstrate the performance gain from our framework across a large series of experiments, including different CL scenarios (class incremental, domain incremental, task incremental learning) different datasets (MNIST, CIFAR10), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.

MCML Authors
Link to Pascal Janetzky

Pascal Janetzky

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1142]
B. Kulynych, J. F. Gomez, G. Kaissis, F. du Pin Calmon and C. Troncoso.
Attack-Aware Noise Calibration for Differential Privacy.
Preprint at arXiv (Nov. 2024). arXiv. URL.
Abstract

Differential privacy (DP) is a widely used approach for mitigating privacy risks when training machine learning models on sensitive data. DP mechanisms add noise during training to limit the risk of information leakage. The scale of the added noise is critical, as it determines the trade-off between privacy and utility. The standard practice is to select the noise scale to satisfy a given privacy budget ε. This privacy budget is in turn interpreted in terms of operational attack risks, such as accuracy, sensitivity, and specificity of inference attacks aimed to recover information about the training data records. We show that first calibrating the noise scale to a privacy budget ε, and then translating {epsilon} to attack risk leads to overly conservative risk assessments and unnecessarily low utility. Instead, we propose methods to directly calibrate the noise scale to a desired attack risk level, bypassing the step of choosing ε. For a given notion of attack risk, our approach significantly decreases noise scale, leading to increased utility at the same level of privacy. We empirically demonstrate that calibrating noise to attack sensitivity/specificity, rather than ε, when training privacy-preserving ML models substantially improves model accuracy for the same risk level. Our work provides a principled and practical way to improve the utility of privacy-preserving ML without compromising on privacy.

MCML Authors
Link to Georgios Kaissis

Georgios Kaissis

Dr.

Associate

Privacy-Preserving and Trustworthy AI


[1141]
Z. Li, D. Muhtar, F. Gu, X. Zhang, P. Xiao, G. He and X. Zhu.
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation.
Preprint at arXiv (Nov. 2024). arXiv. GitHub.
Abstract

Automatically and rapidly understanding Earth’s surface is fundamental to our grasp of the living environment and informed decision-making. This underscores the need for a unified system with comprehensive capabilities in analyzing Earth’s surface to address a wide range of human needs. The emergence of multimodal large language models (MLLMs) has great potential in boosting the efficiency and convenience of intelligent Earth observation. These models can engage in human-like conversations, serve as unified platforms for understanding images, follow diverse instructions, and provide insightful feedbacks. In this study, we introduce LHRS-Bot-Nova, an MLLM specialized in understanding remote sensing (RS) images, designed to expertly perform a wide range of RS understanding tasks aligned with human instructions. LHRS-Bot-Nova features an enhanced vision encoder and a novel bridge layer, enabling efficient visual compression and better language-vision alignment. To further enhance RS-oriented vision-language alignment, we propose a large-scale RS image-caption dataset, generated through feature-guided image recaptioning. Additionally, we introduce an instruction dataset specifically designed to improve spatial recognition abilities. Extensive experiments demonstrate superior performance of LHRS-Bot-Nova across various RS image understanding tasks. We also evaluate different MLLM performances in complex RS perception and instruction following using a complicated multi-choice question evaluation benchmark, providing a reliable guide for future model selection and improvement.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1140]
Y. Ma, Q. Khan and D. Cremers.
MA-DV2F: A Multi-Agent Navigation Framework using Dynamic Velocity Vector Field.
Preprint at arXiv (Nov. 2024). arXiv. GitHub.
Abstract

In this paper we propose MA-DV2F: Multi-Agent Dynamic Velocity Vector Field. It is a framework for simultaneously controlling a group of vehicles in challenging environments. DV2F is generated for each vehicle independently and provides a map of reference orientation and speed that a vehicle must attain at any point on the navigation grid such that it safely reaches its target. The field is dynamically updated depending on the speed and proximity of the ego-vehicle to other agents. This dynamic adaptation of the velocity vector field allows prevention of imminent collisions. Experimental results show that MA-DV2F outperforms concurrent methods in terms of safety, computational efficiency and accuracy in reaching the target when scaling to a large number of vehicles.

MCML Authors
Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1139]
S. Rampp, M. Milling, A. Triantafyllopoulos and B. W. Schuller.
Does the Definition of Difficulty Matter? Scoring Functions and their Role for Curriculum Learning.
Preprint at arXiv (Nov. 2024). arXiv.
Abstract

Curriculum learning (CL) describes a machine learning training strategy in which samples are gradually introduced into the training process based on their difficulty. Despite a partially contradictory body of evidence in the literature, CL finds popularity in deep learning research due to its promise of leveraging human-inspired curricula to achieve higher model performance. Yet, the subjectivity and biases that follow any necessary definition of difficulty, especially for those found in orderings derived from models or training statistics, have rarely been investigated. To shed more light on the underlying unanswered questions, we conduct an extensive study on the robustness and similarity of the most common scoring functions for sample difficulty estimation, as well as their potential benefits in CL, using the popular benchmark dataset CIFAR-10 and the acoustic scene classification task from the DCASE2020 challenge as representatives of computer vision and computer audition, respectively. We report a strong dependence of scoring functions on the training setting, including randomness, which can partly be mitigated through ensemble scoring. While we do not find a general advantage of CL over uniform sampling, we observe that the ordering in which data is presented for CL-based training plays an important role in model performance. Furthermore, we find that the robustness of scoring functions across random seeds positively correlates with CL performance. Finally, we uncover that models trained with different CL strategies complement each other by boosting predictive power through late fusion, likely due to differences in the learnt concepts. Alongside our findings, we release the aucurriculum toolkit (this https URL), implementing sample difficulty and CL-based training in a modular fashion.

MCML Authors
Link to Manuel Milling

Manuel Milling

Health Informatics

Link to Andreas Triantafyllopoulos

Andreas Triantafyllopoulos

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1138]
O. Wysocki, Y. Tan, T. Froech, Y. Xia, M. Wysocki, L. Hoegner, D. Cremers and C. Holst.
ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset.
Preprint at arXiv (Nov. 2024). arXiv.
Abstract

Facade semantic segmentation is a long-standing challenge in photogrammetry and computer vision. Although the last decades have witnessed the influx of facade segmentation methods, there is a lack of comprehensive facade classes and data covering the architectural variability. In ZAHA, we introduce Level of Facade Generalization (LoFG), novel hierarchical facade classes designed based on international urban modeling standards, ensuring compatibility with real-world challenging classes and uniform methods’ comparison. Realizing the LoFG, we present to date the largest semantic 3D facade segmentation dataset, providing 601 million annotated points at five and 15 classes of LoFG2 and LoFG3, respectively. Moreover, we analyze the performance of baseline semantic segmentation methods on our introduced LoFG classes and data, complementing it with a discussion on the unresolved challenges for facade segmentation. We firmly believe that ZAHA shall facilitate further development of 3D facade semantic segmentation methods, enabling robust segmentation indispensable in creating urban digital twins.

MCML Authors
Link to Yan Xia

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Magdalena Wysocki

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1137]
B. Lange.
The Future Audit Society? Automated Assurance and Auditing.
2nd International Conference on Bridging the Gap Between AI and Reality (AISoLA 2024). Crete, Greece, Oct 30-Nov 03, 2024. To be published.
Abstract

AI audits are a key mechanism for responsible AI governance. AI audits have been
proposed in a variety of laws and regulations standardized frameworks and guidelines for
industry best practices as a mechanism to facilitate public trust and accountability for AI system developers and deployers. Though AI auditing for the purpose of compliance and assurance with normative requirements currently lacks defined norms and standardized practices, some systematic assurance AI audit methodologies are emerging that are modelled on financial auditing practices. In the spirit of financial audits which aim to uphold trust in the integrity of the proper function of the financial markets for stakeholders, AI audits, on this line of reasoning, aim to provide assurance to their stakeholders about AI organizations’ ability to govern their algorithms in ways that mitigate harms and uphold human values. Against this backdrop, the nature of the auditing industry is currently evolving. Traditional financial auditing practices are becoming increasingly automated by AI and, given the complexity of some AI-systems themselves and the high degree of assurance that they will require, the future of AI auditing itself will foreseeably be automated. This paper makes a first step toward exploring this picture. I argue that current automated auditing trends run the risk of undermining the justificatory plausibility of auditing as an accountability and trust-facilitating mechanism itself. In particular, I suggest that this leads to a continuous desire for verification, in which the epistemic obscurity of auditing assurance – the nature of the judgment provided auditors – increases and the operational capability of audits to achieve their aims decreases.

MCML Authors
Link to Ben Lange

Ben Lange

Dr.

JRG Leader Ethics of AI

Ethics of Artificial Intelligence


[1136]
M. Bernhard, T. Hannan, N. Strauß and M. Schubert.
Context Matters: Leveraging Spatiotemporal Metadata for Semi-Supervised Learning on Remote Sensing Images.
27th European Conference on Artificial Intelligence (ECAI 2024). Santiago de Compostela, Spain, Oct 19-24, 2024. DOI. GitHub.
Abstract

Remote sensing projects typically generate large amounts of imagery that can be used to train powerful deep neural networks. However, the amount of labeled images is often small, as remote sensing applications generally require expert labelers. Thus, semi-supervised learning (SSL), i.e., learning with a small pool of labeled and a larger pool of unlabeled data, is particularly useful in this domain. Current SSL approaches generate pseudo-labels from model predictions for unlabeled samples. As the quality of these pseudo-labels is crucial for performance, utilizing additional information to improve pseudo-label quality yields a promising direction. For remote sensing images, geolocation and recording time are generally available and provide a valuable source of information as semantic concepts, such as land cover, are highly dependent on spatiotemporal context, e.g., due to seasonal effects and vegetation zones. In this paper, we propose to exploit spatiotemporal metainformation in SSL to improve the quality of pseudo-labels and, therefore, the final model performance. We show that directly adding the available metadata to the input of the predictor at test time degenerates the prediction quality for metadata outside the spatiotemporal distribution of the training set. Thus, we propose a teacher-student SSL framework where only the teacher network uses metainformation to improve the quality of pseudo-labels on the training set. Correspondingly, our student network benefits from the improved pseudo-labels but does not receive metadata as input, making it invariant to spatiotemporal shifts at test time. Furthermore, we propose methods for encoding and injecting spatiotemporal information into the model and introduce a novel distillation mechanism to enhance the knowledge transfer between teacher and student. Our framework dubbed Spatiotemporal SSL can be easily combined with several state-of-the-art SSL methods, resulting in significant and consistent improvements on the BigEarthNet and EuroSAT benchmarks.

MCML Authors
Link to Maximilian Bernhard

Maximilian Bernhard

Database Systems & Data Mining

Link to Tanveer Hannan

Tanveer Hannan

Database Systems & Data Mining

Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[1135]
Y. Liu, F. Shi, D. Wang, Y. Zhang and H. Schütze.
ChatZero: Zero-Shot Cross-Lingual Dialogue Generation via Pseudo-Target Language.
27th European Conference on Artificial Intelligence (ECAI 2024). Santiago de Compostela, Spain, Oct 19-24, 2024. DOI.
Abstract

Although large language models(LLMs) show amazing capabilities, among various exciting applications discovered for LLMs fall short in other low-resource languages. Besides, most existing methods depend on large-scale dialogue corpora and thus building systems for dialogue generation in a zero-shot scenario remains a considerable challenge. To address this challenge, we propose a novel end-to-end zero-shot dialogue generation model ChatZero based on cross-lingual code-switching method. First, we construct code-switching language and pseudo-target language with placeholders. Then for cross-lingual semantic transfer, we employ unsupervised contrastive learning to minimize the semantics gap of the source language, code-switching language, and pseudo-target language that are mutually positive examples in the high dimensional semantic space. Experiments on the multilingual DailyDialog and DSTC7-AVSD datasets demonstrate that ChatZero can achieve more than 90% of the original performance under the zero-shot case compared to supervised learning, and achieve state-of-the-art performance compared with other baselines.

MCML Authors
Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1134]
J. Nama, I. Chalkidisb and M. Rezaei.
Hyperbolic Contrastive Learning for Document Representations – A Multi-View Approach with Paragraph-Level Similarities.
27th European Conference on Artificial Intelligence (ECAI 2024). Santiago de Compostela, Spain, Oct 19-24, 2024. DOI.
Abstract

Self-supervised learning (SSL) has gained prominence due to the increasing availability of unlabeled data and advances in computational efficiency, leading to revolutionized natural language processing with pre-trained language models like BERT and GPT. Representation learning, a core concept in SSL, aims to reduce data dimensionality while preserving meaningful aspects. Conventional SSL methods typically embed data in Euclidean space. However, recent research has revealed that alternative geometries can hold even richer representations, unlocking more meaningful insights from the data. Motivated by this, we propose two novel methods for integrating Hilbert geometry into self-supervised learning for efficient document embedding. First, we present a method directly incorporating Hilbert geometry into the standard Euclidean contrastive learning framework. Additionally, we propose a multi-view hyperbolic contrastive learning framework contrasting both documents and paragraphs. Our findings demonstrate that contrasting only paragraphs, rather than entire documents, can lead to superior efficiency and effectiveness.

MCML Authors
Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[1133]
M. Aßenmacher, L. Karrlein, P. Schiele and C. Heumann.
Introducing wwm-german-18k - Can LLMs Crack the Million? (Or Win at Least 500 Euros?).
7th International Conference on Natural Language and Speech Processing (ICNLSP 2024). Trento, Italy, Oct 19-20, 2024. URL.
Abstract

Language-specific evaluation of large language models (LLMs) for multiple-choice question answering (MCQA) is an important means to test their abilities for a multitude of different dimensions. With a data set assembled from questions from the German variant of ‘Who Wants to Be a Millionaire?’ we evaluate a set of German models and ChatGPT concerning factual/commonsense knowledge, syntactic abilities, and logical reasoning, amongst others. We contribute this new MCQA data set, extracted from the show’s episodes and designed to evaluate the ability of models to answer this diverse range of questions. To ensure data quality, we describe our preprocessing, encompassing data cleaning, deduplication, and the creation of stratified splits. Furthermore, we fine-tune a set of German LLMs and prompt ChatGPT to provide baseline results. Our findings reveal that these models achieve (partly) satisfactory performance on questions of lower difficulty levels (≤ 1000 euros). As the difficulty increases, performance steadily declines, highlighting the challenging nature of the later stages of the game. We contribute to the ongoing efforts to advance the capabilities of LLMs in comprehending and answering questions by providing a valuable resource for German MCQA research as well as further insights into the limitations of current LLMs.

MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[1132]
Z. Xian, L. Zellner, G. M. Tavares and T. Seidl.
CC-HIT: Creating Counterfactuals from High-Impact Transitions.
4th International Workshop on Leveraging Machine Learning in Process Mining (ML4PM 2024) at the 6th International Conference on Process Mining (ICPM 2024). Lyngby, Denmark, Oct 14-18, 2024. PDF.
Abstract

Reliable process information, especially regarding trace du- rations, is crucial for smooth execution. Without it, maintaining a process becomes costly. While many predictive systems aim to identify inefficien- cies, they often focus on individual process instances, missing the global perspective. It is essential not only to detect where delays occur but also to pinpoint specific activity transitions causing them. To address this, we propose CC-HIT (Creating Counterfactuals from High-Impact Transitions), which identifies temporal dependencies across the entire process. By focusing on activity transitions, we provide deeper insights into relational impacts, enabling faster resolution of inefficiencies. CC- HIT highlights the most influential transitions on process performance, offering actionable insights for optimization. We validate this method using the BPIC 2020 dataset, demonstrating its effectiveness compared to existing approaches.

MCML Authors
Link to Zhicong Xian

Zhicong Xian

Database Systems & Data Mining

Link to Gabriel Marques Tavares

Gabriel Marques Tavares

Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1131]
S. Rauch, C. M. M. Frey, L. Zellner and T. Seidl.
Process-Aware Bayesian Networks for Sequential Event Log Queries.
6th International Conference on Process Mining (ICPM 2024). Lyngby, Denmark, Oct 14-18, 2024. To be published.
MCML Authors
Link to Simon Rauch

Simon Rauch

Database Systems & Data Mining

Link to Christian Frey

Christian Frey

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1130]
A. Maldonado, S. A. Aryasomayajula, C. M. M. Frey and T. Seidl.
iGEDI: interactive Generating Event Data with Intentional Features.
Demo Tracks at the 6th International Conference on Process Mining (ICPM 2024). Lyngby, Denmark, Oct 14-18, 2024. To be published.
MCML Authors
Link to Andrea Maldonado

Andrea Maldonado

Database Systems & Data Mining

Link to Christian Frey

Christian Frey

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1129]
A. Maldonado.
Data-Driven Approaches Towards Transparent Benchmarking of Process Mining Tasks.
Doctoral Consortium at the 6th International Conference on Process Mining (ICPM 2024). Lyngby, Denmark, Oct 14-18, 2024. To be published.
MCML Authors
Link to Andrea Maldonado

Andrea Maldonado

Database Systems & Data Mining


[1128]
P. Mondorf and B. Plank.
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models--A Survey.
Conference on Language Modeling (COLM 2024). Philadelphia, PA, USA, Oct 07-09, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1127]
X. Wang, C. Hu, B. Ma, P. Rottger and B. Plank.
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think.
Conference on Language Modeling (COLM 2024). Philadelphia, PA, USA, Oct 07-09, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1126]
D. Grzech, L. Le Folgoc, M. F. Azampour, A. Vlontzos, B. Glocker, N. Navab, J. A. Schnabel and B. Kainz.
Unsupervised Similarity Learning for Image Registration with Energy-Based Models.
11th International Workshop on Biomedical Image Registration (WBIR 2024) at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI.
Abstract

We present a new model for deformable image registration, which learns in an unsupervised way a data-specific similarity metric. The proposed method consists of two neural networks, one that maps pairs of input images to transformations which align them, and one that provides the similarity metric whose maximisation guides the image alignment. We parametrise the similarity metric as an energy-based model, which is simple to train and allows us to improve the accuracy of image registration compared to other models with learnt similarity metrics by taking advantage of a more general mathematical formulation, as well as larger datasets. We also achieve substantial improvement in the accuracy of inter-patient image registration on MRI scans from the OASIS dataset compared to models that rely on traditional functions.

MCML Authors
Link to Mohammad Farid Azampour

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[1125]
B. Jian, J. Pan, M. Ghahremani, D. Rückert, C. Wachinger and B. Wiestler.
Mamba? Catch The Hype Or Rethink What Really Helps for Image Registration.
11th International Workshop on Biomedical Image Registration (WBIR 2024) at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI.
Abstract

VoxelMorph, proposed in 2018, utilizes Convolutional Neural Networks (CNNs) to address medical image registration problems. In 2021 TransMorph advanced this approach by replacing CNNs with Attention mechanisms, claiming enhanced performance. More recently, the rise of Mamba with selective state space models has led to MambaMorph, which substituted Attention with Mamba blocks, asserting superior registration. These developments prompt a critical question: does chasing the latest computational trends with “more advanced” computational blocks genuinely enhance registration accuracy, or is it merely hype? Furthermore, the role of classic high-level registration-specific designs, such as coarse-to-fine pyramid mechanism, correlation calculation, and iterative optimization, warrants scrutiny, particularly in differentiating their influence from the aforementioned low-level computational blocks. In this study, we critically examine these questions through a rigorous evaluation in brain MRI registration. We employed modularized components for each block and ensured unbiased comparisons across all methods and designs to disentangle their effects on performance. Our findings indicate that adopting “advanced” computational elements fails to significantly improve registration accuracy. Instead, well-established registration-specific designs offer fair improvements, enhancing results by a marginal 1.5% over the baseline. Our findings emphasize the importance of rigorous, unbiased evaluation and contribution disentanglement of all low- and high-level registration components, rather than simply following the computer vision trends with “more advanced” computational blocks. We advocate for simpler yet effective solutions and novel evaluation metrics that go beyond conventional registration accuracy, warranting further research across various organs and modalities.

MCML Authors
Link to Bailiang Jian

Bailiang Jian

Artificial Intelligence in Radiology

Link to Morteza Ghahremani

Morteza Ghahremani

Dr.

Artificial Intelligence in Radiology

Link to Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology

Link to Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy


[1124]
F. De Benetti, Y. Yaganeh, C. Belka, S. Corradini, N. Navab, C. Kurz, G. Landry, S. Albarqouni and T. Wendler.
CloverNet – Leveraging Planning Annotations for Enhanced Procedural MR Segmentation: An Application to Adaptive Radiation Therapy.
13th International Workshop on Clinical Image-Based Procedures (CLIP 2024) at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI.
Abstract

In radiation therapy (RT), an accurate delineation of the regions of interest (ROI) and organs at risk (OAR) allows for a more targeted irradiation with reduced side effects. The current clinical workflow for combined MR-linear accelerator devices (MR-linacs) requires the acquisition of a planning MR volume (MR-P), in which the ROI and OAR are accurately segmented by the clinical team. These segmentation maps (S-P) are transferred to the MR acquired on the day of the RT fraction (MR-Fx) using registration, followed by time-consuming manual corrections. The goal of this paper is to enable accurate automatic segmentation of MR-Fx using S-P without clinical workflow disruption. We propose a novel UNet-based architecture, CloverNet, that takes as inputs MR-Fx and S-P in two separate encoder branches, whose latent spaces are concatenated in the bottleneck to generate an improved segmentation of MP-Fx. CloverNet improves the absolute Dice Score by 3.73% (relative +4.34%, p<0.001) when compared with conventional 3D UNet. Moreover, we believe this approach is potentially applicable to other longitudinal use cases in which a prior segmentation of the ROI is available.

MCML Authors
Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1123]
A. H. Berger, L. Lux, N. Stucki, V. Bürgin, S. Shit, A. Banaszaka, D. Rückert, U. Bauer and J. C. Paetzold.
Topologically faithful multi-class segmentation in medical images.
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI.
Abstract

Topological accuracy in medical image segmentation is a highly important property for downstream applications such as network analysis and flow modeling in vessels or cell counting. Recently, significant methodological advancements have brought well-founded concepts from algebraic topology to binary segmentation. However, these approaches have been underexplored in multi-class segmentation scenarios, where topological errors are common. We propose a general loss function for topologically faithful multi-class segmentation extending the recent Betti matching concept, which is based on induced matchings of persistence barcodes. We project the N-class segmentation problem to N single-class segmentation tasks, which allows us to use 1-parameter persistent homology, making training of neural networks computationally feasible. We validate our method on a comprehensive set of four medical datasets with highly variant topological characteristics. Our loss formulation significantly enhances topological correctness in cardiac, cell, artery-vein, and Circle of Willis segmentation.

MCML Authors
Link to Laurin Lux

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to Nico Stucki

Nico Stucki

Applied Topology and Geometry

Link to Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Ulrich Bauer

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry


[1122]
M. Domínguez, Y. Velikova, N. Navab and M. F. Azampour.
Diffusion as Sound Propagation: Physics-Inspired Model for Ultrasound Image Generation.
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI. GitHub.
Abstract

Deep learning (DL) methods typically require large datasets to effectively learn data distributions. However, in the medical field, data is often limited in quantity, and acquiring labeled data can be costly. To mitigate this data scarcity, data augmentation techniques are commonly employed. Among these techniques, generative models play a pivotal role in expanding datasets. However, when it comes to ultrasound (US) imaging, the authenticity of generated data often diminishes due to the oversight of ultrasound physics.
We propose a novel approach to improve the quality of generated US images by introducing a physics-based diffusion model that is specifically designed for this image modality. The proposed model incorporates an US-specific scheduler scheme that mimics the natural behavior of sound wave propagation in ultrasound imaging. Our analysis demonstrates how the proposed method aids in modeling the attenuation dynamics in US imaging. We present both qualitative and quantitative results based on standard generative model metrics, showing that our proposed method results in overall more plausible images.

MCML Authors
Link to Yordanka Velikova

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Mohammad Farid Azampour

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality


[1121]
S. M. Fischer, L. Felsner, R. Osuala, J. Kiechle, D. M. Lang, J. C. Peeken and J. A. Schnabel.
Progressive Growing of Patch Size: Resource-Efficient Curriculum Learning for Dense Prediction Tasks.
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI. GitHub.
Abstract

In this work, we introduce Progressive Growing of Patch Size, a resource-efficient implicit curriculum learning approach for dense prediction tasks. Our curriculum approach is defined by growing the patch size during model training, which gradually increases the task’s difficulty. We integrated our curriculum into the nnU-Net framework and evaluated the methodology on all 10 tasks of the Medical Segmentation Decathlon. With our approach, we are able to substantially reduce runtime, computational costs, and emissions of network training compared to classical constant patch size training. In our experiments, the curriculum approach resulted in improved convergence. We are able to outperform standard nnU-Net training, which is trained with constant patch size, in terms of Dice Score on 7 out of 10 MSD tasks while only spending roughly 50% of the original training runtime. To the best of our knowledge, our Progressive Growing of Patch Size is the first successful employment of a sample-length curriculum in the form of patch size in the field of computer vision.

MCML Authors
Link to Johannes Kiechle

Johannes Kiechle

Computational Imaging and AI in Medicine

Link to Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[1120]
Y. Li, I. Yakushev, D. M. Hedderich and C. Wachinger.
PASTA: Pathology-Aware MRI to PET Cross-Modal Translation with Diffusion Models.
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI. GitHub.
Abstract

Positron emission tomography (PET) is a well-established functional imaging technique for diagnosing brain disorders. However, PET’s high costs and radiation exposure limit its widespread use. In contrast, magnetic resonance imaging (MRI) does not have these limitations. Although it also captures neurodegenerative changes, MRI is a less sensitive diagnostic tool than PET. To close this gap, we aim to generate synthetic PET from MRI. Herewith, we introduce PASTA, a novel pathology-aware image translation framework based on conditional diffusion models. Compared to the state-of-the-art methods, PASTA excels in preserving both structural and pathological details in the target modality, which is achieved through its highly interactive dual-arm architecture and multi-modal condition integration. A cycle exchange consistency and volumetric generation strategy elevate PASTA’s capability to produce high-quality 3D PET scans. Our qualitative and quantitative results confirm that the synthesized PET scans from PASTA not only reach the best quantitative scores but also preserve the pathology correctly. For Alzheimer’s classification, the performance of synthesized scans improves over MRI by 4%, almost reaching the performance of actual PET.

MCML Authors
Link to Yitong Li

Yitong Li

Artificial Intelligence in Radiology

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[1119]
E. Özsoy, C. Pellegrini, M. Keicher and N. Navab.
ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling.
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI. GitHub.
Abstract

Every day, countless surgeries are performed worldwide, each within the distinct settings of operating rooms (ORs) that vary not only in their setups but also in the personnel, tools, and equipment used. This inherent diversity poses a substantial challenge for achieving a holistic understanding of the OR, as it requires models to generalize beyond their initial training datasets. To reduce this gap, we introduce ORacle, an advanced vision-language model designed for holistic OR domain modeling, which incorporates multi-view and temporal capabilities and can leverage external knowledge during inference, enabling it to adapt to previously unseen surgical scenarios. This capability is further enhanced by our novel data augmentation framework, which significantly diversifies the training dataset, ensuring ORacle’s proficiency in applying the provided knowledge effectively. In rigorous testing, in scene graph generation, and downstream tasks on the 4D-OR dataset, ORacle not only demonstrates state-of-the-art performance but does so requiring less data than existing models. Furthermore, its adaptability is displayed through its ability to interpret unseen views, actions, and appearances of tools and equipment. This demonstrates ORacle’s potential to significantly enhance the scalability and affordability of OR domain modeling and opens a pathway for future advancements in surgical data science.

MCML Authors
Link to Ege Özsoy

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Link to Chantal Pellegrini

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Link to Matthias Keicher

Matthias Keicher

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1118]
A. Reithmeir, L. Felsner, R. Braren, J. A. Schnabel and V. A. Zimmer.
Data-Driven Tissue- and Subject-Specific Elastic Regularization for Medical Image Registration.
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI. GitHub.
Abstract

Physics-inspired regularization is desired for intra-patient image registration since it can effectively capture the biomechanical characteristics of anatomical structures. However, a major challenge lies in the reliance on physical parameters: Parameter estimations vary widely across the literature, and the physical properties themselves are inherently subject-specific. In this work, we introduce a novel data-driven method that leverages hypernetworks to learn the tissue-dependent elasticity parameters of an elastic regularizer. Notably, our approach facilitates the estimation of patient-specific parameters without the need to retrain the network. We evaluate our method on three publicly available 2D and 3D lung CT and cardiac MR datasets. We find that with our proposed subject-specific tissue-dependent regularization, a higher registration quality is achieved across all datasets compared to using a global regularizer.

MCML Authors
Link to Anna Reithmeir

Anna Reithmeir

Computational Imaging and AI in Medicine

Link to Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[1117]
O. Tmenova, Y. Velikova, M. Saleh and N. Navab.
Deep Spectral Methods for Unsupervised Ultrasound Image Interpretation.
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI.
Abstract

Ultrasound imaging is challenging to interpret due to non-uniform intensities, low contrast, and inherent artifacts, necessitating extensive training for non-specialists. Advanced representation with clear tissue structure separation could greatly assist clinicians in mapping underlying anatomy and distinguishing between tissue layers. Decomposing an image into semantically meaningful segments is mainly achieved using supervised segmentation algorithms. Unsupervised methods are beneficial, as acquiring large labeled datasets is difficult and costly, but despite their advantages, they still need to be explored in ultrasound. This paper proposes a novel unsupervised deep learning strategy tailored to ultrasound to obtain easily interpretable tissue separations. We integrate key concepts from unsupervised deep spectral methods, which combine spectral graph theory with deep learning methods. We utilize self-supervised transformer features for spectral clustering to generate meaningful segments based on ultrasound-specific metrics and shape and positional priors, ensuring semantic consistency across the dataset. We evaluate our unsupervised deep learning strategy on three ultrasound datasets, showcasing qualitative results across anatomical contexts without label requirements. We also conduct a comparative analysis against other clustering algorithms to demonstrate superior segmentation performance, boundary preservation, and label consistency.

MCML Authors
Link to Yordanka Velikova

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1116]
H. Zerouaoui, G. P. Oderinde, R. Lefdali, K. Echihabi, S. P. Akpulu, N. A. Agbon, A. S. Musa, Y. Yeganeh, A. Farshad and N. Navab.
AMONuSeg: A Histological Dataset for African Multi-organ Nuclei Semantic Segmentation.
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI. GitHub.
Abstract

Nuclei semantic segmentation is a key component for advancing machine learning and deep learning applications in digital pathology. However, most existing segmentation models are trained and tested on high-quality data acquired with expensive equipment, such as whole slide scanners, which are not accessible to most pathologists in developing countries. These pathologists rely on low-resource data acquired with low-precision microscopes, smartphones, or digital cameras, which have different characteristics and challenges than high-resource data. Therefore, there is a gap between the state-of-the-art segmentation models and the real-world needs of low-resource settings. This work aims to bridge this gap by presenting the first fully annotated African multi-organ dataset for histopathology nuclei semantic segmentation acquired with a low-precision microscope. We also evaluate state-of-the-art segmentation models, including spectral feature extraction encoder and vision transformer-based models, and stain normalization techniques for color normalization of Hematoxylin and Eosin-stained histopathology slides. Our results provide important insights for future research on nuclei histopathology segmentation with low-resource data.

MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1115]
F. Bongratz, J. Fecht, A.-M. Rickmann and C. Wachinger.
V2C-Long: Longitudinal Cortex Reconstruction with Spatiotemporal Correspondence.
27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Reconstructing the cortex from longitudinal MRI is indispensable for analyzing morphological changes in the human brain. Despite the recent disruption of cortical surface reconstruction with deep learning, challenges arising from longitudinal data are still persistent. Especially the lack of strong spatiotemporal point correspondence hinders downstream analyses due to the introduced noise. To address this issue, we present V2C-Long, the first dedicated deep learning-based cortex reconstruction method for longitudinal MRI. In contrast to existing methods, V2C-Long surfaces are directly comparable in a cross-sectional and longitudinal manner. We establish strong inherent spatiotemporal correspondences via a novel composition of two deep mesh deformation networks and fast aggregation of feature-enhanced within-subject templates. The results on internal and external test data demonstrate that V2C-Long yields cortical surfaces with improved accuracy and consistency compared to previous methods. Finally, this improvement manifests in higher sensitivity to regional cortical atrophy in Alzheimer’s disease.

MCML Authors
Link to Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Radiology

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[1114]
D. Bani-Harouni, N. Navab and M. Keicher.
MAGDA: Multi-agent Guideline-Driven Diagnostic Assistance.
2nd International Workshop on Foundation Models for General Medical AI (MedAGI 2024) at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI.
Abstract

In emergency departments, rural hospitals, or clinics in less developed regions, clinicians often lack fast image analysis by trained radiologists, which can have a detrimental effect on patients’ healthcare. Large Language Models (LLMs) have the potential to alleviate some pressure from these clinicians by providing insights that can help them in their decision-making. While these LLMs achieve high test results on medical exams showcasing their great theoretical medical knowledge, they tend not to follow medical guidelines. In this work, we introduce a new approach for zero-shot guideline-driven decision support. We model a system of multiple LLM agents augmented with a contrastive vision-language model that collaborate to reach a patient diagnosis. After providing the agents with simple diagnostic guidelines, they will synthesize prompts and screen the image for findings following these guidelines. Finally, they provide understandable chain-of-thought reasoning for their diagnosis, which is then self-refined to consider inter-dependencies between diseases. As our method is zero-shot, it is adaptable to settings with rare diseases, where training data is limited, but expert-crafted disease descriptions are available. We evaluate our method on two chest X-ray datasets, CheXpert and ChestX-ray 14 Longtail, showcasing performance improvement over existing zero-shot methods and generalizability to rare diseases.

MCML Authors
Link to David Bani-Harouni

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Matthias Keicher

Matthias Keicher

Computer Aided Medical Procedures & Augmented Reality


[1113]
D. Daum, R. Osuala, A. Riess, G. Kaissis, J. A. Schnabel and M. Di Folco.
On Differentially Private 3D Medical Image Synthesis with Controllable Latent Diffusion Models.
4th International Workshop on Deep Generative Models (DGM4 2024) at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Generally, the small size of public medical imaging datasets coupled with stringent privacy concerns, hampers the advancement of data-hungry deep learning models in medical imaging. This study addresses these challenges for 3D cardiac MRI images in the short-axis view. We propose Latent Diffusion Models that generate synthetic images conditioned on medical attributes, while ensuring patient privacy through differentially private model training. To our knowledge, this is the first work to apply and quantify differential privacy in 3D medical image generation. We pre-train our models on public data and finetune them with differential privacy on the UK Biobank dataset. Our experiments reveal that pre-training significantly improves model performance, achieving a Fréchet Inception Distance (FID) of 26.77 at ϵ=10, compared to 92.52 for models without pre-training. Additionally, we explore the trade-off between privacy constraints and image quality, investigating how tighter privacy budgets affect output controllability and may lead to degraded performance. Our results demonstrate that proper consideration during training with differential privacy can substantially improve the quality of synthetic cardiac MRI images, but there are still notable challenges in achieving consistent medical realism.

MCML Authors
Link to Georgios Kaissis

Georgios Kaissis

Dr.

Associate

Privacy-Preserving and Trustworthy AI

Link to Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[1112]
F. Dülmer, W. Simson, M. F. Azampour, M. Wysocki, A. Karlas and N. Navab.
PHOCUS: Physics-Based Deconvolution for Ultrasound Resolution Enhancement.
5th International Workshop on Advances in Simplifying Medical Ultrasound (ASMUS 2024) at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI.
Abstract

Ultrasound is widely used in medical diagnostics allowing for accessible and powerful imaging but suffers from resolution limitations due to diffraction and the finite aperture of the imaging system, which restricts diagnostic use. The impulse function of an ultrasound imaging system is called the point spread function (PSF), which is convolved with the spatial distribution of reflectors in the image formation process. Recovering high-resolution reflector distributions by removing image distortions induced by the convolution process improves image clarity and detail. Conventionally, deconvolution techniques attempt to rectify the imaging system’s dependent PSF, working directly on the radio-frequency (RF) data. However, RF data is often not readily accessible. Therefore, we introduce a physics-based deconvolution process using a modeled PSF, working directly on the more commonly available B-mode images. By leveraging Implicit Neural Representations (INRs), we learn a continuous mapping from spatial locations to their respective echogenicity values, effectively compensating for the discretized image space. Our contribution consists of a novel methodology for retrieving a continuous echogenicity map directly from a B-mode image through a differentiable physics-based rendering pipeline for ultrasound resolution enhancement. We qualitatively and quantitatively evaluate our approach on synthetic data, demonstrating improvements over traditional methods in metrics such as PSNR and SSIM. Furthermore, we show qualitative enhancements on an ultrasound phantom and an in-vivo acquisition of a carotid artery.

MCML Authors
Link to Felix Dülmer

Felix Dülmer

Computer Aided Medical Procedures & Augmented Reality

Link to Walter Simson

Walter Simson

Dr.

* Former member

Link to Mohammad Farid Azampour

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Link to Magdalena Wysocki

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1111]
Ç. Köksal, G. Ghazaei, F. Holm, A. Farshad and N. Navab.
SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction.
6th Workshop on GRaphs in biomedicAl Image anaLysis (GRAIL 2024) at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. arXiv.
Abstract

Graph-based holistic scene representations facilitate surgical workflow understanding and have recently demonstrated significant success. However, this task is often hindered by the limited availability of densely annotated surgical scene data. In this work, we introduce an end-to-end framework for the generation and optimization of surgical scene graphs on a downstream task. Our approach leverages the flexibility of graph-based spectral clustering and the generalization capability of foundation models to generate unsupervised scene graphs with learnable properties. We reinforce the initial spatial graph with sparse temporal connections using local matches between consecutive frames to predict temporally consistent clusters across a temporal neighborhood. By jointly optimizing the spatiotemporal relations and node features of the dynamic scene graph with the downstream task of phase segmentation, we address the costly and annotation-burdensome task of semantic scene comprehension and scene graph generation in surgical videos using only weak surgical phase labels. Further, by incorporating effective intermediate scene representation disentanglement steps within the pipeline, our solution outperforms the SOTA on the CATARACTS dataset by 8% accuracy and 10% F1 score in surgical workflow recognition.

MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1110]
P. O. Schenk and C. Kern.
Connecting algorithmic fairness to quality dimensions in machine learning in official statistics and survey production.
AStA Wirtschafts- und Sozialstatistisches Archiv 18 (Oct. 2024). DOI.
Abstract

National Statistical Organizations (NSOs) increasingly draw on Machine Learning (ML) to improve the timeliness and cost-effectiveness of their products. When introducing ML solutions, NSOs must ensure that high standards with respect to robustness, reproducibility, and accuracy are upheld as codified, e.g., in the Quality Framework for Statistical Algorithms (QF4SA; Yung et al. 2022, Statistical Journal of the IAOS). At the same time, a growing body of research focuses on fairness as a pre-condition of a safe deployment of ML to prevent disparate social impacts in practice. However, fairness has not yet been explicitly discussed as a quality aspect in the context of the application of ML at NSOs. We employ the QF4SA quality framework and present a mapping of its quality dimensions to algorithmic fairness. We thereby extend the QF4SA framework in several ways: First, we investigate the interaction of fairness with each of these quality dimensions. Second, we argue for fairness as its own, additional quality dimension, beyond what is contained in the QF4SA so far. Third, we emphasize and explicitly address data, both on its own and its interaction with applied methodology. In parallel with empirical illustrations, we show how our mapping can contribute to methodology in the domains of official statistics, algorithmic fairness, and trustworthy machine learning.

MCML Authors
Link to Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab


[1109]
Y. Wang, C. M. Albrecht and X. Zhu.
Multilabel-Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining.
IEEE Transactions on Geoscience and Remote Sensing 62 (Oct. 2024). DOI. GitHub.
Abstract

Self-supervised pretraining on large-scale satellite data has raised great interest in building Earth observation (EO) foundation models. However, many important resources beyond pure satellite imagery, such as land-cover-land-use products that provide free global semantic information, as well as vision foundation models that hold strong knowledge of the natural world, are not widely studied. In this work, we show these free additional resources not only help resolve common contrastive learning bottlenecks but also significantly boost the efficiency and effectiveness of EO pretraining. Specifically, we first propose soft contrastive learning (SoftCon) that optimizes cross-scene soft similarity based on land-cover-generated multilabel supervision, naturally solving the issue of multiple positive samples and too strict positive matching in complex scenes. Second, we revisit and explore cross-domain continual pretraining for both multispectral and synthetic aperture radar (SAR) imagery, building efficient EO foundation models from strongest vision models such as DINOv2. Adapting simple weight-initialization and Siamese masking strategies into our SoftCon framework, we demonstrate impressive continual pretraining performance even when the input modalities are not aligned. Without prohibitive training, we produce multispectral and SAR foundation models that achieve significantly better results in 10 out of 11 downstream tasks than most existing SOTA models. For example, our ResNet50/ViT-S achieve 84.8/85.0 linear probing mAP scores on BigEarthNet-10%, which are better than most existing ViT-L models; under the same setting, our ViT-B sets a new record of 86.8 in multispectral, and 82.5 in SAR, the latter even better than many multispectral models.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1108]
M. M. H. Maurice M. Heimer, Y. Dikhtyar, B. F. Hoppe, F. L. Herr, A. T. Stüber, T. Burkard, E. Zöller, M. P. Fabritius, L. Unterrainer, L. Adams, A. Thurner, D. Kaufmann, T. Trzaska, M. Kopp, O. Hamer, K. Maurer, I. Ristow, M. S. May, A. Tufman, J. Spiro, M. Brendel, M. Ingrisch, J. Ricke and C. C. Cyran.
Software-assisted structured reporting and semi-automated TNM classification for NSCLC staging in a multicenter proof of concept study.
Insights into Imaging 15.258 (Oct. 2024). DOI.
Abstract

In this multi-center study, we proposed a structured reporting (SR) framework for non-small cell lung cancer (NSCLC) and developed a software-assisted tool to automatically translate image-based findings and annotations into TNM classifications. The aim of this study was to validate the software-assisted SR tool for NSCLC, assess its potential clinical impact in a proof-of-concept study, and evaluate current reporting standards in participating institutions.

MCML Authors
Link to Theresa Stüber

Theresa Stüber

Clinical Data Science in Radiology

Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[1107]
M. Rauscher, A. Scagliotti and F. Pagginelli Patricio.
Shortest-path recovery from signature with an optimal control approach.
Mathematics of Control, Signals, and Systems (Oct. 2024). DOI.
MCML Authors
Link to Alessandro Scagliotti

Alessandro Scagliotti

Applied Numerical Analysis


[1106]
V. Blaschke, B. Kovačić, S. Peng and B. Plank.
MaiBaam Annotation Guidelines.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

This document provides the annotation guidelines for MaiBaam, a Bavarian corpus manually annotated with part-of-speech (POS) tags, syntactic dependencies, and German lemmas. MaiBaam belongs to the Universal Dependencies (UD) project, and our annotations elaborate on the general and German UD version 2 guidelines. In this document, we detail how to preprocess and tokenize Bavarian data, provide an overview of the POS tags and dependencies we use, explain annotation decisions that would also apply to closely related languages like German, and lastly we introduce and motivate decisions that are specific to Bavarian grammar.

MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1105]
H. Chen, H. Li, Y. Zhang, G. Zhang, J. Bi, P. Torr, J. Gu, D. Krompass and V. Tresp.
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM’s pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client’s local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.

MCML Authors
Link to Hang Li

Hang Li

Database Systems & Data Mining

Link to Yao Zhang

Yao Zhang

Database Systems & Data Mining

Link to Gengyuan Zhang

Gengyuan Zhang

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1104]
Q. Chen, X. Wang, P. Mondorf, M. Hedderich and B. Plank.
Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Tree of Thoughts (ToT) is a reasoning strategy for Large Language Models (LLMs) that employs a generator to suggest reasoning steps and a discriminator to decide which steps to implement. ToT demonstrates strong performance on reasoning tasks, often surpassing simple methods such as Input-Output (IO) prompting and Chain-of-Thought (CoT) reasoning. However, ToT does not consistently outperform such simpler methods across all models, leaving large knowledge gaps on the conditions under which ToT is most beneficial. In this paper, we analyze the roles of the generator and discriminator separately to better understand the conditions when ToT is beneficial. We find that the generator plays a more critical role than the discriminator in driving the success of ToT. Scaling the generator leads to notable improvements in ToT performance, even when using a smaller model as the discriminator, whereas scaling the discriminator with a fixed generator yields only marginal gains. Our results show that models across different scales exhibit comparable discrimination capabilities, yet differ significantly in their generative performance for ToT.

MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1103]
Z. Ding, Y. Li, Y. He, A. Norelli, J. Wu, V. Tresp, Y. Ma and M. Bronstein.
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2) Meanwhile, more powerful models are needed to identify and select the most critical temporal information within the extended context provided by longer histories. To address these problems, we propose a CTDG representation learning model named DyGMamba, originating from the popular Mamba state space model (SSM). DyGMamba first leverages a node-level SSM to encode the sequence of historical node interactions. Another time-level SSM is then employed to exploit the temporal patterns hidden in the historical graph, where its output is used to dynamically select the critical information from the interaction history. We validate DyGMamba experimentally on the dynamic link prediction task. The results show that our model achieves state-of-the-art in most cases. DyGMamba also maintains high efficiency in terms of computational resources, making it possible to capture long temporal dependencies with a limited computation budget.

MCML Authors
Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning


[1102]
L. Edman, L. Bylinina, F. Ghorbanpour and A. Fraser.
Are BabyLMs Second Language Learners?.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

This paper describes a linguistically-motivated approach to the 2024 edition of the BabyLM Challenge (Warstadt et al. 2023). Rather than pursuing a first language learning (L1) paradigm, we approach the challenge from a second language (L2) learning perspective. In L2 learning, there is a stronger focus on learning explicit linguistic information, such as grammatical notions, definitions of words or different ways of expressing a meaning. This makes L2 learning potentially more efficient and concise. We approximate this using data from Wiktionary, grammar examples either generated by an LLM or sourced from grammar books, and paraphrase data. We find that explicit information about word meaning (in our case, Wiktionary) does not boost model performance, while grammatical information can give a small improvement. The most impactful data ingredient is sentence paraphrases, with our two best models being trained on 1) a mix of paraphrase data and data from the BabyLM pretraining dataset, and 2) exclusively paraphrase data.

MCML Authors
Link to Lukas Edman

Lukas Edman

Dr.

Data Analytics & Statistics

Link to Faeze Ghorbanpour

Faeze Ghorbanpour

Data Analytics & Statistics

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[1101]
F. Eichin, C. Schuster, G. Groh and M. A. Hedderich.
Semantic Component Analysis: Discovering Patterns in Short Texts Beyond Topics.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Topic modeling is a key method in text analysis, but existing approaches are limited by assuming one topic per document or fail to scale efficiently for large, noisy datasets of short texts. We introduce Semantic Component Analysis (SCA), a novel topic modeling technique that overcomes these limitations by discovering multiple, nuanced semantic components beyond a single topic in short texts which we accomplish by introducing a decomposition step to the clustering-based topic modeling framework. Evaluated on multiple Twitter datasets, SCA matches the state-of-the-art method BERTopic in coherence and diversity, while uncovering at least double the semantic components and maintaining a noise rate close to zero while staying scalable and effective across languages, including an underrepresented one.

MCML Authors
Link to Florian Eichin

Florian Eichin

Artificial Intelligence and Computational Linguistics

Link to Michael Hedderich

Michael Hedderich

Dr.

Artificial Intelligence and Computational Linguistics


[1100]
L. Fang, Y. Wang, Z. Liu, C. Zhang, S. Jegelka, J. Gao, B. Ding and Y. Wang.
What is Wrong with Perplexity for Long-context Language Modeling?.
Preprint at arXiv (Oct. 2024). arXiv. GitHub.
Abstract

Handling long-context inputs is crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning. While recent approaches have extended the context windows of LLMs and employed perplexity (PPL) as a standard evaluation metric, PPL has proven unreliable for assessing long-context capabilities. The underlying cause of this limitation has remained unclear. In this work, we provide a comprehensive explanation for this issue. We find that PPL overlooks key tokens, which are essential for long-context understanding, by averaging across all tokens and thereby obscuring the true performance of models in long-context scenarios. To address this, we propose textbf{LongPPL}, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them. Our experiments demonstrate that LongPPL strongly correlates with performance on various long-context benchmarks (e.g., Pearson correlation of -0.96), significantly outperforming traditional PPL in predictive accuracy. Additionally, we introduce textbf{LongCE} (Long-context Cross-Entropy) loss, a re-weighting strategy for fine-tuning that prioritizes key tokens, leading to consistent improvements across diverse benchmarks. In summary, these contributions offer deeper insights into the limitations of PPL and present effective solutions for accurately evaluating and enhancing the long-context capabilities of LLMs.

MCML Authors
Link to Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1099]
M. Fornasier, P. Heid and G. Sodini.
Approximation Theory, Computing, and Deep Learning on the Wasserstein Space.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. We delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional’s Euler-Lagrange equation. We furnish explicit and quantitative bounds on generalization errors for each of these solutions. We leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude. This allows evaluations over large datasets several times faster, including training, than traditional optimal transport algorithms. Our analytically designed deep learning architecture slightly outperforms the test error of state-of-the-art CNN architectures on datasets of images.

MCML Authors
Link to Massimo Fornasier

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis

Link to Pascal Heid

Pascal Heid

Dr.

Applied Numerical Analysis


[1098]
H. Funk, R. Ludwig, H. Kuechenhoff and T. Nagler.
Towards more realistic climate model outputs: A multivariate bias correction based on zero-inflated vine copulas.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Climate model large ensembles are an essential research tool for analysing and quantifying natural climate variability and providing robust information for rare extreme events. The models simulated representations of reality are susceptible to bias due to incomplete understanding of physical processes. This paper aims to correct the bias of five climate variables from the CRCM5 Large Ensemble over Central Europe at a 3-hourly temporal resolution. At this high temporal resolution, two variables, precipitation and radiation, exhibit a high share of zero inflation. We propose a novel bias-correction method, VBC (Vine copula bias correction), that models and transfers multivariate dependence structures for zero-inflated margins in the data from its error-prone model domain to a reference domain. VBC estimates the model and reference distribution using vine copulas and corrects the model distribution via (inverse) Rosenblatt transformation. To deal with the variables’ zero-inflated nature, we develop a new vine density decomposition that accommodates such variables and employs an adequately randomized version of the Rosenblatt transform. This novel approach allows for more accurate modelling of multivariate zero-inflated climate data. Compared with state-of-the-art correction methods, VBC is generally the best-performing correction and the most accurate method for correcting zero-inflated events.

MCML Authors
Link to Henri Funk

Henri Funk

Statistical Consulting Unit (StaBLab)

Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science


[1097]
E. Garces Arias, M. Li, C. Heumann and M. Aßenmacher.
Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Decoding strategies for large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Since LLMs produce probability distributions over the entire vocabulary, various decoding methods have been developed to transform these probabilities into coherent and fluent text, each with its own set of hyperparameters. In this study, we present a large-scale, comprehensive analysis of how hyperparameter selection affects text quality in open-ended text generation across multiple LLMs, datasets, and evaluation metrics. Through an extensive sensitivity analysis, we provide practical guidelines for hyperparameter tuning and demonstrate the substantial influence of these choices on text quality. Using three established datasets, spanning factual domains (e.g., news) and creative domains (e.g., fiction), we show that hyperparameter tuning significantly impacts generation quality, though its effects vary across models and tasks. We offer in-depth insights into these effects, supported by both human evaluations and a synthesis of widely-used automatic evaluation metrics.

MCML Authors
Link to Esteban Garces Arias

Esteban Garces Arias

Statistical Learning & Data Science

Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[1096]
P. Gassert and M. Althoff.
Stepping Out of the Shadows: Reinforcement Learning in Shadow Mode.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Reinforcement learning (RL) is not yet competitive for many cyber-physical systems, such as robotics, process automation, and power systems, as training on a system with physical components cannot be accelerated, and simulation models do not exist or suffer from a large simulation-to-reality gap. During the long training time, expensive equipment cannot be used and might even be damaged due to inappropriate actions of the reinforcement learning agent. Our novel approach addresses exactly this problem: We train the reinforcement agent in a so-called shadow mode with the assistance of an existing conventional controller, which does not have to be trained and instantaneously performs reasonably well. In shadow mode, the agent relies on the controller to provide action samples and guidance towards favourable states to learn the task, while simultaneously estimating for which states the learned agent will receive a higher reward than the conventional controller. The RL agent will then control the system for these states and all other regions remain under the control of the existing controller. Over time, the RL agent will take over for an increasing amount of states, while leaving control to the baseline, where it cannot surpass its performance. Thus, we keep regret during training low and improve the performance compared to only using conventional controllers or reinforcement learning. We present and evaluate two mechanisms for deciding whether to use the RL agent or the conventional controller. The usefulness of our approach is demonstrated for a reach-avoid task, for which we are able to effectively train an agent, where standard approaches fail.

MCML Authors
Link to Philipp Gassert

Philipp Gassert

Cyber Physical Systems

Link to Matthias Althoff

Matthias Althoff

Prof. Dr.

Cyber Physical Systems


[1095]
K. Gatmiry, N. Saunshi, S. Reddi, S. Jegelka and S. Kumar.
On the Role of Depth and Looping for In-Context Learning with Task Diversity.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

The intriguing in-context learning (ICL) abilities of deep Transformer models have lately garnered significant attention. By studying in-context linear regression on unimodal Gaussian data, recent empirical and theoretical works have argued that ICL emerges from Transformers’ abilities to simulate learning algorithms like gradient descent. However, these works fail to capture the remarkable ability of Transformers to learn multiple tasks in context. To this end, we study in-context learning for linear regression with diverse tasks, characterized by data covariance matrices with condition numbers ranging from [1,κ], and highlight the importance of depth in this setting. More specifically, (a) we show theoretical lower bounds of log(κ) (or κ√) linear attention layers in the unrestricted (or restricted) attention setting and, (b) we show that multilayer Transformers can indeed solve such tasks with a number of layers that matches the lower bounds. However, we show that this expressivity of multilayer Transformer comes at the price of robustness. In particular, multilayer Transformers are not robust to even distributional shifts as small as O(e−L) in Wasserstein distance, where L is the depth of the network. We then demonstrate that Looped Transformers – a special class of multilayer Transformers with weight-sharing – not only exhibit similar expressive power but are also provably robust under mild assumptions. Besides out-of-distribution generalization, we also show that Looped Transformers are the only models that exhibit a monotonic behavior of loss with respect to depth.

MCML Authors
Link to Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1094]
H. Hauger, P. Scholl and G. Kutyniok.
Robust identifiability for symbolic recovery of differential equations.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Recent advancements in machine learning have transformed the discovery of physical laws, moving from manual derivation to data-driven methods that simultaneously learn both the structure and parameters of governing equations. This shift introduces new challenges regarding the validity of the discovered equations, particularly concerning their uniqueness and, hence, identifiability. While the issue of non-uniqueness has been well-studied in the context of parameter estimation, it remains underexplored for algorithms that recover both structure and parameters simultaneously. Early studies have primarily focused on idealized scenarios with perfect, noise-free data. In contrast, this paper investigates how noise influences the uniqueness and identifiability of physical laws governed by partial differential equations (PDEs). We develop a comprehensive mathematical framework to analyze the uniqueness of PDEs in the presence of noise and introduce new algorithms that account for noise, providing thresholds to assess uniqueness and identifying situations where excessive noise hinders reliable conclusions. Numerical experiments demonstrate the effectiveness of these algorithms in detecting uniqueness despite the presence of noise.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[1093]
K. Hess and S. Feuerriegel.
Stabilized Neural Prediction of Potential Outcomes in Continuous Time.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Due to the broad range of social media platforms, the requirements of abusive language detection systems are varied and ever-changing. Already a large set of annotated corpora with different properties and label sets were created, such as hate or misogyny detection, but the form and targets of abusive speech are constantly evolving. Since, the annotation of new corpora is expensive, in this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection. Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain. We propose a two-step approach: first we train our model in a multitask fashion. We then carry out few-shot adaptation to the target requirements. Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages. Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset and can benefit from knowledge about labels which are not directly used for the target task.

MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1092]
A. H. Kargaran, A. Modarressi, N. Nikeghbal, J. Diesner, F. Yvon and H. Schütze.
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

English-centric large language models (LLMs) often show strong multilingual capabilities. However, the multilingual performance of these models remains unclear and is not thoroughly evaluated for many languages. Most benchmarks for multilinguality focus on classic NLP tasks, or cover a minimal number of languages. We introduce MEXA, a method for assessing the multilingual capabilities of pre-trained English-centric LLMs using parallel sentences, which are available for more languages than existing downstream tasks. MEXA leverages the fact that English-centric LLMs use English as a kind of pivot language in their intermediate layers. It computes the alignment between English and non-English languages using parallel sentences to evaluate the transfer of language understanding from English to other languages. This alignment can be used to estimate model performance in other languages. We conduct studies using various parallel datasets (FLORES-200 and Bible), models (Llama family, Gemma family, Mistral, and OLMo), and established downstream tasks (Belebele, m-MMLU, and m-ARC). We explore different methods to compute embeddings in decoder-only models. Our results show that MEXA, in its default settings, achieves a statistically significant average Pearson correlation of 0.90 with three established downstream tasks across nine models and two parallel datasets. This suggests that MEXA is a reliable method for estimating the multilingual capabilities of English-centric LLMs, providing a clearer understanding of their multilingual potential and the inner workings of LLMs.

MCML Authors
Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

Link to Ali Modarressi

Ali Modarressi

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1091]
S. Karnik, A. Veselovska, M. Iwen and F. Krahmer.
Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.

MCML Authors
Link to Hanna Veselovska

Hanna Veselovska

Dr.

Optimization & Data Analysis

Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis


[1090]
B. Kühbacher, F. Iglesias-Suarez, N. Kilbertus and V. Eyring.
Towards Physically Consistent Deep Learning For Climate Model Parameterizations.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Climate models play a critical role in understanding and projecting climate change. Due to their complexity, their horizontal resolution of about 40-100 km remains too coarse to resolve processes such as clouds and convection, which need to be approximated via parameterizations. These parameterizations are a major source of systematic errors and large uncertainties in climate projections. Deep learning (DL)-based parameterizations, trained on data from computationally expensive short, high-resolution simulations, have shown great promise for improving climate models in that regard. However, their lack of interpretability and tendency to learn spurious non-physical correlations result in reduced trust in the climate simulation. We propose an efficient supervised learning framework for DL-based parameterizations that leads to physically consistent models with improved interpretability and negligible computational overhead compared to standard supervised training. First, key features determining the target physical processes are uncovered. Subsequently, the neural network is fine-tuned using only those relevant features. We show empirically that our method robustly identifies a small subset of the inputs as actual physical drivers, therefore removing spurious non-physical relationships. This results in by design physically consistent and interpretable neural networks while maintaining the predictive performance of unconstrained black-box DL-based parameterizations.

MCML Authors
Link to Birgit Kühbacher

Birgit Kühbacher

Ethics in Systems Design and Machine Learning

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[1089]
J. Lan, D. Frassinelli and B. Plank.
Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies between Model Predictions and Human Responses in VQA.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Large vision-language models frequently struggle to accurately predict responses provided by multiple human annotators, particularly when those responses exhibit human uncertainty. In this study, we focus on the Visual Question Answering (VQA) task, and we comprehensively evaluate how well the state-of-the-art vision-language models correlate with the distribution of human responses. To do so, we categorize our samples based on their levels (low, medium, high) of human uncertainty in disagreement (HUD) and employ not only accuracy but also three new human-correlated metrics in VQA, to investigate the impact of HUD. To better align models with humans, we also verify the effect of common calibration and human calibration. Our results show that even BEiT3, currently the best model for this task, struggles to capture the multi-label distribution inherent in diverse human responses. Additionally, we observe that the commonly used accuracy-oriented calibration technique adversely affects BEiT3’s ability to capture HUD, further widening the gap between model predictions and human distributions. In contrast, we show the benefits of calibrating models towards human distributions for VQA, better aligning model confidence with human uncertainty. Our findings highlight that for VQA, the consistent alignment between human responses and model predictions is understudied and should become the next crucial target of future studies.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1088]
Y. Li, M. Ghahremani, Y. Wally and C. Wachinger.
DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Diagnosing dementia, particularly for Alzheimer’s Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. Moreover, the potential of multi-modal approaches in differential diagnosis, which holds significant clinical importance, remains largely unexplored. We propose a novel framework, DiaMond, to address these issues with vision Transformers to effectively integrate MRI and PET. DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance. DiaMond significantly outperforms existing multi-modal methods across various datasets, achieving a balanced accuracy of 92.4% in AD diagnosis, 65.2% for AD-MCI-CN classification, and 76.5% in differential diagnosis of AD and FTD. We also validated the robustness of DiaMond in a comprehensive ablation study.

MCML Authors
Link to Yitong Li

Yitong Li

Artificial Intelligence in Radiology

Link to Morteza Ghahremani

Morteza Ghahremani

Dr.

Artificial Intelligence in Radiology

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[1087]
S. Lüpke, Y. Yeganeh, E. Adeli, N. Navab and A. Farshad.
Physics-Informed Latent Diffusion for Multimodal Brain MRI Synthesis.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Recent advances in generative models for medical imaging have shown promise in representing multiple modalities. However, the variability in modality availability across datasets limits the general applicability of the synthetic data they produce. To address this, we present a novel physics-informed generative model capable of synthesizing a variable number of brain MRI modalities, including those not present in the original dataset. Our approach utilizes latent diffusion models and a two-step generative process: first, unobserved physical tissue property maps are synthesized using a latent diffusion model, and then these maps are combined with a physical signal model to generate the final MRI scan. Our experiments demonstrate the efficacy of this approach in generating unseen MR contrasts and preserving physical plausibility. Furthermore, we validate the distributions of generated tissue properties by comparing them to those measured in real brain tissue.

MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[1086]
G. Manten, C. Casolo, E. Ferrucci, S. Mogensen, C. Salvi and N. Kilbertus.
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Inferring the causal structure underlying stochastic dynamical systems from observational data holds great promise in domains ranging from science and health to finance. Such processes can often be accurately modeled via stochastic differential equations (SDEs), which naturally imply causal relationships via ‘which variables enter the differential of which other variables’. In this paper, we develop conditional independence (CI) constraints on coordinate processes over selected intervals that are Markov with respect to the acyclic dependence graph (allowing self-loops) induced by a general SDE model. We then provide a sound and complete causal discovery algorithm, capable of handling both fully and partially observed data, and uniquely recovering the underlying or induced ancestral graph by exploiting time directionality assuming a CI oracle. Finally, to make our algorithm practically usable, we also propose a flexible, consistent signature kernel-based CI test to infer these constraints from data. We extensively benchmark the CI test in isolation and as part of our causal discovery algorithms, outperforming existing approaches in SDE models and beyond.

MCML Authors
Link to Cecilia Casolo

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[1085]
P. Mondorf, S. Wold and B. Plank.
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions via subnetworks that can be composed to perform more complex tasks. Recent developments in mechanistic interpretability have made progress in identifying subnetworks, often referred to as circuits, which represent the minimal computational subgraph responsible for a model’s behavior on specific tasks. However, most studies focus on identifying circuits for individual tasks without investigating how functionally similar circuits relate to each other. To address this gap, we examine the modularity of neural networks by analyzing circuits for highly compositional subtasks within a transformer-based language model. Specifically, given a probabilistic context-free grammar, we identify and compare circuits responsible for ten modular string-edit operations. Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness. Moreover, we demonstrate that the circuits identified can be reused and combined through subnetwork set operations to represent more complex functional capabilities of the model.

MCML Authors
Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1084]
Y. Ozyurt, S. Feuerriegel and M. Sachan.
Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Knowledge tracing (KT) is a popular approach for modeling students’ learning progress over time, which can enable more personalized and adaptive learning. However, existing KT approaches face two major limitations: (1) they rely heavily on expert-defined knowledge concepts (KCs) in questions, which is time-consuming and prone to errors; and (2) KT methods tend to overlook the semantics of both questions and the given KCs. In this work, we address these challenges and present KCQRL, a framework for automated knowledge concept annotation and question representation learning that can improve the effectiveness of any existing KT model. First, we propose an automated KC annotation process using large language models (LLMs), which generates question solutions and then annotates KCs in each solution step of the questions. Second, we introduce a contrastive learning approach to generate semantically rich embeddings for questions and solution steps, aligning them with their associated KCs via a tailored false negative elimination approach. These embeddings can be readily integrated into existing KT models, replacing their randomly initialized embeddings. We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets, where we achieve consistent performance improvements.

MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1083]
Y. Ozyurt, S. Feuerriegel and C. Zhang.
Document-Level In-Context Few-Shot Relation Extraction via Pre-Trained Language Models.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Document-level relation extraction aims at inferring structured human knowledge from textual documents. State-of-the-art methods for this task use pre-trained language models (LMs) via fine-tuning, yet fine-tuning is computationally expensive and cannot adapt to new relation types or new LMs. As a remedy, we leverage the generalization capabilities of pre-trained LMs and present a novel framework for document-level in-context few-shot relation extraction. Our framework has three strengths: it eliminates the need (1) for named entity recognition and (2) for human annotations of documents, and (3) it can be updated to new LMs without re-training. We evaluate our framework using DocRED, the largest publicly available dataset for document-level relation extraction, and demonstrate that our framework achieves state-of-the-art performance. We further show that our framework actually performs much better than the original labels from the development set of DocRED. Finally, we conduct an extensive benchmark demonstrating the effectiveness of our framework, achieving state-of-the-art results across six relation extraction datasets and outperforming more than 30 baseline methods. Unlike our framework, the baseline methods have large computational overhead (e.g., from fine-tuning). To the best of our knowledge, we are the first to reformulate the document-level relation extraction task as a tailored in-context few-shot learning paradigm.

MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1082]
T. Putterman, D. Lim, Y. Gelberg, S. Jegelka and H. Maron.
Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Low-rank adaptations (LoRAs) have revolutionized the finetuning of large foundation models, enabling efficient adaptation even with limited computational resources. The resulting proliferation of LoRAs presents exciting opportunities for applying machine learning techniques that take these low-rank weights themselves as inputs. In this paper, we investigate the potential of Learning on LoRAs (LoL), a paradigm where LoRA weights serve as input to machine learning models. For instance, an LoL model that takes in LoRA weights as inputs could predict the performance of the finetuned model on downstream tasks, detect potentially harmful finetunes, or even generate novel model edits without traditional training methods. We first identify the inherent parameter symmetries of low rank decompositions of weights, which differ significantly from the parameter symmetries of standard neural networks. To efficiently process LoRA weights, we develop several symmetry-aware invariant or equivariant LoL models, using tools such as canonicalization, invariant featurization, and equivariant layers. We finetune thousands of text-to-image diffusion models and language models to collect datasets of LoRAs. In numerical experiments on these datasets, we show that our LoL architectures are capable of processing low rank weight decompositions to predict CLIP score, finetuning data attributes, finetuning data membership, and accuracy on downstream tasks.

MCML Authors
Link to Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1081]
P. Scholl, A. Bacho, H. Boche and G. Kutyniok.
Symbolic Recovery of Differential Equations: The Identifiability Problem.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Symbolic recovery of differential equations is the ambitious attempt at automating the derivation of governing equations with the use of machine learning techniques. In contrast to classical methods which assume the structure of the equation to be known and focus on the estimation of specific parameters, these algorithms aim to learn the structure and the parameters simultaneously. While the uniqueness and, therefore, the identifiability of parameters of governing equations are a well-addressed problem in the field of parameter estimation, it has not been investigated for symbolic recovery. However, this problem should be even more present in this field since the algorithms aim to cover larger spaces of governing equations. In this paper, we investigate under which conditions a solution of a differential equation does not uniquely determine the equation itself. For various classes of differential equations, we provide both necessary and sufficient conditions for a function to uniquely determine the corresponding differential equation. We then use our results to devise numerical algorithms aiming to determine whether a function solves a differential equation uniquely. Finally, we provide extensive numerical experiments showing that our algorithms can indeed guarantee the uniqueness of the learned governing differential equation, without assuming any knowledge about the analytic form of function, thereby ensuring the reliability of the learned equation.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[1080]
T. Schwarz, C. Casolo and N. Kilbertus.
Uncertainty-Aware Optimal Treatment Selection for Clinical Time Series.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

In personalized medicine, the ability to predict and optimize treatment outcomes across various time frames is essential. Additionally, the ability to select cost-effective treatments within specific budget constraints is critical. Despite recent advancements in estimating counterfactual trajectories, a direct link to optimal treatment selection based on these estimates is missing. This paper introduces a novel method integrating counterfactual estimation techniques and uncertainty quantification to recommend personalized treatment plans adhering to predefined cost constraints. Our approach is distinctive in its handling of continuous treatment variables and its incorporation of uncertainty quantification to improve prediction reliability. We validate our method using two simulated datasets, one focused on the cardiovascular system and the other on COVID-19. Our findings indicate that our method has robust performance across different counterfactual estimation baselines, showing that introducing uncertainty quantification in these settings helps the current baselines in finding more reliable and accurate treatment selection. The robustness of our method across various settings highlights its potential for broad applicability in personalized healthcare solutions.

MCML Authors
Link to Cecilia Casolo

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[1079]
J. Schweisthal, D. Frauen, M. Schröder, K. Hess, N. Kilbertus and S. Feuerriegel.
Learning Representations of Instruments for Partial Identification of Treatment Effects.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Reliable estimation of treatment effects from observational data is important in many disciplines such as medicine. However, estimation is challenging when unconfoundedness as a standard assumption in the causal inference literature is violated. In this work, we leverage arbitrary (potentially high-dimensional) instruments to estimate bounds on the conditional average treatment effect (CATE). Our contributions are three-fold: (1) We propose a novel approach for partial identification through a mapping of instruments to a discrete representation space so that we yield valid bounds on the CATE. This is crucial for reliable decision-making in real-world applications. (2) We derive a two-step procedure that learns tight bounds using a tailored neural partitioning of the latent instrument space. As a result, we avoid instability issues due to numerical approximations or adversarial training. Furthermore, our procedure aims to reduce the estimation variance in finite-sample settings to yield more reliable estimates. (3) We show theoretically that our procedure obtains valid bounds while reducing estimation variance. We further perform extensive experiments to demonstrate the effectiveness across various settings. Overall, our procedure offers a novel path for practitioners to make use of potentially high-dimensional instruments (e.g., as in Mendelian randomization).

MCML Authors
Link to Jonas Schweisthal

Jonas Schweisthal

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Maresa Schröder

Maresa Schröder

Artificial Intelligence in Management

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1078]
K. Schwethelm, J. Kaiser, J. Kuntzer, M. Yigitsoy, D. Rueckert and G. Kaissis.
Differentially Private Active Learning: Balancing Effective Data Selection and Privacy.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Active learning (AL) is a widely used technique for optimizing data labeling in machine learning by iteratively selecting, labeling, and training on the most informative data. However, its integration with formal privacy-preserving methods, particularly differential privacy (DP), remains largely underexplored. While some works have explored differentially private AL for specialized scenarios like online learning, the fundamental challenge of combining AL with DP in standard learning settings has remained unaddressed, severely limiting AL’s applicability in privacy-sensitive domains. This work addresses this gap by introducing differentially private active learning (DP-AL) for standard learning settings. We demonstrate that naively integrating DP-SGD training into AL presents substantial challenges in privacy budget allocation and data utilization. To overcome these challenges, we propose step amplification, which leverages individual sampling probabilities in batch creation to maximize data point participation in training steps, thus optimizing data utilization. Additionally, we investigate the effectiveness of various acquisition functions for data selection under privacy constraints, revealing that many commonly used functions become impractical. Our experiments on vision and natural language processing tasks show that DP-AL can improve performance for specific datasets and model architectures. However, our findings also highlight the limitations of AL in privacy-constrained environments, emphasizing the trade-offs between privacy, model accuracy, and data selection accuracy.

MCML Authors
Link to Georgios Kaissis

Georgios Kaissis

Dr.

Associate

Privacy-Preserving and Trustworthy AI


[1077]
R. Shim and B. Plank.
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

There is increasing interest in looking at dialects in NLP. However, most work to date still treats dialects as discrete categories. For instance, evaluative work in variation-oriented NLP for English often works with Indian English or African-American Venacular English as homogeneous categories (Faisal et al., 2024; Ziems et al., 2023), yet even within one variety there is substantial variation. We examine within-dialect variation and show that performance critically varies within categories. We measure speech-to-text performance on Italian dialects, and empirically observe a geographical performance disparity. This disparity correlates substantially (-0.5) with linguistic similarity to the highest performing dialect variety. We cross-examine our results against dialectometry methods, and interpret the performance disparity to be due to a bias towards dialects that are more similar to the standard variety in the speech-to-text model examined. We additionally leverage geostatistical methods to predict zero-shot performance at unseen sites, and find the incorporation of geographical information to substantially improve prediction performance, indicating there to be geographical structure in the performance distribution.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1076]
Q. Sun, A. Akman, X. Jing, M. Milling and B. W. Schuller.
Audio-based Kinship Verification Using Age Domain Conversion.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Audio-based kinship verification (AKV) is important in many domains, such as home security monitoring, forensic identification, and social network analysis. A key challenge in the task arises from differences in age across samples from different individuals, which can be interpreted as a domain bias in a cross-domain verification task. To address this issue, we design the notion of an ‘age-standardised domain’ wherein we utilise the optimised CycleGAN-VC3 network to perform age-audio conversion to generate the in-domain audio. The generated audio dataset is employed to extract a range of features, which are then fed into a metric learning architecture to verify kinship. Experiments are conducted on the KAN_AV audio dataset, which contains age and kinship labels. The results demonstrate that the method markedly enhances the accuracy of kinship verification, while also offering novel insights for future kinship verification research.

MCML Authors
Link to Manuel Milling

Manuel Milling

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1075]
Y. Sun, Z. Wu, Y. Ma and V. Tresp.
Quantum Architecture Search with Unsupervised Representation Learning.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Unsupervised representation learning presents new opportunities for advancing Quantum Architecture Search (QAS) on Noisy Intermediate-Scale Quantum (NISQ) devices. QAS is designed to optimize quantum circuits for Variational Quantum Algorithms (VQAs). Most QAS algorithms tightly couple the search space and search algorithm, typically requiring the evaluation of numerous quantum circuits, resulting in high computational costs and limiting scalability to larger quantum circuits. Predictor-based QAS algorithms mitigate this issue by estimating circuit performance based on structure or embedding. However, these methods often demand time-intensive labeling to optimize gate parameters across many circuits, which is crucial for training accurate predictors. Inspired by the classical neural architecture search algorithm Arch2vec, we investigate the potential of unsupervised representation learning for QAS without relying on predictors. Our framework decouples unsupervised architecture representation learning from the search process, enabling the learned representations to be applied across various downstream tasks. Additionally, it integrates an improved quantum circuit graph encoding scheme, addressing the limitations of existing representations and enhancing search efficiency. This predictor-free approach removes the need for large labeled datasets. During the search, we employ REINFORCE and Bayesian Optimization to explore the latent representation space and compare their performance against baseline methods. Our results demonstrate that the framework efficiently identifies high-performing quantum circuits with fewer search iterations.

MCML Authors
Link to Yize Sun

Yize Sun

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1074]
T. Uscidda, L. Eyring, K. Roth, F. J. Theis, Z. Akata and M. Cuturi.
Disentangled Representation Learning with the Gromov-Monge Gap.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior while preserving geometric features is challenging, as a mapping that fully preserves these features while aligning the data distribution with the prior does not exist in general. To address these challenges, we introduce a novel approach to disentangled representation learning based on quadratic optimal transport. We formulate the problem using Gromov-Monge maps that transport one distribution onto another with minimal distortion of predefined geometric features, preserving them as much as can be achieved. To compute such maps, we propose the Gromov-Monge-Gap (GMG), a regularizer quantifying whether a map moves a reference distribution with minimal geometry distortion. We demonstrate the effectiveness of our approach for disentanglement across four standard benchmarks, outperforming other methods leveraging geometric considerations.

MCML Authors
Link to Luca Eyring

Luca Eyring

Interpretable and Reliable Machine Learning

Link to Karsten Roth

Karsten Roth

Interpretable and Reliable Machine Learning

Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1073]
X. Wang, C. Hu, P. Röttger and B. Plank.
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Training a language model to be both helpful and harmless requires careful calibration of refusal behaviours: Models should refuse to follow malicious instructions or give harmful advice (e.g. ‘how do I kill someone?’’), but they should not refuse safe requests, even if they superficially resemble unsafe ones (e.g. ‘how do I kill a Python process?’’). Avoiding such false refusal, as prior work has shown, is challenging even for highly-capable language models. In this paper, we propose a simple and surgical method for mitigating false refusal in language models via single vector ablation. For a given model, we extract a false refusal vector and show that ablating this vector reduces false refusal rate without negatively impacting model safety and general model capabilities. We also show that our approach can be used for fine-grained calibration of model safety. Our approach is training-free and model-agnostic, making it useful for mitigating the problem of false refusal in current and future language models.

MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1072]
A. White, A. Büttner, M. Gelbrecht, V. Duruisseaux, N. Kilbertus, F. Hellmann and N. Boers.
Projected Neural Differential Equations for Learning Constrained Dynamics.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Neural differential equations offer a powerful approach for learning dynamics from data. However, they do not impose known constraints that should be obeyed by the learned model. It is well-known that enforcing constraints in surrogate models can enhance their generalizability and numerical stability. In this paper, we introduce projected neural differential equations (PNDEs), a new method for constraining neural differential equations based on projection of the learned vector field to the tangent space of the constraint manifold. In tests on several challenging examples, including chaotic dynamical systems and state-of-the-art power grid models, PNDEs outperform existing methods while requiring fewer hyperparameters. The proposed approach demonstrates significant potential for enhancing the modeling of constrained dynamical systems, particularly in complex domains where accuracy and reliability are essential.

MCML Authors
Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[1071]
M. Yau, E. Akyürek, J. Mao, J. B. Tenenbaum, S. Jegelka and J. Andreas.
Learning Linear Attention in Polynomial Time.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Previous research has explored the computational expressivity of Transformer models in simulating Boolean circuits or Turing machines. However, the learnability of these simulators from observational data has remained an open question. Our study addresses this gap by providing the first polynomial-time learnability results (specifically strong, agnostic PAC learning) for single-layer Transformers with linear attention. We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS. As a consequence, the problem of learning any linear transformer may be converted into the problem of learning an ordinary linear predictor in an expanded feature space, and any such predictor may be converted back into a multiheaded linear transformer. Moving to generalization, we show how to efficiently identify training datasets for which every empirical risk minimizer is equivalent (up to trivial symmetries) to the linear Transformer that generated the data, thereby guaranteeing the learned model will correctly generalize across all inputs. Finally, we provide examples of computations expressible via linear attention and therefore polynomial-time learnable, including associative memories, finite automata, and a class of Universal Turing Machine (UTMs) with polynomially bounded computation histories. We empirically validate our theoretical findings on three tasks: learning random linear attention networks, key–value associations, and learning to execute finite automata. Our findings bridge a critical gap between theoretical expressivity and learnability of Transformers, and show that flexible and general models of computation are efficiently learnable.

MCML Authors
Link to Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1070]
Y. Yeganeh, R. Lazuardi, A. Shamseddin, E. Dari, Y. Thirani, N. Navab and A. Farshad.
VISAGE: Video Synthesis using Action Graphs for Surgery.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.

MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[1069]
Q. Zhang, Y. Wang, J. Cui, X. Pan, Q. Lei, S. Jegelka and Y. Wang.
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness.
Preprint at arXiv (Oct. 2024). arXiv.
Abstract

Deep learning models often suffer from a lack of interpretability due to polysemanticity, where individual neurons are activated by multiple unrelated semantics, resulting in unclear attributions of model behavior. Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability but are commonly believed to compromise accuracy. In this work, we challenge the prevailing belief of the accuracy-interpretability tradeoff, showing that monosemantic features not only enhance interpretability but also bring concrete gains in model performance. Across multiple robust learning scenarios-including input and label noise, few-shot learning, and out-of-domain generalization-our results show that models leveraging monosemantic features significantly outperform those relying on polysemantic features. Furthermore, we provide empirical and theoretical understandings on the robustness gains of feature monosemanticity. Our preliminary analysis suggests that monosemanticity, by promoting better separation of feature representations, leads to more robust decision boundaries. This diverse evidence highlights the generality of monosemanticity in improving model robustness. As a first step in this new direction, we embark on exploring the learning benefits of monosemanticity beyond interpretability, supporting the long-standing hypothesis of linking interpretability and robustness.

MCML Authors
Link to Stefanie Jegelka

Stefanie Jegelka

Prof. Dr.

Foundations of Deep Neural Networks


[1068]
P. Scholl, M. Iskandar, S. Wolf, J. Lee, A. Bacho, A. Dietrich, A. Albu-Schäffer and G. Kutyniok.
Learning-based adaption of robotic friction models.
Robotics and Computer-Integrated Manufacturing 89 (Oct. 2024). DOI.
MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[1067]
C. Kern, M. P. Kim and A. Zhou.
Multi-Accurate CATE is Robust to Unknown Covariate Shifts.
Transactions on Machine Learning Research (Oct. 2024). URL.
Abstract

Estimating heterogeneous treatment effects is important to tailor treatments to those individuals who would most likely benefit. However, conditional average treatment effect predictors may often be trained on one population but possibly deployed on different, possibly unknown populations. We use methodology for learning multi-accurate predictors to post-process CATE T-learners (differenced regressions) to become robust to unknown covariate shifts at the time of deployment. The method works in general for pseudo-outcome regression, such as the DR-learner. We show how this approach can combine (large) confounded observational and (smaller) randomized datasets by learning a confounded predictor from the observational dataset, and auditing for multi-accuracy on the randomized controlled trial. We show improvements in bias and mean squared error in simulations with increasingly larger covariate shift, and on a semi-synthetic case study of a parallel large observational study and smaller randomized controlled experiment. Overall, we establish a connection between methods developed for multi-distribution learning and achieve appealing desiderata (e.g. external validity) in causal inference and machine learning.

MCML Authors
Link to Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab


[1066]
J. W. Grootjen, P. Thallhammer and T. Kosch.
Your Eyes on Speed: Using Pupil Dilation to Adaptively Select Speed-Reading Parameters in Virtual Reality.
ACM International Conference on Mobile Human-Computer Interaction (MobileHCI 2024). Melbourne, Australia, Sep 30-Oct 03, 2024. DOI. GitHub.
Abstract

Rapid Serial Visual Presentation (RSVP) improves the reading speed for optimizing the user’s information processing capabilities on Virtual Reality (VR) devices. Yet, the user’s RSVP reading performance changes over time while the reading speed remains static. In this paper, we evaluate pupil dilation as a physiological metric to assess the mental workload of readers in real-time. We assess mental workload under different background lighting and RSVP presentation speeds to estimate the optimal color that discriminates the pupil diameter varying RSVP presentation speeds. We discovered that a gray background provides the best contrast for reading at various presentation speeds. Then, we conducted a second study to evaluate the classification accuracy of mental workload for different presentation speeds. We find that pupil dilation relates to mental workload when reading with RSVP. We discuss how pupil dilation can be used to adapt the RSVP speed in future VR applications to optimize information intake.

MCML Authors
Link to Jesse Grootjen

Jesse Grootjen

Human-Centered Ubiquitous Media


[1065]
Y. Weiss, S. Villa, J. W. Grootjen, M. Hoppe, Y. Kale and F. Müller.
Exploring Redirection and Shifting Techniques to Mask Hand Movements from Shoulder-Surfing Attacks during PIN Authentication in Virtual Reality.
ACM International Conference on Mobile Human-Computer Interaction (MobileHCI 2024). Melbourne, Australia, Sep 30-Oct 03, 2024. DOI.
Abstract

The proliferation of mobile Virtual Reality (VR) headsets shifts our interaction with virtual worlds beyond our living rooms into shared spaces. Consequently, we are entrusting more and more personal data to these devices, calling for strong security measures and authentication. However, the standard authentication method of such devices - entering PINs via virtual keyboards - is vulnerable to shoulder-surfing, as movements to enter keys can be monitored by an unnoticed observer. To address this, we evaluated masking techniques to obscure VR users’ input during PIN authentication by diverting their hand movements. Through two experimental studies, we demonstrate that these methods increase users’ security against shoulder-surfing attacks from observers without excessively impacting their experience and performance. With these discoveries, we aim to enhance the security of future VR authentication without disrupting the virtual experience or necessitating additional hardware or training of users.

MCML Authors
Link to Jesse Grootjen

Jesse Grootjen

Human-Centered Ubiquitous Media


[1064]
M. Windl, M. Schlegel and S. Mayer.
Exploring Users’ Mental Models and Privacy Concerns During Interconnected Interactions.
ACM International Conference on Mobile Human-Computer Interaction (MobileHCI 2024). Melbourne, Australia, Sep 30-Oct 03, 2024. DOI.
Abstract

Users frequently use their smartphones in combination with other smart devices, for example, when streaming music to smart speakers or controlling smart appliances. During these interconnected interactions, user data gets handled and processed by several entities that employ different data protection practices or are subject to different regulations. Users need to understand these processes to inform themselves in the right places and make informed privacy decisions. We conducted an online survey (N=120) to investigate whether users have accurate mental models about interconnected interactions. We found that users consider scenarios more privacy-concerning when multiple devices are involved. Yet, we also found that most users do not fully comprehend the privacy-relevant processes in interconnected interactions. Our results show that current privacy information methods are insufficient and that users must be better educated to make informed privacy decisions. Finally, we advocate for restricting data processing to the app layer and better encryption to reduce users’ data protection responsibilities.

MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[1063]
A. Christensen, N. Mojab, K. Patel, K. Ahuja, Z. Akata, O. Winther, O. Gonzalez-Franco and A. Colaco.
Geometry Fidelity for Spherical Images.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. DOI.
Abstract

Spherical or omni-directional images offer an immersive visual format appealing to a wide range of computer vision applications. However, geometric properties of spherical images pose a major challenge for models and metrics designed for ordinary 2D images. Here, we show that direct application of Fréchet Inception Distance (FID) is insufficient for quantifying geometric fidelity in spherical images. We introduce two quantitative metrics accounting for geometric constraints, namely Omnidirectional FID (OmniFID) and Discontinuity Score (DS). OmniFID is an extension of FID tailored to additionally capture field-of-view requirements of the spherical format by leveraging cubemap projections. DS is a kernel-based seam alignment score of continuity across borders of 2D representations of spherical images. In experiments, OmniFID and DS quantify geometry fidelity issues that are undetected by FID.

MCML Authors
Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1062]
V. T. Hu, S. A. Baumann, M. Gui, O. Grebenkova, P. Ma, J. Fischer and B. Ommer.
ZigMa: A DiT-style Zigzag Mamba Diffusion Model.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. DOI.
Abstract

The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called Mamba to extend its applicability to visual data generation. Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the lack of consideration for spatial continuity in the scan scheme of Mamba. Secondly, building upon this insight, we introduce Zigzag Mamba, a simple, plug-and-play, minimal-parameter burden, DiT style solution, which outperforms Mamba-based baselines and demonstrates improved speed and memory utilization compared to transformer-based baselines, also this heterogeneous layerwise scan enables zero memory and speed burden when we consider more scan paths. Lastly, we integrate Zigzag Mamba with the Stochastic Interpolant framework to investigate the scalability of the model on large-resolution visual datasets, such as FacesHQ and UCF101, MultiModal-CelebA-HQ, and MS COCO .

MCML Authors
Link to Olga Grebenkova

Olga Grebenkova

Machine Vision & Learning

Link to Pingchuan Ma

Pingchuan Ma

Machine Vision & Learning

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1061]
W. Huang, Y. Shi, Z. Xiong and X. Zhu.
Representation Enhancement-Stabilization: Reducing Bias-Variance of Domain Generalization.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. DOI. GitHub.
Abstract

Domain Generalization (DG) focuses on enhancing the generalization of deep learning models trained on multiple source domains to adapt to unseen target domains. This paper explores DG through the lens of bias-variance decomposition, uncovering that test errors in DG predominantly arise from cross-domain bias and variance. Inspired by this insight, we introduce a Representation Enhancement-Stabilization (RES) framework, comprising a Representation Enhancement (RE) module and a Representation Stabilization (RS) module. In RE, a novel set of feature frequency augmentation techniques is used to progressively reduce cross-domain bias during feature extraction. Furthermore, in RS, a novel Mutual Exponential Moving Average (MEMA) strategy is designed to stabilize model optimization for diminishing cross-domain variance during training. Collectively, the whole RES method can significantly enhance model generalization. We evaluate RES on five benchmark datasets and the results show that it outperforms multiple advanced DG methods.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1060]
Y. Wang, C. M. Albrecht, N. A. A. Braham, C. Liu, Z. Xiong and X. Zhu.
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. DOI. GitHub.
Abstract

The increasing availability of multi-sensor data sparks wide interest in multimodal self-supervised learning. However, most existing approaches learn only common representations across modalities while ignoring intra-modal training and modality-unique representations. We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning. By distinguishing inter- and intra-modal embeddings through multimodal redundancy reduction, DeCUR can integrate complementary information across different modalities. We evaluate DeCUR in three common multimodal scenarios (radar-optical, RGB-elevation, and RGB-depth), and demonstrate its consistent improvement regardless of architectures and for both multimodal and modality-missing settings. With thorough experiments and comprehensive analysis, we hope this work can provide valuable insights and raise more interest in researching the hidden relationships of multimodal representations.

MCML Authors
Link to Chenying Liu

Chenying Liu

Data Science in Earth Observation

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[1059]
F. Hoppe, C. M. Verdun, H. Laus, S. Endt, M. I. Menzel, F. Krahmer and H. Rauhut.
Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published.
MCML Authors
Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member

Link to Hannah Laus

Hannah Laus

Optimization & Data Analysis

Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Holger Rauhut

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence


[1058]
Y. Mansour, X. Zhong, S. Caglar and R. Heckel.
TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published.
MCML Authors
Link to Reinhard Heckel

Reinhard Heckel

Prof. Dr.

Machine Learning


[1057]
J. S. Fischer, M. Gui, P. Ma, N. Stracke, S. A. Baumann and B. Ommer.
FMBoost: Boosting Latent Diffusion with Flow Matching.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Visual synthesis has recently seen significant leaps in performance, largely due to breakthroughs in generative models. Diffusion models have been a key enabler, as they excel in image diversity. However, this comes at the cost of slow training and synthesis, which is only partially alleviated by latent diffusion. To this end, flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis. We demonstrate our FMBoost approach, which introduces flow matching between a frozen diffusion model and a convolutional decoder that enables high-resolution image synthesis at reduced computational cost and model size. A small diffusion model can then effectively provide the necessary visual diversity, while flow matching efficiently enhances resolution and detail by mapping the small to a high-dimensional latent space, producing high-resolution images. Combining the diversity of diffusion models, the efficiency of flow matching, and the effectiveness of convolutional decoders, state-of-the-art high-resolution image synthesis is achieved at 10242 pixels with minimal computational cost. Cascading FMBoost optionally boosts this further to 20482 pixels. Importantly, this approach is orthogonal to recent approximation and speed-up strategies for the underlying model, making it easily integrable into the various diffusion model frameworks.

MCML Authors
Link to Pingchuan Ma

Pingchuan Ma

Machine Vision & Learning

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1056]
T. Hannan, M. M. Islam, T. Seidl and G. Bertasius.
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Tanveer Hannan

Tanveer Hannan

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1055]
L. Härenstam-Nielsen, L. Sang, A. Saroha, N. Araslanov and D. Cremers.
DiffCD: A Symmetric Differentiable Chamfer Distance for Neural Implicit Surface Fitting.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Neural implicit surfaces can be used to recover accurate 3D geometry from imperfect point clouds. In this work, we show that state-of-the-art techniques work by minimizing an approximation of a one-sided Chamfer distance. This shape metric is not symmetric, as it only ensures that the point cloud is near the surface but not vice versa. As a consequence, existing methods can produce inaccurate reconstructions with spurious surfaces. Although one approach against spurious surfaces has been widely used in the literature, we theoretically and experimentally show that it is equivalent to regularizing the surface area, resulting in over-smoothing. As a more appealing alternative, we propose DiffCD, a novel loss function corresponding to the symmetric Chamfer distance. In contrast to previous work, DiffCD also assures that the surface is near the point cloud, which eliminates spurious surfaces without the need for additional regularization. We experimentally show that DiffCD reliably recovers a high degree of shape detail, substantially outperforming existing work across varying surface complexity and noise levels.

MCML Authors
Link to Abhishek Saroha

Abhishek Saroha

Computer Vision & Artificial Intelligence

Link to Nikita Araslanov

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1054]
J. M. Kim, J. Bader, S. Alaniz, C. Schmid and Z. Akata.
DataDream: Few-shot Guided Dataset Generation.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

While text-to-image diffusion models have been shown to achieve state-of-the-art results in image synthesis, they have yet to prove their effectiveness in downstream applications. Previous work has proposed to generate data for image classifier training given limited real data access. However, these methods struggle to generate in-distribution images or depict fine-grained features, thereby hindering the generalization of classification models trained on synthetic datasets. We propose DataDream, a framework for synthesizing classification datasets that more faithfully represents the real data distribution when guided by few-shot examples of the target classes. DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model. We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets. We demonstrate the efficacy of DataDream through extensive experiments, surpassing state-of-the-art classification accuracy with few-shot data across 7 out of 10 datasets, while being competitive on the other 3. Additionally, we provide insights into the impact of various factors, such as the number of real-shot and generated images as well as the fine-tuning compute on model performance.

MCML Authors
Link to Jae Myung Kim

Jae Myung Kim

Interpretable and Reliable Machine Learning

Link to Jessica Bader

Jessica Bader

Interpretable and Reliable Machine Learning

Link to Stephan Alaniz

Stephan Alaniz

Dr.

Interpretable and Reliable Machine Learning

Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[1053]
D. Kotovenko, O. Grebenkova, N. Sarafianos, A. Paliwal, P. Ma, O. Poursaeed, S. Mohan, Y. Fan, Y. Li, R. Ranjan and B. Ommer.
WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Olga Grebenkova

Olga Grebenkova

Machine Vision & Learning

Link to Pingchuan Ma

Pingchuan Ma

Machine Vision & Learning

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1052]
B. Liao, Z. Zhao, L. Chen, H. Li, D. Cremers and P. Liu.
GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Haoang Li

Haoang Li

Dr.

* Former member

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1051]
M. Mahajan, F. Hofherr and D. Cremers.
MeshFeat: Multi-Resolution Features for Neural Fields on Meshes.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Florian Hofherr

Florian Hofherr

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1050]
P. Müller, G. Kaissis and D. Rückert.
ChEX: Interactive Localization and Region Description in Chest X-rays.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX’s interactive capabilities.

MCML Authors
Link to Georgios Kaissis

Georgios Kaissis

Dr.

Associate

Privacy-Preserving and Trustworthy AI

Link to Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[1049]
N. Stracke, S. A. Baumann, J. Susskind, M. A. Bautista and B. Ommer.
CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control and Altering of T2I Models.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[1048]
S. Weber, J. H. Hong and D. Cremers.
Power Variable Projection for Initialization-Free Large-Scale Bundle Adjustment.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Simon Weber

Simon Weber

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1047]
L. Yang, L. Hoyer, M. Weber, T. Fischer, D. Dai, L. Leal-Taixé, D. Cremers, M. Pollefeys and L. Van Gool.
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Laura Leal-Taixé

Laura Leal-Taixé

Prof. Dr.

* Former member

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1046]
G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Guangyao Zhai

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Benjamin Busam

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[1045]
F. Hoppe, C. M. Verdun, F. Krahmer, M. Menzel and H. Rauhut.
With or Without Replacement? Improving Confidence in Fourier Imaging.
International Workshop on the Theory of Computational Sensing and its Applications to Radar, Multimodal Sensing and Imaging (CoSeRa 2024). Santiago de Compostela, Spain, Sep 18-20, 2024. DOI.
Abstract

Over the last few years, debiased estimators have been proposed in order to establish rigorous confidence intervals for high-dimensional problems in machine learning and data science. The core argument is that the error of these estimators with respect to the ground truth can be expressed as a Gaussian variable plus a remainder term that vanishes as long as the dimension of the problem is sufficiently high. Thus, uncertainty quantification (UQ) can be performed exploiting the Gaussian model. Empirically, however, the remainder term cannot be neglected in many realistic situations of moderately-sized dimensions, in particular in certain structured measurement scenarios such as Magnetic Resonance Imaging (MRI). This, in turn, can downgrade the advantage of the UQ methods as compared to non-UQ approaches such as the standard LASSO. In this paper, we present a method to improve the debiased estimator by sampling without replacement. Our approach leverages recent results of ours on the structure of the random nature of certain sampling schemes showing how a transition between sampling with and without replacement can lead to a weighted reconstruction scheme with improved performance for the standard LASSO. In this paper, we illustrate how this reweighted sampling idea can also improve the debiased estimator and, consequently, provide a better method for UQ in Fourier imaging.

MCML Authors
Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member

Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Holger Rauhut

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence


[1044]
H. Baniecki, G. Casalicchio, B. Bischl and P. Biecek.
On the Robustness of Global Feature Effect Explanations.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. DOI.
Abstract

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[1043]
C. Damke and E. Hüllermeier.
CUQ-GNN: Committee-Based Graph Uncertainty Quantification Using Posterior Networks.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. DOI.
Abstract

In this work, we study the influence of domain-specific characteristics when defining a meaningful notion of predictive uncertainty on graph data. Previously, the so-called Graph Posterior Network (GPN) model has been proposed to quantify uncertainty in node classification tasks. Given a graph, it uses Normalizing Flows (NFs) to estimate class densities for each node independently and converts those densities into Dirichlet pseudo-counts, which are then dispersed through the graph using the personalized Page-Rank (PPR) algorithm. The architecture of GPNs is motivated by a set of three axioms on the properties of its uncertainty estimates. We show that those axioms are not always satisfied in practice and therefore propose the family of Committe-based Uncertainty Quantification Graph Neural Networks (CUQ-GNNs), which combine standard Graph Neural Networks (GNNs) with the NF-based uncertainty estimation of Posterior Networks (PostNets). This approach adapts more flexibly to domain-specific demands on the properties of uncertainty estimates. We compare CUQ-GNN against GPN and other uncertainty quantification approaches on common node classification benchmarks and show that it is effective at producing useful uncertainty estimates.

MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1042]
R. Fischer, M. Wever, S. Buschjäger and T. Liebig.
MetaQuRe: Meta-learning from Model Quality and Resource Consumption.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. DOI.
Abstract

Automated machine learning (AutoML) allows for selecting, parametrizing, and composing learning algorithms for a given data set. While resources play a pivotal role in neural architecture search, it is less pronounced by classical AutoML approaches. In fact, they generally focus on only maximizing predictive quality and disregard the importance of finding resource-efficient solutions. To push resource awareness further, our work explicitly explores how measures such as running time or energy consumption can be better considered in AutoML. Firstly, we propose a novel method for algorithm selection that balances multiple performance aspects (including resource demand) as prioritized by the user with the help of compositional meta-learning. Secondly, to foster research on green meta-learning and AutoML, we release the MetaQuRe data set, which contains information on predictive (Qu)ality and (Re)source consumption of models evaluated across hundreds of data sets and four execution environments. We use this data to put our methodology into practice and conduct an in-depth analysis of how our approach and data set can help in making AutoML more resource-aware, which represents our third contribution. Lastly, we publish MetaQuRe alongside an extensive code base, allowing for reproducing all results, expanding our data with results from custom environments, and exploring MetaQuRe interactively. In short, our work demonstrates both the importance as well as benefits of rethinking AutoML and meta-learning in a resource-aware way, thus paving the path for making future ML solutions more sustainable.

MCML Authors

[1041]
S. Gilhuber, A. Beer, Y. Ma and T. Seidl.
FALCUN: A Simple and Efficient Deep Active Learning Strategy.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. DOI.
Abstract

We propose FALCUN, a novel deep batch active learning method that is label- and time-efficient. Our proposed acquisition uses a natural, self-adjusting balance of uncertainty and diversity: It slowly transitions from emphasizing uncertain instances at the decision boundary to emphasizing batch diversity. In contrast, established deep active learning methods often have a fixed weighting of uncertainty and diversity, limiting their effectiveness over diverse data sets exhibiting different characteristics. Moreover, to increase diversity, most methods demand intensive search through a deep neural network’s high-dimensional latent embedding space. This leads to high acquisition times when experts are idle while waiting for the next batch for annotation. We overcome this structural problem by exclusively operating on the low-dimensional probability space, yielding much faster acquisition times without sacrificing label efficiency. In extensive experiments, we show FALCUN’s suitability for diverse use cases, including medical images and tabular data. Compared to state-of-the-art methods like BADGE, CLUE, and AlfaMix, FALCUN consistently excels in quality and speed: while FALCUN is among the fastest methods, it has the highest average label efficiency.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1040]
P. Jahn, C. M. M. Frey, A. Beer, C. Leiber and T. Seidl.
Data with Density-Based Clusters: A Generator for Systematic Evaluation of Clustering Algorithms.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. DOI.
MCML Authors
Link to Philipp Jahn

Philipp Jahn

Database Systems & Data Mining

Link to Christian Frey

Christian Frey

Dr.

* Former member

Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1039]
Y. Liu, E. Nie, S. Feng, Z. Hua, Z. Ding, D. Wang, Y. Zhang and H. Schütze.
A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. DOI. GitHub.
MCML Authors
Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1038]
F. Stermann, I. Chalkidis, A. Vahidi, B. Bischl and M. Rezaei.
Attention-Driven Dropout: A Simple Method to Improve Self-supervised Contrastive Sentence Embeddings.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. DOI.
Abstract

Self-contrastive learning has proven effective for vision and natural language tasks. It aims to learn aligned data representations by encoding similar and dissimilar sentence pairs without human annotation. Therefore, data augmentation plays a crucial role in the learned embedding quality. However, in natural language processing (NLP), creating augmented samples for unsupervised contrastive learning is challenging since random editing may modify the semantic meanings of sentences and thus affect learning good representations. In this paper, we introduce a simple, still effective approach dubbed ADD (Attention-Driven Dropout) to generate better-augmented views of sentences to be used in self-contrastive learning. Given a sentence and a Pre-trained Transformer Language Model (PLM), such as RoBERTa, we use the aggregated attention scores of the PLM to remove the less “informative” tokens from the input. We consider two alternative algorithms based on NAIVEAGGREGATION across layers/heads and ATTENTIONROLLOUT [1]. Our approach significantly improves the overall performance of various self-supervised contrastive-based methods, including SIMCSE [14], DIFFCSE [10], and INFOCSE [33] by facilitating the generation of high-quality positive pairs required by these methods. Through empirical evaluations on multiple Semantic Textual Similarity (STS) and Transfer Learning tasks, we observe enhanced performance across the board.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[1037]
A. Vahidi, L. Wimmer, H. A. Gündüz, B. Bischl, E. Hüllermeier and M. Rezaei.
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. DOI.
Abstract

Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory demands. In addition, the efficiency of a deep ensemble is related to diversity among the ensemble members, which is challenging for large, over-parameterized deep neural networks. Moreover, ensemble learning has not yet seen such widespread adoption for unsupervised learning and it remains a challenging endeavor for self-supervised or unsupervised representation learning. Motivated by these challenges, we present a novel self-supervised training regime that leverages an ensemble of independent sub-networks, complemented by a new loss function designed to encourage diversity. Our method efficiently builds a sub-model ensemble with high diversity, leading to well-calibrated estimates of model uncertainty, all achieved with minimal computational overhead compared to traditional deep self-supervised ensembles. To evaluate the effectiveness of our approach, we conducted extensive experiments across various tasks, including in-distribution generalization, out-of-distribution detection, dataset corruption, and semi-supervised settings. The results demonstrate that our method significantly improves prediction reliability. Our approach not only achieves excellent accuracy but also enhances calibration, improving on important baseline performance across a wide range of self-supervised architectures in computer vision, natural language processing, and genomics data.

MCML Authors
Link to Lisa Wimmer

Lisa Wimmer

Statistical Learning & Data Science

Link to Hüseyin Anil Gündüz

Hüseyin Anil Gündüz

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[1036]
M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Explaining Change in Models and Data with Global Feature Importance and Effects.
Tutorial-Workshop Explainable AI for Time Series and Data Streams (TempXAI 2024) at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. PDF.
Abstract

In dynamic machine learning environments, where data streams continuously evolve, traditional explanation methods struggle to remain faithful to the underlying model or data distribution. Therefore, this work presents a unified framework for efficiently computing incremental model-agnostic global explanations tailored for time-dependent models. By extending static model-agnostic methods such as Permutation Feature Importance, SAGE, and Partial Dependence Plots into the online learning context, the proposed framework enables the continuous updating of explanations as new data becomes available. These incremental variants ensure that global explanations remain relevant while minimizing computational overhead. The framework also addresses key challenges related to data distribution maintenance and perturbation generation in online learning, offering time and memory efficient solutions like geometric reservoir-based sampling for data replacement.

MCML Authors
Link to Maximilian Muschalik

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[1035]
A. Maldonado, C. M. M. Frey, G. M. Tavares, N. Rehwald and T. Seidl.
GEDI: Generating Event Data with Intentional Features for Benchmarking Process Mining.
22nd International Conference on Business Process Management (BPM 2024). Krakow, Poland, Sep 01-06, 2024. To be published. Preprint available. PDF.
MCML Authors
Link to Andrea Maldonado

Andrea Maldonado

Database Systems & Data Mining

Link to Christian Frey

Christian Frey

Dr.

* Former member

Link to Gabriel Marques Tavares

Gabriel Marques Tavares

Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[1034]
L. Haliburton, J. Leusmann, R. Welsch, S. Ghebremedhin, P. Isaakidis, A. Schmidt and S. Mayer.
Uncovering labeler bias in machine learning annotation tasks.
AI and Ethics (Sep. 2024). DOI.
MCML Authors
Link to Luke Haliburton

Luke Haliburton

Dr.

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[1033]
A. Maarouf, S. Feuerriegel and N. Pröllochs.
A fused large language model for predicting startup success.
European Journal of Operational Research (Sep. 2024). In press. DOI.
Abstract

Investors are continuously seeking profitable investment opportunities in startups and, hence, for effective decision-making, need to predict a startup’s probability of success. Nowadays, investors can use not only various fundamental information about a startup (e.g., the age of the startup, the number of founders, and the business sector) but also textual description of a startup’s innovation and business model, which is widely available through online venture capital (VC) platforms such as Crunchbase. To support the decision-making of investors, we develop a machine learning approach with the aim of locating successful startups on VC platforms. Specifically, we develop, train, and evaluate a tailored, fused large language model to predict startup success. Thereby, we assess to what extent self-descriptions on VC platforms are predictive of startup success. Using 20,172 online profiles from Crunchbase, we find that our fused large language model can predict startup success, with textual self-descriptions being responsible for a significant part of the predictive power. Our work provides a decision support tool for investors to find profitable investment opportunities.

MCML Authors
Link to Abdurahman Maarouf

Abdurahman Maarouf

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1032]
M. Milling, S. Liu, A. Triantafyllopoulos, I. Aslan and B. W. Schuller.
Audio Enhancement for Computer Audition—An Iterative Training Paradigm Using Sample Importance.
IEEE Internet of Things Journal 39 (Sep. 2024). DOI.
Abstract

Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These applications are associated with speech and nonspeech tasks concerning semantic and non-semantic features, transient and global information, and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models, especially at low signal-to-noise ratios, for a wide range of computer audition tasks in everyday-life noisy environments.

MCML Authors
Link to Manuel Milling

Manuel Milling

Health Informatics

Link to Andreas Triantafyllopoulos

Andreas Triantafyllopoulos

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1031]
W. Jiang, M. Windl, B. Tag, Z. Sarsenbayeva and S. Mayer.
An Immersive and Interactive VR Dataset to Elicit Emotions.
IEEE Transactions on Visualization and Computer Graphics 30.11 (Sep. 2024). DOI.
Abstract

Images and videos are widely used to elicit emotions; however, their visual appeal differs from real-world experiences. With virtual reality becoming more realistic, immersive, and interactive, we envision virtual environments to elicit emotions effectively, rapidly, and with high ecological validity. This work presents the first interactive virtual reality dataset to elicit emotions. We created five interactive virtual environments based on corresponding validated 360° videos and validated their effectiveness with 160 participants. Our results show that our virtual environments successfully elicit targeted emotions. Compared with the existing methods using images or videos, our dataset allows virtual reality researchers and practitioners to integrate their designs effectively with emotion elicitation settings in an immersive and interactive way.

MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[1030]
H. J. Coyle-Asbil, L. Burk, M. Brandes, B. Brandes, C. Buck, M. N. Wright and L. A. Vallis.
Energy Expenditure Prediction in Preschool Children: A Machine Learning Approach Using Accelerometry and External Validation.
Physiological Measurement 45.9 (Sep. 2024). DOI.
Abstract

Objective. This study aimed to develop convolutional neural networks (CNNs) models to predict the energy expenditure (EE) of children from raw accelerometer data. Additionally, this study sought to external validation of the CNN models in addition to the linear regression (LM), random forest (RF), and full connected neural network (FcNN) models published in Steenbock et al (2019 J. Meas. Phys. Behav. 2 94–102). Approach. Included in this study were 41 German children (3.0–6.99 years) for the training and internal validation who were equipped with GENEActiv, GT3X+, and activPAL accelerometers. The external validation dataset consisted of 39 Canadian children (3.0–5.99 years) that were equipped with OPAL, GT9X, GENEActiv, and GT3X+ accelerometers. EE was recorded simultaneously in both datasets using a portable metabolic unit. The protocols consisted of a semi-structured activities ranging from low to high intensities. The root mean square error (RMSE) values were calculated and used to evaluate model performances. Main results. (1) The CNNs outperformed the LM (13.17%–23.81% lower mean RMSE values), FcNN (8.13%–27.27% lower RMSE values) and the RF models (3.59%–18.84% lower RMSE values) in the internal dataset. (2) In contrast, it was found that when applied to the external Canadian dataset, the CNN models had consistently higher RMSE values compared to the LM, FcNN, and RF. Significance. Although CNNs can enhance EE prediction accuracy, their ability to generalize to new datasets and accelerometer brands/models, is more limited compared to LM, RF, and FcNN models.

MCML Authors
Link to Lukas Burk

Lukas Burk

Statistical Learning & Data Science


[1029]
A. Bashardoust, Y. Feng, D. Geissler, S. Feuerriegel and Y. R. Shrestha.
The Effect of Education in Prompt Engineering: Evidence from Journalists.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

Large language models (LLMs) are increasingly used in daily work. In this paper, we analyze whether training in prompt engineering can improve the interactions of users with LLMs. For this, we conducted a field experiment where we asked journalists to write short texts before and after training in prompt engineering. We then analyzed the effect of training on three dimensions: (1) the user experience of journalists when interacting with LLMs, (2) the accuracy of the texts (assessed by a domain expert), and (3) the reader perception, such as clarity, engagement, and other text quality dimensions (assessed by non-expert readers). Our results show: (1) Our training improved the perceived expertise of journalists but also decreased the perceived helpfulness of LLM use. (2) The effect on accuracy varied by the difficulty of the task. (3) There is a mixed impact of training on reader perception across different text quality dimensions.

MCML Authors
Link to Dominique Geißler

Dominique Geißler

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1028]
L. Cheng, J. Hu, H. Yan, M. Gladkova, T. Huang, Y.-H. Liu, D. Cremers and H. Li.
Physically-Based Photometric Bundle Adjustment in Non-Lambertian Environments.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

Photometric bundle adjustment (PBA) is widely used in estimating the camera pose and 3D geometry by assuming a Lambertian world. However, the assumption of photometric consistency is often violated since the non-diffuse reflection is common in real-world environments. The photometric inconsistency significantly affects the reliability of existing PBA methods. To solve this problem, we propose a novel physically-based PBA method. Specifically, we introduce the physically-based weights regarding material, illumination, and light path. These weights distinguish the pixel pairs with different levels of photometric inconsistency. We also design corresponding models for material estimation based on sequential images and illumination estimation based on point clouds. In addition, we establish the first SLAM-related dataset of non-Lambertian scenes with complete ground truth of illumination and material. Extensive experiments demonstrated that our PBA method outperforms existing approaches in accuracy.

MCML Authors
Link to Mariia Gladkova

Mariia Gladkova

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Link to Haoang Li

Haoang Li

Dr.

* Former member


[1027]
M. Fornasier and L. Sun.
A PDE Framework of Consensus-Based Optimization for Objectives with Multiple Global Minimizers.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

Consensus-based optimization (CBO) is an agent-based derivative-free method for non-smooth global optimization that has been introduced in 2017, leveraging a surprising interplay between stochastic exploration and Laplace principle. In addition to its versatility and effectiveness in handling high-dimensional, non-convex, and non-smooth optimization problems, this approach lends itself well to theoretical analysis. Indeed, its dynamics is governed by a degenerate nonlinear Fokker–Planck equation, whose large time behavior explains the convergence of the method. Recent results provide guarantees of convergence under the restrictive assumption of a unique global minimizer for the objective function. In this work, we propose a novel and simple variation of CBO to tackle non-convex optimization problems with multiple global minimizers. Despite the simplicity of this new model, its analysis is particularly challenging because of its nonlinearity and nonlocal nature. We prove the existence of solutions of the corresponding nonlinear Fokker–Planck equation and we show exponential concentration in time to the set of minimizers made of multiple smooth, convex, and compact components. Our proofs require combining several ingredients, such as delicate geometrical arguments, new variants of a quantitative Laplace principle, ad hoc regularizations and approximations, and regularity theory for parabolic equations. Ultimately, this result suggests that the corresponding CBO algorithm, formulated as an Euler-Maruyama discretization of the underlying empirical stochastic process, tends to converge to multiple global minimizers.

MCML Authors
Link to Massimo Fornasier

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis


[1026]
X. Jing, K. Zhou, A. Triantafyllopoulos and B. W. Schuller.
Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

While current emotional text-to-speech (TTS) systems can generate highly intelligible emotional speech, achieving fine control over emotion rendering of the output speech still remains a significant challenge. In this paper, we introduce ParaEVITS, a novel emotional TTS framework that leverages the compositionality of natural language to enhance control over emotional rendering. By incorporating a text-audio encoder inspired by ParaCLAP, a contrastive language-audio pretraining (CLAP) model for computational paralinguistics, the diffusion model is trained to generate emotional embeddings based on textual emotional style descriptions. Our framework first trains on reference audio using the audio encoder, then fine-tunes a diffusion model to process textual inputs from ParaCLAP’s text encoder. During inference, speech attributes such as pitch, jitter, and loudness are manipulated using only textual conditioning. Our experiments demonstrate that ParaEVITS effectively control emotion rendering without compromising speech quality. Speech demos are publicly available.

MCML Authors
Link to Andreas Triantafyllopoulos

Andreas Triantafyllopoulos

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1025]
A. Köksal, M. Thaler, A. Imani, A. Üstün, A. Korhonen and H. Schütze.
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions.
Preprint at arXiv (Sep. 2024). arXiv. GitHub.
Abstract

Instruction tuning enhances large language models (LLMs) by aligning them with human preferences across diverse tasks. Traditional approaches to create instruction tuning datasets face serious challenges for low-resource languages due to their dependence on data annotation. This work introduces a novel method, Multilingual Reverse Instructions (MURI), which generates high-quality instruction tuning datasets for low-resource languages without requiring human annotators or pre-existing multilingual models. Utilizing reverse instructions and a translation pipeline, MURI produces instruction-output pairs from existing human-written texts in low-resource languages. This method ensures cultural relevance and diversity by sourcing texts from different native domains and applying filters to eliminate inappropriate content. Our dataset, MURI-IT, includes more than 2 million instruction-output pairs across 200 languages. Evaluation by native speakers and fine-tuning experiments with mT5 models demonstrate the approach’s effectiveness for both NLU and open-ended generation.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1024]
T. Liu, Z. Lai, G. Zhang, P. Torr, V. Demberg, V. Tresp and J. Gu.
Multimodal Pragmatic Jailbreak on Text-to-image Models.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two close-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from 8% to 74%. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while current classifiers may be effective for single modality detection, they fail to work against our jailbreak. Our work provides a foundation for further development towards more secure and reliable T2I models.

MCML Authors
Link to Tong Liu

Tong Liu

Database Systems & Data Mining

Link to Gengyuan Zhang

Gengyuan Zhang

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1023]
Y. Liu, M. Wang, A. H. Kargaran, A. Imani, O. Xhelili, H. Ye, C. Ma, F. Yvon and H. Schütze.
How Transliterations Improve Crosslingual Alignment.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

Recent studies have shown that post-aligning multilingual pretrained language models (mPLMs) using alignment objectives on both original and transliterated data can improve crosslingual alignment. This improvement further leads to better crosslingual transfer performance. However, it remains unclear how and why a better crosslingual alignment is achieved, as this technique only involves transliterations, and does not use any parallel data. This paper attempts to explicitly evaluate the crosslingual alignment and identify the key elements in transliteration-based approaches that contribute to better performance. For this, we train multiple models under varying setups for two pairs of related languages: (1) Polish and Ukrainian and (2) Hindi and Urdu. To assess alignment, we define four types of similarities based on sentence representations. Our experiments show that adding transliterations alone improves the overall similarities, even for random sentence pairs. With the help of auxiliary alignment objectives, especially the contrastive objective, the model learns to distinguish matched from random pairs, leading to better alignments. However, we also show that better alignment does not always yield better downstream performance, suggesting that further research is needed to clarify the connection between alignment and performance.

MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1022]
Y. Liu, H. Ye, C. Ma, M. Wang and H. Schütze.
LangSAMP: Language-Script Aware Multilingual Pretraining.
Preprint at arXiv (Sep. 2024). arXiv. GitHub.
Abstract

Recent multilingual pretrained language models (mPLMs) often avoid using language embeddings – learnable vectors assigned to different languages. These embeddings are discarded for two main reasons: (1) mPLMs are expected to have a single, unified parameter set across all languages, and (2) they need to function seamlessly as universal text encoders without requiring language IDs as input. However, this removal increases the burden on token embeddings to encode all language-specific information, which may hinder the model’s ability to produce more language-neutral representations. To address this challenge, we propose Language-Script Aware Multilingual Pretraining (LangSAMP), a method that incorporates both language and script embeddings to enhance representation learning while maintaining a simple architecture. Specifically, we integrate these embeddings into the output of the transformer blocks before passing the final representations to the language modeling head for prediction. We apply LangSAMP to the continual pretraining of XLM-R on a highly multilingual corpus covering more than 500 languages. The resulting model consistently outperforms the baseline. Extensive analysis further shows that language/script embeddings encode language/script-specific information, which improves the selection of source languages for crosslingual transfer.

MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1021]
Y. Ma, A. Li, Q. Khan and D. Cremers.
Enhancing the Performance of Multi-Vehicle Navigation in Unstructured Environments using Hard Sample Mining.
Preprint at arXiv (Sep. 2024). arXiv. GitHub.
Abstract

Contemporary research in autonomous driving has demonstrated tremendous potential in emulating the traits of human driving. However, they primarily cater to areas with well built road infrastructure and appropriate traffic management systems. Therefore, in the absence of traffic signals or in unstructured environments, these self-driving algorithms are expected to fail. This paper proposes a strategy for autonomously navigating multiple vehicles in close proximity to their desired destinations without traffic rules in unstructured environments. Graphical Neural Networks (GNNs) have demonstrated good utility for this task of multi-vehicle control. Among the different alternatives of training GNNs, supervised methods have proven to be most data-efficient, albeit require ground truth labels. However, these labels may not always be available, particularly in unstructured environments without traffic regulations. Therefore, a tedious optimization process may be required to determine them while ensuring that the vehicles reach their desired destination and do not collide with each other or any obstacles. Therefore, in order to expedite the training process, it is essential to reduce the optimization time and select only those samples for labeling that add most value to the training. In this paper, we propose a warm start method that first uses a pre-trained model trained on a simpler subset of data. Inference is then done on more complicated scenarios, to determine the hard samples wherein the model faces the greatest predicament. This is measured by the difficulty vehicles encounter in reaching their desired destination without collision. Experimental results demonstrate that mining for hard samples in this manner reduces the requirement for supervised training data by 10 fold.

MCML Authors
Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1020]
D. Ostermeier, J. Külz and M. Althoff.
Automatic Geometric Decomposition for Analytical Inverse Kinematics.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

Calculating the inverse kinematics (IK) is fundamental for motion planning in robotics. Compared to numerical or learning-based approaches, analytical IK provides higher efficiency and accuracy. However, existing analytical approaches require manual intervention, are ill-conditioned, or rely on time-consuming symbolic manipulation. In this paper, we propose a fast and stable method that enables automatic online derivation and computation of analytical inverse kinematics. Our approach is based on remodeling the kinematic chain of a manipulator to automatically decompose its IK into pre-solved geometric subproblems. We exploit intersecting and parallel joint axes to assign a given manipulator to a certain kinematic class and the corresponding subproblem decomposition. In numerical experiments, we demonstrate that our decomposition is orders of magnitudes faster in deriving the IK than existing tools that employ symbolic manipulation. Following this one-time derivation, our method matches and even surpasses baselines, such as IKFast, in terms of speed and accuracy during the online computation of explicit IK solutions. Finally, we provide a C++ toolbox with Python wrappers that, for the first time, enables plug-and-play analytical IK within less than a millisecond.

MCML Authors
Link to Jonathan Külz

Jonathan Külz

Cyber Physical Systems

Link to Matthias Althoff

Matthias Althoff

Prof. Dr.

Cyber Physical Systems


[1019]
A. Ranne, L. Kuang, Y. Velikova, N. Navab and F. Baena.
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

In minimally invasive endovascular procedures, contrast-enhanced angiography remains the most robust imaging technique. However, it is at the expense of the patient and clinician’s health due to prolonged radiation exposure. As an alternative, interventional ultrasound has notable benefits such as being radiation-free, fast to deploy, and having a small footprint in the operating room. Yet, ultrasound is hard to interpret, and highly prone to artifacts and noise. Additionally, interventional radiologists must undergo extensive training before they become qualified to diagnose and treat patients effectively, leading to a shortage of staff, and a lack of open-source datasets. In this work, we seek to address both problems by introducing a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images, without demanding any labeled data. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism, and is capable of learning feature changes across time and space. To facilitate training, we used synthetic ultrasound data based on physics-driven catheter insertion simulations, and translated the data into a unique CT-Ultrasound common domain, CACTUSS, to improve the segmentation performance. We generated ground truth segmentation masks by computing the optical flow between adjacent frames using FlowNet2, and performed thresholding to obtain a binary map estimate. Finally, we validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms, thus demonstrating its potential for applications to clinical data in the future.

MCML Authors
Link to Yordanka Velikova

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1018]
H. Schulz-Kümpel, S. Fischer, T. Nagler, A.-L. Boulesteix, B. Bischl and R. Hornung.
Constructing Confidence Intervals for 'the' Generalization Error – a Comprehensive Benchmark Study.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

When assessing the quality of prediction models in machine learning, confidence intervals (CIs) for the generalization error, which measures predictive performance, are a crucial tool. Luckily, there exist many methods for computing such CIs and new promising approaches are continuously being proposed. Typically, these methods combine various resampling procedures, most popular among them cross-validation and bootstrapping, with different variance estimation techniques. Unfortunately, however, there is currently no consensus on when any of these combinations may be most reliably employed and how they generally compare. In this work, we conduct the first large-scale study comparing CIs for the generalization error - empirically evaluating 13 different methods on a total of 18 tabular regression and classification problems, using four different inducers and a total of eight loss functions. We give an overview of the methodological foundations and inherent challenges of constructing CIs for the generalization error and provide a concise review of all 13 methods in a unified framework. Finally, the CI methods are evaluated in terms of their relative coverage frequency, width, and runtime. Based on these findings, we are able to identify a subset of methods that we would recommend. We also publish the datasets as a benchmarking suite on OpenML and our code on GitHub to serve as a basis for further studies.

MCML Authors
Link to Hannah Schulz-Kümpel

Hannah Schulz-Kümpel

Biometry in Molecular Medicine

Link to Sebastian Fischer

Sebastian Fischer

Statistical Learning & Data Science

Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[1017]
A. Stephan, D. Zhu, M. Aßenmacher, X. Shen and B. Roth.
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks.
Preprint at arXiv (Sep. 2024). arXiv.
MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[1016]
L. von der Heyde, A.-C. Haensch and A. Wenz.
United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections.
Preprint at arXiv (Sep. 2024). arXiv.
MCML Authors
Link to Leah von der Heyde

Leah von der Heyde

Social Data Science and AI Lab


[1015]
M. Weber, L. Yu, Q. Yu, X. Deng, X. Shen, D. Cremers and L.-C. Chen.
MaskBit: Embedding-free Image Generation via Bit Tokens.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

Masked transformer models for class-conditional image generation have become a compelling alternative to diffusion models. Typically comprising two stages - an initial VQGAN model for transitioning between latent space and image space, and a subsequent Transformer model for image generation within latent space - these frameworks offer promising avenues for image synthesis. In this study, we present two primary contributions: Firstly, an empirical and systematic examination of VQGANs, leading to a modernized VQGAN. Secondly, a novel embedding-free generation network operating directly on bit tokens - a binary quantized representation of tokens with rich semantics. The first contribution furnishes a transparent, reproducible, and high-performing VQGAN model, enhancing accessibility and matching the performance of current state-of-the-art methods while revealing previously undisclosed details. The second contribution demonstrates that embedding-free image generation using bit tokens achieves a new state-of-the-art FID of 1.52 on the ImageNet 256x256 benchmark, with a compact generator model of mere 305M parameters.

MCML Authors
Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[1014]
Ç. Yapar, R. Levie, G. Kutyniok and G. Caire.
Dataset of Pathloss and ToA Radio Maps With Localization Application.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

In this article, we present a collection of radio map datasets in dense urban setting, which we generated and made publicly available. The datasets include simulated pathloss/received signal strength (RSS) and time of arrival (ToA) radio maps over a large collection of realistic dense urban setting in real city maps. The two main applications of the presented dataset are 1) learning methods that predict the pathloss from input city maps (namely, deep learning-based simulations), and, 2) wireless localization. The fact that the RSS and ToA maps are computed by the same simulations over the same city maps allows for a fair comparison of the RSS and ToA-based localization methods.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[1013]
I. Ziegler, A. Köksal, D. Elliott and H. Schütze.
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation.
Preprint at arXiv (Sep. 2024). arXiv.
Abstract

Building high-quality datasets for specialized tasks is a time-consuming and resource-intensive process that often requires specialized domain knowledge. We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets, given a small number of user-written few-shots that demonstrate the task to be performed. Given the few-shot examples, we use large-scale public web-crawled corpora and similarity-based document retrieval to find other relevant human-written documents. Lastly, instruction-tuned large language models (LLMs) augment the retrieved documents into custom-formatted task samples, which then can be used for fine-tuning. We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks: biology question-answering (QA), medicine QA and commonsense QA as well as summarization. Our experiments show that CRAFT-based models outperform or achieve comparable performance to general LLMs for QA tasks, while CRAFT-based summarization models outperform models trained on human-curated data by 46 preference points.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1012]
D. Tschernutter, M. Kraus and S. Feuerriegel.
A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions.
Transactions on Machine Learning Research (Sep. 2024). URL.
MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1011]
A. Triantafyllopoulos, A. Gebhard, M. Milling, S. Rampp and B. W. Schuller.
An Automatic Analysis of Ultrasound Vocalisations for the Prediction of Interaction Context in Captive Egyptian Fruit Bats.
32nd European Signal Processing Conference (EUSIPCO 2024). Lyon, France,, Aug 26-30, 2024. URL.
MCML Authors
Link to Andreas Triantafyllopoulos

Andreas Triantafyllopoulos

Health Informatics

Link to Alexander Gebhard

Alexander Gebhard

Health Informatics

Link to Manuel Milling

Manuel Milling

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[1010]
T. Decker, A. Koebler, M. Lebacher, I. Thon, V. Tresp and F. Buettner.
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance.
30th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2024). Barcelona, Spain, Aug 25-29, 2024. DOI.
Abstract

Monitoring and maintaining machine learning models are among the most critical challenges in translating recent advances in the field into real-world applications. However, current monitoring methods lack the capability of provide actionable insights answering the question of why the performance of a particular model really degraded. In this work, we propose a novel approach to explain the behavior of a black-box model under feature shifts by attributing an estimated performance change to interpretable input characteristics. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation (XPE). We analyze the underlying assumptions and demonstrate the superiority of our approach over several baselines on different data sets across various data modalities such as images, audio, and tabular data. We also indicate how the generated results can lead to valuable insights, enabling explanatory model monitoring by revealing potential root causes for model deterioration and guiding toward actionable countermeasures.

MCML Authors
Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[1009]
M. Kuzmanovic, D. Frauen, T. Hatt and S. Feuerriegel.
Causal Machine Learning for Cost-Effective Allocation of Development Aid.
30th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2024). Barcelona, Spain, Aug 25-29, 2024. DOI.
MCML Authors
Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[1008]
S. Urchs, V. Thurner, M. Aßenmacher, C. and S. Thiemichen.
Detecting Gender Discrimination on Actor Level Using Linguistic Discourse Analysis.
5th Workshop on Gender Bias in Natural Language Processing (GeBNLP 2024). Bangkok, Thailand, Aug 16, 2024. URL.
MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[1007]
J. Pavlopoulos, V. Kougia, E. Garces Arias, P. Platanou, S. Shabalin, K. Liagkou, E. Papadatos, H. Essler, J.-B. Camps and F. Fischer.
Challenging Error Correction in Recognised Byzantine Greek.
1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024) at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Automatic correction of errors in Handwritten Text Recognition (HTR) output poses persistent challenges yet to be fully resolved. In this study, we introduce a shared task aimed at addressing this challenge, which attracted 271 submissions, yielding only a handful of promising approaches. This paper presents the datasets, the most effective methods, and an experimental analysis in error-correcting HTRed manuscripts and papyri in Byzantine Greek, the language that followed Classical and preceded Modern Greek. By using recognised and transcribed data from seven centuries, the two best-performing methods are compared, one based on a neural encoder-decoder architecture and the other based on engineered linguistic rules. We show that the recognition error rate can be reduced by both, up to 2.5 points at the level of characters and up to 15 at the level of words, while also elucidating their respective strengths and weaknesses.

MCML Authors
Link to Esteban Garces Arias

Esteban Garces Arias

Statistical Learning & Data Science


[1006]
A. Dimmelmeier, H. Doll, M. Schierholz, E. Kormanyos, M. Fehr, B. Ma, J. Beck, A. Fraser and F. Kreuter.
Informing climate risk analysis using textual information - A research agenda.
1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024) at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

We present a research agenda focused on efficiently extracting, assuring quality, and consolidating textual company sustainability information to address urgent climate change decision-making needs. Starting from the goal to create integrated FAIR (Findable, Accessible, Interoperable, Reusable) climate-related data, we identify research needs pertaining to the technical aspects of information extraction as well as to the design of the integrated sustainability datasets that we seek to compile. Regarding extraction, we leverage technological advancements, particularly in large language models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines, to unlock the underutilized potential of unstructured textual information contained in corporate sustainability reports. In applying these techniques, we review key challenges, which include the retrieval and extraction of CO2 emission values from PDF documents, especially from unstructured tables and graphs therein, and the validation of automatically extracted data through comparisons with human-annotated values. We also review how existing use cases and practices in climate risk analytics relate to choices of what textual information should be extracted and how it could be linked to existing structured data.

MCML Authors
Link to Malte Schierholz

Malte Schierholz

Dr.

Social Data Science and AI Lab

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Jacob Beck

Jacob Beck

Social Data Science and AI Lab

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[1005]
S. Zhou, S. Peng and B. Plank.
CLIMATELI: Evaluating Entity Linking on Climate Change Data.
1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024) at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Climate Change (CC) is a pressing topic of global importance, attracting increasing attention across research fields, from social sciences to Natural Language Processing (NLP). CC is also discussed in various settings and communication platforms, from academic publications to social media forums. Understanding who and what is mentioned in such data is a first critical step to gaining new insights into CC. We present CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3,087 entity spans to Wikipedia. Using CLIMATELI (CLIMATe Entity LInking), we evaluate existing entity linking (EL) systems on the CC topic across various genres and propose automated filtering methods for CC entities. We find that the performance of EL models notably lags behind humans at both token and entity levels. Testing within the scope of retaining or excluding non-nominal and/or non-CC entities particularly impacts the models’ performances.

MCML Authors
Link to Shijia Zhou

Shijia Zhou

Artificial Intelligence and Computational Linguistics

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1004]
A. Yüksel, A. Köksal, L. K. Senel, A. Korhonen and H. Schütze.
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish.
1st Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024) at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. Invited talk. arXiv. GitHub.
Abstract

Multiple choice question answering tasks evaluate the reasoning, comprehension, and mathematical abilities of Large Language Models (LLMs). While existing benchmarks employ automatic translation for multilingual evaluation, this approach is error-prone and potentially introduces culturally biased questions, especially in social sciences. We introduce the first multitask, multiple-choice Turkish QA benchmark, TurkishMMLU, to evaluate LLMs’ understanding of the Turkish language. TurkishMMLU includes over 10,000 questions, covering 9 different subjects from Turkish high-school education curricula. These questions are written by curriculum experts, suitable for the high-school curricula in Turkey, covering subjects ranging from natural sciences and math questions to more culturally representative topics such as Turkish Literature and the history of the Turkish Republic. We evaluate over 20 LLMs, including multilingual open-source (e.g., Gemma, Llama, MT5), closed-source (GPT 4o, Claude, Gemini), and Turkish-adapted (e.g., Trendyol) models. We provide an extensive evaluation, including zero-shot and few-shot evaluation of LLMs, chain-of-thought reasoning, and question difficulty analysis along with model performance. We provide an in-depth analysis of the Turkish capabilities and limitations of current LLMs to provide insights for future LLMs for the Turkish language.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1003]
M. Windl, J. Leusmann, A. Schmidt, S. S. Feger and S. Mayer.
Privacy Communication Patterns for Domestic Robots.
20th Symposium on Usable Privacy and Security (SOUPS 2024). Philadelphia, PA, USA, Aug 11-13, 2024. URL.
Abstract

Future domestic robots will become integral parts of our homes. They will have various sensors that continuously collect data and varying locomotion and interaction capabilities, enabling them to access all rooms and physically manipulate the environment. This raises many privacy concerns. We investigate how such concerns can be mitigated, using all possibilities enabled by the robot’s novel locomotion and interaction abilities. First, we found that privacy concerns increase with advanced locomotion and interaction capabilities through an online survey (N=90). Second, we conducted three focus groups (N=22) to construct 86 patterns to communicate the states of microphones, cameras, and the internet connectivity of domestic robots. Lastly, we conducted a large-scale online survey (N=1720) to understand which patterns perform best regarding trust, privacy, understandability, notification qualities, and user preference. Our final set of communication patterns will guide developers and researchers to ensure a privacy-preserving future with domestic robots.

MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[1002]
V. Blaschke, C. Purschke, H. Schütze and B. Plank.
What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Natural language processing (NLP) has largely focused on modelling standardized languages. More recently, attention has increasingly shifted to local, non-standardized languages and dialects. However, the relevant speaker populations’ needs and wishes with respect to NLP tools are largely unknown. In this paper, we focus on dialects and regional languages related to German – a group of varieties that is heterogeneous in terms of prestige and standardization. We survey speakers of these varieties (N=327) and present their opinions on hypothetical language technologies for their dialects. Although attitudes vary among subgroups of our respondents, we find that respondents are especially in favour of potential NLP tools that work with dialectal input (especially audio input) such as virtual assistants, and less so for applications that produce dialectal output such as machine translation or spellcheckers.

MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[1001]
A. H. Kargaran, F. Yvon and H. Schütze.
MaskLID: Code-Switching Language Identification through Iterative Masking.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI. GitHub.
Abstract

We present MaskLID, a simple, yet effective, code-switching (CS) language identification (LID) method. MaskLID does not require any training and is designed to complement current high-performance sentence-level LIDs. Sentence-level LIDs are classifiers trained on monolingual texts to provide single labels, typically using a softmax layer to turn scores into probabilities. However, in cases where a sentence is composed in both L1 and L2 languages, the LID classifier often only returns the dominant label L1. To address this limitation, MaskLID employs a strategy to mask text features associated with L1, allowing the LID to classify the text as L2 in the next round. This method uses the LID itself to identify the features that require masking and does not rely on any external resource. In this work, we explore the use of MaskLID for two open-source LIDs (GlotLID and OpenLID), that are both based on the FastText architecture.

MCML Authors
Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[1000]
T. Liu, I. Škrjanec and V. Demberg.
Temperature-scaling surprisal estimates improve fit to human reading times – but does it do so for the 'right reasons'?.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

A wide body of evidence shows that human language processing difficulty is predicted by the information-theoretic measure surprisal, a word’s negative log probability in context. However, it is still unclear how to best estimate these probabilities needed for predicting human processing difficulty – while a long-standing belief held that models with lower perplexity would provide more accurate estimates of word predictability, and therefore lead to better reading time predictions, recent work has shown that for very large models, psycholinguistic predictive power decreases. One reason could be that language models might be more confident of their predictions than humans, because they have had exposure to several magnitudes more data. In this paper, we test what effect temperature-scaling of large language model (LLM) predictions has on surprisal estimates and their predictive power of reading times of English texts. Firstly, we show that calibration of large language models typically improves with model size, i.e. poorer calibration cannot account for poorer fit to reading times. Secondly, we find that temperature-scaling probabilities lead to a systematically better fit to reading times (up to 89% improvement in delta log likelihood), across several reading time corpora. Finally, we show that this improvement in fit is chiefly driven by words that are composed of multiple subword tokens.

MCML Authors
Link to Tong Liu

Tong Liu

Database Systems & Data Mining


[999]
Y. Liu, C. Ma, H. Ye and H. Schütze.
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

The world’s more than 7000 languages are written in at least 293 scripts. Due to various reasons, many closely related languages use different scripts, which poses a difficulty for multilingual pretrained language models (mPLMs) in learning crosslingual knowledge through lexical overlap. As a consequence, mPLMs are faced with a script barrier: representations from different scripts are located in different subspaces, which can result in crosslingual transfer involving languages of different scripts performing suboptimally. To address this problem, we propose TransliCo, a framework that optimizes the Transliteration Contrastive Modeling (TCM) objective to fine-tune an mPLM by contrasting sentences in its training data and their transliterations in a unified script (in our case Latin), which enhances uniformity in the representation space for different scripts. Using Glot500-m, an mPLM pretrained on over 500 languages, as our source model, we fine-tune it on a small portion (5%) of its training data, and refer to the resulting model as Furina. We show that Furina not only better aligns representations from distinct scripts but also outperforms the original Glot500-m on various zero-shot crosslingual transfer tasks. Additionally, we achieve consistent improvement in a case study on the Indic group where the languages exhibit areal features but use different scripts. We make our code and models publicly available.

MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[998]
P. Mondorf and B. Plank.
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Deductive reasoning plays a pivotal role in the formulation of sound and cohesive arguments. It allows individuals to draw conclusions that logically follow, given the truth value of the information provided. Recent progress in the domain of large language models (LLMs) has showcased their capability in executing deductive reasoning tasks. Nonetheless, a significant portion of research primarily assesses the accuracy of LLMs in solving such tasks, often overlooking a deeper analysis of their reasoning behavior. In this study, we draw upon principles from cognitive psychology to examine inferential strategies employed by LLMs, through a detailed evaluation of their responses to propositional logic problems. Our findings indicate that LLMs display reasoning patterns akin to those observed in humans, including strategies like supposition following or chain construction. Moreover, our research demonstrates that the architecture and scale of the model significantly affect its preferred method of reasoning, with more advanced models tending to adopt strategies more frequently than less sophisticated ones. Importantly, we assert that a model’s accuracy, that is the correctness of its final conclusion, does not necessarily reflect the validity of its reasoning process. This distinction underscores the necessity for more nuanced evaluation procedures in the field.

MCML Authors
Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[997]
L. K. Senel, B. Fetahu, D. Yoshida, Z. Chen, G. Castellucci, N. Vedula, J. I. Choi and S. Malmasi.
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Recommender systems are widely used to suggest engaging content, and Large Language Models (LLMs) have given rise to generative recommenders. Such systems can directly generate items, including for open-set tasks like question suggestion. While the world knowledge of LLMs enable good recommendations, improving the generated content through user feedback is challenging as continuously fine-tuning LLMs is prohibitively expensive. We present a training-free approach for optimizing generative recommenders by connecting user feedback loops to LLM-based optimizers. We propose a generative explore-exploit method that can not only exploit generated items with known high engagement, but also actively explore and discover hidden population preferences to improve recommendation quality. We evaluate our approach on question generation in two domains (e-commerce and general knowledge), and model user feedback with Click Through Rate (CTR). Experiments show our LLM-based explore-exploit approach can iteratively improve recommendations, and consistently increase CTR. Ablation analysis shows that generative exploration is key to learning user preferences, avoiding the pitfalls of greedy exploit-only approaches. A human evaluation strongly supports our quantitative findings.

MCML Authors
Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning


[996]
C. Tomani, D. Vilar, M. Freitag, C. Cherry, S. Naskar, M. Finkelstein, X. Garcia and D. Cremers.
Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations getting assigned a higher score by the model. However, research has shown that this assumption does not always hold, and generation quality can be improved by decoding to optimize a utility function backed by a metric or quality-estimation signal, as is done by Minimum Bayes Risk (MBR) or Quality-Aware decoding. The main disadvantage of these approaches is that they require an additional model to calculate the utility function during decoding, significantly increasing the computational cost. In this paper, we propose to make the NMT models themselves quality-aware by training them to estimate the quality of their own output. Using this approach for MBR decoding we can drastically reduce the size of the candidate list, resulting in a speed-up of two-orders of magnitude. When applying our method to MAP decoding we obtain quality gains similar or even superior to quality reranking approaches, but with the efficiency of single pass decoding.

MCML Authors
Link to Christian Tomani

Christian Tomani

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[995]
L. Weber-Genzel, S. Peng, M.-C. De Marneffe and B. Plank.
VariErr NLI: Separating Annotation Error from Human Label Variation.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Human label variation arises when annotators assign different labels to the same item for valid reasons, while annotation errors occur when labels are assigned for invalid reasons. These two issues are prevalent in NLP benchmarks, yet existing research has studied them in isolation. To the best of our knowledge, there exists no prior work that focuses on teasing apart error from signal, especially in cases where signal is beyond black-and-white.To fill this gap, we introduce a systematic methodology and a new dataset, VariErr (variation versus error), focusing on the NLI task in English. We propose a 2-round annotation procedure with annotators explaining each label and subsequently judging the validity of label-explanation pairs.VariErr contains 7,732 validity judgments on 1,933 explanations for 500 re-annotated MNLI items. We assess the effectiveness of various automatic error detection (AED) methods and GPTs in uncovering errors versus human label variation. We find that state-of-the-art AED methods significantly underperform GPTs and humans. While GPT-4 is the best system, it still falls short of human performance. Our methodology is applicable beyond NLI, offering fertile ground for future research on error versus plausible variation, which in turn can yield better and more trustworthy NLP systems.

MCML Authors
Link to Leon Weber-Genzel

Leon Weber-Genzel

Dr.

* Former member

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[994]
S. Xu, S. T.y.s.s, O. Ichim, B. Plank and M. Grabmair.
Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

In legal decisions, split votes (SV) occur when judges cannot reach a unanimous decision, posing a difficulty for lawyers who must navigate diverse legal arguments and opinions. In high-stakes domains, %as human-AI interaction systems become increasingly important, understanding the alignment of perceived difficulty between humans and AI systems is crucial to build trust. However, existing NLP calibration methods focus on a classifier’s awareness of predictive performance, measured against the human majority class, overlooking inherent human label variation (HLV). This paper explores split votes as naturally observable human disagreement and value pluralism. We collect judges’ vote distributions from the European Court of Human Rights (ECHR), and present SV-ECHR, a case outcome classification (COC) dataset with SV information. We build a taxonomy of disagreement with SV-specific subcategories. We further assess the alignment of perceived difficulty between models and humans, as well as confidence- and human-calibration of COC models. We observe limited alignment with the judge vote distribution. To our knowledge, this is the first systematic exploration of calibration to human judgements in legal NLP. Our study underscores the necessity for further research on measuring and enhancing model calibration considering HLV in legal decision tasks.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[993]
M. Aßenmacher, A. Stephan, L. Weissweiler, E. Çano, I. Ziegler, M. Härttrich, B. Bischl, B. Roth, C. Heumann and H. Schütze.
Collaborative Development of Modular Open Source Educational Resources for Natural Language Processing.
6th Workshop on Teaching NLP (TeachingNLP 2024) at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
Abstract

In this work, we present a collaboratively and continuously developed open-source educational resource (OSER) for teaching natural language processing at two different universities. We shed light on the principles we followed for the initial design of the course and the rationale for ongoing developments, followed by a reflection on the inter-university collaboration for designing and maintaining teaching material. When reflecting on the latter, we explicitly emphasize the considerations that need to be made when facing heterogeneous groups and when having to accommodate multiple examination regulations within one single course framework. Relying on the fundamental principles of OSER developments as defined by Bothmann et al. (2023) proved to be an important guideline during this process. The final part pertains to open-sourcing our teaching material, coping with the increasing speed of developments in the field, and integrating the course digitally, also addressing conflicting priorities and challenges we are currently facing.

MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[992]
L. Christ, S. Amiriparian, M. Milling, I. Aslan and B. W. Schuller.
Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children’s stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .8221 for valence and .7125 for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict.

MCML Authors
Link to Shahin Amiriparian

Shahin Amiriparian

Dr.

Health Informatics

Link to Manuel Milling

Manuel Milling

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[991]
K. Hämmerl, J. Libovický and A. Fraser.
Understanding Cross-Lingual Alignment—A Survey.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, has been an active field of research in recent years. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field. We present different understandings of cross-lingual alignment and their limitations. We provide a qualitative summary of results from a number of surveyed papers. Finally, we discuss how these insights may be applied not only to encoder models, where this topic has been heavily studied, but also to encoder-decoder or even decoder-only models, and argue that an effective trade-off between language-neutral and language-specific information is key.

MCML Authors
Link to Katharina Hämmerl

Katharina Hämmerl

Data Analytics & Statistics

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[990]
W. Lai, M. Mesgar and A. Fraser.
LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

To democratize large language models (LLMs) to most natural languages, it is imperative to make these models capable of understanding and generating texts in many languages, in particular low-resource ones. While recent multilingual LLMs demonstrate remarkable performance in such capabilities, these LLMs still support a limited number of human languages due to the lack of training data for low resource languages. Moreover, these LLMs are not yet aligned with human preference for downstream tasks, which is crucial for the success of LLMs in English. In this paper, we introduce xLLaMA-100 and xBLOOM-100 (collectively xLLMs-100), which scale the multilingual capabilities of LLaMA and BLOOM to 100 languages. To do so, we construct two datasets: a multilingual instruction dataset including 100 languages, which represents the largest language coverage to date, and a cross-lingual human feedback dataset encompassing 30 languages. We perform multilingual instruction tuning on the constructed instruction data and further align the LLMs with human feedback using the DPO algorithm on our cross-lingual human feedback dataset. We evaluate the multilingual understanding and generating capabilities of xLLMs-100 on five multilingual benchmarks. Experimental results show that xLLMs-100 consistently outperforms its peers across the benchmarks by considerable margins, defining a new state-of-the-art multilingual LLM that supports 100 languages.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[989]
A. Maarouf, D. Bär, D. Geissler and S. Feuerriegel.
HQP: A human-annotated dataset for detecting online propaganda.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Online propaganda poses a severe threat to the integrity of societies. However, existing datasets for detecting online propaganda have a key limitation: they were annotated using weak labels that can be noisy and even incorrect. To address this limitation, our work makes the following contributions: (1) We present HQP: a novel dataset (N=30000) for detecting online propaganda with high-quality labels. To the best of our knowledge, HQP is the first large-scale dataset for detecting online propaganda that was created through human annotation. (2) We show empirically that state-of-the-art language models fail in detecting online propaganda when trained with weak labels (AUC: 64.03). In contrast, state-of-the-art language models can accurately detect online propaganda when trained with our high-quality labels (AUC: 92.25), which is an improvement of 44%. (3) We show that prompt-based learning using a small sample of high-quality labels can still achieve a reasonable performance (AUC: 80.27) while significantly reducing the cost of labeling. (4) We extend HQP to HQP+ to test how well propaganda across different contexts can be detected. Crucially, our work highlights the importance of high-quality labels for sensitive NLP tasks such as propaganda detection.

MCML Authors
Link to Abdurahman Maarouf

Abdurahman Maarouf

Artificial Intelligence in Management

Link to Dominik Bär

Dominik Bär

Artificial Intelligence in Management

Link to Dominique Geißler

Dominique Geißler

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[988]
X. Wang, B. Ma, C. Hu, L. Weber-Genzel, P. Röttger, F. Kreuter, D. Hovy and B. Plank.
My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions to limit the response space. The model is then evaluated by ranking the candidate answers by the log probability of the first token prediction. However, first-tokens may not consistently reflect the final response output, due to model’s diverse response styles such as starting with ‘Sure’ or refusing to answer. Consequently, first-token evaluation is not indicative of model behaviour when interacting with users. But by how much? We evaluate how aligned first-token evaluation is with the text output along several dimensions, namely final option choice, refusal rate, choice distribution and robustness under prompt perturbation. Our results show that the two approaches are severely misaligned on all dimensions, reaching mismatch rates over 60%. Models heavily fine-tuned on conversational or safety data are especially impacted. Crucially, models remain misaligned even when we increasingly constrain prompts, i.e., force them to start with an option letter or example template. Our findings i) underscore the importance of inspecting the text output as well and ii) caution against relying solely on first-token evaluation.

MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Leon Weber-Genzel

Leon Weber-Genzel

Dr.

* Former member

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[987]
P. Wicke and L. Wachowiak.
Exploring Spatial Schemas in Large Language Models.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI. GitHub.
Abstract

Despite the ubiquity of large language models (LLMs) in AI research, the question of embodiment in LLMs remains underexplored, distinguishing them from embodied systems in robotics where sensory perception directly informs physical action.Our investigation navigates the intriguing terrain of whether LLMs, despite their non-embodied nature, effectively capture implicit human intuitions about fundamental, spatial building blocks of language. We employ insights from spatial cognitive foundations developed through early sensorimotor experiences, guiding our exploration through the reproduction of three psycholinguistic experiments. Surprisingly, correlations between model outputs and human responses emerge, revealing adaptability without a tangible connection to embodied experiences. Notable distinctions include polarized language model responses and reduced correlations in vision language models. This research contributes to a nuanced understanding of the interplay between language, spatial experiences, and the computations made by large language models.

MCML Authors
Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning


[986]
S. Yuan, E. Nie, M. Färber, H. Schmid and H. Schütze.
GNNAVI: Navigating the Information Flow in Large Language Models by Graph Neural Network.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

Large Language Models (LLMs) exhibit strong In-Context Learning (ICL) capabilities when prompts with demonstrations are applied to them. However, fine-tuning still remains crucial to further enhance their adaptability. Prompt-based fine-tuning proves to be an effective fine-tuning method in low-data scenarios, but high demands on computing resources limit its practicality. We address this issue by introducing a prompt-based parameter-efficient fine-tuning (PEFT) approach. GNNavi leverages insights into ICL’s information flow dynamics, which indicates that label words act in prompts as anchors for information propagation. GNNavi employs a Graph Neural Network (GNN) layer to precisely guide the aggregation and distribution of information flow during the processing of prompts by hardwiring the desired information flow into the GNN. Our experiments on text classification tasks with GPT-2 and Llama2 shows GNNavi surpasses standard prompt-based fine-tuning methods in few-shot settings by updating just 0.2% to 0.5% of parameters. We compare GNNavi with prevalent PEFT approaches, such as prefix tuning, LoRA and Adapter in terms of performance and efficiency. Our analysis reveals that GNNavi enhances information flow and ensures a clear aggregation process.

MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[985]
M. Zhang, V. Gautam, M. Wang, J. Alabi, X. Shen, D. Klakow and M. Mosbach.
The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

In-context learning is a popular inference strategy where large language models solve a task using only a few labeled demonstrations without needing any parameter updates. Although there have been extensive studies on English in-context learning, multilingual in-context learning remains under-explored, and we lack an in-depth understanding of the role of demonstrations in this context. To address this gap, we conduct a multidimensional analysis of multilingual in-context learning, experimenting with 5 models from different model families, 9 datasets covering classification and generation tasks, and 56 typologically diverse languages. Our results reveal that the effectiveness of demonstrations varies significantly across models, tasks, and languages. We also find that strong instruction-following models including Llama 2-Chat, GPT-3.5, and GPT-4 are largely insensitive to the quality of demonstrations. Instead, a carefully crafted template often eliminates the benefits of demonstrations for some tasks and languages altogether. These findings show that the importance of demonstrations might be overestimated. Our work highlights the need for granular evaluation across multiple axes towards a better understanding of in-context learning.

MCML Authors
Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning


[984]
B. Ma.
Evaluating Lexical Aspect with Large Language Models.
Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2024) at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI.
Abstract

In this study, we explore the proficiency of large language models (LLMs) in understanding two key lexical aspects: duration (durative/stative) and telicity (telic/atelic). Through experiments on datasets featuring sentences, verbs, and verb positions, we prompt the LLMs to identify aspectual features of verbs in sentences. Our findings reveal that certain LLMs, particularly those closed-source ones, are able to capture information on duration and telicity, albeit with some performance variations and weaker results compared to the baseline. By employing prompts at three levels (sentence-only, sentence with verb, and sentence with verb and its position), we demonstrate that integrating verb information generally enhances performance in aspectual feature recognition, though it introduces instability. We call for future research to look deeper into methods aimed at optimizing LLMs for aspectual feature comprehension.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab


[983]
P. Wicke, L. Hirlimann and J. M. Cunha.
Using Analogical Reasoning to Prompt LLMs for their Intuitions of Abstract Spatial Schemas.
1st Workshop on Analogical Abstraction in Cognition, Perception, and Language (Analogy-ANGLE 2024) at the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024). Jeju, Korea, Aug 03-09, 2024. PDF.
MCML Authors
Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning


[982]
J. Brandt, M. Wever, V. Bengs and E. Hüllermeier.
Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO.
33rd International Joint Conference on Artificial Intelligence (IJCAI 2024). Jeju, Korea, Aug 03-09, 2024. DOI.
Abstract

Hyperparameter optimization (HPO) is indispensable for achieving optimal performance in machine learning tasks. A popular class of methods in this regard is based on Successive Halving (SHA), which casts HPO into a pure-exploration multi-armed bandit problem under finite sampling budget constraints. This is accomplished by considering hyperparameter configurations as arms and rewards as the negative validation losses. While enjoying theoretical guarantees as well as working well in practice, SHA comes, however, with several hyperparameters itself, one of which is the maximum budget that can be allocated to evaluate a single arm (hyperparameter configuration). Although there are already solutions to this meta hyperparameter optimization problem, such as the doubling trick or asynchronous extensions of SHA, these are either practically inefficient or lack theoretical guarantees. In this paper, we propose incremental SHA (iSHA), a synchronous extension of SHA, allowing to increase the maximum budget a posteriori while still enjoying theoretical guarantees. Our empirical analysis of HPO problems corroborates our theoretical findings and shows that iSHA is more resource-efficient than existing SHA-based approaches.

MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[981]
J. G. Wiese, L. Wimmer, T. Papamarkou, B. Bischl, S. Günnemann and D. Rügamer.
Towards Efficient Posterior Sampling in Deep Neural Networks via Symmetry Removal (Extended Abstract).
33rd International Joint Conference on Artificial Intelligence (IJCAI 2024). Jeju, Korea, Aug 03-09, 2024. DOI.
Abstract

Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.

MCML Authors
Link to Lisa Wimmer

Lisa Wimmer

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[980]
L. Bothmann and K. Peters.
Fairness als Qualitätskriterium im Maschinellen Lernen – Rekonstruktion des philosophischen Konzepts und Implikationen für die Nutzung außergesetzlicher Merkmale bei qualifizierten Mietspiegeln.
AStA Wirtschafts- und Sozialstatistisches Archiv 18 (Aug. 2024). DOI.
Abstract

With the increased use of machine learning (ML) models within automated decision-making systems, the demands on the quality of ML models are growing. Pure prediction quality is no longer the sole quality criterion; in particular, there is an increasing demand to consider fairness aspects. This paper pursues two goals. First, it summarizes the current fairness discussion in the field of ML (fairML) and describes the most recent developments, especially with respect to the philosophical foundations of the concept of fairness within ML. On the other hand, the question is addressed to what extent so-called ‘extra-legal’ characteristics may be used in the compilation of qualified rent indices. A recent proposal by Kauermann and Windmann (AStA Wirtschafts- und Sozialstatistisches Archiv, Volume 17, 2023) on using extra-legal features in qualified rent indices includes a model-based imputation method, which we contrast with the legal requirements. Finally, we show which alternatives from the field of fairML could be used and outline the different basic philosophical assumptions behind the various methods.

MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[979]
D. Schalk, R. Rehms, V. S. Hoffmann, B. Bischl and U. Mansmann.
Distributed non-disclosive validation of predictive models by a modified ROC-GLM.
BMC Medical Research Methodology 24.190 (Aug. 2024). DOI.
Abstract

Distributed statistical analyses provide a promising approach for privacy protection when analyzing data distributed over several databases. Instead of directly operating on data, the analyst receives anonymous summary statistics, which are combined into an aggregated result. Further, in discrimination model (prognosis, diagnosis, etc.) development, it is key to evaluate a trained model w.r.t. to its prognostic or predictive performance on new independent data. For binary classification, quantifying discrimination uses the receiver operating characteristics (ROC) and its area under the curve (AUC) as aggregation measure. We are interested to calculate both as well as basic indicators of calibration-in-the-large for a binary classification task using a distributed and privacy-preserving approach…

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[978]
F. Drost, E. Dorigatti, A. Straub, P. Hilgendorf, K. I. Wagner, K. Heyer, M. López Montes, B. Bischl, D. H. Busch, K. Schober and B. Schubert.
Predicting T cell receptor functionality against mutant epitopes.
Cell Genomics 4.9 (Aug. 2024). DOI.
Abstract

Cancer cells and pathogens can evade T cell receptors (TCRs) via mutations in immunogenic epitopes. TCR cross-reactivity (i.e., recognition of multiple epitopes with sequence similarities) can counteract such escape but may cause severe side effects in cell-based immunotherapies through targeting self-antigens. To predict the effect of epitope point mutations on T cell functionality, we here present the random forest-based model Predicting T Cell Epitope-Specific Activation against Mutant Versions (P-TEAM). P-TEAM was trained and tested on three datasets with TCR responses to single-amino-acid mutations of the model epitope SIINFEKL, the tumor neo-epitope VPSVWRSSL, and the human cytomegalovirus antigen NLVPMVATV, totaling 9,690 unique TCR-epitope interactions. P-TEAM was able to accurately classify T cell reactivities and quantitatively predict T cell functionalities for unobserved single-point mutations and unseen TCRs. Overall, P-TEAM provides an effective computational tool to study T cell responses against mutated epitopes.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[977]
A. Mittermeier, M. Aßenmacher, B. Schachtner, S. Grosu, V. Dakovic, V. Kandratovich, B. Sabel and M. Ingrisch.
Automatische ICD-10-Codierung.
Die Radiologie 64 (Aug. 2024). DOI.
MCML Authors
Link to Andreas Mittermeier

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science

Link to Balthasar Schachtner

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[976]
T. Rajapakshe, R. Rana, S. Khalifa, B. Sisman, B. W. Schuller and C. Busso.
emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition.
IEEE Access 12 (Aug. 2024). DOI.
Abstract

Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a potential solution for automatically determining the best DL model. The Differentiable Architecture Search (DARTS) is a particularly efficient method for discovering optimal models. This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance. The literature supports the selection of CNN and LSTM coupling to improve performance. While DARTS has previously been used to choose CNN and LSTM operations independently, our technique adds a novel mechanism for selecting CNN and SeqNN operations in conjunction using DARTS. Unlike earlier work, we do not impose limits on the layer order of the CNN. Instead, we let DARTS choose the best layer order inside the DARTS cell. We demonstrate that emoDARTS outperforms conventionally designed CNN-LSTM models and surpasses the best-reported SER results achieved through DARTS on CNN-LSTM by evaluating our approach on the IEMOCAP, MSP-IMPROV, and MSP-Podcast datasets.

MCML Authors
Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[975]
S. Heid, J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Learning decision catalogues for situated decision making: The case of scoring systems.
International Journal of Approximate Reasoning 171 (Aug. 2024). DOI.
Abstract

In this paper, we formalize the problem of learning coherent collections of decision models, which we call decision catalogues, and illustrate it for the case where models are scoring systems. This problem is motivated by the recent rise of algorithmic decision-making and the idea to improve human decision-making through machine learning, in conjunction with the observation that decision models should be situated in terms of their complexity and resource requirements: Instead of constructing a single decision model and using this model in all cases, different models might be appropriate depending on the decision context. Decision catalogues are supposed to support a seamless transition from very simple, resource-efficient to more sophisticated but also more demanding models. We present a general algorithmic framework for inducing such catalogues from training data, which tackles the learning task as a problem of searching the space of candidate catalogues systematically and, to this end, makes use of heuristic search methods. We also present a concrete instantiation of this framework as well as empirical studies for performance evaluation, which, in a nutshell, show that greedy search is an efficient and hard-to-beat strategy for the construction of catalogues of scoring systems.

MCML Authors
Link to Jonas Hanselle

Jonas Hanselle

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[974]
F. Ott, L. Heublein, D. Rügamer, B. Bischl and C. Mutschler.
Fusing structure from motion and simulation-augmented pose regression from optical flow for challenging indoor environments.
Journal of Visual Communication and Image Representation 103 (Aug. 2024). DOI.
Abstract

The localization of objects is essential in many applications, such as robotics, virtual and augmented reality, and warehouse logistics. Recent advancements in deep learning have enabled localization using monocular cameras. Traditionally, structure from motion (SfM) techniques predict an object’s absolute position from a point cloud, while absolute pose regression (APR) methods use neural networks to understand the environment semantically. However, both approaches face challenges from environmental factors like motion blur, lighting changes, repetitive patterns, and featureless areas. This study addresses these challenges by incorporating additional information and refining absolute pose estimates with relative pose regression (RPR) methods. RPR also struggles with issues like motion blur. To overcome this, we compute the optical flow between consecutive images using the Lucas–Kanade algorithm and use a small recurrent convolutional network to predict relative poses. Combining absolute and relative poses is difficult due to differences between global and local coordinate systems. Current methods use pose graph optimization (PGO) to align these poses. In this work, we propose recurrent fusion networks to better integrate absolute and relative pose predictions, enhancing the accuracy of absolute pose estimates. We evaluate eight different recurrent units and create a simulation environment to pre-train the APR and RPR networks for improved generalization. Additionally, we record a large dataset of various scenarios in a challenging indoor environment resembling a warehouse with transportation robots. Through hyperparameter searches and experiments, we demonstrate that our recurrent fusion method outperforms PGO in effectiveness.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[973]
H. Boche, V. Fojtik, A. Fono and G. Kutyniok.
Computability of Classification and Deep Learning: From Theoretical Limits to Practical Feasibility through Quantization.
Preprint at arXiv (Aug. 2024). arXiv.
Abstract

The unwavering success of deep learning in the past decade led to the increasing prevalence of deep learning methods in various application fields. However, the downsides of deep learning, most prominently its lack of trustworthiness, may not be compatible with safety-critical or high-responsibility applications requiring stricter performance guarantees. Recently, several instances of deep learning applications have been shown to be subject to theoretical limitations of computability, undermining the feasibility of performance guarantees when employed on real-world computers. We extend the findings by studying computability in the deep learning framework from two perspectives: From an application viewpoint in the context of classification problems and a general limitation viewpoint in the context of training neural networks. In particular, we show restrictions on the algorithmic solvability of classification problems that also render the algorithmic detection of failure in computations in a general setting infeasible. Subsequently, we prove algorithmic limitations in training deep neural networks even in cases where the underlying problem is well-behaved. Finally, we end with a positive observation, showing that in quantized versions of classification and deep network training, computability restrictions do not arise or can be overcome to a certain degree.

MCML Authors
Link to Vit Fojtik

Vit Fojtik

Mathematical Foundations of Artificial Intelligence

Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[972]
T. Boege, M. Drton, B. Hollering, S. Lumpp, P. Misra and D. Schkoda.
Conditional Independence in Stationary Diffusions.
Preprint at arXiv (Aug. 2024). arXiv.
Abstract

Stationary distributions of multivariate diffusion processes have recently been proposed as probabilistic models of causal systems in statistics and machine learning. Motivated by these developments, we study stationary multivariate diffusion processes with a sparsely structured drift. Our main result gives a characterization of the conditional independence relations that hold in a stationary distribution. The result draws on a graphical representation of the drift structure and pertains to conditional independence relations that hold generally as a consequence of the drift’s sparsity pattern.

MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[971]
Y. Liang, O. Zadorozhnyi and M. Drton.
Kernel-Based Differentiable Learning of Non-Parametric Directed Acyclic Graphical Models.
Preprint at arXiv (Aug. 2024). arXiv.
Abstract

Causal discovery amounts to learning a directed acyclic graph (DAG) that encodes a causal model. This model selection problem can be challenging due to its large combinatorial search space, particularly when dealing with non-parametric causal models. Recent research has sought to bypass the combinatorial search by reformulating causal discovery as a continuous optimization problem, employing constraints that ensure the acyclicity of the graph. In non-parametric settings, existing approaches typically rely on finite-dimensional approximations of the relationships between nodes, resulting in a score-based continuous optimization problem with a smooth acyclicity constraint. In this work, we develop an alternative approximation method by utilizing reproducing kernel Hilbert spaces (RKHS) and applying general sparsity-inducing regularization terms based on partial derivatives. Within this framework, we introduce an extended RKHS representer theorem. To enforce acyclicity, we advocate the log-determinant formulation of the acyclicity constraint and show its stability. Finally, we assess the performance of our proposed RKHS-DAGMA procedure through simulations and illustrative data analyses.

MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[970]
J. Meier, L. Scalerandi, O. Dhaouadi, J. Kaiser, N. Araslanov and D. Cremers.
CARLA Drone: Monocular 3D Object Detection from a Different Perspective.
Preprint at arXiv (Aug. 2024). arXiv.
Abstract

Existing techniques for monocular 3D detection have a serious restriction. They tend to perform well only on a limited set of benchmarks, faring well either on ego-centric car views or on traffic camera views, but rarely on both. To encourage progress, this work advocates for an extended evaluation of 3D detection frameworks across different camera perspectives. We make two key contributions. First, we introduce the CARLA Drone dataset, CDrone. Simulating drone views, it substantially expands the diversity of camera perspectives in existing benchmarks. Despite its synthetic nature, CDrone represents a real-world challenge. To show this, we confirm that previous techniques struggle to perform well both on CDrone and a real-world 3D drone dataset. Second, we develop an effective data augmentation pipeline called GroundMix. Its distinguishing element is the use of the ground for creating 3D-consistent augmentation of a training image. GroundMix significantly boosts the detection accuracy of a lightweight one-stage detector. In our expanded evaluation, we achieve the average precision on par with or substantially higher than the previous state of the art across all tested datasets.

MCML Authors
Link to Johannes Meier

Johannes Meier

Computer Vision & Artificial Intelligence

Link to Nikita Araslanov

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[969]
D. Schkoda, E. Robeva and M. Drton.
Causal Discovery of Linear Non-Gaussian Causal Models with Unobserved Confounding.
Preprint at arXiv (Aug. 2024). arXiv.
Abstract

We consider linear non-Gaussian structural equation models that involve latent confounding. In this setting, the causal structure is identifiable, but, in general, it is not possible to identify the specific causal effects. Instead, a finite number of different causal effects result in the same observational distribution. Most existing algorithms for identifying these causal effects use overcomplete independent component analysis (ICA), which often suffers from convergence to local optima. Furthermore, the number of latent variables must be known a priori. To address these issues, we propose an algorithm that operates recursively rather than using overcomplete ICA. The algorithm first infers a source, estimates the effect of the source and its latent parents on their descendants, and then eliminates their influence from the data. For both source identification and effect size estimation, we use rank conditions on matrices formed from higher-order cumulants. We prove asymptotic correctness under the mild assumption that locally, the number of latent variables never exceeds the number of observed variables. Simulation studies demonstrate that our method achieves comparable performance to overcomplete ICA even though it does not know the number of latents in advance.

MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[968]
D. Strieder and M. Drton.
Identifying Total Causal Effects in Linear Models under Partial Homoscedasticity.
Preprint at arXiv (Aug. 2024). arXiv.
Abstract

A fundamental challenge of scientific research is inferring causal relations based on observed data. One commonly used approach involves utilizing structural causal models that postulate noisy functional relations among interacting variables. A directed graph naturally represents these models and reflects the underlying causal structure. However, classical identifiability results suggest that, without conducting additional experiments, this causal graph can only be identified up to a Markov equivalence class of indistinguishable models. Recent research has shown that focusing on linear relations with equal error variances can enable the identification of the causal structure from mere observational data. Nonetheless, practitioners are often primarily interested in the effects of specific interventions, rendering the complete identification of the causal structure unnecessary. In this work, we investigate the extent to which less restrictive assumptions of partial homoscedasticity are sufficient for identifying the causal effects of interest. Furthermore, we construct mathematically rigorous confidence regions for total causal effects under structure uncertainty and explore the performance gain of relying on stricter error assumptions in a simulation study.

MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[967]
Y. Zhang, Z. Ma, Y. Ma, Z. Han, Y. Wu and V. Tresp.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration.
Preprint at arXiv (Aug. 2024). arXiv.
Abstract

LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks and continuously refining this plan, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.

MCML Authors
Link to Yao Zhang

Yao Zhang

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[966]
R. Klaar, M. Rabe, A. T. Stüber, S. Hering, S. Corradini, C. Eze, S. Marschner, C. Belka, G. Landry, J. Dinkel and C. Kurz.
MRI-based ventilation and perfusion imaging to predict radiation-induced pneumonitis in lung tumor patients at a 0.35T MR-Linac.
Radiotherapy and Oncology (Aug. 2024). DOI.
Abstract

Radiation-induced pneumonitis (RP), diagnosed 6–12 weeks after treatment, is a complication of lung tumor radiotherapy. So far, clinical and dosimetric parameters have not been reliable in predicting RP. We propose using non-contrast enhanced magnetic resonance imaging (MRI) based functional parameters acquired over the treatment course for patient stratification for improved follow-up.

MCML Authors
Link to Theresa Stüber

Theresa Stüber

Clinical Data Science in Radiology


[965]
E. Bergman, M. Feurer, A. Bahram, A. R. Balef, L. Purucker, S. Segel, M. Lindauer, F. Hutter and K. Eggensperger.
AMLTK: A Modular AutoML Toolkit in Python.
The Journal of Open Source Software 9.100 (Aug. 2024). DOI.
Abstract

Machine Learning is a core building block in novel data-driven applications. Practitioners face many ambiguous design decisions while developing practical machine learning (ML) solutions. Automated machine learning (AutoML) facilitates the development of machine learning applications by providing efficient methods for optimizing hyperparameters, searching for neural architectures, or constructing whole ML pipelines (Hutter et al., 2019). Thereby, design decisions such as the choice of modelling, pre-processing, and training algorithm are crucial to obtaining well-performing solutions. By automatically obtaining ML solutions, AutoML aims to lower the barrier to leveraging machine learning and reduce the time needed to develop or adapt ML solutions for new domains or data.
Highly performant software packages for automatically building ML pipelines given data, so-called AutoML systems, are available and can be used off-the-shelf. Typically, AutoML systems evaluate ML models sequentially to return a well-performing single best model or multiple models combined into an ensemble. Existing AutoML systems are typically highly engineered monolithic software developed for specific use cases to perform well and robustly under various conditions…

MCML Authors
Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science


[964]
M. Bini, K. Roth, Z. Akata and A. Khoreva.
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL. GitHub.
Abstract

Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to downstream task requirements while retaining their generalization ability. However, the amount of additionally introduced parameters and compute for successful adaptation and hyperparameter searches can explode quickly, especially when deployed at scale to serve numerous individual requests. To ensure effective, parameter-efficient, and hyperparameter-robust adaptation, we propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. By design, ETHER transformations require a minimal number of parameters, are less likely to deteriorate model performance, and exhibit robustness to hyperparameter and learning rate choices. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters (∼10-100 times lower than LoRA or OFT) across multiple image synthesis and natural language tasks without exhaustive hyperparameter tuning. Finally, we investigate the recent emphasis on Hyperspherical Energy retention for adaptation and raise questions on its practical utility.

MCML Authors
Link to Massimo  Bini

Massimo Bini

Interpretable and Reliable Machine Learning

Link to Karsten Roth

Karsten Roth

Interpretable and Reliable Machine Learning

Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[963]
K. Bouchiat, A. Immer, H. Yèche, G. Ratsch and V. Fortuin.
Improving Neural Additive Models with Bayesian Principles.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.

MCML Authors
Link to Vincent Fortuin

Vincent Fortuin

Dr.

Associate

Bayesian Deep Learning


[962]
T. Decker, A. R. Bhattarai, J. Gu, V. Tresp and F. Buettner.
Provably Better Explanations with Optimized Aggregation of Feature Attributions.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models. Despite the numerous techniques available, individual methods often produce inconsistent and unstable results, putting their overall reliability into question. In this work, we aim to systematically improve the quality of feature attributions by combining multiple explanations across distinct methods or their variations. For this purpose, we propose a novel approach to derive optimal convex combinations of feature attributions that yield provable improvements of desired quality criteria such as robustness or faithfulness to the model behavior. Through extensive experiments involving various model architectures and popular feature attribution techniques, we demonstrate that our combination strategy consistently outperforms individual methods and existing baselines.

MCML Authors
Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[961]
S. Eckman, B. Plank and F. Kreuter.
Position: Insights from Survey Methodology can Improve Training Data.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

Whether future AI models are fair, trustworthy, and aligned with the public’s interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality data is difficult, and few AI/ML researchers are trained in data collection methods. Recent research in data-centric AI has show that higher quality training data leads to better performing models, making this the right moment to introduce AI/ML researchers to the field of survey methodology, the science of data collection. We summarize insights from the survey methodology literature and discuss how they can improve the quality of training and feedback data. We also suggest collaborative research ideas into how biases in data collection can be mitigated, making models more accurate and human-centric.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[960]
D. Frauen, V. Melnychuk and S. Feuerriegel.
Fair Off-Policy Learning from Observational Data.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

Algorithmic decision-making in practice must be fair for legal, ethical, and societal reasons. To achieve this, prior research has contributed various approaches that ensure fairness in machine learning predictions, while comparatively little effort has focused on fairness in decision-making, specifically off-policy learning. In this paper, we propose a novel framework for fair off-policy learning: we learn decision rules from observational data under different notions of fairness, where we explicitly assume that observational data were collected under a different – potentially discriminatory – behavioral policy. Importantly, our framework applies to different fairness notions for off-policy learning, where fairness is formalized based on actions or policy values. As our main contribution, we propose a neural network-based framework to learn optimal policies under different fairness notions. We further provide theoretical guarantees in the form of generalization bounds for the finite-sample version of our framework. We demonstrate the effectiveness of our framework through extensive numerical experiments using both simulated and real-world data. Altogether, our work enables algorithmic decision-making in a wide array of practical applications where fairness must be ensured.

MCML Authors
Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[959]
F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and k-Shapley values (k-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

MCML Authors
Link to Maximilian Muschalik

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[958]
M. Herrmann, F. J. D. Lange, K. Eggensperger, G. Casalicchio, M. Wever, M. Feurer, D. Rügamer, E. Hüllermeier, A.-L. Boulesteix and B. Bischl.
Position: Why We Must Rethink Empirical Research in Machine Learning.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

We warn against a common but incomplete understanding of empirical research in machine learning (ML) that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical ML research is fashioned as confirmatory research while it should rather be considered exploratory.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[957]
F. Karl, M. Kemeter, G. Dax and P. Sierak.
Position: Embracing Negative Results in Machine Learning.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
MCML Authors
Link to Florian Karl

Florian Karl

Statistical Learning & Data Science


[956]
M. Lindauer, F. Karl, A. Klier, J. Moosbauer, A. Tornede, A. C. Mueller, F. Hutter, M. Feurer and B. Bischl.
Position: A Call to Action for a Human-Centered AutoML Paradigm.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive performance. This focused progress, while substantial, raises questions about how well AutoML has met its broader, original goals. In this position paper, we argue that a key to unlocking AutoML’s full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems, including their diverse roles, expectations, and expertise. We envision a more human-centered approach in future AutoML research, promoting the collaborative design of ML systems that tightly integrates the complementary strengths of human expertise and AutoML methodologies.

MCML Authors
Link to Florian Karl

Florian Karl

Statistical Learning & Data Science

Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[955]
T. Papamarkou, M. Skoularidou, K. Palla, L. Aitchison, J. Arbel, D. Dunson, M. Filippone, V. Fortuin, P. Hennig, J. M. Hernández-Lobato, A. Hubin, A. Immer, T. Karaletsos, M. E. Khan, A. Kristiadi, Y. Li, S. Mandt, C. Nemeth, M. A. Osborne, T. G. J. Rudner, D. Rügamer, Y. W. Teh, M. Welling, A. G. Wilson and R. Zhang.
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.

MCML Authors
Link to Vincent Fortuin

Vincent Fortuin

Dr.

Associate

Bayesian Deep Learning

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[954]
D. Rügamer, C. Kolb, T. Weber, L. Kook and T. Nagler.
Generalizing orthogonalization for models with non-linearities.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms’ application. It was, for instance, shown that neural networks can deduce racial information solely from a patient’s X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the ‘‘orthogonalization’’ or ‘’normalization’’ of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method’s effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Chris Kolb

Chris Kolb

Statistical Learning & Data Science

Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science


[953]
Y. Sale, V. Bengs, M. Caprio and E. Hüllermeier.
Second-Order Uncertainty Quantification: A Distance-Based Approach.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[952]
J. Schweisthal, D. Frauen, M. van der Schaar and S. Feuerriegel.
Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
MCML Authors
Link to Jonas Schweisthal

Jonas Schweisthal

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[951]
Y. Shen, N. Daheim, B. Cong, P. Nickl, G. M. Marconi, C. Bazan, R. Yokota, I. Gurevych, D. Cremers, M. E. Khan and T. Möllenhoff.
Variational Learning is Effective for Large Deep Networks.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL. GitHub.
MCML Authors
Yuesong Shen

Yuesong Shen

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[950]
E. Sommer, L. Wimmer, T. Papamarkou, L. Bothmann, B. Bischl and D. Rügamer.
Connecting the Dots: Is Mode Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks’ parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a Bayesian deep ensemble approach as an effective solution with competitive performance and uncertainty quantification.

MCML Authors
Link to Lisa Wimmer

Lisa Wimmer

Statistical Learning & Data Science

Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[949]
D. Tramontano, Y. Kivva, S. Salehkaleybar, M. Drton and N. Kiyavash.
Causal Effect Identification in LiNGAM Models with Latent Confounders.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[948]
Y. Sun, J. Liu, Z. Wu, Z. Ding, Y. Ma, T. Seidl and V. Tresp.
SA-DQAS: Self-attention Enhanced Differentiable Quantum Architecture Search.
Workshop Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. PDF.
MCML Authors
Link to Yize Sun

Yize Sun

Database Systems & Data Mining

Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[947]
U. Fischer Abaigar, C. Kern and F. Kreuter.
The Missing Link: Allocation Performance in Causal Machine Learning.
Workshop Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. arXiv. URL.
Abstract

Automated decision-making (ADM) systems are being deployed across a diverse range of critical problem areas such as social welfare and healthcare. Recent work highlights the importance of causal ML models in ADM systems, but implementing them in complex social environments poses significant challenges. Research on how these challenges impact the performance in specific downstream decision-making tasks is limited. Addressing this gap, we make use of a comprehensive real-world dataset of jobseekers to illustrate how the performance of a single CATE model can vary significantly across different decision-making scenarios and highlight the differential influence of challenges such as distribution shifts on predictions and allocations.

MCML Authors
Link to Unai Fischer Abaigar

Unai Fischer Abaigar

Social Data Science and AI Lab

Link to Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[946]
S. Dandl, K. Blesch, T. Freiesleben, G. König, J. Kapar, B. Bischl and M. N. Wright.
CountARFactuals – Generating plausible model-agnostic counterfactual explanations with adversarial random forests.
2nd World Conference on Explainable Artificial Intelligence (xAI 2024). Valletta, Malta, Jul 17-19, 2024. DOI.
Abstract

Counterfactual explanations elucidate algorithmic decisions by pointing to scenarios that would have led to an alternative, desired outcome. Giving insight into the model’s behavior, they hint users towards possible actions and give grounds for contesting decisions. As a crucial factor in achieving these goals, counterfactuals must be plausible, i.e., describing realistic alternative scenarios within the data manifold. This paper leverages a recently developed generative modeling technique – adversarial random forests (ARFs) – to efficiently generate plausible counterfactuals in a model-agnostic way. ARFs can serve as a plausibility measure or directly generate counterfactual explanations. Our ARF-based approach surpasses the limitations of existing methods that aim to generate plausible counterfactual explanations: It is easy to train and computationally highly efficient, handles continuous and categorical data naturally, and allows integrating additional desiderata such as sparsity in a straightforward manner.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[945]
F. K. Ewald, L. Bothmann, M. N. Wright, B. Bischl, G. Casalicchio and G. König.
A Guide to Feature Importance Methods for Scientific Inference.
2nd World Conference on Explainable Artificial Intelligence (xAI 2024). Valletta, Malta, Jul 17-19, 2024. DOI.
Abstract

While machine learning (ML) models are increasingly used due to their high predictive power, their use in understanding the data-generating process (DGP) is limited. Understanding the DGP requires insights into feature-target associations, which many ML models cannot directly provide due to their opaque internal mechanisms. Feature importance (FI) methods provide useful insights into the DGP under certain conditions. Since the results of different FI methods have different interpretations, selecting the correct FI method for a concrete use case is crucial and still requires expert knowledge. This paper serves as a comprehensive guide to help understand the different interpretations of global FI methods. Through an extensive review of FI methods and providing new proofs regarding their interpretation, we facilitate a thorough understanding of these methods and formulate concrete recommendations for scientific inference. We conclude by discussing options for FI uncertainty estimation and point to directions for future research aiming at full statistical inference from black-box ML models.

MCML Authors
Link to Fiona Ewald

Fiona Ewald

Statistical Learning & Data Science

Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[944]
P. Kolpaczki, G. Haselbeck and E. Hüllermeier.
How Much Can Stratification Improve the Approximation of Shapley Values?.
2nd World Conference on Explainable Artificial Intelligence (xAI 2024). Valletta, Malta, Jul 17-19, 2024. DOI.
Abstract

Over the last decade, the Shapley value has become one of the most widely applied tools to provide post-hoc explanations for black box models. However, its theoretically justified solution to the problem of dividing a collective benefit to the members of a group, such as features or data points, comes at a price. Without strong assumptions, the exponential number of member subsets excludes an exact calculation of the Shapley value. In search for a remedy, recent works have demonstrated the efficacy of approximations based on sampling with stratification, in which the sample space is partitioned into smaller subpopulations. The effectiveness of this technique mainly depends on the degree to which the allocation of available samples over the formed strata mirrors their unknown variances. To uncover the hypothetical potential of stratification, we investigate the gap in approximation quality caused by the lack of knowledge of the optimal allocation. Moreover, we combine recent advances to propose two state-of-the-art algorithms Adaptive SVARM and Continuous Adaptive SVARM that adjust the sample allocation on-the-fly. The potential of our approach is assessed in an empirical evaluation.

MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[943]
D. Rundel, J. Kobialka, C. von Crailsheim, M. Feurer, T. Nagler and D. Rügamer.
Interpretable Machine Learning for TabPFN.
2nd World Conference on Explainable Artificial Intelligence (xAI 2024). Valletta, Malta, Jul 17-19, 2024. DOI. GitHub.
Abstract

The recently developed Prior-Data Fitted Networks (PFNs) have shown very promising results for applications in low-data regimes. The TabPFN model, a special case of PFNs for tabular data, is able to achieve state-of-the-art performance on a variety of classification tasks while producing posterior predictive distributions in mere seconds by in-context learning without the need for learning parameters or hyperparameter tuning. This makes TabPFN a very attractive option for a wide range of domain applications. However, a major drawback of the method is its lack of interpretability. Therefore, we propose several adaptations of popular interpretability methods that we specifically design for TabPFN. By taking advantage of the unique properties of the model, our adaptations allow for more efficient computations than existing implementations. In particular, we show how in-context learning facilitates the estimation of Shapley values by avoiding approximate retraining and enables the use of Leave-One-Covariate-Out (LOCO) even when working with large-scale Transformers. In addition, we demonstrate how data valuation methods can be used to address scalability challenges of TabPFN.

MCML Authors
Link to David Rundel

David Rundel

Statistical Learning & Data Science

Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science

Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[942]
C. A. Scholbeck, H. Funk and G. Casalicchio.
Algorithm-Agnostic Feature Attributions for Clustering.
2nd World Conference on Explainable Artificial Intelligence (xAI 2024). Valletta, Malta, Jul 17-19, 2024. DOI.
Abstract

Understanding how assignments of instances to clusters can be attributed to the features can be vital in many applications. However, research to provide such feature attributions has been limited. Clustering algorithms with built-in explanations are scarce. Common algorithm-agnostic approaches involve dimension reduction and subsequent visualization, which transforms the original features used to cluster the data; or training a supervised learning classifier on the found cluster labels, which adds additional and intractable complexity. We present FACT (feature attributions for clustering), an algorithm-agnostic framework that preserves the integrity of the data and does not introduce additional models. As the defining characteristic of FACT, we introduce a set of work stages: sampling, intervention, reassignment, and aggregation. Furthermore, we propose two novel FACT methods: SMART (scoring metric after permutation) measures changes in cluster assignments by custom scoring functions after permuting selected features; IDEA (isolated effect on assignment) indicates local and global changes in cluster assignments after making uniform changes to selected features.

MCML Authors
Link to Christian Scholbeck

Christian Scholbeck

Statistical Learning & Data Science

Link to Henri Funk

Henri Funk

Statistical Consulting Unit (StaBLab)

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[941]
S. Dandl, M. Becker, B. Bischl, G. Casalicchio and L. Bothmann.
mlr3summary: Concise and interpretable summaries for machine learning models.
Demo Track of the 2nd World Conference on Explainable Artificial Intelligence (xAI 2024). Valletta, Malta, Jul 17-19, 2024. arXiv.
Abstract

This work introduces a novel R package for concise, informative summaries of machine learning models. We take inspiration from the summary function for (generalized) linear models in R, but extend it in several directions: First, our summary function is model-agnostic and provides a unified summary output also for non-parametric machine learning models; Second, the summary output is more extensive and customizable – it comprises information on the dataset, model performance, model complexity, model’s estimated feature importances, feature effects, and fairness metrics;
Third, models are evaluated based on resampling strategies for unbiased estimates of model performances, feature importances, etc. Overall, the clear, structured output should help to enhance and expedite the model selection process, making it a helpful tool for practitioners and researchers alike.

MCML Authors
Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[940]
C. Damke and E. Hüllermeier.
Linear Opinion Pooling for Uncertainty Quantification on Graphs.
40th Conference on Uncertainty in Artificial Intelligence (UAI 2024). Barcelona, Spain, Jul 16-18, 2024. URL. GitHub.
Abstract

We address the problem of uncertainty quantification for graph-structured data, or, more specifically, the problem to quantify the predictive uncertainty in (semi-supervised) node classification. Key questions in this regard concern the distinction between two different types of uncertainty, aleatoric and epistemic, and how to support uncertainty quantification by leveraging the structural information provided by the graph topology. Challenging assumptions and postulates of state-of-the-art methods, we propose a novel approach that represents (epistemic) uncertainty in terms of mixtures of Dirichlet distributions and refers to the established principle of linear opinion pooling for propagating information between neighbored nodes in the graph. The effectiveness of this approach is demonstrated in a series of experiments on a variety of graph-structured datasets.

MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[939]
L. Kook, P. Schiele, C. Kolb, D. Dold, M. Arpogaus, C. Fritz, P. Baumann, P. Kopper, T. Pielok, E. Dorigatti and D. Rügamer.
How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression.
40th Conference on Uncertainty in Artificial Intelligence (UAI 2024). Barcelona, Spain, Jul 16-18, 2024. URL.
Abstract

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.

MCML Authors
Link to Chris Kolb

Chris Kolb

Statistical Learning & Data Science

Link to Tobias Pielok

Tobias Pielok

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[938]
Y. Sale, P. Hofman, T. Löhr, L. Wimmer, T. Nagler and E. Hüllermeier.
Label-wise Aleatoric and Epistemic Uncertainty Quantification.
40th Conference on Uncertainty in Artificial Intelligence (UAI 2024). Barcelona, Spain, Jul 16-18, 2024. URL.
MCML Authors
Link to Paul Hofman

Paul Hofman

Artificial Intelligence & Machine Learning

Link to Lisa Wimmer

Lisa Wimmer

Statistical Learning & Data Science

Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[937]
T. Löhr, M. Ingrisch and E. Hüllermeier.
Towards Aleatoric and Epistemic Uncertainty in Medical Image Classification.
22nd International Conference on Artificial Intelligence in Medicine (AIME 2024). Salt Lake City, UT, USA, Jul 09-12, 2024. DOI.
Abstract

Medical domain applications require a detailed understanding of the decision making process, in particular when data-driven modeling via machine learning is involved, and quantifying uncertainty in the process adds trust and interpretability to predictive models. However, current uncertainty measures in medical imaging are mostly monolithic and do not distinguish between different sources and types of uncertainty. In this paper, we advocate the distinction between so-called aleatoric and epistemic uncertainty in the medical domain and illustrate its potential in clinical decision making for the case of PET/CT image classification.

MCML Authors
Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[936]
S. Dandl, M. Becker, B. Bischl, G. Casalicchio and L. Bothmann.
mlr3summary: Concise and interpretable summaries for machine learning models.
International R User Conference (useR! 2024). Salzburg, Austria, Jul 08-22, 2024. arXiv. GitHub.
Abstract

This work introduces a novel R package for concise, informative summaries of machine learning models. We take inspiration from the summary function for (generalized) linear models in R, but extend it in several directions: First, our summary function is model-agnostic and provides a unified summary output also for non-parametric machine learning models; Second, the summary output is more extensive and customizable – it comprises information on the dataset, model performance, model complexity, model’s estimated feature importances, feature effects, and fairness metrics;
Third, models are evaluated based on resampling strategies for unbiased estimates of model performances, feature importances, etc. Overall, the clear, structured output should help to enhance and expedite the model selection process, making it a helpful tool for practitioners and researchers alike.

MCML Authors
Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[935]
S. Fischer and M. Binder.
mlr3torch - Deep Learning in R.
International R User Conference (useR! 2024). Salzburg, Austria, Jul 08-22, 2024. GitHub.
Abstract

mlr3torch is a deep learning framework for the mlr3 ecosystem built on top of torch. It allows to easily build, train and evaluate deep learning models in a few lines of codes, without needing to worry about low-level details. Off-the-shelf learners are readily available, but custom architectures can be defined by connecting PipeOpTorch operators in an mlr3pipelines::Graph.

MCML Authors
Link to Sebastian Fischer

Sebastian Fischer

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science


[934]
B. Ronval, S. Nijssen and L. Bothmann.
Can generative AI-based data balancing mitigate unfairness issues in Machine Learning?.
3rd European Workshop on Algorithmic Fairness (EWAF 2024). Mainz, Germany, Jul 01-03, 2024. URL.
MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[933]
J. Shin, M. A. Hedderich, B. J. Rey, A. Lucero and A. Oulasvirta.
Understanding Human-AI Workflows for Generating Personas.
ACM Conference on Designing Interactive Systems (DIS 2024). Copenhagen, Denmark, Jul 01-05, 2024. DOI.
Abstract

One barrier to deeper adoption of user-research methods is the amount of labor required to create high-quality representations of collected data. Trained user researchers need to analyze datasets and produce informative summaries pertaining to the original data. While Large Language Models (LLMs) could assist in generating summaries, they are known to hallucinate and produce biased responses. In this paper, we study human–AI workflows that differently delegate subtasks in user research between human experts and LLMs. Studying persona generation as our case, we found that LLMs are not good at capturing key characteristics of user data on their own. Better results are achieved when we leverage human skill in grouping user data by their key characteristics and exploit LLMs for summarizing pre-grouped data into personas. Personas generated via this collaborative approach can be more representative and empathy-evoking than ones generated by human experts or LLMs alone. We also found that LLMs could mimic generated personas and enable interaction with personas, thereby helping user researchers empathize with them. We conclude that LLMs, by facilitating the analysis of user data, may promote widespread application of qualitative methods in user research.

MCML Authors
Link to Michael Hedderich

Michael Hedderich

Dr.

Artificial Intelligence and Computational Linguistics


[932]
M. Windl and S. S. Feger.
Designing Interactive Privacy Labels for Advanced Smart Home Device Configuration Options.
ACM Conference on Designing Interactive Systems (DIS 2024). Copenhagen, Denmark, Jul 01-05, 2024. DOI.
Abstract

Labels inform smart home users about the privacy of devices before purchase and during use. Yet, current privacy labels fail to fully reflect the impact of advanced device configuration options like sensor state control. Based on the successful implementation of related privacy and security labels, we designed extended static and interactive labels that reflect sensor states and device connectivity. We first did expert interviews (N=10) that informed the final label design. Second, we ran an online survey (N=160) to assess the interpretation and usability of the novel interactive privacy label. Lastly, we conducted a second survey (N=120) to investigate how well our interactive labels educate users about sensor configuration. We found that most participants successfully used the interactive label and retrieved sensor information more efficiently and correctly. We discuss our findings in the context of a potential shift in label use toward control and use-case-based interaction.

MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media


[931]
W. Qiu, Y. Feng, Y. Li, Y. Chang, K. Qian, B. Hu, Y. Yamamoto and B. W. Schuller.
Fed-MStacking: Heterogeneous Federated Learning With Stacking Misaligned Labels for Abnormal Heart Sound Detection.
IEEE Journal of Biomedical and Health Informatics 28.9 (Jul. 2024). DOI.
Abstract

Ubiquitous sensing has been widely applied in smart healthcare, providing an opportunity for intelligent heart sound auscultation. However, smart devices contain sensitive information, raising user privacy concerns. To this end, federated learning (FL) has been adopted as an effective solution, enabling decentralised learning without data sharing, thus preserving data privacy in the Internet of Health Things (IoHT). Nevertheless, traditional FL requires the same architectural models to be trained across local clients and global servers, leading to a lack of model heterogeneity and client personalisation. For medical institutions with private data clients, this study proposes Fed-MStacking, a heterogeneous FL framework that incorporates a stacking ensemble learning strategy to support clients in building their own models. The secondary objective of this study is to address scenarios involving local clients with data characterised by inconsistent labelling. Specifically, the local client contains only one case type, and the data cannot be shared within or outside the institution. To train a global multi-class classifier, we aggregate missing class information from all clients at each institution and build meta-data, which then participates in FL training via a meta-learner. We apply the proposed framework to a multi-institutional heart sound database. The experiments utilise random forests (RFs), feedforward neural networks (FNNs), and convolutional neural networks (CNNs) as base classifiers. The results show that the heterogeneous stacking of local models performs better compared to homogeneous stacking.

MCML Authors
Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[930]
F. Fan, Y. Shi and X. Zhu.
Land Cover Classification From Sentinel-2 Images With Quantum-Classical Convolutional Neural Networks.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 17 (Jul. 2024). DOI.
Abstract

Exploiting machine learning techniques to automatically classify multispectral remote sensing imagery plays a significant role in deriving changes on the Earth’s surface. However, the computation power required to manage large Earth observation data and apply sophisticated machine learning models for this analysis purpose has become an intractable bottleneck. Leveraging quantum computing provides a possibility to tackle this challenge in the future. This article focuses on land cover classification by analyzing Sentinel-2 images with quantum computing. Two hybrid quantum-classical deep learning frameworks are proposed. Both models exploit quantum computing to extract features efficiently from multispectral images and classical computing for final classification. As proof of concept, numerical simulation results on the LCZ42 dataset through the TensorFlow Quantum platform verify our models’ validity. The experiments indicate that our models can extract features more effectively compared with their classical counterparts, specifically, the convolutional neural network (CNN) model. Our models demonstrated improvements, with an average test accuracy increase of 4.5% and 3.3%, respectively, in comparison to the CNN model. In addition, our proposed models exhibit better transferability and robustness than CNN models.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[929]
Z. Xiong, S. Chen, Y. Shi and X. Zhu.
Self-Supervised Pretraining With Monocular Height Estimation for Semantic Segmentation.
IEEE Transactions on Geoscience and Remote Sensing 62 (Jul. 2024). DOI. GitHub.
Abstract

Monocular height estimation (MHE) is key for generating 3-D city models, essential for swift disaster response. Moving beyond the traditional focus on performance enhancement, our study breaks new ground by probing the interpretability of MHE networks. We have pioneeringly discovered that neurons within MHE models demonstrate selectivity for both height and semantic classes. This insight sheds light on the complex inner workings of MHE models and inspires innovative strategies for leveraging elevation data more effectively. Informed by this insight, we propose a pioneering framework that employs MHE as a self-supervised pretraining method for remote sensing (RS) imagery. This approach significantly enhances the performance of semantic segmentation tasks. Furthermore, we develop a disentangled latent transformer (DLT) module that leverages explainable deep representations from pretrained MHE networks for unsupervised semantic segmentation. Our method demonstrates the significant potential of MHE tasks in developing foundation models for sophisticated pixel-level semantic analyses. Additionally, we present a new dataset designed to benchmark the performance of both semantic segmentation and height estimation tasks.

MCML Authors
Link to Sining Chen

Sining Chen

Data Science in Earth Observation

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[928]
W. Yu, X. Zhang, S. Das, X. Zhu and P. Ghamisi.
MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification.
IEEE Transactions on Geoscience and Remote Sensing 62 (Jul. 2024). DOI. GitHub.
Abstract

Change detection (CD) from remote sensing (RS) images using deep learning has been widely investigated in the literature. It is typically regarded as a pixelwise labeling task that aims to classify each pixel as changed or unchanged. Although per-pixel classification networks in encoder-decoder structures have shown dominance, they still suffer from imprecise boundaries and incomplete object delineation at various scenes. For high-resolution RS images, partly or totally changed objects are more worthy of attention rather than a single pixel. Therefore, we revisit the CD task from the mask prediction and classification perspective and propose mask classification-based CD (MaskCD) to detect changed areas by adaptively generating categorized masks from input image pairs. Specifically, it utilizes a cross-level change representation perceiver (CLCRP) to learn multiscale change-aware representations and capture spatiotemporal relations from encoded features by exploiting deformable multihead self-attention (DeformMHSA). Subsequently, a masked cross-attention-based detection transformers (MCA-DETRs) decoder is developed to accurately locate and identify changed objects based on masked cross-attention and self-attention (SA) mechanisms. It reconstructs the desired changed objects by decoding the pixelwise representations into learnable mask proposals and making final predictions from these candidates. Experimental results on five benchmark datasets demonstrate the proposed approach outperforms other state-of-the-art models.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[927]
J. Beddrich, E. Chenchene, M. Fornasier, H. Huang and B. Wohlmuth.
Constrained Consensus-Based Optimization and Numerical Heuristics for the Few Particle Regime.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Consensus-based optimization (CBO) is a versatile multi-particle optimization method for performing nonconvex and nonsmooth global optimizations in high dimensions. Proofs of global convergence in probability have been achieved for a broad class of objective functions in unconstrained optimizations. In this work we adapt the algorithm for solving constrained optimizations on compact and unbounded domains with boundary by leveraging emerging reflective boundary conditions. In particular, we close a relevant gap in the literature by providing a global convergence proof for the many-particle regime comprehensive of convergence rates. On the one hand, for the sake of minimizing running cost, it is desirable to keep the number of particles small. On the other hand, reducing the number of particles implies a diminished capability of exploration of the algorithm. Hence numerical heuristics are needed to ensure convergence of CBO in the few-particle regime. In this work, we also significantly improve the convergence and complexity of CBO by utilizing an adaptive region control mechanism and by choosing geometry-specific random noise. In particular, by combining a hierarchical noise structure with a multigrid finite element method, we are able to compute global minimizers for a constrained p-Allen-Cahn problem with obstacles, a very challenging variational problem.

MCML Authors
Link to Massimo Fornasier

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis


[926]
F. Bongratz, V. Golkov, L. Mautner, L. Della Libera, F. Heetmeyer, F. Czaja, J. Rodemann and D. Cremers.
How to Choose a Reinforcement-Learning Algorithm.
Preprint at arXiv (Jul. 2024). arXiv. GitHub.
Abstract

The field of reinforcement learning offers a large variety of concepts and methods to tackle sequential decision-making problems. This variety has become so large that choosing an algorithm for a task at hand can be challenging. In this work, we streamline the process of choosing reinforcement-learning algorithms and action-distribution families. We provide a structured overview of existing methods and their properties, as well as guidelines for when to choose which methods.

MCML Authors
Link to Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Radiology

Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[925]
H. Chen, D. Krompass, J. Gu and V. Tresp.
FedPop: Federated Population-based Hyperparameter Tuning.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their ’training-after-tuning’ framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both the client and server sides. Compared with prior tuning methods, FedPop employs an online ’tuning-while-training’ framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets, including full-sized Non-IID ImageNet-1K, demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP-tuning methods in FL.

MCML Authors
Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[924]
M. Dani, M. J. Prakash, Z. Akata and S. Liebe.
SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Large Language Models have shown promising results in their ability to encode general medical knowledge in standard medical question-answering datasets. However, their potential application in clinical practice requires evaluation in domain-specific tasks, where benchmarks are largely missing. In this study semioLLM, we test the ability of state-of-the-art LLMs (GPT-3.5, GPT-4, Mixtral 8x7B, and Qwen-72chat) to leverage their internal knowledge and reasoning for epilepsy diagnosis. Specifically, we obtain likelihood estimates linking unstructured text descriptions of seizures to seizure-generating brain regions, using an annotated clinical database containing 1269 entries. We evaluate the LLM’s performance, confidence, reasoning, and citation abilities in comparison to clinical evaluation. Models achieve above-chance classification performance with prompt engineering significantly improving their outcome, with some models achieving close-to-clinical performance and reasoning. However, our analyses also reveal significant pitfalls with several models being overly confident while showing poor performance, as well as exhibiting citation errors and hallucinations. In summary, our work provides the first extensive benchmark comparing current SOTA LLMs in the medical domain of epilepsy and highlights their ability to leverage unstructured texts from patients’ medical history to aid diagnostic processes in health care.

MCML Authors
Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[923]
D. Frauen, K. Hess and S. Feuerriegel.
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Estimating heterogeneous treatment effects (HTEs) over time is crucial in many disciplines such as personalized medicine. For example, electronic health records are commonly collected over several time periods and then used to personalize treatment decisions. Existing works for this task have mostly focused on model-based learners (i.e., learners that adapt specific machine-learning models). In contrast, model-agnostic learners – so-called meta-learners – are largely unexplored. In our paper, we propose several meta-learners that are model-agnostic and thus can be used in combination with arbitrary machine learning models (e.g., transformers) to estimate HTEs over time. Here, our focus is on learners that can be obtained via weighted pseudo-outcome regressions, which allows for efficient estimation by targeting the treatment effect directly. We then provide a comprehensive theoretical analysis that characterizes the different learners and that allows us to offer insights into when specific learners are preferable. Finally, we confirm our theoretical insights through numerical experiments. In sum, while meta-learners are already state-of-the-art for the static setting, we are the first to propose a comprehensive set of meta-learners for estimating HTEs in the time-varying setting.

MCML Authors
Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[922]
D. Geissler, A. Maarouf and S. Feuerriegel.
Analyzing User Characteristics of Hate Speech Spreaders on Social Media.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Hate speech on social media threatens the mental and physical well-being of individuals and contributes to real-world violence. Resharing is an important driver behind the spread of hate speech on social media. Yet, little is known about who reshares hate speech and what their characteristics are. In this paper, we analyze the role of user characteristics in hate speech resharing across different types of hate speech (e.g., political hate). For this, we proceed as follows: First, we cluster hate speech posts using large language models to identify different types of hate speech. Then we model the effects of user attributes on users’ probability to reshare hate speech using an explainable machine learning model. To do so, we apply debiasing to control for selection bias in our observational social media data and further control for the latent vulnerability of users to hate speech. We find that, all else equal, users with fewer followers, fewer friends, fewer posts, and older accounts share more hate speech. This shows that users with little social influence tend to share more hate speech. Further, we find substantial heterogeneity across different types of hate speech. For example, racist and misogynistic hate is spread mostly by users with little social influence. In contrast, political anti-Trump and anti-right-wing hate is reshared by users with larger social influence. Overall, understanding the factors that drive users to share hate speech is crucial for detecting individuals at risk of engaging in harmful behavior and for designing effective mitigation strategies.

MCML Authors
Link to Dominique Geißler

Dominique Geißler

Artificial Intelligence in Management

Link to Abdurahman Maarouf

Abdurahman Maarouf

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[921]
M. Gerczuk, S. Amiriparian, J. Lutz, W. Strube, I. Papazova, A. Hasan and B. W. Schuller.
Exploring Gender-Specific Speech Patterns in Automatic Suicide Risk Assessment.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

In emergency medicine, timely intervention for patients at risk of suicide is often hindered by delayed access to specialised psychiatric care. To bridge this gap, we introduce a speech-based approach for automatic suicide risk assessment. Our study involves a novel dataset comprising speech recordings of 20 patients who read neutral texts. We extract four speech representations encompassing interpretable and deep features. Further, we explore the impact of gender-based modelling and phrase-level normalisation. By applying gender-exclusive modelling, features extracted from an emotion fine-tuned wav2vec2.0 model can be utilised to discriminate high- from low- suicide risk with a balanced accuracy of 81%. Finally, our analysis reveals a discrepancy in the relationship of speech characteristics and suicide risk between female and male subjects. For men in our dataset, suicide risk increases together with agitation while voice characteristics of female subjects point the other way.

MCML Authors
Link to Maurice Gerczuk

Maurice Gerczuk

Health Informatics

Link to Shahin Amiriparian

Shahin Amiriparian

Dr.

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[920]
F. Hoppe, C. M. Verdun, H. Laus, F. Krahmer and H. Rauhut.
Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.

MCML Authors
Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member

Link to Hannah Laus

Hannah Laus

Optimization & Data Analysis

Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Holger Rauhut

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence


[919]
T. Hummel, S. Karthik, M.-I. Georgescu and Z. Akata.
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval.
Preprint at arXiv (Jul. 2024). arXiv. GitHub.
Abstract

In Composed Video Retrieval, a video and a textual description which modifies the video content are provided as inputs to the model. The aim is to retrieve the relevant video with the modified content from a database of videos. In this challenging task, the first step is to acquire large-scale training datasets and collect high-quality benchmarks for evaluation. In this work, we introduce EgoCVR, a new evaluation benchmark for fine-grained Composed Video Retrieval using large-scale egocentric video datasets. EgoCVR consists of 2,295 queries that specifically focus on high-quality temporal video understanding. We find that existing Composed Video Retrieval frameworks do not achieve the necessary high-quality temporal video understanding for this task. To address this shortcoming, we adapt a simple training-free method, propose a generic re-ranking framework for Composed Video Retrieval, and demonstrate that this achieves strong results on EgoCVR.

MCML Authors
Link to Shyamgopal Karthik

Shyamgopal Karthik

Interpretable and Reliable Machine Learning

Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[918]
D. Köhler, D. Rügamer and M. Schmid.
Achieving interpretable machine learning by functional decomposition of black-box models into explainable predictor effects.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Machine learning (ML) has seen significant growth in both popularity and importance. The high prediction accuracy of ML models is often achieved through complex black-box architectures that are difficult to interpret. This interpretability problem has been hindering the use of ML in fields like medicine, ecology and insurance, where an understanding of the inner workings of the model is paramount to ensure user acceptance and fairness. The need for interpretable ML models has boosted research in the field of interpretable machine learning (IML). Here we propose a novel approach for the functional decomposition of black-box predictions, which is considered a core concept of IML. The idea of our method is to replace the prediction function by a surrogate model consisting of simpler subfunctions. Similar to additive regression models, these functions provide insights into the direction and strength of the main feature contributions and their interactions. Our method is based on a novel concept termed stacked orthogonality, which ensures that the main effects capture as much functional behavior as possible and do not contain information explained by higher-order interactions. Unlike earlier functional IML approaches, it is neither affected by extrapolation nor by hidden feature interactions. To compute the subfunctions, we propose an algorithm based on neural additive modeling and an efficient post-hoc orthogonalization procedure.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[917]
P. Lin, A. F. T. Martins and H. Schütze.
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, and general-purpose tasks, e.g., text classification. Building upon these findings, our comprehensive study aims to identify the most effective strategies for leveraging parallel corpora. We investigate the impact of parallel corpora quality and quantity, training objectives, and model size on the performance of multilingual large language models enhanced with parallel corpora across diverse languages and tasks. Our analysis reveals several key insights: (i) filtering noisy translations is essential for effectively exploiting parallel corpora, while language identification and short sentence filtering have little effect; (ii) even a corpus containing just 10K parallel sentences can yield results comparable to those obtained from much larger datasets; (iii) employing only the machine translation objective yields the best results among various training objectives and their combinations; (iv) larger multilingual language models benefit more from parallel corpora than smaller models due to their stronger capacity for cross-task transfer. Our study offers valuable insights into the optimal utilization of parallel corpora to enhance multilingual large language models, extending the generalizability of previous findings from limited languages and tasks to a broader range of scenarios.

MCML Authors
Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[916]
C. Ma, Y. Liu, H. Ye and H. Schütze.
Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Decoder-only large language models (LLMs) excel in high-resource languages across various tasks through few-shot or even zero-shot in-context learning (ICL). However, their performance often does not transfer well to low-resource languages, especially those written in non-Latin scripts. Inspired by recent work that leverages transliteration in encoder-only models, we investigate whether transliteration is also effective in improving LLMs’ performance for low-resource languages written in non-Latin scripts. To this end, we propose three prompt templates, where the target-language text is represented in (1) its original script, (2) Latin script, or (3) both. We apply these methods to several representative LLMs of different sizes on various tasks including text classification and sequential labeling. Our findings show that the effectiveness of transliteration varies by task type and model size. For instance, all models benefit from transliterations for sequential labeling (with increases of up to 25%).

MCML Authors
Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[915]
M. Schröder, D. Frauen, J. Schweisthal, K. Heß, V. Melnychuk and S. Feuerriegel.
Conformal Prediction for Causal Effects of Continuous Treatments.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Uncertainty quantification of causal effects is crucial for safety-critical applications such as personalized medicine. A powerful approach for this is conformal prediction, which has several practical benefits due to model-agnostic finite-sample guarantees. Yet, existing methods for conformal prediction of causal effects are limited to binary/discrete treatments and make highly restrictive assumptions such as known propensity scores. In this work, we provide a novel conformal prediction method for potential outcomes of continuous treatments. We account for the additional uncertainty introduced through propensity estimation so that our conformal prediction intervals are valid even if the propensity score is unknown. Our contributions are three-fold: (1) We derive finite-sample prediction intervals for potential outcomes of continuous treatments. (2) We provide an algorithm for calculating the derived intervals. (3) We demonstrate the effectiveness of the conformal prediction intervals in experiments on synthetic and real-world datasets. To the best of our knowledge, we are the first to propose conformal prediction for continuous treatments when the propensity score is unknown and must be estimated from data.

MCML Authors
Link to Maresa Schröder

Maresa Schröder

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Jonas Schweisthal

Jonas Schweisthal

Artificial Intelligence in Management

Link to Konstantin Heß

Konstantin Heß

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[914]
F. Sergeev, P. Malsot, G. Rätsch and V. Fortuin.
Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information.
Preprint at arXiv (Jul. 2024). arXiv.
Abstract

Knowing which features of a multivariate time series to measure and when is a key task in medicine, wearables, and robotics. Better acquisition policies can reduce costs while maintaining or even improving the performance of downstream predictors. Inspired by the maximization of conditional mutual information, we propose an approach to train acquirers end-to-end using only the downstream loss. We show that our method outperforms random acquisition policy, matches a model with an unrestrained budget, but does not yet overtake a static acquisition strategy. We highlight the assumptions and outline avenues for future work.

MCML Authors
Link to Vincent Fortuin

Vincent Fortuin

Dr.

Associate

Bayesian Deep Learning


[913]
L. von der Heyde, A.-C. Haensch and A. Wenz.
Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion.
Preprint at arXiv (Jul. 2024). arXiv.
MCML Authors
Link to Leah von der Heyde

Leah von der Heyde

Social Data Science and AI Lab


[912]
Y. Xia, R. Ding, Z. Qin, G. Zhan, K. Zhou, L. Yang, H. Dong and D. Cremers.
TARGO: Benchmarking Target-driven Object Grasping under Occlusions.
Preprint at arXiv (Jul. 2024). arXiv. GitHub.
Abstract

Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object’s grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contributions: 1) We are the first to study the occlusion level of grasping. 2) We set up an evaluation benchmark consisting of large-scale synthetic data and part of real-world data, and we evaluated five grasp models and found that even the current SOTA model suffers when the occlusion level increases, leaving grasping under occlusion still a challenge. 3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world. 4) We further propose a transformer-based grasping model involving a shape completion module, termed TARGO-Net, which performs most robustly as occlusion increases.

MCML Authors
Link to Yan Xia

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[911]
M. Fornasier, T. Klock and K. Riedl.
Consensus-Based Optimization Methods Converge Globally.
SIAM Journal on Optimization 34.3 (Jul. 2024). DOI.
Abstract

In this paper we study consensus-based optimization (CBO), which is a multiagent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. Based on an experimentally supported intuition that, on average, CBO performs a gradient descent of the squared Euclidean distance to the global minimizer, we devise a novel technique for proving the convergence to the global minimizer in mean-field law for a rich class of objective functions. The result unveils internal mechanisms of CBO that are responsible for the success of the method. In particular, we prove that CBO performs a convexification of a large class of optimization problems as the number of optimizing agents goes to infinity. Furthermore, we improve prior analyses by requiring mild assumptions about the initialization of the method and by covering objectives that are merely locally Lipschitz continuous. As a core component of this analysis, we establish a quantitative nonasymptotic Laplace principle, which may be of independent interest. From the result of CBO convergence in mean-field law, it becomes apparent that the hardness of any global optimization problem is necessarily encoded in the rate of the mean-field approximation, for which we provide a novel probabilistic quantitative estimate. The combination of these results allows us to obtain probabilistic global convergence guarantees of the numerical CBO method.

MCML Authors
Link to Massimo Fornasier

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis

Link to Konstantin Riedl

Konstantin Riedl

Applied Numerical Analysis


[910]
F. Quinzan, C. Casolo, K. Muandet, Y. Luo and N. Kilbertus.
Learning Counterfactually Invariant Predictors.
Transactions on Machine Learning Research (Jul. 2024). URL.
Abstract

Notions of counterfactual invariance (CI) have proven essential for predictors that are fair, robust, and generalizable in the real world. We propose graphical criteria that yield a sufficient condition for a predictor to be counterfactually invariant in terms of a conditional independence in the observational distribution. In order to learn such predictors, we propose a model-agnostic framework, called Counterfactually Invariant Prediction (CIP), building on the Hilbert-Schmidt Conditional Independence Criterion (HSCIC), a kernel-based conditional dependence measure. Our experimental results demonstrate the effectiveness of CIP in enforcing counterfactual invariance across various simulated and real-world datasets including scalar and multi-variate settings.

MCML Authors
Link to Cecilia Casolo

Cecilia Casolo

Ethics in Systems Design and Machine Learning

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[909]
F. Karl, J. Thomas, J. Elstner, R. Gross and B. Bischl.
Automated Machine Learning.
Unlocking Artificial Intelligence (Jul. 2024). DOI.
Abstract

In the past few years automated machine learning (AutoML) has gained a lot of traction in the data science and machine learning community. AutoML aims at reducing the partly repetitive work of data scientists and enabling domain experts to construct machine learning pipelines without extensive knowledge in data science. This chapter presents a comprehensive review of the current leading AutoML methods and sets AutoML in an industrial context. To this extent we present the typical components of an AutoML system, give an overview over the stateof-the-art and highlight challenges to industrial application by presenting several important topics such as AutoML for time series data, AutoML in unsupervised settings, AutoML with multiple evaluation criteria, or interactive human-in-the-loop methods. Finally, the connection to Neural Architecture Search (NAS) is presented and a brief review with special emphasis on hardware-aware NAS is given.

MCML Authors
Link to Florian Karl

Florian Karl

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[908]
C. M. Verdun, O. Melnyk, F. Krahmer and P. Jung.
Fast, blind, and accurate: Tuning-free sparse regression with global linear convergence.
37th Annual Conference on Learning Theory (COLT 2024). Edmonton, Canada, Jun 30-Jul 03, 2024. URL.
MCML Authors
Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member

Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis


[907]
F. Fan, Y. Shi and X. Zhu.
Urban Land Cover Classification with Efficient Hybrid Quantum Machine Learning Model.
IEEE Congress on Evolutionary Computation (CEC 2024). Yokohama, Japan, Jun 30-Jul 05, 2024. DOI.
Abstract

Urban land cover classification aims to derive crucial information from earth observation data and categorize it into specific land uses. To achieve accurate classification, sophisticated machine learning models trained with large earth observation data are employed, but the required computation power has become a bottleneck. Quantum computing might tackle this challenge in the future. However, representing images into quantum states for analysis with quantum computing is challenging due to the high demand for quantum resources. To tackle this challenge, we propose a hybrid quantum neural network that can effectively represent and classify remote sensing imagery with reduced quantum resources. Our model was evaluated on the Local Climate Zone (LCZ)-based land cover classification task using the TensorFlow Quantum platform, and the experimental results indicate its validity for accurate urban land cover classification.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[906]
C. Naumzik, A. Kongsted, W. Vach and S. Feuerriegel.
Data-driven subgrouping of patient trajectories with chronic diseases: Evidence from low back pain.
5th AHLI Conference on Health, Inference, and Learning (CHIL 2024) . New York City, NY, USA, Jun 27-28, 2024. URL.
MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[905]
C. Cipriani, A. Scagliotti and T. Wöhrer.
A minimax optimal control approach for robust neural ODEs.
European Control Conference (ECC 2024). Stockholm, Sweden, Jun 25-28, 2024. DOI.
Abstract

In this paper, we address the adversarial training of neural ODEs from a robust control perspective. This is an alternative to the classical training via empirical risk minimization, and it is widely used to enforce reliable outcomes for input perturbations. Neural ODEs allow the interpretation of deep neural networks as discretizations of control systems, unlocking powerful tools from control theory for the development and the understanding of machine learning. In this specific case, we formulate the adversarial training with perturbed data as a minimax optimal control problem, for which we derive first order optimality conditions in the form of Pontryagin’s Maximum Principle. We provide a novel interpretation of robust training leading to an alternative weighted technique, which we test on a low-dimensional classification task.

MCML Authors
Link to Cristina Cipriani

Cristina Cipriani

Applied Numerical Analysis

Link to Alessandro Scagliotti

Alessandro Scagliotti

Applied Numerical Analysis


[904]
P. Piccirilli, A. Fraser and S. Schulte im Walde.
VOLIMET: A Parallel Corpus of Literal and Metaphorical Verb-Object Pairs for English–German and English–French.
13th Joint Conference on Lexical and Computational Semantics (*SEM 2024) co-located with NAACL 2024. Mexico City, Mexico, Jun 20-21, 2024. DOI.
Abstract

The interplay of cultural and linguistic elements that characterizes metaphorical language poses a substantial challenge for both human comprehension and machine processing. This challenge goes beyond monolingual settings and becomes particularly complex in translation, even more so in automatic translation. We present VOLIMET, a corpus of 2,916 parallel sentences containing gold standard alignments of metaphorical verb-object pairs and their literal paraphrases, e.g., tackle/address question, from English to German and French. On the one hand, the parallel nature of our corpus enables us to explore monolingual patterns for metaphorical vs. literal uses in English. On the other hand, we investigate different aspects of cross-lingual translations into German and French and the extent to which metaphoricity and literalness in the source language are transferred to the target languages. Monolingually, our findings reveal clear preferences in using metaphorical or literal uses of verb-object pairs. Cross-lingually, we observe a rich variability in translations as well as different behaviors for our two target languages.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[903]
H. Chen, J. Büssing, D. Rügamer and E. Nie.
Leveraging (Sentence) Transformer Models with Contrastive Learning for Identifying Machine-Generated Text.
18th International Workshop on Semantic Evaluation (SemEval 2024) co-located with the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 20-21, 2024. URL.
Abstract

This paper outlines our approach to SemEval-2024 Task 8 (Subtask B), which focuses on discerning machine-generated text from human-written content, while also identifying the text sources, i.e., from which Large Language Model (LLM) the target text is generated. Our detection system is built upon Transformer-based techniques, leveraging various pre-trained language models (PLMs), including sentence transformer models. Additionally, we incorporate Contrastive Learning (CL) into the classifier to improve the detecting capabilities and employ Data Augmentation methods. Ultimately, our system achieves a peak accuracy of 76.96% on the test set of the competition, configured using a sentence transformer model integrated with CL methodology.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning


[902]
S. Zhou, H. Shan, B. Plank and R. Litschko.
MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness.
18th International Workshop on Semantic Evaluation (SemEval 2024) co-located with the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 20-21, 2024. URL.
Abstract

This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences from the same languages. For cross-lingual approach we developed a set of linguistics-inspired models trained with several task-specific strategies. We 1) utilize language vectors for selection of donor languages; 2) investigate the multi-source approach for training; 3) use transliteration of non-latin script to study impact of ‘script gap’; 4) opt machine translation for data augmentation. We additionally compare the performance of XLM-RoBERTa and Furina with the same training strategy. Our submission achieved the first place in the C8 (Kinyarwanda) test.

MCML Authors
Link to Shijia Zhou

Shijia Zhou

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

Link to Robert Litschko

Robert Litschko

Artificial Intelligence and Computational Linguistics


[901]
M. Brahimi, B. Haefner, Z. Ye, B. Goldluecke and D. Cremers.
Sparse Views, Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
Abstract

Neural approaches have shown a significant progress on camera-based reconstruction. But they require either a fairly dense sampling of the viewing sphere, or pre-training on an existing dataset, thereby limiting their generalizability. In contrast, photometric stereo (PS) approaches have shown great potential for achieving high-quality reconstruction under sparse viewpoints. Yet, they are impractical because they typically require tedious laboratory conditions, are restricted to dark rooms, and often multi-staged, making them subject to accumulated errors. To address these shortcomings, we propose an end-to-end uncalibrated multi-view PS frameworkfor reconstructing high-resolution shapes acquiredfrom sparse viewpoints in a real-world environment. We relax the dark room assumption, and allow a combination of static ambient lighting and dynamic near LED lighting, thereby enabling easy data capture outside the lab. Experimental validation confirms that it outperforms existing baseline approaches in the regime of sparse viewpoints by a large margin. This allows to bring high-accuracy 3D reconstruction from the dark room to the real world, while maintaining a reasonable data capture complexity.

MCML Authors
Link to Zhenzhang Ye

Zhenzhang Ye

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[900]
Y. Chen, Y. Di, G. Zhai, F. Manhardt, C. Zhang, R. Zhang, F. Tombari, N. Navab and B. Busam.
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
Abstract

Category-level object pose estimation, aiming to predict the 6D pose and 3D size of objects from known categories, typically struggles with large intra-class shape variation. Existing works utilizing mean shapes often fall short of cap-turing this variation. To address this issue, we present Sec-ondPose, a novel approach integrating object-specific ge-ometric features with semantic category priors from DI-NOv2. Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object represen-tation under SE(3) transformations, facilitating the map-ping from camera space to the pre-defined canonical space, thus further enhancing pose estimation. Extensive exper-iments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% leap forward over the state-of-the-art. Moreover, on a more complex dataset HouseCat6D which provides photometrically challenging objects, SecondPose still surpasses other competitors by a large margin.

MCML Authors
Link to Guangyao Zhai

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Benjamin Busam

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[899]
V. Ehm, M. Gao, P. Eisenberger, D. Cremers and F. Bernard.
Partial-to-Partial Shape Matching with Geometric Consistency.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI. GitHub.
Abstract

Finding correspondences between 3D shapes is an important and long-standing problem in computer vision, graphics and beyond. A prominent challenge are partial-to-partial shape matching settings, which occur when the shapes to match are only observed incompletely (e.g. from 3D scanning). Although partial-to-partial matching is a highly relevant setting in practice, it is rarely explored. Our work bridges the gap between existing (rather artificial) 3D full shape matching and partial-to-partial real-world set-tings by exploiting geometric consistency as a strong constraint. We demonstrate that it is indeed possible to solve this challenging problem in a variety of settings. For the first time, we achieve geometric consistency for partial-to-partial matching, which is realized by a novel integer non-linear program formalism building on triangle prod-uct spaces, along with a new pruning algorithm based on linear integer programming. Further, we generate a new inter-class dataset for partial-to-partial shape-matching. We show that our method outperforms current SOTA meth-ods on both an established intra-class dataset and our novel inter-class dataset.

MCML Authors
Link to Viktoria Ehm

Viktoria Ehm

Computer Vision & Artificial Intelligence

Link to Maolin Gao

Maolin Gao

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[898]
M. Ghahremani, M. Khateri, B. Jian, B. Wiestler, E. Adeli and C. Wachinger.
H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
MCML Authors
Link to Morteza Ghahremani

Morteza Ghahremani

Dr.

Artificial Intelligence in Radiology

Link to Bailiang Jian

Bailiang Jian

Artificial Intelligence in Radiology

Link to Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[897]
K. Han, D. Muhle, F. Wimbauer and D. Cremers.
Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
MCML Authors
Link to Dominik Muhle

Dominik Muhle

Computer Vision & Artificial Intelligence

Link to Felix Wimbauer

Felix Wimbauer

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[896]
J. Huang, H. Yu, K.-T. Yu, N. Navab, S. Ilic and B. Busam.
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
MCML Authors
Link to Junwen Huang

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Benjamin Busam

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[895]
H. Jung, S.-C. Wu, P. Ruhkamp, G. Zhai, H. Schieber, G. Rizzoli, P. Wang, H. Zhao, L. Garattoni, D. Roth, S. Meier, N. Navab and B. Busam.
HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
MCML Authors
Link to Guangyao Zhai

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Benjamin Busam

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[894]
H. Li, C. Shen, P. Torr, V. Tresp and J. Gu.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI. GitHub.
Abstract

Diffusion-based models have gained significant popularity for text-to-image generation due to their exceptional image-generation capabilities. A risk with these models is the potential generation of inappropriate content, such as biased or harmful images. However, the underlying reasons for generating such undesired content from the perspective of the diffusion model’s internal representation remain unclear. Previous work interprets vectors in an interpretable latent space of diffusion models as semantic concepts. However, existing approaches cannot discover directions for arbitrary concepts, such as those related to inappropriate concepts. In this work, we propose a novel self-supervised approach to find interpretable latent directions for a given concept. With the discovered vectors, we further propose a simple approach to mitigate inappropriate generation. Extensive experiments have been conducted to verify the effectiveness of our mitigation approach, namely, for fair generation, safe generation, and responsible text-enhancing generation.

MCML Authors
Link to Hang Li

Hang Li

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[893]
S. Weber, T. Dagès, M. Gao and D. Cremers.
Finsler-Laplace-Beltrami Operators with Application to Shape Analysis.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
MCML Authors
Link to Simon Weber

Simon Weber

Computer Vision & Artificial Intelligence

Link to Maolin Gao

Maolin Gao

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[892]
S. Weber, B. Zöngür, N. Araslanov and D. Cremers.
Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
MCML Authors
Link to Simon Weber

Simon Weber

Computer Vision & Artificial Intelligence

Link to Nikita Araslanov

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[891]
F. Wimbauer, B. Wu, E. Schoenfeld, X. Dai, J. Hou, Z. He, A. Sanakoyeu, P. Zhang, S. Tsai, J. Kohler, C. Rupprecht, D. Cremers, P. Vajda and J. Wang.
Cache Me if You Can: Accelerating Diffusion Models through Block Caching.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
MCML Authors
Link to Felix Wimbauer

Felix Wimbauer

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[890]
Y. Xia, L. Shi, Z. Ding, J. F. Henriques and D. Cremers.
Text2Loc: 3D Point Cloud Localization from Natural Language.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI. GitHub.
MCML Authors
Link to Yan Xia

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[889]
I. Obadic, A. Levering, L. Pennig, D. Oliveira, D. Marcos and X. Zhu.
Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes.
Workshop at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. DOI.
Abstract

Predicting socioeconomic indicators from satellite imagery with deep learning has become an increasingly popular research direction. Post-hoc concept-based explanations can be an important step towards broader adoption of these models in policy-making as they enable the interpretation of socioeconomic outcomes based on visual concepts that are intuitive to humans. In this paper, we study the interplay between representation learning using an additional task-specific contrastive loss and post-hoc concept explainability for socioeconomic studies. Our results on two different geographical locations and tasks indicate that the task-specific pretraining imposes a continuous ordering of the latent space embeddings according to the socioeconomic outcomes. This improves the model’s interpretability as it enables the latent space of the model to associate urban concepts with continuous intervals of socioeconomic outcomes. Further, we illustrate how analyzing the model’s conceptual sensitivity for the intervals of socioeconomic outcomes can shed light on new insights for urban studies.

MCML Authors
Link to Ivica Obadic

Ivica Obadic

Data Science in Earth Observation

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[888]
C. Reich, O. Hahn, D. Cremers, S. Roth and B. Debnath.
A Perspective on Deep Vision Performance with Standard Image and Video Codecs.
Workshop at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Seattle, WA, USA, Jun 17-21, 2024. PDF.
MCML Authors
Link to Christoph Reich

Christoph Reich

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[887]
H. Ye, Y. Liu, C. Ma and H. Schütze.
MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer.
5th Workshop on Insights from Negative Results in NLP at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL.
Abstract

Transformer-based pre-trained language models (PLMs) have achieved remarkable performance in various natural language processing (NLP) tasks. However, pre-training such models can take considerable resources that are almost only available to high-resource languages. On the contrary, static word embeddings are easier to train in terms of computing resources and the amount of data required. In this paper, we introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer), a novel and challenging task that is especially relevant to low-resource languages for which static word embeddings are available. To tackle the task, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language. In this way, we can train the PLM on source-language training data and perform zero-shot transfer to the target language by simply swapping the embedding layer. However, through extensive experiments on two classification datasets, we show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines. In this paper, we attempt to explain this negative result and provide several thoughts on possible improvement.

MCML Authors
Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[886]
P. Resnik, B. Ma, A. Hoyle, P. Goel, R. Sarkar, M. Gearing, A.-C. Haensch and F. Kreuter.
TOPCAT: Topic-Oriented Protocol for Content Analysis of Text – A Preliminary Study.
6th Workshop on Natural Language Processing and Computational Social Science (NLP+CSS 2024) at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL.
Abstract

Identifying constructs in text data is a labor-intensive task in social science research. Despite the potential richness of open-ended survey responses, the complexity of analyzing them often leads researchers to underutilize or ignore them entirely. While topic modeling offers a technological solution, qualitative researchers may remain skeptical of its rigor. In this paper, we introduce TOPCAT: Topic-Oriented Protocol for Content Analysis of Text, a systematic approach that integrates off-the-shelf topic modeling with human decisionmaking and curation. Our method aims to provide a viable solution for topicalizing open-ended responses in survey research, ensuring both efficiency and trustworthiness. We present the TOPCAT protocol, define an evaluation process, and demonstrate its effectiveness using open-ended responses from a U.S. survey on COVID-19 impact. Our findings suggest that TOPCAT enables efficient and rigorous qualitative analysis, offering a promising avenue for future research in this domain. Furthermore, our findings challenge the adequacy of expert coding schemes as ‘‘gold’’ standards, emphasizing the subjectivity inherent in qualitative content interpretation.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[885]
Y. Zhang, V. Hangya and A. Fraser.
A Study of the Class Imbalance Problem in Abusive Language Detection.
8th Workshop on Online Abuse and Harms (WOAH 2024) at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. DOI.
Abstract

Abusive language detection has drawn increasing interest in recent years. However, a less systematically explored obstacle is label imbalance, i.e., the amount of abusive data is much lower than non-abusive data, leading to performance issues. The aim of this work is to conduct a comprehensive comparative study of popular methods for addressing the class imbalance issue. We explore 10 well-known approaches on 8 datasets with distinct characteristics: binary or multi-class, moderately or largely imbalanced, focusing on various types of abuse, etc. Additionally, we pro-pose two novel methods specialized for abuse detection: AbusiveLexiconAug and ExternalDataAug, which enrich the training data using abusive lexicons and external abusive datasets, respectively. We conclude that: 1) our AbusiveLexiconAug approach, random oversampling, and focal loss are the most versatile methods on various datasets; 2) focal loss tends to yield peak model performance; 3) oversampling and focal loss provide promising results for binary datasets and small multi-class sets, while undersampling and weighted cross-entropy are more suitable for large multi-class sets; 4) most methods are sensitive to hyperparameters, yet our suggested choice of hyperparameters provides a good starting point.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[884]
B. Deiseroth, M. Meuer, N. Gritsch, C. Eichenberg, P. Schramowski, M. Aßenmacher and K. Kersting.
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization.
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. DOI.
Abstract

Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. However, their ever-increasing size has raised concerns about their effective deployment and the need for LLM compression. This study introduces the Divergent Token Metrics (DTMs), a novel approach to assessing compressed LLMs, addressing the limitations of traditional perplexity or accuracy measures that fail to accurately reflect text generation quality. DTMs measure token divergences that allow deeper insights into the subtleties of model compression, in particular, when evaluating components’ impacts individually. Utilizing the First Divergent Token Metric (FDTM) in model sparsification reveals that 25% of all attention components can be pruned beyond 90% on the Llama-2 model family, still keeping SOTA performance. For quantization, FDTM suggests that more than 80% of parameters can be naively transformed to int8 without special outlier management. These evaluations indicate the necessity of choosing appropriate compressions for parameters individually—and that FDTM can identify those—while standard metrics result in deteriorated outcomes.

MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[883]
Z. Ding, H. Cai, J. Wu, Y. Ma, R. Liao, B. Xiong and V. Tresp.
zrLLM: Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models.
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL.
Abstract

Modeling evolving knowledge over temporal knowledge graphs (TKGs) has become a heated topic. Various methods have been proposed to forecast links on TKGs. Most of them are embedding-based, where hidden representations are learned to represent knowledge graph (KG) entities and relations based on the observed graph contexts. Although these methods show strong performance on traditional TKG forecasting (TKGF) benchmarks, they face a strong challenge in modeling the unseen zero-shot relations that have no prior graph context. In this paper, we try to mitigate this problem as follows. We first input the text descriptions of KG relations into large language models (LLMs) for generating relation representations, and then introduce them into embedding-based TKGF methods. LLM-empowered representations can capture the semantic information in the relation descriptions. This makes the relations, whether seen or unseen, with similar semantic meanings stay close in the embedding space, enabling TKGF models to recognize zero-shot relations even without any observed graph context. Experimental results show that our approach helps TKGF models to achieve much better performance in forecasting the facts with previously unseen relations, while still maintaining their ability in link forecasting regarding seen relations.

MCML Authors
Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[882]
R. Liao, X. Jia, Y. Li, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL. GitHub.
Abstract

The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional embedding-based and rule-based methods dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval-augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and few-shot parameter-efficient instruction tuning to solve the above challenges, respectively. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting with low computation resources using extremely limited training data as few as 16 samples. GenTKG also highlights remarkable cross-domain generalizability with outperforming performance on unseen datasets without re-training, and in-domain generalizability regardless of time split in the same dataset. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.

MCML Authors
Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[881]
M. Wang, H. Adel, L. Lange, J. Strötgen and H. Schütze.
Rehearsal-Free Modular and Compositional Continual Learning for Language Models.
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL.
Abstract

Continual learning aims at incrementally acquiring new knowledge while not forgetting existing knowledge. To overcome catastrophic forgetting, methods are either rehearsal-based, i.e., store data examples from previous tasks for data replay, or isolate parameters dedicated to each task. However, rehearsal-based methods raise privacy and memory issues, and parameter-isolation continual learning does not consider interaction between tasks, thus hindering knowledge transfer. In this work, we propose MoCL, a rehearsal-free Modular and Compositional Continual Learning framework which continually adds new modules to language models and composes them with existing modules. Experiments on various benchmarks show that MoCL outperforms state of the art and effectively facilitates knowledge transfer.

MCML Authors
Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[880]
Y. Liu, P. Lin, M. Wang and H. Schütze.
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining.
Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL.
Abstract

Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining. However, this method usually randomly initializes the embeddings of new subwords and introduces substantially more embedding parameters to the model, thus weakening the efficiency. To address these issues, we propose a novel framework: One For All (OFA), which wisely initializes the embeddings of unseen subwords and thus can adapt a PLM to multiple languages efficiently and effectively. OFA takes advantage of external well-aligned multilingual static word vectors and injects the alignment knowledge into the subword embeddings. In addition, OFA applies matrix factorization and replaces the cumbersome embeddings with two lower-dimensional matrices, which largely reduces the number of parameters. We show OFA accelerates the convergence of continued pretraining, which is environmentally friendly as much fewer carbon footprints are generated. Through extensive experiments, we demonstrate OFA can achieve competitive or better performance than default continued pretraining baselines on a wide range of crosslingual downstream tasks. We make our code and models publicly available.

MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[879]
L. Haliburton, S. Ghebremedhin, R. Welsch, A. Schmidt and S. Mayer.
Investigating Labeler Bias in Face Annotation for Machine Learning.
3rd International Conference on Hybrid Human-Artificial Intelligence (HHAI 2024). Malmö, Sweden, Jun 10-14, 2024. DOI.
MCML Authors
Link to Luke Haliburton

Luke Haliburton

Dr.

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[878]
L. Mayer, C. Heumann and M. Aßenmacher.
Can OpenSource beat ChatGPT? - A Comparative Study of Large Language Models for Text-to-Code Generation.
Swiss Text Analytics Conference (SwissText 2024). Chur, Switzerland, Jun 10-11, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[877]
C. Liu, C. M. Albrecht, Y. Wang and X. Zhu.
Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation.
IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2024). Athens, Greece, Jun 07, 2024-12, 2023. DOI.
Abstract

Compared to supervised deep learning, self-supervision provides remote sensing a tool to reduce the amount of exact, human-crafted geospatial annotations. While image-level information for unsupervised pretraining efficiently works for various classification downstream tasks, the performance on pixel-level semantic segmentation lags behind in terms of model accuracy. On the contrary, many easily available label sources (e.g., automatic labeling tools and land cover land use products) exist, which can provide a large amount of noisy labels for segmentation model training. In this work, we propose to exploit noisy semantic segmentation maps for model pretraining. Our experiments provide insights on robustness per network layer. The transfer learning settings test the cases when the pretrained encoders are fine-tuned for different label classes and decoders. The results from two datasets indicate the effectiveness of task-specific supervised pretraining with noisy labels. Our findings pave new avenues to improved model accuracy and novel pretraining strategies for efficient remote sensing image segmentation.

MCML Authors
Link to Chenying Liu

Chenying Liu

Data Science in Earth Observation

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[876]
Q. Zhang, Y. Wang and X. Zhu.
Deep-Learning-Based Large-Scale Forest Height Generation.
IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2024). Athens, Greece, Jun 07, 2024-12, 2023. DOI.
Abstract

The vegetation height has been identified as a key biophysical parameter to justify the role of forests in the carbon cycle and ecosystem productivity. Therefore, consistent and large-scale forest height is essential for managing terrestrial ecosystems, mitigating climate change, and preventing biodiversity loss. Since spaceborne multispectral instruments, Light Detection and Ranging (LiDAR), and Synthetic Aperture Radar (SAR) have been widely used for large-scale earth observation for years, this paper explores the possibility of generating largescale and high-accuracy forest heights with the synergy of the Sentinel-1, Sentinel-2, and ICESat-2 data. A Forest Height Generative Adversarial Network (FH-GAN) is developed to retrieve forest height from Sentinel-1 and Sentinel-2 images sparsely supervised by the ICESat-2 data. This model is made up of a cascade forest height and coherence generator, where the output of the forest height generator is fed into the spatial discriminator to regularize spatial details, and the coherence generator is connected to a coherence discriminator to refine the vertical details. A progressive strategy further underpins the generator to boost the accuracy of multi-source forest height estimation. Results indicated that FH-GAN achieves the best RMSE of 2.10 m at a large scale compared with the LVIS reference and the best RMSE of 6.16 m compared with the ICESat-2 reference.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[875]
J. W. Grootjen, H. Weingärtner and S. Mayer.
Investigating the Effects of Eye-Tracking Interpolation Methods on Model Performance of LSTM.
9th International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction (PETMEI 2024) at the ACM Symposium on Eye Tracking Research and Applications (ETRA 2024). Glasgow, Scotland, Jun 04-07, 2024. DOI.
MCML Authors
Link to Jesse Grootjen

Jesse Grootjen

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[874]
S. Jaime and C. Kern.
Ethnic Classifications in Algorithmic Fairness: Concepts, Measures and Implications in Practice.
7th ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT 2024). Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI.
Abstract

We address the challenges and implications of ensuring fairness in algorithmic decision-making (ADM) practices related to ethnicity. Expanding beyond the U.S.-centric approach to race, we provide an overview of ethnic classification schemes in European countries and emphasize how the distinct approaches to ethnicity in Europe can impact fairness assessments in ADM. Drawing on large-scale German survey data, we highlight differences in ethnic disadvantage across subpopulations defined by different measures of ethnicity. We build prediction models in the labor market, health, and finance domain and investigate the fairness implications of different ethnic classification schemes across multiple prediction tasks and fairness metrics. Our results show considerable variation in fairness scores across ethnic classifications, where error disparities for the same model can be twice as large when using different operationalizations of ethnicity. We argue that ethnic classifications differ in their ability to identify ethnic disadvantage across ADM domains and advocate for context-sensitive operationalizations of ethnicity and its transparent reporting in fair machine learning (ML) applications.

MCML Authors
Link to Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab


[873]
J. Simson, A. Fabris and C. Kern.
Lazy Data Practices Harm Fairness Research.
7th ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT 2024). Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI.
Abstract

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a lack of representation for certain protected attributes in both data and evaluations, (2) the widespread exclusion of minorities during data preprocessing, and (3) a lack of transparency about consequential yet overlooked dataset processing choices. We further note additional factors, such as limitations in publicly available data, privacy considerations and a general lack of awareness that further contribute to these issues. Through exemplary analyses on the usage of popular datasets, we demonstrate how opaque data choices significantly impact minorities, fairness metrics, and the resulting model comparison. To address these challenges, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.

MCML Authors
Link to Jan Simson

Jan Simson

Social Data Science and AI Lab

Link to Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab


[872]
J. Simson, F. Pfisterer and C. Kern.
One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions.
7th ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT 2024). Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI.
Abstract

A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. The downstream effects of ADM systems critically depend on the decisions made during a systems’ design, implementation, and evaluation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these decisions are made implicitly, without knowing exactly how they will influence the final system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit decisions during design and evaluation into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible “universes” of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can investigate the variability and robustness of fairness scores and see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand fairness implications of design and evaluation decisions using an exemplary case study of predicting public health care coverage for vulnerable populations. Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model. This is problematic, as a nefarious actor could optimise or “hack” a fairness metric to portray a discriminating model as fair merely by changing how it is evaluated. We illustrate how a multiverse analysis can help to address this issue.

MCML Authors
Link to Jan Simson

Jan Simson

Social Data Science and AI Lab

Link to Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab


[871]
J. Guo, D. Hong, Z. Liu and X. Zhu.
Continent-wide urban tree canopy fine-scale mapping and coverage assessment in South America with high-resolution satellite images.
ISPRS Journal of Photogrammetry and Remote Sensing 212 (Jun. 2024). DOI.
Abstract

Urban development in South America has experienced significant growth and transformation over the past few decades. South America’s urban development and trees are closely interconnected, and tree cover within cities plays a vital role in shaping sustainable and resilient urban landscapes. However, knowledge of urban tree canopy (UTC) coverage in the South American continent remains limited. In this study, we used high-resolution satellite images and developed a semi-supervised deep learning method to create UTC data for 888 South American cities. The proposed semi-supervised method can leverage both labeled and unlabeled data during training. By incorporating labeled data for guidance and utilizing unlabeled data to explore underlying patterns, the algorithm enhances model robustness and generalization for urban tree canopy detection across South America, with an average overall accuracy of 94.88% for the tested cities. Based on the created UTC products, we successfully assessed the UTC coverage for each city. Statistical results showed that the UTC coverage in South America is between 0.76% and 69.53%, and the average UTC coverage is approximately 19.99%. Among the 888 cities, only 357 cities that accommodate approximately 48.25% of the total population have UTC coverage greater than 20%, while the remaining 531 cities that accommodate approximately 51.75% of the total population have UTC coverage less than 20%. Natural factors (climatic and geographical) play a very important role in determining UTC coverage, followed by human activity factors (economy and urbanization level). We expect that the findings of this study and the created UTC dataset will help formulate policies and strategies to promote sustainable urban forestry, thus further improving the quality of life of residents in South America.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[870]
S. M. Fischer, J. Kiechle, D. M. Lang, J. C. Peeken and J. A. Schnabel.
Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge.
Machine Learning for Biomedical Imaging 2 (Jun. 2024). DOI. GitHub.
MCML Authors
Link to Johannes Kiechle

Johannes Kiechle

Computational Imaging and AI in Medicine

Link to Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[869]
D. Bär, F. Pierri, G. De Francisci Morales and S. Feuerriegel.
Systematic discrepancies in the delivery of political ads on facebook and instagram.
PNAS Nexus (Jun. 2024). DOI.
Abstract

Political advertising on social media has become a central element in election campaigns. However, granular information about political advertising on social media was previously unavailable, thus raising concerns regarding fairness, accountability, and transparency in the electoral process. In this article, we analyze targeted political advertising on social media via a unique, large-scale dataset of over 80,000 political ads from Meta during the 2021 German federal election, with more than billion impressions. For each political ad, our dataset records granular information about targeting strategies, spending, and actual impressions. We then study (i) the prevalence of targeted ads across the political spectrum; (ii) the discrepancies between targeted and actual audiences due to algorithmic ad delivery; and (iii) which targeting strategies on social media attain a wide reach at low cost. We find that targeted ads are prevalent across the entire political spectrum. Moreover, there are considerable discrepancies between targeted and actual audiences, and systematic differences in the reach of political ads (in impressions-per-EUR) among parties, where the algorithm favor ads from populists over others.

MCML Authors
Link to Dominik Bär

Dominik Bär

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[868]
S. Amiriparian, F. Packań, M. Gerczuk and B. W. Schuller.
ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

Foundation models have shown great promise in speech emotion recognition (SER) by leveraging their pre-trained representations to capture emotion patterns in speech signals. To further enhance SER performance across various languages and domains, we propose a novel twofold approach. First, we gather EmoSet++, a comprehensive multi-lingual, multi-cultural speech emotion corpus with 37 datasets, 150,907 samples, and a total duration of 119.5 hours. Second, we introduce ExHuBERT, an enhanced version of HuBERT achieved by backbone extension and fine-tuning on EmoSet++. We duplicate each encoder layer and its weights, then freeze the first duplicate, integrating an extra zero-initialized linear layer and skip connections to preserve functionality and ensure its adaptability for subsequent fine-tuning. Our evaluation on unseen datasets shows the efficacy of ExHuBERT, setting a new benchmark for various SER tasks.

MCML Authors
Link to Shahin Amiriparian

Shahin Amiriparian

Dr.

Health Informatics

Link to Filip Packań

Filip Packań

Health Informatics

Link to Maurice Gerczuk

Maurice Gerczuk

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[867]
S. Ball, F. Kreuter and N. Panickssery.
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

Conversational large language models are trained to refuse to answer harmful questions. However, emergent jailbreaking techniques can still elicit unsafe outputs, presenting an ongoing challenge for model alignment. To better understand how different jailbreak types circumvent safeguards, this paper analyses model activations on different jailbreak inputs. We find that it is possible to extract a jailbreak vector from a single class of jailbreaks that works to mitigate jailbreak effectiveness from other semantically-dissimilar classes. This may indicate that different kinds of effective jailbreaks operate via a similar internal mechanism. We investigate a potential common mechanism of harmfulness feature suppression, and find evidence that effective jailbreaks noticeably reduce a model’s perception of prompt harmfulness. These findings offer actionable insights for developing more robust jailbreak countermeasures and lay the groundwork for a deeper, mechanistic understanding of jailbreak dynamics in language models.

MCML Authors
Link to Sarah Ball

Sarah Ball

Social Data Science and AI Lab

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[866]
H. Baniecki, G. Casalicchio, B. Bischl and P. Biecek.
Efficient and Accurate Explanation Estimation with Distribution Compression.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

Exact computation of various machine learning explanations requires numerous model evaluations and in extreme cases becomes impractical. The computational cost of approximation increases with an ever-increasing size of data and model parameters. Many heuristics have been proposed to approximate post-hoc explanations efficiently. This paper shows that the standard i.i.d. sampling used in a broad spectrum of algorithms for explanation estimation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm for more efficient and accurate explanation estimation. CTE uses distribution compression through kernel thinning to obtain a data sample that best approximates the marginal distribution. We show that CTE improves the estimation of removal-based local and global explanations with negligible computational overhead. It often achieves an on-par explanation approximation error using 2-3x less samples, i.e. requiring 2-3x less model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[865]
A. Bavaresco, R. Bernardi, L. Bertolazzi, D. Elliott, R. Fernández, A. Gatt, E. Ghaleb, M. Giulianelli, M. Hanna, A. Koller, A. F. T. Martins, P. Mondorf, V. Neplenbroek, S. Pezzelle, B. Plank, D. Schlangen, A. Suglia, A. K. S. Aditya K. Surikuchi, E. Takmaz and A. Testoni.
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

There is an increasing trend towards evaluating NLP models with LLM-generated judgments instead of human judgments. In the absence of a comparison against human data, this raises concerns about the validity of these evaluations; in case they are conducted with proprietary models, this also raises concerns over reproducibility. We provide JUDGE-BENCH, a collection of 20 NLP datasets with human annotations, and comprehensively evaluate 11 current LLMs, covering both open-weight and proprietary models, for their ability to replicate the annotations. Our evaluations show that each LLM exhibits a large variance across datasets in its correlation to human judgments. We conclude that LLMs are not yet ready to systematically replace human judges in NLP.

MCML Authors
Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[864]
L. Burk, J. Zobolas, B. Bischl, A. Bender, M. N. Wright and R. Sonabend.
A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are often narrow in scope, focusing, for example, on high-dimensional data. Additionally, they may lack appropriate tuning or evaluation procedures, or are qualitative reviews, rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable conclusions. We benchmark 18 models, ranging from classical statistical approaches to many common machine learning methods, on 32 publicly available datasets. The benchmark tunes for both a discrimination measure and a proper scoring rule to assess performance in different settings. Evaluating on 8 survival metrics, we assess discrimination, calibration, and overall predictive performance of the tested models. Using discrimination measures, we find that no method significantly outperforms the Cox model. However, (tuned) Accelerated Failure Time models were able to achieve significantly better results with respect to overall predictive performance as measured by the right-censored log-likelihood. Machine learning methods that performed comparably well include Oblique Random Survival Forests under discrimination, and Cox-based likelihood-boosting under overall predictive performance. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for practitioners.

MCML Authors
Link to Lukas Burk

Lukas Burk

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[863]
L. Christ, S. Amiriparian, F. Hawighorst, A.-K. Schill, A. Boutalikakis, L. Graf-Vlachy, A. König and B. W. Schuller.
This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

Flattery is an important aspect of human communication that facilitates social bonding, shapes perceptions, and influences behavior through strategic compliments and praise, leveraging the power of speech to build rapport effectively. Its automatic detection can thus enhance the naturalness of human-AI interactions. To meet this need, we present a novel audio textual dataset comprising 20 hours of speech and train machine learning models for automatic flattery detection. In particular, we employ pretrained AST, Wav2Vec2, and Whisper models for the speech modality, and Whisper TTS models combined with a RoBERTa text classifier for the textual modality. Subsequently, we build a multimodal classifier by combining text and audio representations. Evaluation on unseen test data demonstrates promising results, with Unweighted Average Recall scores reaching 82.46% in audio-only experiments, 85.97% in text-only experiments, and 87.16% using a multimodal approach.

MCML Authors
Link to Shahin Amiriparian

Shahin Amiriparian

Dr.

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[862]
A. Findeis, T. Kaufmann, E. Hüllermeier, S. Albanie and R. Mullins.
Inverse Constitutional AI: Compressing Preferences into Principles.
Preprint at arXiv (Jun. 2024). arXiv. GitHub.
Abstract

Feedback data plays an important role in fine-tuning and evaluating state-of-the-art AI models. Often pairwise text preferences are used: given two texts, human (or AI) annotators select the ‘better’ one. Such feedback data is widely used to align models to human preferences (e.g., reinforcement learning from human feedback), or to rank models according to human preferences (e.g., Chatbot Arena). Despite its wide-spread use, prior work has demonstrated that human-annotated pairwise text preference data often exhibits unintended biases. For example, human annotators have been shown to prefer assertive over truthful texts in certain contexts. Models trained or evaluated on this data may implicitly encode these biases in a manner hard to identify. In this paper, we formulate the interpretation of existing pairwise text preference data as a compression task: the Inverse Constitutional AI (ICAI) problem. In constitutional AI, a set of principles (or constitution) is used to provide feedback and fine-tune AI models. The ICAI problem inverts this process: given a dataset of feedback, we aim to extract a constitution that best enables a large language model (LLM) to reconstruct the original annotations. We propose a corresponding initial ICAI algorithm and validate its generated constitutions quantitatively based on reconstructed annotations. Generated constitutions have many potential use-cases – they may help identify undesirable biases, scale feedback to unseen data or assist with adapting LLMs to individual user preferences. We demonstrate our approach on a variety of datasets: (a) synthetic feedback datasets with known underlying principles; (b) the AlpacaEval dataset of cross-annotated human feedback; and (c) the crowdsourced Chatbot Arena data set.

MCML Authors
Link to Timo Kaufmann

Timo Kaufmann

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[861]
L. Hirlimann, S. Zhang, H. Schütze and P. Wicke.
Robustness Testing of Multi-Modal Models in Varied Home Environments for Assistive Robots.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

The development of assistive robotic agents to support household tasks is advancing, yet the underlying models often operate in virtual settings that do not reflect real-world complexity. For assistive care robots to be effective in diverse environments, their models must be robust and integrate multiple modalities. Consider a caretaker needing assistance in a dimly lit room or navigating around a newly installed glass door. Models relying solely on visual input might fail in low light, while those using depth information could avoid the door. This demonstrates the necessity for models that can process various sensory inputs. Our ongoing study evaluates state-of-the-art robotic models in the AI2Thor virtual environment. We introduce disturbances, such as dimmed lighting and mirrored walls, to assess their impact on modalities like movement or vision, and object recognition. Our goal is to gather input from the Geriatronics community to understand and model the challenges faced by practitioners.

MCML Authors
Link to Shengqiang Zhang

Shengqiang Zhang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning


[860]
T. Kaufmann, J. Blüml, A. Wüst, Q. Delfosse, K. Kersting and E. Hüllermeier.
OCALM: Object-Centric Assessment with Language Models.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

Properly defining a reward signal to efficiently train a reinforcement learning (RL) agent is a challenging task. Designing balanced objective functions from which a desired behavior can emerge requires expert knowledge, especially for complex environments. Learning rewards from human feedback or using large language models (LLMs) to directly provide rewards are promising alternatives, allowing non-experts to specify goals for the agent. However, black-box reward models make it difficult to debug the reward. In this work, we propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for RL agents from natural language task descriptions. OCALM uses the extensive world-knowledge of LLMs while leveraging the object-centric nature common to many environments to derive reward functions focused on relational concepts, providing RL agents with the ability to derive policies from task descriptions.

MCML Authors
Link to Timo Kaufmann

Timo Kaufmann

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[859]
F. Krahmer and A. Veselovska.
The Mathematics of Dots and Pixels: On the Theoretical Foundations of Image Halftoning.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

The evolution of image halftoning, from its analog roots to contemporary digital methodologies, encapsulates a fascinating journey marked by technological advancements and creative innovations. Yet the theoretical understanding of halftoning is much more recent. In this article, we explore various approaches towards shedding light on the design of halftoning approaches and why they work. We discuss both halftoning in a continuous domain and on a pixel grid. We start by reviewing the mathematical foundation of the so-called electrostatic halftoning method, which departed from the heuristic of considering the back dots of the halftoned image as charged particles attracted by the grey values of the image in combination with mutual repulsion. Such an attraction-repulsion model can be mathematically represented via an energy functional in a reproducing kernel Hilbert space allowing for a rigorous analysis of the resulting optimization problem as well as a convergence analysis in a suitable topology. A second class of methods that we discuss in detail is the class of error diffusion schemes, arguably among the most popular halftoning techniques due to their ability to work directly on a pixel grid and their ease of application. The main idea of these schemes is to choose the locations of the black pixels via a recurrence relation designed to agree with the image in terms of the local averages. We discuss some recent mathematical understanding of these methods that is based on a connection to Sigma-Delta quantizers, a popular class of algorithms for analog-to-digital conversion.

MCML Authors
Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Hanna Veselovska

Hanna Veselovska

Dr.

Optimization & Data Analysis


[858]
P. Lin, A. F. T. Martins and H. Schütze.
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples.
Preprint at arXiv (Jun. 2024). arXiv. GitHub.
Abstract

Recent studies indicate that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving relevant in-context examples tailored to the input query, enhances few-shot in-context learning of English. However, adapting these methods to other languages, especially low-resource ones, poses challenges due to the scarcity of cross-lingual retrievers and annotated data. Thus, we introduce XAMPLER: Cross-Lingual Example Retrieval, a method tailored to tackle the challenge of cross-lingual in-context learning using only annotated English data. XAMPLER first trains a retriever based on Glot500, a multilingual small language model, using positive and negative English examples constructed from the predictions of a multilingual large language model, i.e., MaLA500. Leveraging the cross-lingual capacity of the retriever, it can directly retrieve English examples as few-shot examples for in-context learning of target languages. Experiments on the multilingual text classification benchmark SIB200 with 176 languages show that XAMPLER substantially improves the in-context learning performance across languages.

MCML Authors
Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[857]
C. Ma, A. ImaniGooghari, H. Ye, R. Pei, E. Asgari and H. Schütze.
Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

While natural language processing tools have been developed extensively for some of the world’s languages, a significant portion of the world’s over 7000 languages are still neglected. One reason for this is that evaluation datasets do not yet cover a wide range of languages, including low-resource and endangered ones. We aim to address this issue by creating a text classification dataset encompassing a large number of languages, many of which currently have little to no annotated data available. We leverage parallel translations of the Bible to construct such a dataset by first developing applicable topics and employing a crowdsourcing tool to collect annotated data. By annotating the English side of the data and projecting the labels onto other languages through aligned verses, we generate text classification datasets for more than 1500 languages. We extensively benchmark several existing multilingual language models using our dataset. To facilitate the advancement of research in this area, we will release our dataset and code.

MCML Authors
Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[856]
V. Margraf, M. Wever, S. Gilhuber, G. M. Tavares, T. Seidl and E. Hüllermeier.
ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data.
Preprint at arXiv (Jun. 2024). arXiv. GitHub.
Abstract

In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled, aiming to enhance learning algorithms’ efficiency and performance. Numerous such query strategies have been proposed and compared in the active learning literature. However, the community still lacks standardized benchmarks for comparing the performance of different query strategies. This particularly holds for the combination of query strategies with different learning algorithms into active learning pipelines and examining the impact of the learning algorithm choice. To close this gap, we propose ALPBench, which facilitates the specification, execution, and performance monitoring of active learning pipelines. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, ALPBench consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. To demonstrate its usefulness and broad compatibility with various learning algorithms and query strategies, we conduct an exemplary study evaluating 9 query strategies paired with 8 learning algorithms in 2 different settings.

MCML Authors
Link to Valentin Margraf

Valentin Margraf

Artificial Intelligence & Machine Learning

Link to Gabriel Marques Tavares

Gabriel Marques Tavares

Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[855]
E. Nie, B. Shao, Z. Ding, M. Wang, H. Schmid and H. Schütze.
BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning.
Preprint at arXiv (Jun. 2024). arXiv. GitHub.
Abstract

Large language models (LLMs) possess extensive parametric knowledge, but this knowledge is difficult to update with new information because retraining is very expensive and infeasible for closed-source models. Knowledge editing (KE) has emerged as a viable solution for updating the knowledge of LLMs without compromising their overall performance. On-the-fly KE methods, inspired by in-context learning (ICL), have shown great promise and allow LLMs to be treated as black boxes. In the past, KE was primarily employed in English contexts, whereas the potential for cross-lingual KE in current English-centric LLMs has not been fully explored. To foster more research in this direction, we introduce the BMIKE-53 benchmark for evaluating cross-lingual KE on 53 diverse languages across three KE task types. We also propose a gradient-free KE method called Multilingual In-context Knowledge Editing (MIKE) and evaluate it on BMIKE-53. Our evaluation focuses on cross-lingual knowledge transfer in terms of reliability, generality, locality, and portability, offering valuable insights and a framework for future research in cross-lingual KE.

MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[854]
J. Senoner, S. Schallmoser, B. Kratzwald, S. Feuerriegel and T. Netland.
Explainable AI improves task performance in human-AI collaboration.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

Artificial intelligence (AI) provides considerable opportunities to assist human work. However, one crucial challenge of human-AI collaboration is that many AI algorithms operate in a black-box manner where the way how the AI makes predictions remains opaque. This makes it difficult for humans to validate a prediction made by AI against their own domain knowledge. For this reason, we hypothesize that augmenting humans with explainable AI as a decision aid improves task performance in human-AI collaboration. To test this hypothesis, we analyze the effect of augmenting domain experts with explainable AI in the form of visual heatmaps. We then compare participants that were either supported by (a) black-box AI or (b) explainable AI, where the latter supports them to follow AI predictions when the AI is accurate or overrule the AI when the AI predictions are wrong. We conducted two preregistered experiments with representative, real-world visual inspection tasks from manufacturing and medicine. The first experiment was conducted with factory workers from an electronics factory, who performed N=9,600 assessments of whether electronic products have defects. The second experiment was conducted with radiologists, who performed N=5,650 assessments of chest X-ray images to identify lung lesions. The results of our experiments with domain experts performing real-world tasks show that task performance improves when participants are supported by explainable AI instead of black-box AI. For example, in the manufacturing setting, we find that augmenting participants with explainable AI (as opposed to black-box AI) leads to a five-fold decrease in the median error rate of human decisions, which gives a significant improvement in task performance.

MCML Authors
Link to Simon Schallmoser

Simon Schallmoser

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[853]
R. Sonabend, J. Zobolas, P. Kopper, L. Burk and A. Bender.
Examining properness in the external validation of survival models with squared and logarithmic losses.
Preprint at arXiv (Jun. 2024). arXiv.
MCML Authors
Link to Lukas Burk

Lukas Burk

Statistical Learning & Data Science

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[852]
L. Thede, K. Roth, O. J. Hénaff, M. Bethge and Z. Akata.
Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

With the advent and recent ubiquity of foundation models, continual learning (CL) has recently shifted from continual training from scratch to the continual adaptation of pretrained models, seeing particular success on rehearsal-free CL benchmarks (RFCL). To achieve this, most proposed methods adapt and restructure parameter-efficient finetuning techniques (PEFT) to suit the continual nature of the problem. Based most often on input-conditional query-mechanisms or regularizations on top of prompt- or adapter-based PEFT, these PEFT-style RFCL (P-RFCL) approaches report peak performances; often convincingly outperforming existing CL techniques. However, on the other end, critical studies have recently highlighted competitive results by training on just the first task or via simple non-parametric baselines. Consequently, questions arise about the relationship between methodological choices in P-RFCL and their reported high benchmark scores. In this work, we tackle these questions to better understand the true drivers behind strong P-RFCL performances, their placement w.r.t. recent first-task adaptation studies, and their relation to preceding CL standards such as EWC or SI. In particular, we show: (1) P-RFCL techniques relying on input-conditional query mechanisms work not because, but rather despite them by collapsing towards standard PEFT shortcut solutions. (2) Indeed, we show how most often, P-RFCL techniques can be matched by a simple and lightweight PEFT baseline. (3) Using this baseline, we identify the implicit bound on tunable parameters when deriving RFCL approaches from PEFT methods as a potential denominator behind P-RFCL efficacy. Finally, we (4) better disentangle continual versus first-task adaptation, and (5) motivate standard RFCL techniques s.a. EWC or SI in light of recent P-RFCL methods.

MCML Authors
Link to Karsten Roth

Karsten Roth

Interpretable and Reliable Machine Learning

Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[851]
M. Wang, H. Adel, L. Lange, J. Strötgen and H. Schütze.
Learn it or Leave it: Module Composition and Pruning for Continual Learning.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

In real-world environments, continual learning is essential for machine learning models, as they need to acquire new knowledge incrementally without forgetting what they have already learned. While pretrained language models have shown impressive capabilities on various static tasks, applying them to continual learning poses significant challenges, including avoiding catastrophic forgetting, facilitating knowledge transfer, and maintaining parameter efficiency. In this paper, we introduce MoCL-P, a novel lightweight continual learning method that addresses these challenges simultaneously. Unlike traditional approaches that continuously expand parameters for newly arriving tasks, MoCL-P integrates task representation-guided module composition with adaptive pruning, effectively balancing knowledge integration and computational overhead. Our evaluation across three continual learning benchmarks with up to 176 tasks shows that MoCL-P achieves state-of-the-art performance and improves parameter efficiency by up to three times, demonstrating its potential for practical applications where resource requirements are constrained.

MCML Authors
Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[850]
G. Zhang, M. L. A. Fok, Y. Xia, Y. Tang, D. Cremers, P. Torr, V. Tresp and J. Gu.
Localizing Events in Videos with Multimodal Queries.
Preprint at arXiv (Jun. 2024). arXiv.
Abstract

Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current research is that semantic queries are typically in natural language that depicts the semantics of the target event. This setting overlooks the potential for multimodal semantic queries composed of images and texts. To address this gap, we introduce a new benchmark, ICQ, for localizing events in videos with multimodal queries, along with a new evaluation dataset ICQ-Highlight. Our new benchmark aims to evaluate how well models can localize an event given a multimodal semantic query that consists of a reference image, which depicts the event, and a refinement text to adjust the images’ semantics. To systematically benchmark model performance, we include 4 styles of reference images and 5 types of refinement texts, allowing us to explore model performance across different domains. We propose 3 adaptation methods that tailor existing models to our new setting and evaluate 10 SOTA models, ranging from specialized to large-scale foundation models. We believe this benchmark is an initial step toward investigating multimodal queries in video event localization.

MCML Authors
Link to Gengyuan Zhang

Gengyuan Zhang

Database Systems & Data Mining

Link to Yan Xia

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[849]
J. Ramjith, A. Bender, K. C. B. Roes and M. A. Jonker.
Recurrent events analysis with piece-wise exponential additive mixed models.
Statistical Modelling 24.3 (Jun. 2024). DOI.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[848]
J. Kiechle, S. M. Fischer, D. M. Lang, M. Folco, S. C. Foreman, V. K. N. Rösner, A.-K. Lohse, C. Mogler, C. Knebel, M. R. Makowski, K. Woertler, S. E. Combs, H. R. Duerr, A. S. Gersing, J. C. Peeken and J. A. Schnabel.
Unifying local and global shape descriptors to grade soft-tissue sarcomas using graph convolutional networks.
IEEE 20th International Symposium on Biomedical Imaging (ISBI 2024). Athens, Greece, May 27-30, 2024. DOI.
MCML Authors
Link to Johannes Kiechle

Johannes Kiechle

Computational Imaging and AI in Medicine

Link to Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[847]
N. Stolt-Ansó, V. Sideri-Lampretsa, M. Dannecker and D. Rückert.
Intensity-based 3D motion correction for cardiac MR images.
IEEE 20th International Symposium on Biomedical Imaging (ISBI 2024). Athens, Greece, May 27-30, 2024. DOI.
Abstract

Cardiac magnetic resonance (CMR) image acquisition requires subjects to hold their breath while 2D cine images are acquired. This process assumes that the heart remains in the same position across all slices. However, differences in breathhold positions or patient motion introduce 3D slice misalignments. In this work, we propose an algorithm that simultaneously aligns all SA and LA slices by maximizing the pair-wise intensity agreement between their intersections. Unlike previous works, our approach is formulated as a subject-specific optimization problem and requires no prior knowledge of the underlying anatomy. We quantitatively demonstrate that the proposed method is robust against a large range of rotations and translations by synthetically misaligning 10 motion-free datasets and aligning them back using the proposed method.

MCML Authors
Link to Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[846]
Y. Zhang, N. Stolt-Ansó, J. Pan, W. Huang, K. Hammernik and D. Rückert.
Direct Cardiac Segmentation from Undersampled K-Space using Transformers.
IEEE 20th International Symposium on Biomedical Imaging (ISBI 2024). Athens, Greece, May 27-30, 2024. DOI.
Abstract

The prevailing deep learning-based methods of predicting cardiac segmentation involve reconstructed magnetic resonance (MR) images. The heavy dependency of segmentation approaches on image quality significantly limits the acceleration rate in fast MR reconstruction. Moreover, the practice of treating reconstruction and segmentation as separate sequential processes leads to artifact generation and information loss in the intermediate stage. These issues pose a great risk to achieving high-quality outcomes. To leverage the redundant k-space information overlooked in this dual-step pipeline, we introduce a novel approach to directly deriving segmentations from sparse k-space samples using a transformer (DiSK). DiSK operates by globally extracting latent features from 2D+time k-space data with attention blocks and subsequently predicting the segmentation label of query points. We evaluate our model under various acceleration factors (ranging from 4 to 64) and compare against two image-based segmentation baselines. Our model consistently outperforms the baselines in Dice and Hausdorff distances across foreground classes for all presented sampling rates.

MCML Authors
Link to Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[845]
V. Blaschke, B. Kovačić, S. Peng, H. Schütze and B. Plank.
MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in `within-language breadth’: most treebanks focus on standard languages. Even for German, the language with the most annotations in UD, so far no treebank exists for one of its language varieties spoken by over 10M people: Bavarian. To contribute to closing this gap, we present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in UD, covering multiple text genres (wiki, fiction, grammar examples, social, non-fiction). We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers’ orthographies. Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries. We provide baseline parsing and POS tagging results, which are lower than results obtained on German and vary substantially between different graph-based parsers. To support further research on Bavarian syntax, we make our dataset, language-specific guidelines and code publicly available.

MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[844]
V. Hangya and A. Fraser.
How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Due to the broad range of social media platforms, the requirements of abusive language detection systems are varied and ever-changing. Already a large set of annotated corpora with different properties and label sets were created, such as hate or misogyny detection, but the form and targets of abusive speech are constantly evolving. Since, the annotation of new corpora is expensive, in this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection. Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain. We propose a two-step approach: first we train our model in a multitask fashion. We then carry out few-shot adaptation to the target requirements. Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages. Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset and can benefit from knowledge about labels which are not directly used for the target task.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[843]
A. H. Kargaran, F. Yvon and H. Schütze.
GlotScript: A Resource and Tool for Low Resource Writing System Identification.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL. GitHub.
Abstract

We present GlotScript, an open resource and tool for low resource writing system identification. GlotScript-R is a resource that provides the attested writing systems for more than 7,000 languages. It is compiled by aggregating information from existing writing system resources. GlotScript-T is a writing system identification tool that covers all 161 Unicode 15.0 scripts. For an input text, it returns its script distribution where scripts are identified by ISO 15924 codes. We also present two use cases for GlotScript. First, we demonstrate that GlotScript can help cleaning multilingual corpora such as mC4 and OSCAR. Second, we analyze the tokenization of a number of language models such as GPT-4 using GlotScript and provide insights on the coverage of low resource scripts and languages by each language model. We hope that GlotScript will become a useful resource for work on low resource languages in the NLP community.

MCML Authors
Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[842]
A. Köksal, S. Severini and H. Schütze.
SilverAlign: MT-Based Silver Data Algorithm for Evaluating Word Alignment.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance on our silver data correlates well with gold benchmarks for 9 language pairs, making our approach a valid resource for evaluation of different domains and languages when gold data are not available. This addresses the important scenario of missing gold data alignments for low-resource languages.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[841]
M. Marco and A. Fraser.
Analyzing the Understanding of Morphologically Complex Words in Large Language Models.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

We empirically study the ability of a Large Language Model (gpt-3.5-turbo-instruct) to understand morphologically complex words. In our experiments, we looked at a variety of tasks to analyse German compounds with regard to compositional word formation and derivation, such as identifying the head noun of existing and novel compounds, identifying the shared verb stem between two words, or recognizing words constructed with inappropriately used derivation morphemes as invalid. Our results show that the language model is generally capable of solving most tasks, except for the task of identifying ill-formed word forms. While the model demonstrated a good overall understanding of complex words and their word-internal structure, the results also suggest that there is no formal knowledge of derivational rules, but rather an interpretation of the observed word parts to derive the meaning of a word.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[840]
D. R. Mortensen, V. Izrailevitch, Y. Xiao, H. Schütze and L. Weissweiler.
Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Lexical-syntactic flexibility, in the form of conversion (or zero-derivation) is a hallmark of English morphology. In conversion, a word with one part of speech is placed in a non-prototypical context, where it is coerced to behave as if it had a different part of speech. However, while this process affects a large part of the English lexicon, little work has been done to establish the degree to which language models capture this type of generalization. This paper reports the first study on the behavior of large language models with reference to conversion. We design a task for testing lexical-syntactic flexibility—the degree to which models can generalize over words in a construction with a non-prototypical part of speech. This task is situated within a natural language inference paradigm. We test the abilities of five language models—two proprietary models (GPT-3.5 and GPT-4), three open source model (Mistral 7B, Falcon 40B, and Llama 2 70B). We find that GPT-4 performs best on the task, followed by GPT-3.5, but that the open source language models are also able to perform it and that the 7-billion parameter Mistral displays as little difference between its baseline performance on the natural language inference task and the non-prototypical syntactic category task, as the massive GPT-4.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member


[839]
C. Müller and B. Plank.
IndirectQA: Understanding Indirect Answers to Implicit Polar Questions in French and Spanish.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Polar questions are common in dialogue and expect exactly one of two answers (yes/no). It is however not uncommon for speakers to bypass these expected choices and answer, for example, ‘Islands are generally by the sea’ to the question: ‘An island? By the sea?’. While such answers are natural in spoken dialogues, conversational systems still struggle to interpret them. Seminal work to interpret indirect answers were made in recent years—but only for English and with strict question formulations. In this work, we present a new corpus for French and Spanish—IndirectQA —where we mine subtitle data for indirect answers to study the labeling task with six different labels, while broadening polar questions to include also implicit polar questions (statements that trigger a yes/no-answer which are not necessarily formulated as a question). We opted for subtitles since they are a readily available source of conversation in various languages, but also come with peculiarities and challenges which we will discuss. Overall, we provide the first results on French and Spanish. They show that the task is challenging: the baseline accuracy scores drop from 61.43 on English to 44.06 for French and Spanish.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[838]
S. Peng, Z. Sun, H. Shan, M. Kolm, V. Blaschke, E. Artemova and B. Plank.
Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs from standard German in lexical distribution, syntactic construction, and entity information. We conduct in-domain, cross-domain, sequential, and joint experiments on two Bavarian and three German corpora and present the first comprehensive NER results on Bavarian. Incorporating knowledge from the larger German NER (sub-)datasets notably improves on bar-wiki and moderately on bar-tweet. Inversely, training first on Bavarian contributes slightly to the seminal German CoNLL 2006 corpus. Moreover, with gold dialect labels on Bavarian tweets, we assess multi-task learning between five NER and two Bavarian-German dialect identification tasks and achieve NER SOTA on bar-wiki. We substantiate the necessity of our low-resource BarNER corpus and the importance of diversity in dialects, genres, and topics in enhancing model performance.

MCML Authors
Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[837]
L. Weissweiler, N. Böbel, K. Guiller, S. Herrera, W. Scivetti, A. Lorenzi, N. Melnik, A. Bhatia, H. Schütze, L. Levin, A. Zeldes, J. Nivre, W. Croft and N. Schneider.
UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages. However, the UD annotations do not tell the full story. Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements – for example, interrogative sentences with special markers and/or word orders – are not labeled holistically. We argue for (i) augmenting UD annotations with a ‘UCxn’ annotation layer for such meaning-bearing grammatical constructions, and (ii) approaching this in a typologically informed way so that morphosyntactic strategies can be compared across languages. As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns. In addition to findings regarding these particular constructions, our study yields important insights on methodology for describing and identifying constructions in language-general and language-particular ways, and lays the foundation for future constructional enrichment of UD treebanks.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Nora Schneider

Nora Schneider

Ethics in Systems Design and Machine Learning


[836]
M. Winkler, V. Juozapaityte, R. van der Goot and B. Plank.
Slot and Intent Detection Resources for Bavarian and Lithuanian: Assessing Translations vs Natural Queries to Digital Assistants.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Digital assistants perform well in high-resource languages like English, where tasks like slot and intent detection (SID) are well-supported. Many recent SID datasets start including multiple language varieties. However, it is unclear how realistic these translated datasets are. Therefore, we extend one such dataset, namely xSID-0.4, to include two underrepresented languages: Bavarian, a German dialect, and Lithuanian, a Baltic language. Both language variants have limited speaker populations and are often not included in multilingual projects. In addition to translations we provide “natural” queries to digital assistants generated by native speakers. We further include utterances from another dataset for Bavarian to build the richest SID dataset available today for a low-resource dialect without standard orthography. We then set out to evaluate models trained on English in a zero-shot scenario on our target language variants. Our evaluation reveals that translated data can produce overly optimistic scores. However, the error patterns in translated and natural datasets are highly similar. Cross-dataset experiments demonstrate that data collection methods influence performance, with scores lower than those achieved with single-dataset translations. This work contributes to enhancing SID datasets for underrepresented languages, yielding NaLiBaSID, a new evaluation dataset for Bavarian and Lithuanian.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[835]
S. Zhou, L. Weissweiler, T. He, H. Schütze, D. R. Mortensen and L. Levin.
Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning entailment solely based on token distinctions, and show that GPT-4 and Llama 2 fail it with strong bias. We then create further challenging sub-tasks in an effort to explain this failure. From a Computational Linguistics perspective, we identify a group of constructions with three classes of adjectives which cannot be distinguished by surface features. This enables us to probe for LLM’s understanding of these constructions in various ways, and we find that they fail in a variety of ways to distinguish between them, suggesting that they don’t adequately represent their meaning or capture the lexical properties of phrasal heads.

MCML Authors
Link to Shijia Zhou

Shijia Zhou

Artificial Intelligence and Computational Linguistics

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[834]
A. Beer, O. Palotás, A. Maldonado, A. Draganov and I. Assent.
DROPP: Structure-aware PCA for Ordered Data.
40th IEEE International Conference on Data Engineering (ICDE 2024). Utrecht, Netherlands, May 13-17, 2024. DOI.
Abstract

Ordered data arises in many areas, e.g., in molecular dynamics and other spatial-temporal trajectories. While data points that are close in this order are related, common dimensionality reduction techniques cannot capture this relation or order. Thus, the information is lost in the low-dimensional representations. We introduce DROPP, which incorporates order into dimensionality reduction by adapting a Gaussian kernel function across the ordered covariances between data points. We find underlying principal components that are characteristic of the process that generated the data. In extensive experiments, we show DROPP’s advantages over other dimensionality re- duction techniques on synthetic as well as real-world data sets from molecular dynamics and climate research: The principal components of different data sets that were generated by the same underlying mechanism are very similar to each other. They can, thus, be used for dimensionality reduction with low reconstruction errors along a set of data sets, allowing an explainable visual comparison of different data sets as well as good compression even for unseen data.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Andrea Maldonado

Andrea Maldonado

Database Systems & Data Mining


[833]
Y. Velikova, M. F. Azampour, W. Simson, M. Esposito and N. Navab.
Implicit Neural Representations for Breathing-compensated Volume Reconstruction in Robotic Ultrasound Aorta Screening.
IEEE International Conference on Robotics and Automation (ICR4 2024). Yokohoma, Japan, May 13-17, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Yordanka Velikova

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Mohammad Farid Azampour

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Link to Walter Simson

Walter Simson

Dr.

* Former member

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[832]
J. W. Grootjen, H. Weingärtner and S. Mayer.
Uncovering and Addressing Blink-Related Challenges in Using Eye Tracking for Interactive Systems.
Conference on Human Factors in Computing Systems (CHI 2024). Honolulu, Hawaii, May 11-16, 2024. DOI.
MCML Authors
Link to Jesse Grootjen

Jesse Grootjen

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[831]
L. Haliburton, I. Damen, C. Lallemand, J. Niess, A. Ahtinen and P. W. Woźniak.
Office Wellbeing by Design: Don’t Stand for Anything Less.
Conference on Human Factors in Computing Systems (CHI 2024). Honolulu, Hawaii, May 11-16, 2024. DOI.
MCML Authors
Link to Luke Haliburton

Luke Haliburton

Dr.

Human-Centered Ubiquitous Media


[830]
L. Haliburton, D. J. Grüning, F. Riedel, A. Schmidt and N. Terzimehić.
A Longitudinal In-the-Wild Investigation of Design Frictions to Prevent Smartphone Overuse.
Conference on Human Factors in Computing Systems (CHI 2024). Honolulu, Hawaii, May 11-16, 2024. DOI.
MCML Authors
Link to Luke Haliburton

Luke Haliburton

Dr.

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media


[829]
S. Sakel, T. Blenk, A. Schmidt and L. Haliburton.
The Social Journal: Investigating Technology to Support and Reflect on Meaningful Social Interactions.
Conference on Human Factors in Computing Systems (CHI 2024). Honolulu, Hawaii, May 11-16, 2024. DOI.
MCML Authors
Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Link to Luke Haliburton

Luke Haliburton

Dr.

Human-Centered Ubiquitous Media


[828]
S. d'Ascoli, S. Becker, P. Schwaller, A. Mathis and N. Kilbertus.
ODEFormer: Symbolic Regression of Dynamical Systems with Transformers.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL. GitHub.
Abstract

We introduce ODEFormer, the first transformer able to infer multidimensional ordinary differential equation (ODE) systems in symbolic form from the observation of a single solution trajectory. We perform extensive evaluations on two datasets: (i) the existing ‘Strogatz’ dataset featuring two-dimensional systems; (ii) ODEBench, a collection of one- to four-dimensional systems that we carefully curated from the literature to provide a more holistic benchmark. ODEFormer consistently outperforms existing methods while displaying substantially improved robustness to noisy and irregularly sampled observations, as well as faster inference.

MCML Authors
Link to Sören Becker

Sören Becker

Ethics in Systems Design and Machine Learning

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[827]
L. Eyring, D. Klein, T. Palla, N. Kilbertus, Z. Akata and F. J. Theis.
Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

In optimal transport (OT), a Monge map is known as a mapping that transports a source distribution to a target distribution in the most cost-efficient way. Recently, multiple neural estimators for Monge maps have been developed and applied in diverse unpaired domain translation tasks, e.g. in single-cell biology and computer vision. However, the classic OT framework enforces mass conservation, which makes it prone to outliers and limits its applicability in real-world scenarios. The latter can be particularly harmful in OT domain translation tasks, where the relative position of a sample within a distribution is explicitly taken into account. While unbalanced OT tackles this challenge in the discrete setting, its integration into neural Monge map estimators has received limited attention. We propose a theoretically grounded method to incorporate unbalancedness into any Monge map estimator. We improve existing estimators to model cell trajectories over time and to predict cellular responses to perturbations. Moreover, our approach seamlessly integrates with the OT flow matching (OT-FM) framework. While we show that OT-FM performs competitively in image translation, we further improve performance by incorporating unbalancedness (UOT-FM), which better preserves relevant features. We hence establish UOT-FM as a principled method for unpaired image translation.

MCML Authors
Link to Luca Eyring

Luca Eyring

Interpretable and Reliable Machine Learning

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems


[826]
D. Frauen, F. Imrie, A. Curth, V. Melnychuk, S. Feuerriegel and M. van der Schaar.
A Neural Framework for Generalized Causal Sensitivity Analysis.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

Unobserved confounding is common in many applications, making causal inference from observational data challenging. As a remedy, causal sensitivity analysis is an important tool to draw causal conclusions under unobserved confounding with mathematical guarantees. In this paper, we propose NeuralCSA, a neural framework for generalized causal sensitivity analysis. Unlike previous work, our framework is compatible with (i) a large class of sensitivity models, including the marginal sensitivity model, -sensitivity models, and Rosenbaum’s sensitivity model; (ii) different treatment types (i.e., binary and continuous); and (iii) different causal queries, including (conditional) average treatment effects and simultaneous effects on multiple outcomes. This generality is achieved by learning a latent distribution shift that corresponds to a treatment intervention using two conditional normalizing flows. We provide theoretical guarantees that NeuralCSA is able to infer valid bounds on the causal query of interest and also demonstrate this empirically using both simulated and real-world data.

MCML Authors
Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[825]
K. Hess, V. Melnychuk, D. Frauen and S. Feuerriegel.
Bayesian Neural Controlled Differential Equations for Treatment Effect Estimation.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

Treatment effect estimation in continuous time is crucial for personalized medicine. However, existing methods for this task are limited to point estimates of the potential outcomes, whereas uncertainty estimates have been ignored. Needless to say, uncertainty quantification is crucial for reliable decision-making in medical applications. To fill this gap, we propose a novel Bayesian neural controlled differential equation (BNCDE) for treatment effect estimation in continuous time. In our BNCDE, the time dimension is modeled through a coupled system of neural controlled differential equations and neural stochastic differential equations, where the neural stochastic differential equations allow for tractable variational Bayesian inference. Thereby, for an assigned sequence of treatments, our BNCDE provides meaningful posterior predictive distributions of the potential outcomes. To the best of our knowledge, ours is the first tailored neural method to provide uncertainty estimates of treatment effects in continuous time. As such, our method is of direct practical value for promoting reliable decision-making in medicine.

MCML Authors
Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[824]
C. Koke and D. Cremers.
HoloNets: Spectral Convolutions do extend to Directed Graphs.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

Within the graph learning community, conventional wisdom dictates that spectral convolutional networks may only be deployed on undirected graphs: Only there could the existence of a well-defined graph Fourier transform be guaranteed, so that information may be translated between spatial- and spectral domains. Here we show this traditional reliance on the graph Fourier transform to be superfluous and – making use of certain advanced tools from complex analysis and spectral theory – extend spectral convolutions to directed graphs. We provide a frequency-response interpretation of newly developed filters, investigate the influence of the basis used to express filters and discuss the interplay with characteristic operators on which networks are based. In order to thoroughly test the developed theory, we conduct experiments in real world settings, showcasing that directed spectral convolutional networks provide new state of the art results for heterophilic node classification on many datasets and – as opposed to baselines – may be rendered stable to resolution-scale varying topological perturbations.

MCML Authors
Link to Christian Koke

Christian Koke

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[823]
V. Melnychuk, D. Frauen and S. Feuerriegel.
Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic refutation framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATE is non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose a neural refutation framework which performs partial identification of CATE or, equivalently, aims at estimating lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our refutation framework is of direct relevance in practice where the validity of CATE estimation is of importance.

MCML Authors
Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[822]
M. Schröder, D. Frauen and S. Feuerriegel.
Causal Fairness under Unobserved Confounding: A Neural Sensitivity Framework.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

Fairness of machine learning predictions is widely required in practice for legal, ethical, and societal reasons. Existing work typically focuses on settings without unobserved confounding, even though unobserved confounding can lead to severe violations of causal fairness and, thus, unfair predictions. In this work, we analyze the sensitivity of causal fairness to unobserved confounding. Our contributions are three-fold. First, we derive bounds for causal fairness metrics under different sources of unobserved confounding. This enables practitioners to examine the sensitivity of their machine learning models to unobserved confounding in fairness-critical applications. Second, we propose a novel neural framework for learning fair predictions, which allows us to offer worst-case guarantees of the extent to which causal fairness can be violated due to unobserved confounding. Third, we demonstrate the effectiveness of our framework in a series of experiments, including a real-world case study about predicting prison sentences. To the best of our knowledge, ours is the first work to study causal fairness under unobserved confounding. To this end, our work is of direct practical value as a refutation strategy to ensure the fairness of predictions in high-stakes applications.

MCML Authors
Link to Maresa Schröder

Maresa Schröder

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[821]
S. Solonets, D. Sinitsyn, L. Von Stumberg, N. Araslanov and D. Cremers.
An Analytical Solution to Gauss-Newton Loss for Direct Image Alignment.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

Direct image alignment is a widely used technique for relative 6DoF pose estimation between two images, but its accuracy strongly depends on pose initialization. Therefore, recent end-to-end frameworks increase the convergence basin of the learned feature descriptors with special training objectives, such as the Gauss-Newton loss. However, the training data may exhibit bias toward a specific type of motion and pose initialization, thus limiting the generalization of these methods. In this work, we derive a closed-form solution to the expected optimum of the Gauss-Newton loss. The solution is agnostic to the underlying feature representation and allows us to dynamically adjust the basin of convergence according to our assumptions about the uncertainty in the current estimates. These properties allow for effective control over the convergence in the alignment process. Despite using self-supervised feature embeddings, our solution achieves compelling accuracy w.r.t. the state-of-the-art direct image alignment methods trained end-to-end with pose supervision, and demonstrates improved robustness to pose initialization. Our analytical solution exposes some inherent limitations of end-to-end learning with the Gauss-Newton loss, and establishes an intriguing connection between direct image alignment and feature-matching approaches.

MCML Authors
Link to Sergei Solonets

Sergei Solonets

Computer Vision & Artificial Intelligence

Link to Daniil Sinitsyn

Daniil Sinitsyn

Computer Vision & Artificial Intelligence

Link to Nikita Araslanov

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[820]
A. Vahidi, S. Schoßer, L. Wimmer, Y. Li, B. Bischl, E. Hüllermeier and M. Rezaei.
Probabilistic Self-supervised Learning via Scoring Rules Minimization.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL. GitHub.
Abstract

In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN’s convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method’s optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.

MCML Authors
Link to Lisa Wimmer

Lisa Wimmer

Statistical Learning & Data Science

Link to Yawei Li

Yawei Li

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[819]
L. Zellner, S. Rauch, J. Sontheim and T. Seidl.
On Diverse and Precise Recommendations for Small and Medium-Sized Enterprises.
28th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2024). Taipeh, Taiwan, May 07-10, 2024. DOI. GitHub.
Abstract

Recommender Systems are a popular and common means to extract relevant information for users. Small and medium-sized enterprises make up a large share of the overall amount of business but need to be more frequently considered regarding the demand for recommender systems. Different conditions, such as the small amount of data, lower computational capabilities, and users frequently not possessing an account, require a different and potentially a more small-scale recommender system. The requirements regarding quality are similar: High accuracy and high diversity are certainly an advantage. We provide multiple solutions with different variants solely based on information contained in event-based sequences and temporal information. Our code is available at GitHub. We conduct experiments on four different datasets with an increasing set of items to show a possible range for scalability. The promising results show the applicability of these grammar-based recommender system variants and leave the final decision on which recommender to choose to the user and its ultimate goals.

MCML Authors
Link to Simon Rauch

Simon Rauch

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[818]
C. Liu, C. Albrecht, Y. Wang and X. Zhu.
CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation.
2nd Workshop Machine Learning for Remote Sensing (ML4RS 2024) at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. PDF.
Abstract

We study the potential of noisy labels y to pretrain semantic segmentation models in a multi-modal learning framework for geospatial applications. Specifically, we propose a novel Cross-modal Sample Selection method (CromSS) that utilizes the class distributions P^{(d)}(x,c) over pixels x and classes c modelled by multiple sensors/modalities d of a given geospatial scene. Consistency of predictions across sensors d is jointly informed by the entropy of P^{(d)}(x,c). Noisy label sampling we determine by the confidence of each sensor d in the noisy class label, P^{(d)}(x,c=y(x)). To verify the performance of our approach, we conduct experiments with Sentinel-1 (radar) and Sentinel-2 (optical) satellite imagery from the globally-sampled SSL4EO-S12 dataset. We pair those scenes with 9-class noisy labels sourced from the Google Dynamic World project for pretraining. Transfer learning evaluations (downstream task) on the DFC2020 dataset confirm the effectiveness of the proposed method for remote sensing image segmentation.

MCML Authors
Link to Chenying Liu

Chenying Liu

Data Science in Earth Observation

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[817]
R. Kohli, M. Feurer, B. Bischl, K. Eggensperger and F. Hutter.
Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning.
Workshop on Data-centric Machine Learning Research (DMLR 2024) at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

Data in tabular form makes up a large part of real-world ML applications, and thus, there has been a strong interest in developing novel deep learning (DL) architectures for supervised learning on tabular data in recent years. As a result, there is a debate as to whether DL methods are superior to the ubiquitous ensembles of boosted decision trees. Typically, the advantage of one model class over the other is claimed based on an empirical evaluation, where different variations of both model classes are compared on a set of benchmark datasets that supposedly resemble relevant real-world tabular data. While the landscape of state-of- the-art models for tabular data changed, one factor has remained largely constant over the years: The datasets. Here, we examine 30 recent publications and 187 different datasets they use, in terms of age, study size and relevance. We found that the average study used less than 10 datasets and that half of the datasets are older than 20 years. Our insights raise questions about the conclusions drawn from previous studies and urge the research community to develop and publish additional recent, challenging and relevant datasets and ML tasks for supervised learning on tabular data.

MCML Authors
Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[816]
S. Chen, Z. Han, B. He, M. Buckley, P. Torr, V. Tresp and J. Gu.
Understanding and Improving In-Context Learning on Vision-language Models.
Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo 2024) at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

Recently, in-context learning (ICL) on large language models (LLMs) has received great attention, and this technique can also be applied to vision-language models (VLMs) built upon LLMs. These VLMs can respond to queries by conditioning responses on a series of multimodal demonstrations, which comprise images, queries, and answers. Though ICL has been extensively studied on LLMs, its research on VLMs remains limited. The inclusion of additional visual information in the demonstrations motivates the following research questions: which of the two modalities in the demonstration is more significant? How can we select effective multimodal demonstrations to enhance ICL performance? This study investigates the significance of both visual and language information. Our findings indicate that ICL in VLMs is predominantly driven by the textual information in the demonstrations whereas the visual information in the demonstrations barely affects the ICL performance. Subsequently, we provide an understanding of the findings by analyzing the model information flow and comparing model inner states given different ICL settings. Motivated by our analysis, we propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES), which considers both visual and language modalities when selecting demonstrations and shows better ICL performance. Extensive experiments are conducted to support our findings, understanding, and improvement of the ICL performance of VLMs.

MCML Authors
Link to Shuo Chen

Shuo Chen

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[815]
S. Chen, Z. Han, B. He, Z. Ding, W. Yu, P. Torr, V. Tresp and J. Gu.
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?.
Workshop on Secure and Trustworthy Large Language Models (SeT LLM 2024) at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.
Abstract

Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods.

MCML Authors
Link to Shuo Chen

Shuo Chen

Database Systems & Data Mining

Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[814]
J. Kiechle, S. C. Foreman, S. Fischer, D. Rusche, V. K. N. Rösner, A.-K. Lohse, C. Mogler, C. Knebel, S. E. Combs, M. R. Makowski, K. Woertler, D. M. Lang, J. A. Schnabel, A. S. Gersing and J. C. Peeken.
Investigating the role of morphology in deep learning-based liposarcoma grading.
Annual Meeting of the European Society for Radiotherapy and Oncology (ESTRO 2024). Glasgow, UK, May 03-07, 2024. URL.
MCML Authors
Link to Johannes Kiechle

Johannes Kiechle

Computational Imaging and AI in Medicine

Link to Stefan Fischer

Stefan Fischer

Computational Imaging and AI in Medicine

Link to Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[813]
V. Bengs, B. Haddenhorst and E. Hüllermeier.
Identifying Copeland Winners in Dueling Bandits with Indifferences.
27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024). Valencia, Spain, May 02-04, 2024. URL.
Abstract

We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[812]
D. Dold, D. Rügamer, B. Sick and O. Dürr.
Bayesian Semi-structured Subspace Inference.
27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024). Valencia, Spain, May 02-04, 2024. URL.
Abstract

Semi-structured regression models enable the joint modeling of interpretable structured and complex unstructured feature effects. The structured model part is inspired by statistical models and can be used to infer the input-output relationship for features of particular importance. The complex unstructured part defines an arbitrary deep neural network and thereby provides enough flexibility to achieve competitive prediction performance. While these models can also account for aleatoric uncertainty, there is still a lack of work on accounting for epistemic uncertainty. In this paper, we address this problem by presenting a Bayesian approximation for semi-structured regression models using subspace inference. To this end, we extend subspace inference for joint posterior sampling from a full parameter space for structured effects and a subspace for unstructured effects. Apart from this hybrid sampling scheme, our method allows for tunable complexity of the subspace and can capture multiple minima in the loss landscape. Numerical experiments validate our approach’s efficacy in recovering structured effect parameter posteriors in semi-structured models and approaching the full-space posterior distribution of MCMC for increasing subspace dimension. Further, our approach exhibits competitive predictive performance across simulated and real-world datasets.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[811]
P. Kolpaczki, M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
SVARM-IQ: Efficient Approximation of Any-order Shapley Interactions through Stratification.
27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024). Valencia, Spain, May 02-04, 2024. URL.
Abstract

Addressing the limitations of individual attribution scores via the Shapley value (SV), the field of explainable AI (XAI) has recently explored intricate interactions of features or data points. In particular, extensions of the SV, such as the Shapley Interaction Index (SII), have been proposed as a measure to still benefit from the axiomatic basis of the SV. However, similar to the SV, their exact computation remains computationally prohibitive. Hence, we propose with SVARM-IQ a sampling-based approach to efficiently approximate Shapley-based interaction indices of any order. SVARM-IQ can be applied to a broad class of interaction indices, including the SII, by leveraging a novel stratified representation. We provide non-asymptotic theoretical guarantees on its approximation quality and empirically demonstrate that SVARM-IQ achieves state-of-the-art estimation results in practical XAI scenarios on different model classes and application domains.

MCML Authors
Link to Maximilian Muschalik

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[810]
N. Palm and T. Nagler.
An Online Bootstrap for Time Series.
27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024). Valencia, Spain, May 02-04, 2024. URL.
MCML Authors
Link to Nicolai Palm

Nicolai Palm

Computational Statistics & Data Science

Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science


[809]
D. Rügamer.
Scalable Higher-Order Tensor Product Spline Models.
27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024). Valencia, Spain, May 02-04, 2024. URL.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[808]
Z. Ye, G. Peyré, D. Cremers and P. Ablin.
Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization.
27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024). Valencia, Spain, May 02-04, 2024. URL. GitHub.
Abstract

Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem. It is routinely used in Machine Learning, notably for hyperparameter tuning. The conventional method to compute the so-called hypergradient of the outer problem is to use the Implicit Function Theorem (IFT). As a function of the error of the inner problem resolution, we study the error of the IFT method. We analyze two strategies to reduce this error: preconditioning the IFT formula and reparameterizing the inner problem. We give a detailed account of the impact of these two modifications on the error, highlighting the role played by higher-order derivatives of the functionals at stake. Our theoretical findings explain when super efficiency, namely reaching an error on the hypergradient that depends quadratically on the error on the inner problem, is achievable and compare the two approaches when this is impossible. Numerical evaluations on hyperparameter tuning for regression problems substantiate our theoretical findings.

MCML Authors
Link to Zhenzhang Ye

Zhenzhang Ye

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[807]
A. Solderer, S. P. Hicklin, M. Aßenmacher, A. Ender and P. R. Schmidlin.
Influence of an allogenic collagen scaffold on implant sites with thin supracrestal tissue height: a randomized clinical trial.
Clinical Oral Investigations 28.313 (May. 2024). DOI.
MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[806]
A. Kazemi, A. Rasouli-Saravani, M. Gharib, T. Albuquerque, S. Eslami and P. J. Schüffler.
A systematic review of machine learning-based tumor-infiltrating lymphocytes analysis in colorectal cancer: Overview of techniques, performance metrics, and clinical outcomes.
Computers in Biology and Medicine 173 (May. 2024). DOI.
Abstract

The incidence of colorectal cancer (CRC), one of the deadliest cancers around the world, is increasing. Tissue microenvironment (TME) features such as tumor-infiltrating lymphocytes (TILs) can have a crucial impact on diagnosis or decision-making for treating patients with CRC. While clinical studies showed that TILs improve the host immune response, leading to a better prognosis, inter-observer agreement for quantifying TILs is not perfect. Incorporating machine learning (ML) based applications in clinical routine may promote diagnosis reliability. Recently, ML has shown potential for making progress in routine clinical procedures. We aim to systematically review the TILs analysis based on ML in CRC histological images. Deep learning (DL) and non-DL techniques can aid pathologists in identifying TILs, and automated TILs are associated with patient outcomes. However, a large multi-institutional CRC dataset with a diverse and multi-ethnic population is necessary to generalize ML methods.

MCML Authors
Link to Peter Schüffler

Peter Schüffler

Prof. Dr.

Associate

Computational Pathology


[805]
K. Jeblick, B. Schachtner, J. Dexl, A. Mittermeier, A. T. Stüber, J. Topalis, T. Weber, P. Wesp, B. O. Sabel, J. Ricke and M. Ingrisch.
ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports.
European Radiology 34 (May. 2024). DOI.
MCML Authors
Link to Katharina Jeblick

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Link to Balthasar Schachtner

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Link to Jakob Dexl

Jakob Dexl

Clinical Data Science in Radiology

Link to Andreas Mittermeier

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Link to Theresa Stüber

Theresa Stüber

Clinical Data Science in Radiology

Link to Philipp Wesp

Philipp Wesp

Clinical Data Science in Radiology

Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[804]
W. Qiu, C. Quan, L. Zhu, Y. Yu, Z. Wang, Y. Ma, M. Sun, Y. Chang, K. Qian, B. Hu, Y. Yamamoto and B. W. Schuller.
Heart Sound Abnormality Detection From Multi-Institutional Collaboration: Introducing a Federated Learning Framework.
IEEE Transactions on Biomedical Engineering 71.10 (May. 2024). DOI.
Abstract

Objective: Early diagnosis of cardiovascular diseases is a crucial task in medical practice. With the application of computer audition in the healthcare field, artificial intelligence (AI) has been applied to clinical non-invasive intelligent auscultation of heart sounds to provide rapid and effective pre-screening. However, AI models generally require large amounts of data which may cause privacy issues. Unfortunately, it is difficult to collect large amounts of healthcare data from a single centre. Methods: In this study, we propose federated learning (FL) optimisation strategies for the practical application in multi-centre institutional heart sound databases. The horizontal FL is mainly employed to tackle the privacy problem by aligning the feature spaces of FL participating institutions without information leakage. In addition, techniques based on deep learning have poor interpretability due to their “black-box” property, which limits the feasibility of AI in real medical data. To this end, vertical FL is utilised to address the issues of model interpretability and data scarcity. Conclusion: Experimental results demonstrate that, the proposed FL framework can achieve good performance for heart sound abnormality detection by taking the personal privacy protection into account. Moreover, using the federated feature space is beneficial to balance the interpretability of the vertical FL and the privacy of the data. Significance: This work realises the potential of FL from research to clinical practice, and is expected to have extensive application in the federated smart medical system.

MCML Authors
Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[803]
H. Krasowski and M. Althoff.
Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea.
IEEE Transactions on Intelligent Vehicles Early Access (May. 2024). DOI.
Abstract

For safe operation, autonomous vehicles have to obey traffic rules that are set forth in legal documents formulated in natural language. Temporal logic is a suitable concept to formalize such traffic rules. Still, temporal logic rules often result in constraints that are hard to solve using optimization-based motion planners. Reinforcement learning (RL) is a promising method to find motion plans for autonomous vehicles. However, vanilla RL algorithms are based on random exploration and do not automatically comply with traffic rules. Our approach accomplishes guaranteed rule-compliance by integrating temporal logic specifications into RL. Specifically, we consider the application of vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). To efficiently synthesize rule-compliant actions, we combine predicates based on set-based prediction with a statechart representing our formalized rules and their priorities. Action masking then restricts the RL agent to this set of verified rule-compliant actions. In numerical evaluations on critical maritime traffic situations, our agent always complies with the formalized legal rules and never collides while achieving a high goal-reaching rate during training and deployment. In contrast, vanilla and traffic rule-informed RL agents frequently violate traffic rules and collide even after training.

MCML Authors
Link to Hanna Krasowski

Hanna Krasowski

Dr.

Cyber Physical Systems

Link to Matthias Althoff

Matthias Althoff

Prof. Dr.

Cyber Physical Systems


[802]
V. G. Duque, A. Marquardt, Y. Velikova, L. Lacourpaille, A. Nordez, M. Crouzier, H. J. Lee, D. Mateus and N. Navab.
Ultrasound segmentation analysis via distinct and completed anatomical bordersd.
International Journal of Computer Assisted Radiology and Surgery 19 (May. 2024). DOI.
Abstract

Segmenting ultrasound images is important for precise area and/or volume calculations, ensuring reliable diagnosis and effective treatment evaluation for diseases. Recently, many segmentation methods have been proposed and shown impressive performance. However, currently, there is no deeper understanding of how networks segment target regions or how they define the boundaries. In this paper, we present a new approach that analyzes ultrasound segmentation networks in terms of learned borders because border delimitation is challenging in ultrasound.

MCML Authors
Link to Yordanka Velikova

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Hong Joo Lee

Hong Joo Lee

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[801]
M.-A. Gafencu, Y. Velikova, M. Saleh, T. Ungi, N. Navab, T. Wendler and M. F. Azampour.
Shape completion in the dark: completing vertebrae morphology from 3D ultrasound.
International Journal of Computer Assisted Radiology and Surgery 19 (May. 2024). DOI.
Abstract

Ultrasound (US) imaging, while advantageous for its radiation-free nature, is challenging to interpret due to only partially visible organs and a lack of complete 3D information. While performing US-based diagnosis or investigation, medical professionals therefore create a mental map of the 3D anatomy. In this work, we aim to replicate this process and enhance the visual representation of anatomical structures.

MCML Authors
Link to Yordanka Velikova

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Mohammad Farid Azampour

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality


[800]
K. Hechinger, C. Koller, X. Zhu and G. Kauermann.
Human-in-the-loop: Towards Label Embeddings for Measuring Classification Difficulty.
Preprint at arXiv (May. 2024). arXiv.
Abstract

Uncertainty in machine learning models is a timely and vast field of research. In supervised learning, uncertainty can already occur in the first stage of the training process, the annotation phase. This scenario is particularly evident when some instances cannot be definitively classified. In other words, there is inevitable ambiguity in the annotation step and hence, not necessarily a ‘ground truth’ associated with each instance. The main idea of this work is to drop the assumption of a ground truth label and instead embed the annotations into a multidimensional space. This embedding is derived from the empirical distribution of annotations in a Bayesian setup, modeled via a Dirichlet-Multinomial framework. We estimate the model parameters and posteriors using a stochastic Expectation Maximization algorithm with Markov Chain Monte Carlo steps. The methods developed in this paper readily extend to various situations where multiple annotators independently label instances. To showcase the generality of the proposed approach, we apply our approach to three benchmark datasets for image classification and Natural Language Inference. Besides the embeddings, we can investigate the resulting correlation matrices, which reflect the semantic similarities of the original classes very well for all three exemplary datasets.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[799]
K. Hess, D. Frauen, V. Melnychuk and S. Feuerriegel.
G-Transformer for Conditional Average Potential Outcome Estimation over Time.
Preprint at arXiv (May. 2024). arXiv.
Abstract

Estimating potential outcomes for treatments over time based on observational data is important for personalized decision-making in medicine. Yet, existing neural methods for this task either (1) do not perform proper adjustments for time-varying confounders, or (2) suffer from large estimation variance. In order to address both limitations, we introduce the G-transformer (GT). Our GT is a novel, neural end-to-end model which adjusts for time-varying confounders, and provides low-variance estimation of conditional average potential outcomes (CAPOs) over time. Specifically, our GT is the first neural model to perform regression-based iterative G-computation for CAPOs in the time-varying setting. We evaluate the effectiveness of our GT across various experiments. In sum, this work represents a significant step towards personalized decision-making from electronic health records.

MCML Authors
Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[798]
Z. Li, S. S. Cranganore, N. Youngblut and N. Kilbertus.
Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity.
Preprint at arXiv (May. 2024). arXiv.
Abstract

Leveraging the vast genetic diversity within microbiomes offers unparalleled insights into complex phenotypes, yet the task of accurately predicting and understanding such traits from genomic data remains challenging. We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences. Based on our model, we develop attribution techniques to elucidate gene interaction effects that drive microbial adaptation to diverse environments. We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats. We not only demonstrate solid predictive performance, but also how sequence-level information of entire genomes allows us to identify gene associations underlying complex phenotypes. Our attribution recovers known important interaction networks and proposes new candidates for experimental follow up.

MCML Authors
Link to Zhufeng Li

Zhufeng Li

Ethics in Systems Design and Machine Learning

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[797]
Y. Liu, C. Ma, H. Ye and H. Schütze.
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data.
Preprint at arXiv (May. 2024). arXiv. GitHub.
MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[796]
A. Scagliotti.
Minimax problems for ensembles of affine-control systems.
Preprint at arXiv (May. 2024). arXiv.
MCML Authors
Link to Alessandro Scagliotti

Alessandro Scagliotti

Applied Numerical Analysis


[795]
P. Scholl, K. Bieker, H. Hauger and G. Kutyniok.
ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization.
Preprint at arXiv (May. 2024). arXiv.
Abstract

The problem of symbolic regression (SR) arises in many different applications, such as identifying physical laws or deriving mathematical equations describing the behavior of financial markets from given data. Various methods exist to address the problem of SR, often based on genetic programming. However, these methods are usually complicated and involve various hyperparameters. In this paper, we present our new approach ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. In combination with a global optimizer, this approach results in a highly effective method to tackle the problem of SR. We theoretically analyze the expressivity of ParFam and demonstrate its performance with extensive numerical experiments based on the common SR benchmark suit SRBench, showing that we achieve state-of-the-art results. Moreover, we present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam, accelerating the optimization process by up to two magnitudes.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[794]
X. Zhu, Z. Xiong, Y. Wang, A. Stewart, K. Heidler, Y. Wang, Z. Yuan, T. Dujardin, Q. Xu and Y. Shi.
On the Foundations of Earth and Climate Foundation Models.
Preprint at arXiv (May. 2024). arXiv.
Abstract

Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an environmental- and human-centric this http URL further shed light on the way forward to achieve the ideal model and to evaluate Earth foundation models. What comes after foundation models? Energy efficient adaptation, adversarial defenses, and interpretability are among the emerging directions.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

Link to Adam Stewart

Adam Stewart

Dr.

Data Science in Earth Observation

Link to Qingsong Xu

Qingsong Xu

Data Science in Earth Observation


[793]
R. Debelak, T. Koch, M. Aßenmacher and C. Stachl.
From Embeddings to Explainability: A Tutorial on Transformer-Based Text Analysis for Social and Behavioral Scientists.
Preprint at psyArXxiv (May. 2024). DOI.
Abstract

Large language models and their use for text analysis have had a significant impact on psychology and the social and behavioral sciences in general. Key applications include the analysis of texts, such as social media posts, to infer psychological characteristics, as well as survey and interview analysis. In this tutorial paper, we demonstrate the use of the Python-based natural language processing software package transformers (and related modules from the Hugging Face Ecosystem) that allow for the automated classification of text inputs in a practical exercise. In doing so, we rely on pretrained transformer models which can be fine-tuned to a specific task and domain. The first proposed application of this model class is to use it as a feature extractor, allowing for the transformation of written text into real-valued numerical vectors (called ’embeddings’) that capture a text’s semantic meaning. These vectors can, in turn, be used as input for a subsequent machine-learning model. The second presented application of transformer models is the end-to-end training (so-called ‘fine-tuning’) of the model. This results in a direct prediction of the label within the same model that directly maps the text to the embeddings. While in the second case, results are usually better and training works more seamlessly, the model itself is often not directly interpretable. We showcase an alleviation of this issue via the application of post-hoc interpretability methods by calculating SHAP values and applying local interpretable model-agnostic explanations (LIME) in an attempt to explain the model’s inner workings.

MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[792]
A. F. Thielmann, A. Reuter, T. Kneib, D. Rügamer and B. Säfken.
Interpretable Additive Tabular Transformer Networks.
Transactions on Machine Learning Research (May. 2024). URL.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[791]
N. Strauß and M. Schubert.
Spatial-Aware Deep Reinforcement Learning for the Traveling Officer Problem.
SIAM International Conference on Data Mining (SDM 2024). Houston, TX, USA, Apr 18-20, 2024. DOI.
MCML Authors
Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[790]
P. Dettling, M. Drton and M. Kolar.
On the Lasso for Graphical Continuous Lyapunov Models.
3rd Conference on Causal Learning and Reasoning (CLeaR 2024). Los Angeles, CA, USA, Apr 01-03, 2024. URL.
Abstract

Graphical continuous Lyapunov models offer a new perspective on modeling causally interpretable dependence structure in multivariate data by treating each independent observation as a one-time cross-sectional snapshot of a temporal process. Specifically, the models assume that the observations are cross-sections of independent multivariate Ornstein-Uhlenbeck processes in equilibrium. The Gaussian equilibrium exists under a stability assumption on the drift matrix, and the equilibrium covariance matrix is determined by the continuous Lyapunov equation. Each graphical continuous Lyapunov model assumes the drift matrix to be sparse, with a support determined by a directed graph. A natural approach to model selection in this setting is to use an ℓ1-regularization technique that, based on a given sample covariance matrix, seeks to find a sparse approximate solution to the Lyapunov equation. We study the model selection properties of the resulting lasso technique to arrive at a consistency result. Our detailed analysis reveals that the involved irrepresentability condition is surprisingly difficult to satisfy. While this may prevent asymptotic consistency in model selection, our numerical experiments indicate that even if the theoretical requirements for consistency are not met, the lasso approach is able to recover relevant structure of the drift matrix and is robust to aspects of model misspecification.

MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[789]
K. Göbler, T. Windisch, M. Drton, T. Pychynski, M. Roth and S. Sonntag.
causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery.
3rd Conference on Causal Learning and Reasoning (CLeaR 2024). Los Angeles, CA, USA, Apr 01-03, 2024. URL.
Abstract

Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real and complex data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To tackle these challenges, we introduce causalAssembly, a semisynthetic data generator designed to facilitate the benchmarking of causal discovery methods. The tool is built using a complex real-world dataset comprised of measurements collected along an assembly line in a manufacturing setting. For these measurements, we establish a partial set of ground truth causal relationships through a detailed study of the physics underlying the processes carried out in the assembly line. The partial ground truth is sufficiently informative to allow for estimation of a full causal graph by mere nonparametric regression. To overcome potential confounding and privacy concerns, we use distributional random forests to estimate and represent conditional distributions implied by the ground truth causal graph. These conditionals are combined into a joint distribution that strictly adheres to a causal model over the observed variables. Sampling from this distribution, causalAssembly generates data that are guaranteed to be Markovian with respect to the ground truth. Using our tool, we showcase how to benchmark several well-known causal discovery algorithms.

MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[788]
D. Strieder and M. Drton.
Dual Likelihood for Causal Inference under Structure Uncertainty.
3rd Conference on Causal Learning and Reasoning (CLeaR 2024). Los Angeles, CA, USA, Apr 01-03, 2024. URL.
Abstract

Knowledge of the underlying causal relations is essential for inferring the effect of interventions in complex systems. In a widely studied approach, structural causal models postulate noisy functional relations among interacting variables, where the underlying causal structure is then naturally represented by a directed graph whose edges indicate direct causal dependencies. In the typical application, this underlying causal structure must be learned from data, and thus, the remaining structure uncertainty needs to be incorporated into causal inference in order to draw reliable conclusions. In recent work, test inversions provide an ansatz to account for this data-driven model choice and, therefore, combine structure learning with causal inference. In this article, we propose the use of dual likelihood to greatly simplify the treatment of the involved testing problem. Indeed, dual likelihood leads to a closed-form solution for constructing confidence regions for total causal effects that rigorously capture both sources of uncertainty: causal structure and numerical size of nonzero effects. The proposed confidence regions can be computed with a bottom-up procedure starting from sink nodes. To render the causal structure identifiable, we develop our ideas in the context of linear causal relations with equal error variances.

MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[787]
H. A. Gündüz, R. Mreches, J. Moosbauer, G. Robertson, X.-Y. To, E. A. Franzosa, C. Huttenhower, M. Rezaei, A. C. McHardy, B. Bischl, P. C. Münch and M. Binder.
Optimized model architectures for deep learning on genomic data.
Communications Biology 7.1 (Apr. 2024). DOI.
Abstract

The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.

MCML Authors
Link to Hüseyin Anil Gündüz

Hüseyin Anil Gündüz

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science


[786]
M. Herrmann, D. Kazempour, F. Scheipl and P. Kröger.
Enhancing cluster analysis via topological manifold learning.
Data Mining and Knowledge Discovery 38 (Apr. 2024). DOI.
MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member


[785]
C. Koller, P. Jung and X. Zhu.
Can Land Cover Classification Models Benefit From Distance-Aware Architectures?.
IEEE Geoscience and Remote Sensing Magazine 21 (Apr. 2024). DOI. GitHub.
Abstract

The quantification of predictive uncertainties helps to understand where the existing models struggle to find the correct prediction. A useful quality control tool is the task of detecting out-of-distribution (OOD) data by examining the model’s predictive uncertainty. For this task, deterministic single forward pass frameworks have recently been established as deep learning models and have shown competitive performance in certain tasks. The unique combination of spectrally normalized weight matrices and residual connection networks with an approximate Gaussian process (GP) output layer can here offer the best trade-off between performance and complexity. We utilize this framework with a refined version that adds spectral batch normalization and an inducing points approximation of the GP for the task of OOD detection in remote sensing image classification. This is an important task in the field of remote sensing, because it provides an evaluation of how reliable the model’s predictive uncertainty estimates are. By performing experiments on the benchmark datasets Eurosat and So2Sat LCZ42, we can show the effectiveness of the proposed adaptions to the residual networks (ResNets). Depending on the chosen dataset, the proposed methodology achieves OOD detection performance up to 16% higher than previously considered distance-aware networks. Compared with other uncertainty quantification methodologies, the results are on the same level and exceed them in certain experiments by up to 2%. In particular, spectral batch normalization, which normalizes the batched data as opposed to normalizing the network weights by the spectral normalization (SN), plays a crucial role and leads to performance gains of up to 3% in every single experiment.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[784]
X. Li, C. Wen, Y. Hu, Z. Yuan and X. Zhu.
Vision-Language Models in Remote Sensing: Current progress and future trends.
IEEE Geoscience and Remote Sensing Magazine 62 (Apr. 2024). DOI.
Abstract

The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-4) have sparked a wave of interest and research in the field of large language models (LLMs) for artificial general intelligence (AGI). These models provide intelligent solutions that are closer to human thinking, enabling us to use general artificial intelligence (AI) to solve problems in various applications. However, in the field of remote sensing (RS), the scientific literature on the implementation of AGI remains relatively scant. Existing AI-related research in RS focuses primarily on visual-understanding tasks while neglecting the semantic understanding of the objects and their relationships. This is where vision-LMs (VLMs) excel as they enable reasoning about images and their associated textual descriptions, allowing for a deeper understanding of the underlying semantics. VLMs can go beyond visual recognition of RS images and can model semantic relationships as well as generate natural language descriptions of the image. This makes them better suited for tasks that require both visual and textual understanding, such as image captioning and visual question answering (VQA). This article provides a comprehensive review of the research on VLMs in RS, summarizing the latest progress, highlighting current challenges, and identifying potential research opportunities. Specifically, we review the application of VLMs in mainstream RS tasks, including image captioning, text-based image generation, text-based image retrieval (TBIR), VQA, scene classification, semantic segmentation, and object detection. For each task, we analyze representative works and discuss research progress. Finally, we summarize the limitations of existing works and provide possible directions for future development. This review aims to provide a comprehensive overview of the current research progress of VLMs in RS (see Figure 1 ), and to inspire further research in this exciting and promising field.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[783]
K. Qian, Y. Wang, P. Jung, Y. Shi and X. Zhu.
HyperLISTA-ABT: An Ultralight Unfolded Network for Accurate Multicomponent Differential Tomographic SAR Inversion.
IEEE Transactions on Geoscience and Remote Sensing 62 (Apr. 2024). DOI.
Abstract

Deep neural networks based on unrolled iterative algorithms have achieved remarkable success in sparse reconstruction applications, such as synthetic aperture radar (SAR) tomographic inversion (TomoSAR). However, the currently available deep learning-based TomoSAR algorithms are limited to 3-D reconstruction. The extension of deep learning-based algorithms to 4-D imaging, i.e., differential TomoSAR (D-TomoSAR) applications, is impeded mainly due to the high-dimensional weight matrices required by the network designed for D-TomoSAR inversion, which typically contain millions of freely trainable parameters. Learning such huge number of weights requires an enormous number of training samples, resulting in a large memory burden and excessive time consumption. To tackle this issue, we propose an efficient and accurate algorithm called HyperLISTA-ABT. The weights in HyperLISTA-ABT are determined in an analytical way according to a minimum coherence criterion, trimming the model down to an ultra-light one with only three hyperparameters. Additionally, HyperLISTA-ABT improves the global thresholding by utilizing an adaptive blockwise thresholding (ABT) scheme, which applies block-coordinate techniques and conducts thresholding in local blocks, so that weak expressions and local features can be retained in the shrinkage step layer by layer. Simulations were performed and demonstrated the effectiveness of our approach, showing that HyperLISTA-ABT achieves superior computational efficiency with no significant performance degradation compared to the state-of-the-art methods. Real data experiments showed that a high-quality 4-D point cloud could be reconstructed over a large area by the proposed HyperLISTA-ABT with affordable computational resources and in a fast time.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[782]
Y. Lee, H. Boche and G. Kutyniok.
Computability of Optimizers.
IEEE Transactions on Information Theory 70.4 (Apr. 2024). DOI.
Abstract

Optimization problems are a staple of today’s scientific and technical landscape. However, at present, solvers of such problems are almost exclusively run on digital hardware. Using Turing machines as a mathematical model for any type of digital hardware, in this paper, we analyze fundamental limitations of this conceptual approach of solving optimization problems. Since in most applications, the optimizer itself is of significantly more interest than the optimal value of the corresponding function, we will focus on computability of the optimizer. In fact, we will show that in various situations the optimizer is unattainable on Turing machines and consequently on digital computers. Moreover, even worse, there does not exist a Turing machine, which approximates the optimizer itself up to a certain constant error. We prove such results for a variety of well-known problems from very different areas, including artificial intelligence, financial mathematics, and information theory, often deriving the even stronger result that such problems are not Banach-Mazur computable, also not even in an approximate sense.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[781]
J. Guo, D. Hong and X. Zhu.
High-resolution satellite images reveal the prevalent positive indirect impact of urbanization on urban tree canopy coverage in South America.
Landscape and Urban Planning 247 (Apr. 2024). DOI.
Abstract

Trees in urban areas act as carbon sinks and provide ecosystem services for residents. However, the impact of urbanization on tree coverage in South America remains poorly understood. Here, we make use of very high resolution satellite imagery to derive urban tree coverage for 882 cities in South America and developed a tree coverage impacted (TCI) coefficient to quantify the direct and indirect impacts of urbanization on urban tree canopy (UTC) coverage. The direct effect refers to the change in tree cover due to the rise in urban intensity compared to scenarios with extremely low levels of urbanization, while the indirect impact refers to the change in tree coverage resulting from human management practices and alterations in urban environments. Our study revealed the negative direct impacts and prevalent positive indirect impacts of urbanization on UTC coverage. In South America, 841 cities exhibit positive indirect impacts, while only 41 cities show negative indirect impacts. The prevalent positive indirect effects can offset approximately 48% of the direct loss of tree coverage due to increased urban intensity, with full offsets achieved in Argentinian and arid regions of South America. In addition, human activity factors play the most important role in determining the indirect effects of urbanization on UTC coverage, followed by climatic and geographic factors. These findings will help us understand the impact of urbanization on UTC coverage along the urban intensity gradient and formulate policies and strategies to promote sustainable urban development in South America.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[780]
S. Feuerriegel, D. Frauen, V. Melnychuk, J. Schweisthal, K. Hess, A. Curth, S. Bauer, N. Kilbertus, I. S. Kohane and M. van der Schaar.
Causal machine learning for predicting treatment outcomes.
Nature Medicine 30 (Apr. 2024). DOI.
Abstract

Causal machine learning (ML) offers flexible, data-driven methods for predicting treatment outcomes including efficacy and toxicity, thereby supporting the assessment and safety of drugs. A key benefit of causal ML is that it allows for estimating individualized treatment effects, so that clinical decision-making can be personalized to individual patient profiles. Causal ML can be used in combination with both clinical trial data and real-world data, such as clinical registries and electronic health records, but caution is needed to avoid biased or incorrect predictions. In this Perspective, we discuss the benefits of causal ML (relative to traditional statistical or ML approaches) and outline the key components and steps. Finally, we provide recommendations for the reliable use of causal ML and effective translation into the clinic.

MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Jonas Schweisthal

Jonas Schweisthal

Artificial Intelligence in Management

Link to Stefan Bauer

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning


[779]
S. Amiriparian, M. Gerczuk, J. Lutz, W. Strube, I. Papazova, A. Hasan, A. Kathan and B. W. Schuller.
Non-Invasive Suicide Risk Prediction Through Speech Analysis.
Preprint at arXiv (Apr. 2024). arXiv.
Abstract

The delayed access to specialized psychiatric assessments and care for patients at risk of suicidal tendencies in emergency departments creates a notable gap in timely intervention, hindering the provision of adequate mental health support during critical situations. To address this, we present a non-invasive, speech-based approach for automatic suicide risk assessment. For our study, we collected a novel speech recording dataset from 20 patients. We extract three sets of features, including wav2vec, interpretable speech and acoustic features, and deep learning-based spectral representations. We proceed by conducting a binary classification to assess suicide risk in a leave-one-subject-out fashion. Our most effective speech model achieves a balanced accuracy of 66.2%. Moreover, we show that integrating our speech model with a series of patients’ metadata, such as the history of suicide attempts or access to firearms, improves the overall result. The metadata integration yields a balanced accuracy of 94.4%, marking an absolute improvement of 28.2%, demonstrating the efficacy of our proposed approaches for automatic suicide risk assessment in emergency medicine.

MCML Authors
Link to Shahin Amiriparian

Shahin Amiriparian

Dr.

Health Informatics

Link to Maurice Gerczuk

Maurice Gerczuk

Health Informatics

Link to Alexander Kathan

Alexander Kathan

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[778]
V. Gkolemis, C. Diou, E. Ntoutsi, T. Dalamagas, B. Bischl, J. Herbinger and G. Casalicchio.
Effector: A Python package for regional explanations.
Preprint at arXiv (Apr. 2024). arXiv. GitHub.
Abstract

Global feature effect methods explain a model outputting one plot per feature. The plot shows the average effect of the feature on the output, like the effect of age on the annual income. However, average effects may be misleading when derived from local effects that are heterogeneous, i.e., they significantly deviate from the average. To decrease the heterogeneity, regional effects provide multiple plots per feature, each representing the average effect within a specific subspace. For interpretability, subspaces are defined as hyperrectangles defined by a chain of logical rules, like age’s effect on annual income separately for males and females and different levels of professional experience. We introduce Effector, a Python library dedicated to regional feature effects. Effector implements well-established global effect methods, assesses the heterogeneity of each method and, based on that, provides regional effects. Effector automatically detects subspaces where regional effects have reduced heterogeneity. All global and regional effect methods share a common API, facilitating comparisons between them. Moreover, the library’s interface is extensible so new methods can be easily added and benchmarked.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[777]
P. Hofman, Y. Sale and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty with Proper Scoring Rules.
Preprint at arXiv (Apr. 2024). arXiv.
Abstract

Uncertainty representation and quantification are paramount in machine learning and constitute an important prerequisite for safety-critical applications. In this paper, we propose novel measures for the quantification of aleatoric and epistemic uncertainty based on proper scoring rules, which are loss functions with the meaningful property that they incentivize the learner to predict ground-truth (conditional) probabilities. We assume two common representations of (epistemic) uncertainty, namely, in terms of a credal set, i.e. a set of probability distributions, or a second-order distribution, i.e., a distribution over probability distributions. Our framework establishes a natural bridge between these representations. We provide a formal justification of our approach and introduce new measures of epistemic and aleatoric uncertainty as concrete instantiations.

MCML Authors
Link to Paul Hofman

Paul Hofman

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[776]
Y. Li, T. Wolf, S. Pölsterl, I. Yakushev, D. Hedderich and C. Wachinger.
From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data.
Preprint at arXiv (Apr. 2024). arXiv.
Abstract

Differential diagnosis of dementia is challenging due to overlapping symptoms, with structural magnetic resonance imaging (MRI) being the primary method for diagnosis. Despite the clinical value of computer-aided differential diagnosis, research has been limited, mainly due to the absence of public datasets that contain diverse types of dementia. This leaves researchers with small in-house datasets that are insufficient for training deep neural networks (DNNs). Self-supervised learning shows promise for utilizing unlabeled MRI scans in training, but small batch sizes for volumetric brain scans make its application challenging. To address these issues, we propose Triplet Training for differential diagnosis with limited target data. It consists of three key stages: (i) self-supervised pre-training on unlabeled data with Barlow Twins, (ii) self-distillation on task-related data, and (iii) fine-tuning on the target dataset. Our approach significantly outperforms traditional training strategies, achieving a balanced accuracy of 75.6%. We further provide insights into the training process by visualizing changes in the latent space after each step. Finally, we validate the robustness of Triplet Training in terms of its individual components in a comprehensive ablation study.

MCML Authors
Link to Yitong Li

Yitong Li

Artificial Intelligence in Radiology

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[775]
P. Lin, S. Ji, J. Tiedemann, A. F. T. Martins and H. Schütze.
MaLA-500: Massive Language Adaptation of Large Language Models.
Preprint at arXiv (Apr. 2024). arXiv. GitHub.
MCML Authors
Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[774]
Y. Mansour and R. Heckel.
GAMA-IR: Global Additive Multidimensional Averaging for Fast Image Restoration.
Preprint at arXiv (Apr. 2024). arXiv.
Abstract

Deep learning-based methods have shown remarkable success for various image restoration tasks such as denoising and deblurring. The current state-of-the-art networks are relatively deep and utilize (variants of) self attention mechanisms. Those networks are significantly slower than shallow convolutional networks, which however perform worse. In this paper, we introduce an image restoration network that is both fast and yields excellent image quality. The network is designed to minimize the latency and memory consumption when executed on a standard GPU, while maintaining state-of-the-art performance. The network is a simple shallow network with an efficient block that implements global additive multidimensional averaging operations. This block can capture global information and enable a large receptive field even when used in shallow networks with minimal computational overhead. Through extensive experiments and evaluations on diverse tasks, we demonstrate that our network achieves comparable or even superior results to existing state-of-the-art image restoration networks with less latency. For instance, we exceed the state-of-the-art result on real-world SIDD denoising by 0.11dB, while being 2 to 10 times faster.

MCML Authors
Link to Reinhard Heckel

Reinhard Heckel

Prof. Dr.

Machine Learning


[773]
S. Maskey, G. Kutyniok and R. Levie.
Generalization Bounds for Message Passing Networks on Mixture of Graphons.
Preprint at arXiv (Apr. 2024). arXiv.
Abstract

We study the generalization capabilities of Message Passing Neural Networks (MPNNs), a prevalent class of Graph Neural Networks (GNN). We derive generalization bounds specifically for MPNNs with normalized sum aggregation and mean aggregation. Our analysis is based on a data generation model incorporating a finite set of template graphons. Each graph within this framework is generated by sampling from one of the graphons with a certain degree of perturbation. In particular, we extend previous MPNN generalization results to a more realistic setting, which includes the following modifications: 1) we analyze simple random graphs with Bernoulli-distributed edges instead of weighted graphs; 2) we sample both graphs and graph signals from perturbed graphons instead of clean graphons; and 3) we analyze sparse graphs instead of dense graphs. In this more realistic and challenging scenario, we provide a generalization bound that decreases as the average number of nodes in the graphs increases. Our results imply that MPNNs with higher complexity than the size of the training set can still generalize effectively, as long as the graphs are sufficiently large.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[772]
A. Modarressi, A. Köksal, A. Imani, M. Fayyaz and H. Schütze.
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory.
Preprint at arXiv (Apr. 2024). arXiv.
MCML Authors
Link to Ali Modarressi

Ali Modarressi

Statistical NLP and Deep Learning

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[771]
L. Rottkamp and M. Schubert.
A Time-Inhomogeneous Markov Model for Resource Availability under Sparse Observations.
Preprint at arXiv (Apr. 2024). arXiv.
Abstract

Accurate spatio-temporal information about the current situation is crucial for smart city applications such as modern routing algorithms. Often, this information describes the state of stationary resources, e.g. the availability of parking bays, charging stations or the amount of people waiting for a vehicle to pick them up near a given location. To exploit this kind of information, predicting future states of the monitored resources is often mandatory because a resource might change its state within the time until it is needed. To train an accurate predictive model, it is often not possible to obtain a continuous time series on the state of the resource. For example, the information might be collected from traveling agents visiting the resource with an irregular frequency. Thus, it is necessary to develop methods which work on sparse observations for training and prediction. In this paper, we propose time-inhomogeneous discrete Markov models to allow accurate prediction even when the frequency of observation is very rare. Our new model is able to blend recent observations with historic data and also provide useful probabilistic estimates for future states. Since resources availability in a city is typically time-dependent, our Markov model is time-inhomogeneous and cyclic within a predefined time interval. To train our model, we propose a modified Baum-Welch algorithm. Evaluations on real-world datasets of parking bay availability show that our new method indeed yields good results compared to methods being trained on complete data and non-cyclic variants.

MCML Authors
Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[770]
A. Triantafyllopoulos and B. W. Schuller.
Expressivity and Speech Synthesis.
Preprint at arXiv (Apr. 2024). arXiv.
Abstract

Imbuing machines with the ability to talk has been a longtime pursuit of artificial intelligence (AI) research. From the very beginning, the community has not only aimed to synthesise high-fidelity speech that accurately conveys the semantic meaning of an utterance, but also to colour it with inflections that cover the same range of affective expressions that humans are capable of. After many years of research, it appears that we are on the cusp of achieving this when it comes to single, isolated utterances. This unveils an abundance of potential avenues to explore when it comes to combining these single utterances with the aim of synthesising more complex, longer-term behaviours. In the present chapter, we outline the methodological advances that brought us so far and sketch out the ongoing efforts to reach that coveted next level of artificial expressivity. We also discuss the societal implications coupled with rapidly advancing expressive speech synthesis (ESS) technology and highlight ways to mitigate those risks and ensure the alignment of ESS capabilities with ethical norms.

MCML Authors
Link to Andreas Triantafyllopoulos

Andreas Triantafyllopoulos

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[769]
T. Weber, J. Dexl, D. Rügamer and M. Ingrisch.
Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition.
Preprint at arXiv (Apr. 2024). arXiv.
MCML Authors
Link to Jakob Dexl

Jakob Dexl

Clinical Data Science in Radiology

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[768]
X. Zhu, Q. Li, Y. Shi, Y. Wang, A. Stewart and J. Prexl.
GlobalBuildingMap -- Unveiling the Mystery of Global Buildings.
Preprint at arXiv (Apr. 2024). arXiv.
Abstract

Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. To this end, by using a big data analytics approach and nearly 800,000 satellite images, we generated the highest resolution and highest accuracy building map ever created: the GlobalBuildingMap (GBM). A joint analysis of building maps and solar potentials indicates that rooftop solar energy can supply the global energy consumption need at a reasonable cost. Specifically, if solar panels were placed on the roofs of all buildings, they could supply 1.1-3.3 times – depending on the efficiency of the solar device – the global energy consumption in 2020, which is the year with the highest consumption on record. We also identified a clear geospatial correlation between building areas and key socioeconomic variables, which indicates our global building map can serve as an important input to modeling global socioeconomic needs and drivers.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

Link to Adam Stewart

Adam Stewart

Dr.

Data Science in Earth Observation


[767]
C. Gruber, K. Hechinger, M. Aßenmacher, G. Kauermann and B. Plank.
More Labels or Cases? Assessing Label Variation in Natural Language Inference.
3rd Workshop on Understanding Implicit and Underspecified Language (UnImplicit 2024). Malta, Mar 21, 2024. URL.
MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science

Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[766]
S. Peng, Z. Sun, S. Loftus and B. Plank.
Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations.
3rd Workshop on Understanding Implicit and Underspecified Language (UnImplicit 2024). Malta, Mar 21, 2024. URL.
Abstract

Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition. While recent studies address and aim to correct annotation errors via re-labeling efforts, little is known about the sources of human label variation, such as text ambiguity, annotation error, or guideline divergence. This is especially the case for high-quality datasets and beyond English CoNLL03. This paper studies disagreements in expert-annotated named entity datasets for three languages: English, Danish, and Bavarian. We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions. We survey student annotations on a subset of difficult entities and substantiate the feasibility and necessity of manifold annotations for understanding named entity ambiguities from a distributional perspective.

MCML Authors
Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[765]
A. Maronikolakis, A. Köksal and H. Schütze.
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks.
4th Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI 2024). St. Julian's, Malta, Mar 21, 2024. URL.
MCML Authors
Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[764]
V. Ehm, P. Roetzer, M. Eisenberger, M. Gao, F. Bernard and D. Cremers.
Geometrically Consistent Partial Shape Matching.
11th International Conference on 3D Vision (3DV 2024). Davos, Switzerland, Mar 18-21, 2024. DOI. GitHub.
Abstract

Finding correspondences between 3D shapes is a crucial problem in computer vision and graphics, which is for example relevant for tasks like shape interpolation, pose transfer, or texture transfer. An often neglected but essential property of matchings is geometric consistency, which means that neighboring triangles in one shape are consistently matched to neighboring triangles in the other shape. Moreover, while in practice one often has only access to partial observations of a 3D shape (e.g. due to occlusion, or scanning artifacts), there do not exist any methods that directly address geometrically consistent partial shape matching. In this work we fill this gap by proposing to integrate state-of-the-art deep shape features into a novel integer linear programming partial shape matching formulation. Our optimization yields a globally optimal solution on low resolution shapes, which we then refine using a coarse-to-fine scheme. We show that our method can find more reliable results on partial shapes in comparison to existing geometrically consistent algorithms (for which one first has to fill missing parts with a dummy geometry). Moreover, our matchings are substantially smoother than learning-based state-of-the-art shape matching methods.

MCML Authors
Link to Viktoria Ehm

Viktoria Ehm

Computer Vision & Artificial Intelligence

Link to Maolin Gao

Maolin Gao

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[763]
A. Hayler, F. Wimbauer, D. Muhle, C. Rupprecht and D. Cremers.
S4C: Self-Supervised Semantic Scene Completion with Neural Fields.
11th International Conference on 3D Vision (3DV 2024). Davos, Switzerland, Mar 18-21, 2024. DOI.
Abstract

3D semantic scene understanding is a fundamental challenge in computer vision. It enables mobile agents to autonomously plan and navigate arbitrary environments. SSC formalizes this challenge as jointly estimating dense geometry and semantic information from sparse observations of a scene. Current methods for SSC are generally trained on 3D ground truth based on aggregated LiDAR scans. This process relies on special sensors and annotation by hand which are costly and do not scale well. To overcome this issue, our work presents the first self-supervised approach to SSC called S4C that does not rely on 3D ground truth data. Our proposed method can reconstruct a scene from a single image and only relies on videos and pseudo segmentation ground truth generated from off-the-shelf image segmentation network during training. Unlike existing methods, which use discrete voxel grids, we represent scenes as implicit semantic fields. This formulation allows querying any point within the camera frustum for occupancy and semantic class. Our architecture is trained through rendering-based self-supervised losses. Nonetheless, our method achieves performance close to fully supervised state-of-the-art methods. Additionally, our method demonstrates strong generalization capabilities and can synthesize accurate segmentation maps for far away viewpoints.

MCML Authors
Link to Felix Wimbauer

Felix Wimbauer

Computer Vision & Artificial Intelligence

Link to Dominik Muhle

Dominik Muhle

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[762]
S. Klenk, M. Motzet, L. Koestler and D. Cremers.
Deep Event Visual Odometry.
11th International Conference on 3D Vision (3DV 2024). Davos, Switzerland, Mar 18-21, 2024. DOI.
Abstract

Event cameras offer the exciting possibility of tracking the camera’s pose during high-speed motion and in adverse lighting conditions. Despite this promise, existing event-based monocular visual odometry (VO) approaches demonstrate limited performance on recent benchmarks. To address this limitation, some methods resort to additional sensors such as IMUs, stereo event cameras, or frame-based cameras. Nonetheless, these additional sensors limit the application of event cameras in real-world devices since they increase cost and complicate system requirements. Moreover, relying on a frame-based camera makes the system susceptible to motion blur and HDR. To remove the dependency on additional sensors and to push the limits of using only a single event camera, we present Deep Event VO (DEVO), the first monocular event-only system with strong performance on a large number of real-world benchmarks. DEVO sparsely tracks selected event patches over time. A key component of DEVO is a novel deep patch selection mechanism tailored to event data. We significantly decrease the state-of-the-art pose tracking error on seven real-world benchmarks by up to 97% compared to event-only methods and often surpass or are close to stereo or inertial methods.

MCML Authors
Link to Simon Klenk

Simon Klenk

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[761]
E. Artemova, V. Blaschke and B. Plank.
Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Mainstream cross-lingual task-oriented dialogue (ToD) systems leverage the transfer learning paradigm by training a joint model for intent recognition and slot-filling in English and applying it, zero-shot, to other languages.We address a gap in prior research, which often overlooked the transfer to lower-resource colloquial varieties due to limited test data.Inspired by prior work on English varieties, we craft and manually evaluate perturbation rules that transform German sentences into colloquial forms and use them to synthesize test sets in four ToD datasets.Our perturbation rules cover 18 distinct language phenomena, enabling us to explore the impact of each perturbation on slot and intent performance.Using these new datasets, we conduct an experimental evaluation across six different transformers.Here, we demonstrate that when applied to colloquial varieties, ToD systems maintain their intent recognition performance, losing 6% (4.62 percentage points) in accuracy on average. However, they exhibit a significant drop in slot detection, with a decrease of 31% (21 percentage points) in slot F1 score.Our findings are further supported by a transfer experiment from Standard American English to synthetic Urban African American Vernacular English.

MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[760]
J. Baan, R. Fernández, B. Plank and W. Aziz.
Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

With the rise of increasingly powerful and user-facing NLP systems, there is growing interest in assessing whether they have a good representation of uncertainty by evaluating the quality of their predictive distribution over outcomes. We identify two main perspectives that drive starkly different evaluation protocols. The first treats predictive probability as an indication of model confidence; the second as an indication of human label variation. We discuss their merits and limitations, and take the position that both are crucial for trustworthy and fair NLP systems, but that exploiting a single predictive distribution is limiting. We recommend tools and highlight exciting directions towards models with disentangled representations of uncertainty about predictions and uncertainty about human labels.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[759]
J. Beck, S. Eckman, B. Ma, R. Chew and F. Kreuter.
Order Effects in Annotation Tasks: Further Evidence of Annotation Sensitivity.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

The data-centric revolution in AI has revealed the importance of high-quality training data for developing successful AI models. However, annotations are sensitive to annotator characteristics, training materials, and to the design and wording of the data collection instrument. This paper explores the impact of observation order on annotations. We find that annotators’ judgments change based on the order in which they see observations. We use ideas from social psychology to motivate hypotheses about why this order effect occurs. We believe that insights from social science can help AI researchers improve data and model quality.

MCML Authors
Link to Jacob Beck

Jacob Beck

Social Data Science and AI Lab

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[758]
P. Lin, C. Hu, Z. Zhang, A. Martins and H. Schütze.
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Recent multilingual pretrained language models (mPLMs) have been shown to encode strong language-specific signals, which are not explicitly provided during pretraining. It remains an open question whether it is feasible to employ mPLMs to measure language similarity, and subsequently use the similarity results to select source languages for boosting cross-lingual transfer. To investigate this, we propose mPLM-Sim, a language similarity measure that induces the similarities across languages from mPLMs using multi-parallel corpora. Our study shows that mPLM-Sim exhibits moderately high correlations with linguistic similarity measures, such as lexicostatistics, genealogical language family, and geographical sprachbund. We also conduct a case study on languages with low correlation and observe that mPLM-Sim yields more accurate similarity results. Additionally, we find that similarity results vary across different mPLMs and different layers within an mPLM. We further investigate whether mPLM-Sim is effective for zero-shot cross-lingual transfer by conducting experiments on both low-level syntactic tasks and high-level semantic tasks. The experimental results demonstrate that mPLM-Sim is capable of selecting better source languages than linguistic measures, resulting in a 1%-2% improvement in zero-shot cross-lingual transfer performance.

MCML Authors
Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[757]
B. Ma, E. Nie, S. Yuan, H. Schmid, M. Färber, F. Kreuter and H. Schütze.
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt Decomposition (ToPro), which facilitates the prompt-based method for token-level sequence labeling tasks. The ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer, especially for languages that are typologically different from the source language English. Our method also attains state-of-the-art performance when employed with the mT5 model. Besides, our exploratory study in multilingual large language models shows that ToPro performs much better than the current in-context learning method. Overall, the performance improvements show that ToPro could potentially serve as a novel and simple benchmarking method for sequence labeling tasks.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[756]
L. K. Şenel, B. Ebing, K. Baghirova, H. Schütze and G. Glavaš.
Kardeş-NLU: Transfer to Low-Resource Languages with Big Brother’s Help – A Benchmark and Evaluation for Turkic Languages.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Cross-lingual transfer (XLT) driven by massively multilingual language models (mmLMs) has been shown largely ineffective for low-resource (LR) target languages with little (or no) representation in mmLM’s pretraining, especially if they are linguistically distant from the high-resource (HR) source language. Much of the recent focus in XLT research has been dedicated to LR language families, i.e., families without any HR languages (e.g., families of African languages or indigenous languages of the Americas). In this work, in contrast, we investigate a configuration that is arguably of practical relevance for more of the world’s languages: XLT to LR languages that do have a close HR relative. To explore the extent to which a HR language can facilitate transfer to its LR relatives, we (1) introduce Kardeş-NLU, an evaluation benchmark with language understanding datasets in five LR Turkic languages: Azerbaijani, Kazakh, Kyrgyz, Uzbek, and Uyghur; and (2) investigate (a) intermediate training and (b) fine-tuning strategies that leverage Turkish in XLT to these target languages. Our experimental results show that both - integrating Turkish in intermediate training and in downstream fine-tuning - yield substantial improvements in XLT to LR Turkic languages. Finally, we benchmark cutting-edge instruction-tuned large language models on Kardeş-NLU, showing that their performance is highly task- and language-dependent.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[755]
M. Zhang, R. van der Goot, M.-Y. Kan and B. Plank.
NNOSE: Nearest Neighbor Occupational Skill Extraction.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks—combining and leveraging multiple datasets for skill extraction, to identify rarely observed skills within a dataset, and overcoming the scarcity of skills across datasets. In particular, we investigate the retrieval-augmentation of language models, employing an external datastore for retrieving similar skills in a dataset-unifying manner. Our proposed method, Nearest Neighbor Occupational Skill Extraction (NNOSE) effectively leverages multiple datasets by retrieving neighboring skills from other datasets in the datastore. This improves skill extraction without additional fine-tuning. Crucially, we observe a performance gain in predicting infrequent patterns, with substantial gains of up to 30% span-F1 in cross-dataset settings.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[754]
M. Zhang, R. van der Goot and B. Plank.
Entity Linking in the Job Market Domain.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

In Natural Language Processing, entity linking (EL) has centered around Wikipedia, but yet remains underexplored for the job market domain. Disambiguating skill mentions can help us get insight into the current labor market demands. In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014). Previous efforts linked coarse-grained (full) sentences to a corresponding ESCO skill. In this work, we link more fine-grained span-level mentions of skills. We tune two high-performing neural EL models, a bi-encoder (Wu et al., 2020) and an autoregressive model (Cao et al., 2021), on a synthetically generated mention–skill pair dataset and evaluate them on a human-annotated skill-linking benchmark. Our findings reveal that both models are capable of linking implicit mentions of skills to their correct taxonomy counterparts. Empirically, BLINK outperforms GENRE in strict evaluation, but GENRE performs better in loose evaluation (accuracy@k).

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[753]
A. Sorensen, S. Peng, B. Plank and R. Goot.
EEVEE: An Easy Annotation Tool for Natural Language Processing.
18th Linguistic Annotation Workshop (LAW 2024) at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Annotation tools are the starting point for creating Natural Language Processing (NLP) datasets. There is a wide variety of tools available; setting up these tools is however a hindrance. We propose EEVEE, an annotation tool focused on simplicity, efficiency, and ease of use. It can run directly in the browser (no setup required) and uses tab-separated files (as opposed to character offsets or task-specific formats) for annotation. It allows for annotation of multiple tasks on a single dataset and supports four task-types: sequence labeling, span labeling, text classification and seq2seq.

MCML Authors
Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[752]
L. Weber-Genzel, R. Litschko, E. Artemova and B. Plank.
Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?.
18th Linguistic Annotation Workshop (LAW 2024) at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Instruction tuning has become an integral part of training pipelines for Large Language Models (LLMs) and has been shown to yield strong performance gains. In an orthogonal line of research, Annotation Error Detection (AED) has emerged as a tool for detecting quality problems in gold standard labels. So far, however, the application of AED methods has been limited to classification tasks. It is an open question how well AED methods generalize to language generation settings, which are becoming more widespread via LLMs. In this paper, we present a first and novel benchmark for AED on instruction tuning data: DONKII. It comprises three instruction-tuning datasets enriched with error annotations by experts and semi-automatic methods. We also provide a novel taxonomy of error types for instruction-tuning data. We find that all three datasets contain clear errors, which sometimes propagate directly into instruction-tuned LLMs. We propose four AED baselines for the generative setting and evaluate them extensively on the newly introduced dataset. Our results show that the choice of the right AED method and model size is indeed crucial and derive practical recommendations for how to use AED methods to clean instruction-tuning data.

MCML Authors
Link to Leon Weber-Genzel

Leon Weber-Genzel

Dr.

* Former member

Link to Robert Litschko

Robert Litschko

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[751]
M. Zaiss, J. R. Rajput, H. N. Dang, V. Golkov, D. Cremers, F. Knoll and A. Maier.
GPT4MR: Exploring GPT-4 as an MR Sequence and Reconstruction Programming Assistant.
German Conference on Medical Image Computing -Bildverarbeitung für die Medizin (BVM 2024). Erlangen, Germany, Mar 10-02, 2024. DOI.
Abstract

In this study, we explore the potential of generative pre-trained transformer (GPT), as a coding assistant for MRI sequence programming using the Pulseq framework. The programming of MRI sequences is traditionally a complex and time-consuming task, and the Pulseq standard has recently simplified this process. It allows researchers to define and generate complex pulse sequences used in MRI experiments. Leveraging GPT-4’s capabilities in natural language generation, we adapted it for MRI sequence programming, creating a specialized assistant named GPT4MR. Our tests involved generating various MRI sequences, revealing that GPT-4, guided by a tailored prompt, outperformed GPT-3.5, producing fewer errors and demonstrating improved reasoning. Despite limitations in handling complex sequences, GPT4MR corrected its own errors and successfully generated code with step-by-step instructions. The study showcases GPT4MR’s ability to accelerate MRI sequence development, even for novel ideas absent in its training set. While further research and improvement are needed to address complexity limitations, a well-designed prompt enhances performance. The findings propose GPT4MR as a valuable MRI sequence programming assistant, streamlining prototyping and development. The future prospect involves integrating a PyPulseq plugin into lightweight, open-source LLMs, potentially revolutionizing MRI sequence development and prototyping.

MCML Authors
Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[750]
S. Dandl, C. Haslinger, T. Hothorn, H. Seibold, E. Sverdrup, S. Wager and A. Zeileis.
What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work?.
Annals of Applied Statistics 18.1 (Mar. 2024). DOI.
Abstract

Estimation of heterogeneous treatment effects (HTE) is of prime importance in many disciplines, from personalized medicine to economics among many others. Random forests have been shown to be a flexible and powerful approach to HTE estimation in both randomized trials and observational studies. In particular “causal forests” introduced by Athey, Tibshirani and Wager (Ann. Statist. 47 (2019) 1148–1178), along with the R implementation in package grf were rapidly adopted. A related approach, called ‘model-based forests’ that is geared toward randomized trials and simultaneously captures effects of both prognostic and predictive variables, was introduced by Seibold, Zeileis and Hothorn (Stat. Methods Med. Res. 27 (2018) 3104–3125) along with a modular implementation in the R package model4you.
Neither procedure is directly applicable to the estimation of individualized predictions of excess postpartum blood loss caused by a cesarean section in comparison to vaginal delivery. Clearly, randomization is hardly possible in this setup, and thus model-based forests lack clinical trial data to address this question. On the other hand, the skewed and interval-censored postpartum blood loss observations violate assumptions made by causal forests. Here we present a tailored model-based forest for skewed and interval-censored data to infer possible predictive prepartum characteristics and their impact on excess postpartum blood loss caused by a cesarean section.
As a methodological basis, we propose a unifying view on causal and model-based forests that goes beyond the theoretical motivations and investigates which computational elements make causal forests so successful and how these can be blended with the strengths of model-based forests. To do so, we show that both methods can be understood in terms of the same parameters and model assumptions for an additive model under L2 loss. This theoretical insight allows us to implement several flavors of ‘model-based causal forests’ and dissect their different elements in silico.
The original causal forests and model-based forests are compared with the new blended versions in a benchmark study exploring both randomized trials and observational settings. In the randomized setting, both approaches performed akin. If confounding was present in the data-generating process, we found local centering of the treatment indicator with the corresponding propensities to be the main driver for good performance. Local centering of the outcome was less important and might be replaced or enhanced by simultaneous split selection with respect to both prognostic and predictive effects. This lays the foundation for future research combining random forests for HTE estimation with other types of models.

MCML Authors

[749]
F. Coens, N. Knops, I. Tieken, S. Vogelaar, A. Bender, J. J. Kim, K. Krupka, L. Pape, A. Raes, B. Tönshoff, A. Prytula and C. Registry.
Time-Varying Determinants of Graft Failure in Pediatric Kidney Transplantation in Europe.
Clinical Journal of the American Society of Nephrology 19.3 (Mar. 2024). DOI.
Abstract

Little is known about the time-varying determinants of kidney graft failure in children. We performed a retrospective study of primary pediatric kidney transplant recipients (younger than 18 years) from the Eurotransplant registry (1990-2020). Piece-wise exponential additive mixed models were applied to analyze time-varying recipient, donor, and transplant risk factors. Primary outcome was death-censored graft failure.

MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[748]
W. H. Hartl, P. Kopper, L. Xu, L. Heller, M. Mironov, R. Wang, A. G. Day, G. Elke, H. Küchenhoff and A. Bender.
Relevance of Protein Intake for Weaning in the Mechanically Ventilated Critically Ill: Analysis of a Large International Database.
Critical Care Medicine 50.3 (Mar. 2024). DOI.
Abstract

The association between protein intake and the need for mechanical ventilation (MV) is controversial. We aimed to investigate the associations between protein intake and outcomes in ventilated critically ill patients.

MCML Authors
Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[747]
M. Maritsch, S. Föll, V. Lehmann, N. Styger, C. Bérubé, M. Kraus, S. Feuerriegel, T. Kowatsch, T. Züger, E. Fleischr, F. Wortmann and C. Stettler.
Smartwatches for non-invasive hypoglycaemia detection during cognitive and psychomotor stress.
Diabetes, Obesity and Metabolism 26.3 (Mar. 2024). DOI.
MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[746]
Q. Li, L. Mou, Y. Sun, Y. Hua, Y. Shi and X. Zhu.
A Review of Building Extraction From Remote Sensing Imagery: Geometrical Structures and Semantic Attributes.
IEEE Transactions on Geoscience and Remote Sensing 62 (Mar. 2024). DOI.
Abstract

In the remote sensing community, extracting buildings from remote sensing imagery has triggered great interest. While many studies have been conducted, a comprehensive review of these approaches that are applied to optical and synthetic aperture radar (SAR) imagery is still lacking. Therefore, we provide an in-depth review of both early efforts and recent advances, which are aimed at extracting geometrical structures or semantic attributes of buildings, including building footprint generation, building facade segmentation, roof segment and superstructure segmentation, building height retrieval, building-type classification, building change detection, and annotation data correction. Furthermore, a list of corresponding benchmark datasets is given. Finally, challenges and outlooks of existing approaches as well as promising applications are discussed to enhance comprehension within this realm of research.

MCML Authors
Link to Yao Sun

Yao Sun

Dr.

Data Science in Earth Observation

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[745]
Z. Yuan, L. Mou, Y. Hua and X. Zhu.
RRSIS: Referring Remote Sensing Image Segmentation.
IEEE Transactions on Geoscience and Remote Sensing 62 (Mar. 2024). DOI. GitHub.
Abstract

Localizing desired objects from remote sensing images is of great use in practical applications. Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images. However, almost no research attention is given to this task of remote sensing imagery. Considering its potential for real-world applications, in this article, we introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations. Specifically, we created a new dataset, called RefSegRS, for this task, enabling us to evaluate different methods. Afterward, we benchmark referring image segmentation methods of natural images on the RefSegRS dataset and find that these models show limited efficacy in detecting small and scattered objects. To alleviate this issue, we propose a language-guided cross-scale enhancement (LGCE) module that utilizes linguistic features to adaptively enhance multiscale visual features by integrating both deep and shallow features. The proposed dataset, benchmarking results, and the designed LGCE module provide insights into the design of a better RRSIS model.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[744]
B. X. Liew, F. Pfisterer, D. Rügamer and X. Zhai.
Strategies to optimise machine learning classification performance when using biomechanical features.
Journal of Biomechanics 165 (Mar. 2024). DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[743]
M. M. Amin and B. W. Schuller.
On Prompt Sensitivity of ChatGPT in Affective Computing.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

Recent studies have demonstrated the emerging capabilities of foundation models like ChatGPT in several fields, including affective computing. However, accessing these emerging capabilities is facilitated through prompt engineering. Despite the existence of some prompting techniques, the field is still rapidly evolving and many prompting ideas still require investigation. In this work, we introduce a method to evaluate and investigate the sensitivity of the performance of foundation models based on different prompts or generation parameters. We perform our evaluation on ChatGPT within the scope of affective computing on three major problems, namely sentiment analysis, toxicity detection, and sarcasm detection. First, we carry out a sensitivity analysis on pivotal parameters in auto-regressive text generation, specifically the temperature parameter T and the top-p parameter in Nucleus sampling, dictating how conservative or creative the model should be during generation. Furthermore, we explore the efficacy of several prompting ideas, where we explore how giving different incentives or structures affect the performance. Our evaluation takes into consideration performance measures on the affective computing tasks, and the effectiveness of the model to follow the stated instructions, hence generating easy-to-parse responses to be smoothly used in downstream applications.

MCML Authors
Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[742]
R. Bailo, A. Barbaro, S. N. Gomes, K. Riedl, T. Roith, C. Totzeck and U. Vaes.
CBX: Python and Julia packages for consensus-based interacting particle methods.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

We introduce CBXPy and ConsensusBasedX.jl, Python and Julia implementations of consensus-based in- teracting particle systems (CBX), which generalise consensus-based optimization methods (CBO) for global, derivative-free optimisation. The raison d’ˆetre of our libraries is twofold: on the one hand, to offer high- performance implementations of CBX methods that the community can use directly, while on the other, providing a general interface that can accommodate and be extended to further variations of the CBX fam- ily. Python and Julia were selected as the leading high-level languages in terms of usage and performance, as well as for their popularity among the scientific computing community. Both libraries have been developed with a common ethos, ensuring a similar API and core functionality, while leveraging the strengths of each language and writing idiomatic code.

MCML Authors
Link to Konstantin Riedl

Konstantin Riedl

Applied Numerical Analysis


[741]
S. A. Baumann, F. Krause, M. Neumayr, N. Stracke, V. Hu and B. Ommer.
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions.
Preprint at arXiv (Mar. 2024). arXiv. GitHub.
Abstract

In recent years, advances in text-to-image (T2I) diffusion models have substantially elevated the quality of their generated images. However, achieving fine-grained control over attributes remains a challenge due to the limitations of natural language prompts (such as no continuous set of intermediate descriptions existing between person'' and old person’’). Even though many methods were introduced that augment the model or generation process to enable such control, methods that do not require a fixed reference image are limited to either enabling global fine-grained attribute expression control or coarse attribute expression control localized to specific subjects, not both simultaneously. We show that there exist directions in the commonly used token-level CLIP text embeddings that enable fine-grained subject-specific control of high-level attributes in text-to-image models. Based on this observation, we introduce one efficient optimization-free and one robust optimization-based method to identify these directions for specific attributes from contrastive text prompts. We demonstrate that these directions can be used to augment the prompt text input with fine-grained control over attributes of specific subjects in a compositional manner (control over multiple attributes of a single subject) without having to adapt the diffusion model.

MCML Authors
Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[740]
A. Davtyan, S. Sameni, B. Ommer and P. Favaro.
Enabling Visual Composition and Animation in Unsupervised Video Generation.
Preprint at arXiv (Mar. 2024). arXiv. GitHub.
Abstract

In this work we propose a novel method for unsupervised controllable video generation. Once trained on a dataset of unannotated videos, at inference our model is capable of both composing scenes of predefined object parts and animating them in a plausible and controlled way. This is achieved by conditioning video generation on a randomly selected subset of local pre-trained self-supervised features during training. We call our model CAGE for visual Composition and Animation for video GEneration. We conduct a series of experiments to demonstrate capabilities of CAGE in various settings.

MCML Authors
Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[739]
M. Gui, J. S. Fischer, U. Prestel, P. Ma, D. Kotovenko, O. Grebenkova, S. Baumann, V. T. Hu and B. Ommer.
DepthFM: Fast Monocular Depth Estimation with Flow Matching.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

Monocular depth estimation is crucial for numerous downstream vision tasks and applications. Current discriminative approaches to this problem are limited due to blurry artifacts, while state-of-the-art generative methods suffer from slow sampling due to their SDE nature. Rather than starting from noise, we seek a direct mapping from input image to depth map. We observe that this can be effectively framed using flow matching, since its straight trajectories through solution space offer efficiency and high quality. Our study demonstrates that a pre-trained image diffusion model can serve as an adequate prior for a flow matching depth model, allowing efficient training on only synthetic data to generalize to real images. We find that an auxiliary surface normals loss further improves the depth estimates. Due to the generative nature of our approach, our model reliably predicts the confidence of its depth estimates. On standard benchmarks of complex natural scenes, our lightweight approach exhibits state-of-the-art performance at favorable low computational cost despite only being trained on little synthetic data.

MCML Authors
Link to Pingchuan Ma

Pingchuan Ma

Machine Vision & Learning

Link to Olga Grebenkova

Olga Grebenkova

Machine Vision & Learning

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[738]
P. Kopper, D. Rügamer, R. Sonabend, B. Bischl and A. Bender.
Training Survival Models using Scoring Rules.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

Survival Analysis provides critical insights for partially incomplete time-to-event data in various domains. It is also an important example of probabilistic machine learning. The probabilistic nature of the predictions can be exploited by using (proper) scoring rules in the model fitting process instead of likelihood-based optimization. Our proposal does so in a generic manner and can be used for a variety of model classes. We establish different parametric and non-parametric sub-frameworks that allow different degrees of flexibility. Incorporated into neural networks, it leads to a computationally efficient and scalable optimization routine, yielding state-of-the-art predictive performance. Finally, we show that using our framework, we can recover various parametric models and demonstrate that optimization works equally well when compared to likelihood-based methods.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[737]
Y.-J. Li, M. Gladkova, Y. Xia, R. Wang and D. Cremers.
VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

Recent works on the global place recognition treat the task as a retrieval problem, where an off-the-shelf global descriptor is commonly designed in image-based and LiDAR-based modalities. However, it is non-trivial to perform accurate image-LiDAR global place recognition since extracting consistent and robust global descriptors from different domains (2D images and 3D point clouds) is challenging. To address this issue, we propose a novel Voxel-Cross-Pixel (VXP) approach, which establishes voxel and pixel correspondences in a self-supervised manner and brings them into a shared feature space. Specifically, VXP is trained in a two-stage manner that first explicitly exploits local feature correspondences and enforces similarity of global descriptors. Extensive experiments on the three benchmarks (Oxford RobotCar, ViViD++ and KITTI) demonstrate our method surpasses the state-of-the-art cross-modal retrieval by a large margin.

MCML Authors
Link to Mariia Gladkova

Mariia Gladkova

Computer Vision & Artificial Intelligence

Link to Yan Xia

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[736]
B. Lorenz, A. Bacho and G. Kutyniok.
Error Estimation for Physics-informed Neural Networks Approximating Semilinear Wave Equations.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

This paper provides rigorous error bounds for physics-informed neural networks approximating the semilinear wave equation. We provide bounds for the generalization and training error in terms of the width of the network’s layers and the number of training points for a tanh neural network with two hidden layers. Our main result is a bound of the total error in the H1([0,T];L2(Ω))-norm in terms of the training error and the number of training points, which can be made arbitrarily small under some assumptions. We illustrate our theoretical bounds with numerical experiments.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[735]
J. Rodemann, F. Croppi, P. Arens, Y. Sale, J. Herbinger, B. Bischl, E. Hüllermeier, T. Augustin, C. J. Walsh and G. Casalicchio.
Explaining Bayesian Optimization by Shapley Values Facilitates Human-AI Collaboration.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

In today’s data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor’s log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[734]
A. Saroha, M. Gladkova, C. Curreli, D. Muhle, T. Yenamandra and D. Cremers.
Gaussian Splatting in Style.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

3D scene stylization extends the work of neural style transfer to 3D. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across multiple views. A vast majority of the previous works achieve this by training a 3D model for every stylized image and a set of multi-view images. In contrast, we propose a novel architecture trained on a collection of style images that, at test time, produces real time high-quality stylized novel views. We choose the underlying 3D scene representation for our model as 3D Gaussian splatting. We take the 3D Gaussians and process them using a multi-resolution hash grid and a tiny MLP to obtain stylized views. The MLP is conditioned on different style codes for generalization to different styles during test time. The explicit nature of 3D Gaussians gives us inherent advantages over NeRF-based methods, including geometric consistency and a fast training and rendering regime. This enables our method to be useful for various practical use cases, such as augmented or virtual reality. We demonstrate that our method achieves state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data.

MCML Authors
Link to Abhishek Saroha

Abhishek Saroha

Computer Vision & Artificial Intelligence

Link to Mariia Gladkova

Mariia Gladkova

Computer Vision & Artificial Intelligence

Link to Cecilia Curreli

Cecilia Curreli

Computer Vision & Artificial Intelligence

Link to Dominik Muhle

Dominik Muhle

Computer Vision & Artificial Intelligence

Link to Tarun Yenamandra

Tarun Yenamandra

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[733]
C. Wachinger, D. Hedderich and F. Bongratz.
Stochastic Cortical Self-Reconstruction.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

Magnetic resonance imaging (MRI) is critical for diagnosing neurodegenerative diseases, yet accurately assessing mild cortical atrophy remains a challenge due to its subtlety. Automated cortex reconstruction, paired with healthy reference ranges, aids in pinpointing pathological atrophy, yet their generalization is limited by biases from image acquisition and processing. We introduce the concept of stochastic cortical self-reconstruction (SCSR) that creates a subject-specific healthy reference by taking MRI-derived thicknesses as input and, therefore, implicitly accounting for potential confounders. SCSR randomly corrupts parts of the cortex and self-reconstructs them from the remaining information. Trained exclusively on healthy individuals, repeated self-reconstruction generates a stochastic reference cortex for assessing deviations from the norm. We present three implementations of this concept: XGBoost applied on parcels, and two autoencoders on vertex level – one based on a multilayer perceptron and the other using a spherical U-Net. These models were trained on healthy subjects from the UK Biobank and subsequently evaluated across four public Alzheimer’s datasets. Finally, we deploy the model on clinical in-house data, where deviation maps’ high spatial resolution aids in discriminating between four types of dementia.

MCML Authors
Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology

Link to Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Radiology


[732]
L. Weissweiler, A. Köksal and H. Schütze.
Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena.
Preprint at arXiv (Mar. 2024). arXiv.
Abstract

Argument Structure Constructions (ASCs) are one of the most well-studied construction groups, providing a unique opportunity to demonstrate the usefulness of Construction Grammar (CxG). For example, the caused-motion construction (CMC, She sneezed the foam off her cappuccino'') demonstrates that constructions must carry meaning, otherwise the fact that sneeze’’ in this context causes movement cannot be explained. We form the hypothesis that this remains challenging even for state-of-the-art Large Language Models (LLMs), for which we devise a test based on substituting the verb with a prototypical motion verb. To be able to perform this test at statistically significant scale, in the absence of adequate CxG corpora, we develop a novel pipeline of NLP-assisted collection of linguistically annotated text. We show how dependency parsing and GPT-3.5 can be used to significantly reduce annotation cost and thus enable the annotation of rare phenomena at scale. We then evaluate GPT, Gemini, Llama2 and Mistral models for their understanding of the CMC using the newly collected corpus. We find that all models struggle with understanding the motion component that the CMC adds to a sentence.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[731]
S. Dandl, A. Bender and T. Hothorn.
Heterogeneous Treatment Effect Estimation for Observational Data Using Model-Based Forests.
Statistical Methods in Medical Research 33.3 (Mar. 2024). DOI.
Abstract

The estimation of heterogeneous treatment effects has attracted considerable interest in many disciplines, most prominently in medicine and economics. Contemporary research has so far primarily focused on continuous and binary responses where heterogeneous treatment effects are traditionally estimated by a linear model, which allows the estimation of constant or heterogeneous effects even under certain model misspecifications. More complex models for survival, count, or ordinal outcomes require stricter assumptions to reliably estimate the treatment effect. Most importantly, the noncollapsibility issue necessitates the joint estimation of treatment and prognostic effects. Model-based forests allow simultaneous estimation of covariate-dependent treatment and prognostic effects, but only for randomized trials. In this paper, we propose modifications to model-based forests to address the confounding issue in observational data. In particular, we evaluate an orthogonalization strategy originally proposed by Robinson (1988, Econometrica) in the context of model-based forests targeting heterogeneous treatment effect estimation in generalized linear models and transformation models. We found that this strategy reduces confounding effects in a simulated study with various outcome distributions. We demonstrate the practical aspects of heterogeneous treatment effect estimation for survival and ordinal outcomes by an assessment of the potentially heterogeneous effect of Riluzole on the progress of Amyotrophic Lateral Sclerosis.

MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[730]
H. Chen, Y. Zhang, D. Krompass, J. Gu and V. Tresp.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning.
38th Conference on Artificial Intelligence (AAAI 2024). Vancouver, Canada, Feb 20-27, 2024. DOI.
Abstract

Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount of data for finetuning. However, collecting and centralizing training data from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as a promising solution, enabling multiple clients to collaboratively train neural networks without centralizing their local data. To alleviate client computation burdens and communication overheads, previous works have adapted Parameter-efficient Finetuning (PEFT) methods for FL. Hereby, only a small fraction of the model parameters are optimized and communicated during federated communications. Nevertheless, most previous works have focused on a single modality and neglected one common phenomenon, i.e., the presence of data heterogeneity across the clients. Therefore, in this work, we propose a finetuning framework tailored to heterogeneous multi-modal FL, called Federated Dual-Aadapter Teacher (FedDAT). Specifically, our approach leverages a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, where FedDAT substantially outperforms the existing centralized PEFT methods adapted for FL.

MCML Authors
Link to Yao Zhang

Yao Zhang

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[729]
P. Kolpaczki, V. Bengs, M. Muschalik and E. Hüllermeier.
Approximating the Shapley Value without Marginal Contributions.
38th Conference on Artificial Intelligence (AAAI 2024). Vancouver, Canada, Feb 20-27, 2024. DOI.
Abstract

The Shapley value, which is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, has recently been used intensively in explainable artificial intelligence. Its meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley value, most of them revolve around the notion of an agent’s marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contribution. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.

MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Maximilian Muschalik

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[728]
J. Lienen and E. Hüllermeier.
Mitigating Label Noise through Data Ambiguation.
38th Conference on Artificial Intelligence (AAAI 2024). Vancouver, Canada, Feb 20-27, 2024. DOI.
Abstract

Label noise poses an important challenge in machine learning, especially in deep learning, in which large models with high expressive power dominate the field. Models of that kind are prone to memorizing incorrect labels, thereby harming generalization performance. Many methods have been proposed to address this problem, including robust loss functions and more complex label correction approaches. Robust loss functions are appealing due to their simplicity, but typically lack flexibility, while label correction usually adds substantial complexity to the training setup. In this paper, we suggest to address the shortcomings of both methodologies by ‘ambiguating’ the target information, adding additional, complementary candidate labels in case the learner is not sufficiently convinced of the observed training label. More precisely, we leverage the framework of so-called superset learning to construct set-valued targets based on a confidence threshold, which deliver imprecise yet more reliable beliefs about the ground-truth, effectively helping the learner to suppress the memorization effect. In an extensive empirical evaluation, our method demonstrates favorable learning behavior on synthetic and real-world noise, confirming the effectiveness in detecting and correcting erroneous training labels.

MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[727]
M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles.
38th Conference on Artificial Intelligence (AAAI 2024). Vancouver, Canada, Feb 20-27, 2024. DOI.
Abstract

While shallow decision trees may be interpretable, larger ensemble models like gradient-boosted trees, which often set the state of the art in machine learning problems involving tabular data, still remain black box models. As a remedy, the Shapley value (SV) is a well-known concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions. The model-specific TreeSHAP methodology solves the exponential complexity for retrieving exact SVs from tree-based models. Expanding beyond individual feature attribution, Shapley interactions reveal the impact of intricate feature interactions of any order. In this work, we present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions of tree-based models. TreeSHAP-IQ is supported by a mathematical framework that exploits polynomial arithmetic to compute the interaction scores in a single recursive traversal of the tree, akin to Linear TreeSHAP. We apply TreeSHAP-IQ on state-of-the-art tree ensembles and explore interactions on well-established benchmark datasets.

MCML Authors
Link to Maximilian Muschalik

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[726]
T. N. Wolf, F. Bongratz, A.-M. Rickmann, S. Pölsterl and C. Wachinger.
Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning.
38th Conference on Artificial Intelligence (AAAI 2024). Vancouver, Canada, Feb 20-27, 2024. DOI.
Abstract

Explaining predictions of black-box neural networks is crucial when applied to decision-critical tasks. Thus, attribution maps are commonly used to identify important image regions, despite prior work showing that humans prefer explanations based on similar examples. To this end, ProtoPNet learns a set of class-representative feature vectors (prototypes) for case-based reasoning. During inference, similarities of latent features to prototypes are linearly classified to form predictions and attribution maps are provided to explain the similarity. In this work, we evaluate whether architectures for case-based reasoning fulfill established axioms required for faithful explanations using the example of ProtoPNet. We show that such architectures allow the extraction of faithful explanations. However, we prove that the attribution maps used to explain the similarities violate the axioms. We propose a new procedure to extract explanations for trained ProtoPNets, named ProtoPFaith. Conceptually, these explanations are Shapley values, calculated on the similarity scores of each prototype. They allow to faithfully answer which prototypes are present in an unseen image and quantify each pixel’s contribution to that presence, thereby complying with all axioms. The theoretical violations of ProtoPNet manifest in our experiments on three datasets (CUB-200-2011, Stanford Dogs, RSNA) and five architectures (ConvNet, ResNet, ResNet50, WideResNet50, ResNeXt50). Our experiments show a qualitative difference between the explanations given by ProtoPNet and ProtoPFaith. Additionally, we quantify the explanations with the Area Over the Perturbation Curve, on which ProtoPFaith outperforms ProtoPNet on all experiments by a factor >10^3.

MCML Authors
Link to Tom Nuno Wolf

Tom Nuno Wolf

Artificial Intelligence in Radiology

Link to Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Radiology

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[725]
A. Reithmeir, J. A. Schnabel and V. A. Zimmer.
Learning physics-inspired regularization for medical image registration with hypernetworks.
SPIE Medical Imaging: Image Processing 2024. San Diego, CA, USA, Feb 18-22, 2024. DOI.
MCML Authors
Link to Anna Reithmeir

Anna Reithmeir

Computational Imaging and AI in Medicine

Link to Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[724]
H. Weerts, F. Pfisterer, M. Feurer, K. Eggensperger, E. Bergman, N. Awad, J. Vanschoren, M. Pechenizkiy, B. Bischl and F. Hutter.
Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML.
Journal of Artificial Intelligence Research 79 (Feb 17, 2024). DOI.
MCML Authors
Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[723]
R. van Koningsbruggen, L. Haliburton, B. Rossmy, C. George, E. Hornecker and B. Hengeveld.
Metaphors and `Tacit' Data: the Role of Metaphors in Data and Physical Data Representations.
18th International Conference on Tangible, Embedded, and Embodied Interaction. Cork, Ireland, Feb 11-14, 2024. DOI.
MCML Authors
Link to Luke Haliburton

Luke Haliburton

Dr.

Human-Centered Ubiquitous Media


[722]
S. Wiegrebe, P. Kopper, R. Sonabend, B. Bischl and A. Bender.
Deep learning for survival analysis: a review.
Artificial Intelligence Review 57.65 (Feb. 2024). DOI.
Abstract

The influx of deep learning (DL) techniques into the field of survival analysis in recent years has led to substantial methodological progress; for instance, learning from unstructured or high-dimensional data such as images, text or omics data. In this work, we conduct a comprehensive systematic review of DL-based methods for time-to-event analysis, characterizing them according to both survival- and DL-related attributes. In summary, the reviewed methods often address only a small subset of tasks relevant to time-to-event data—e.g., single-risk right-censored data—and neglect to incorporate more complex settings.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[721]
S. Feuerriegel, J. Hartmann, C. Janiesch and P. Zschech.
Generative AI.
Business and Information Systems Engineering 66.1 (Feb. 2024). DOI.
Abstract

In this Catchword article, we provide a conceptualization of generative AI as an entity in socio-technical systems and provide examples of models, systems, and applications. Based on that, we introduce limitations of current generative AI and provide an agenda for BISE research. Previous papers discuss generative AI around specific methods such as language models (e.g., Teubner et al. 2023; Dwivedi et al. 2023; Schöbel et al. 2023) or specific applications such as marketing (e.g., Peres et al. 2023), innovation management (Burger et al. 2023), scholarly research (e.g., Susarla et al. 2023; Davison et al. 2023), and education (e.g., Kasneci et al. 2023; Gimpel et al. 2023). Different from these works, we focus on generative AI in the context of information systems, and, to this end, we discuss several opportunities and challenges that are unique to the BISE community and make suggestions for impactful directions for BISE research.

MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[720]
C. A. Scholbeck, G. Casalicchio, C. Molnar, B. Bischl and C. Heumann.
Marginal Effects for Non-Linear Prediction Functions.
Data Mining and Knowledge Discovery 38 (Feb. 2024). DOI.
Abstract

Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for non-linear models and especially generalized linear models, the estimated coefficients cannot be interpreted as a direct feature effect on the predicted outcome. Hence, marginal effects are typically used as approximations for feature effects, either in the shape of derivatives of the prediction function or forward differences in prediction due to a change in a feature value. While marginal effects are commonly used in many scientific fields, they have not yet been adopted as a model-agnostic interpretation method for machine learning models. This may stem from their inflexibility as a univariate feature effect and their inability to deal with the non-linearities found in black box models. We introduce a new class of marginal effects termed forward marginal effects. We argue to abandon derivatives in favor of better-interpretable forward differences. Furthermore, we generalize marginal effects based on forward differences to multivariate changes in feature values. To account for the non-linearity of prediction functions, we introduce a non-linearity measure for marginal effects. We argue against summarizing feature effects of a non-linear prediction function in a single metric such as the average marginal effect. Instead, we propose to partition the feature space to compute conditional average marginal effects on feature subspaces, which serve as conditional feature effect estimates.

MCML Authors
Link to Christian Scholbeck

Christian Scholbeck

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[719]
T. Li, K. Heidler, L. Mou, Á. Ignéczi, X. Zhu and J. L. Bamber.
A high-resolution calving front data product for marine-terminating glaciers in Svalbard.
Earth System Science Data 16.2 (Feb. 2024). DOI.
Abstract

The mass loss of glaciers outside the polar ice sheets has been accelerating during the past several decades and has been contributing to global sea-level rise. However, many of the mechanisms of this mass loss process are not well understood, especially the calving dynamics of marine-terminating glaciers, in part due to a lack of high-resolution calving front observations. Svalbard is an ideal site to study the climate sensitivity of glaciers as it is a region that has been undergoing amplified climate variability in both space and time compared to the global mean. Here we present a new high-resolution calving front dataset of 149 marine-terminating glaciers in Svalbard, comprising 124 919 glacier calving front positions during the period 1985–2023 (https://doi.org/10.5281/zenodo.10407266, Li et al., 2023). This dataset was generated using a novel automated deep-learning framework and multiple optical and SAR satellite images from Landsat, Terra-ASTER, Sentinel-2, and Sentinel-1 satellite missions. The overall calving front mapping uncertainty across Svalbard is 31 m. The newly derived calving front dataset agrees well with recent decadal calving front observations between 2000 and 2020 (Kochtitzky and Copland, 2022) and an annual calving front dataset between 2008 and 2022 (Moholdt et al., 2022). The calving fronts between our product and the latter deviate by 32±65m on average. The R2 of the glacier calving front change rates between these two products is 0.98, indicating an excellent match. Using this new calving front dataset, we identified widespread calving front retreats during the past four decades, across most regions in Svalbard except for a handful of glaciers draining the ice caps Vestfonna and Austfonna on Nordaustlandet. In addition, we identified complex patterns of glacier surging events overlaid with seasonal calving cycles. These data and findings provide insights into understanding glacier calving mechanisms and drivers. This new dataset can help improve estimates of glacier frontal ablation as a component of the integrated mass balance of marine-terminating glaciers.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[718]
C. Cipriani, M. Fornasier and A. Scagliotti.
From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks.
European Journal of Applied Mathematics (Feb. 2024). DOI.
Abstract

The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks, which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modelling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularisation, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularisation may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.

MCML Authors
Link to Cristina Cipriani

Cristina Cipriani

Applied Numerical Analysis

Link to Massimo Fornasier

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis

Link to Alessandro Scagliotti

Alessandro Scagliotti

Applied Numerical Analysis


[717]
B. X. W. Liew, D. Rügamer and A. V. Birn-Jeffery.
Neuromechanical stabilisation of the centre of mass during running.
Gait and Posture 108 (Feb. 2024). DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[716]
A. Mallol-Ragolta and B. W. Schuller.
Coupling Sentiment and Arousal Analysis Towards an Affective Dialogue Manager.
IEEE Access 12 (Feb. 2024). DOI.
Abstract

We present the technologies and host components developed to power a speech-based dialogue manager with affective capabilities. The overall goal is that the system adapts its response to the sentiment and arousal level of the user inferred by analysing the linguistic and paralinguistic information embedded in his or her interaction. A linguistic-based, dedicated sentiment analysis component determines the body of the system response. A paralinguistic-based, dedicated arousal recognition component adjusts the energy level to convey in the affective system response. The sentiment analysis model is trained using the CMU-MOSEI dataset and implements a hierarchical contextual attention fusion network, which scores an Unweighted Average Recall (UAR) of 79.04% on the test set when tackling the task as a binary classification problem. The arousal recognition model is trained using the MSP-Podcast corpus. This model extracts the Mel-spectrogram representations of the speech signals, which are exploited with a Convolutional Neural Network (CNN) trained from scratch, and scores a UAR of 61.11% on the test set when tackling the task as a three-class classification problem. Furthermore, we highlight two sample dialogues implemented at the system back-end to detail how the sentiment and arousal inferences are coupled to determine the affective system response. These are also showcased in a proof of concept demonstrator. We publicly release the trained models to provide the research community with off-the-shelf sentiment analysis and arousal recognition tools.

MCML Authors
Link to Adria Mallol-Ragolta

Adria Mallol-Ragolta

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[715]
Y. Xie, X. Yuan, X. Zhu and J. Tian.
Multimodal Co-Learning for Building Change Detection: A Domain Adaptation Framework Using VHR Images and Digital Surface Models.
IEEE Transactions on Geoscience and Remote Sensing 62 (Feb. 2024). DOI.
Abstract

In this article, we propose a multimodal co-learning framework for building change detection. This framework can be adopted to jointly train a Siamese bitemporal image network and a height difference (HDiff) network with labeled source data and unlabeled target data pairs. Three co-learning combinations (vanilla co-learning, fusion co-learning, and detached fusion co-learning) are proposed and investigated with two types of co-learning loss functions within our framework. Our experimental results demonstrate that the proposed methods are able to take advantage of unlabeled target data pairs and, therefore, enhance the performance of single-modal neural networks on the target data. In addition, our synthetic-to-real experiments demonstrate that the recently published synthetic dataset, Simulated Multimodal Aerial Remote Sensing (SMARS), is feasible to be used in real change detection scenarios, where the optimal result is with the F1 score of 79.29%.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[714]
P. Gijsbers, M. L. P. Bueno, S. Coors, E. LeDell, S. Poirier, J. Thomas, B. Bischl and J. Vanschoren.
AMLB: an AutoML Benchmark.
Journal of Machine Learning Research 25.101 (Feb. 2024). URL.
Abstract

Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative AutoML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[713]
A. Bonfanti, G. Bruno and C. Cipriani.
The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks.
Preprint at arXiv (Feb. 2024). arXiv.
Abstract

The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief. Another significant difference from the linear regime is that, even in the idealistic infinite-width limit, the Hessian does not vanish and hence it cannot be disregarded during training. This motivates the adoption of second-order optimization methods. We explore the convergence guarantees of such methods in both linear and nonlinear cases, addressing challenges such as spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we highlight the benefits of second-order methods in benchmark test cases.

MCML Authors
Link to Cristina Cipriani

Cristina Cipriani

Applied Numerical Analysis


[712]
M. Drton, A. Grosdos, I. Portakal and N. Sturma.
Algebraic Sparse Factor Analysis.
Preprint at arXiv (Feb. 2024). arXiv.
Abstract

Factor analysis is a statistical technique that explains correlations among observed random variables with the help of a smaller number of unobserved factors. In traditional full factor analysis, each observed variable is influenced by every factor. However, many applications exhibit interesting sparsity patterns, that is, each observed variable only depends on a subset of the factors. In this paper, we study such sparse factor analysis models from an algebro-geometric perspective. Under mild conditions on the sparsity pattern, we examine the dimension of the set of covariance matrices that corresponds to a given model. Moreover, we study algebraic relations among the covariances in sparse two-factor models. In particular, we identify cases in which a Gröbner basis for these relations can be derived via a 2-delightful term order and joins of toric edge ideals.

MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[711]
F. Friedrich, K. Hämmerl, P. Schramowski, M. Brack, J. Libovicky, K. Kersting and A. Fraser.
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You.
Preprint at arXiv (Feb. 2024). arXiv.
Abstract

Text-to-image generation models have recently achieved astonishing results in image quality, flexibility, and text alignment, and are consequently employed in a fast-growing number of applications. Through improvements in multilingual abilities, a larger community now has access to this technology. However, our results show that multilingual models suffer from significant gender biases just as monolingual models do. Furthermore, the natural expectation that multilingual models will provide similar results across languages does not hold up. Instead, there are important differences between languages. We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models. We use MAGBIG to investigate the effect of multilingualism on gender bias in T2I models. To this end, we construct multilingual prompts requesting portraits of people with a certain occupation or trait. Our results show that not only do models exhibit strong gender biases but they also behave differently across languages. Furthermore, we investigate prompt engineering strategies, such as indirect, neutral formulations, to mitigate these biases. Unfortunately, these approaches have limited success and result in worse text-to-image alignment. Consequently, we call for more research into diverse representations across languages in image generators, as well as into steerability to address biased model behavior.

MCML Authors
Link to Katharina Hämmerl

Katharina Hämmerl

Data Analytics & Statistics

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[710]
A. Höhl, I. Obadic, M. Á. F. Torres, H. Najjar, D. Oliveira, Z. Akata, A. Dengel and X. Zhu.
Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing.
Preprint at arXiv (Feb. 2024). arXiv.
Abstract

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the used explainable AI methods and their objectives, findings, and challenges in Remote Sensing applications is still missing. In this paper, we address this issue by performing a systematic review to identify the key trends of how explainable AI is used in Remote Sensing and shed light on novel explainable AI approaches and emerging directions that tackle specific Remote Sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights in Remote Sensing, and reflect on the approaches used for explainable AI methods evaluation. Our review provides a complete summary of the state-of-the-art in the field. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field of explainable AI in Remote Sensing.

MCML Authors
Link to Ivica Obadic

Ivica Obadic

Data Science in Earth Observation

Link to Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[709]
E. Nie, S. Yuan, B. Ma, H. Schmid, M. Färber, F. Kreuter and H. Schütze.
Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models.
Preprint at arXiv (Feb. 2024). arXiv.
Abstract

Despite the predominance of English in their training data, English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks, raising questions about the depth and nature of their cross-lingual capabilities. This paper introduces the decomposed prompting approach to probe the linguistic structure understanding of these LLMs in sequence labeling tasks. Diverging from the single text-to-text prompt, our method generates for each token of the input sentence an individual prompt which asks for its linguistic label. We assess our method on the Universal Dependencies part-of-speech tagging dataset for 38 languages, utilizing both English-centric and multilingual LLMs. Our findings show that decomposed prompting surpasses the iterative prompting baseline in efficacy and efficiency under zero- and few-shot settings. Further analysis reveals the influence of evaluation methods and the use of instructions in prompts. Our multilingual investigation shows that English-centric language models perform better on average than multilingual models. Our study offers insights into the multilingual transferability of English-centric LLMs, contributing to the understanding of their multilingual linguistic knowledge.

MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[708]
D. Schalk, B. Bischl and D. Rügamer.
Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models.
Statistics and Computing 34.31 (Feb. 2024). DOI.
Abstract

Various privacy-preserving frameworks that respect the individual’s privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy-preserving, and lossless estimation of generalized additive mixed models (GAMM) using component-wise gradient boosting (CWB). Making use of CWB allows us to reframe the GAMM estimation as a distributed fitting of base learners using the $L_2$-loss. In order to account for the heterogeneity of different data location sites, we propose a distributed version of a row-wise tensor product that allows the computation of site-specific (smooth) effects. Our adaption of CWB preserves all the important properties of the original algorithm, such as an unbiased feature selection and the feasibility to fit models in high-dimensional feature spaces, and yields equivalent model estimates as CWB on pooled data. Next to a derivation of the equivalence of both algorithms, we also showcase the efficacy of our algorithm on a distributed heart disease data set and compare it with state-of-the-art methods.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[707]
C. Geldhauser and H. Diebel-Fischer.
Is diverse and inclusive AI trapped in the gap between reality and algorithmizability?.
Northern Lights Deep Learning Conference (NLDL 2024). Tromsø, Norway, Jan 09-11, 2024. URL.
Abstract

We investigate the preconditions of an operationalization of ethics on the example algorithmization, i.e. the mathematical implementation, of the concepts of fairness and diversity in AI. From a non-technical point of view in ethics, this implementation entails two major drawbacks, (1) as it narrows down big concepts to a single model that is deemed manageable, and (2) as it hides unsolved problems of humanity in a system that could be mistaken as the `solution’ to these problems. We encourage extra caution when dealing with such issues and vote for human oversight.

MCML Authors
Link to Carina Geldhauser

Carina Geldhauser

Dr.

* Former member


[706]
M. Bernhard, R. Amoroso, Y. Kindermann, M. Schubert, L. Baraldi, R. Cucchiara and V. Tresp.
What's Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024). Waikoloa, Hawaii, Jan 04-08, 2024. DOI. GitHub.
Abstract

Semantic segmentation represents a fundamental task in computer vision with various application areas such as autonomous driving, medical imaging, or remote sensing. For evaluating and comparing semantic segmentation models, the mean intersection over union (mIoU) is currently the gold standard. However, while mIoU serves as a valuable benchmark, it does not offer insights into the types of errors incurred by a model. Moreover, different types of errors may have different impacts on downstream applications. To address this issue, we propose an intuitive method for the systematic categorization of errors, thereby enabling a fine-grained analysis of semantic segmentation models. Since we assign each erroneous pixel to precisely one error type, our method seamlessly extends the popular IoU-based evaluation by shedding more light on the false positive and false negative predictions. Our approach is model- and dataset-agnostic, as it does not rely on additional information besides the predicted and ground-truth segmentation masks. In our experiments, we demonstrate that our method accurately assesses model strengths and weaknesses on a quantitative basis, thus reducing the dependence on time-consuming qualitative model inspection. We analyze a variety of state-of-the-art semantic segmentation models, revealing systematic differences across various architectural paradigms. Exploiting the gained insights, we showcase that combining two models with complementary strengths in a straightforward way is sufficient to consistently improve mIoU, even for models setting the current state of the art on ADE20K.

MCML Authors
Link to Maximilian Bernhard

Maximilian Bernhard

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[705]
M. Brahimi, B. Haefner, T. Yenamandra, B. Goldluecke and D. Cremers.
SupeRVol: Super-Resolution Shape and Reflectance Estimation in Inverse Volume Rendering.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024). Waikoloa, Hawaii, Jan 04-08, 2024. DOI.
Abstract

We propose an end-to-end inverse rendering pipeline called SupeRVol that allows us to recover 3D shape and material parameters from a set of color images in a superresolution manner. To this end, we represent both the bidirectional reflectance distribution function’s (BRDF) parameters and the signed distance function (SDF) by multi-layer perceptrons (MLPs). In order to obtain both the surface shape and its reflectance properties, we revert to a differentiable volume renderer with a physically based illumination model that allows us to decouple reflectance and lighting. This physical model takes into account the effect of the camera’s point spread function thereby enabling a reconstruction of shape and material in a super-resolution quality. Experimental validation confirms that SupeRVol achieves state of the art performance in terms of inverse rendering quality. It generates reconstructions that are sharper than the individual input images, making this method ideally suited for 3D modeling from low-resolution imagery.

MCML Authors
Link to Tarun Yenamandra

Tarun Yenamandra

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[704]
S. Klenk, D. Bonello, L. Koestler, N. Araslanov and D. Cremers.
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024). Waikoloa, Hawaii, Jan 04-08, 2024. DOI.
MCML Authors
Link to Simon Klenk

Simon Klenk

Computer Vision & Artificial Intelligence

Link to Nikita Araslanov

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[703]
U. Sahin, H. Li, Q. Khan, D. Cremers and V. Tresp.
Enhancing Multimodal Compositional Reasoning of Visual Language Models With Generative Negative Mining.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024). Waikoloa, Hawaii, Jan 04-08, 2024. DOI. GitHub.
MCML Authors
Link to Hang Li

Hang Li

Database Systems & Data Mining

Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[702]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Constrained Probabilistic Mask Learning for Task-specific Undersampled MRI Reconstruction.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024). Waikoloa, Hawaii, Jan 04-08, 2024. DOI.
MCML Authors
Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[701]
T. Yenamandra, A. Tewari, N. Yang, F. Bernard, C. Theobalt and D. Cremers.
FIRe: Fast Inverse Rendering Using Directional and Signed Distance Functions.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024). Waikoloa, Hawaii, Jan 04-08, 2024. DOI.
MCML Authors
Link to Tarun Yenamandra

Tarun Yenamandra

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[700]
G. Zhang, Y. Zhang, K. Zhang and V. Tresp.
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024). Waikoloa, Hawaii, Jan 04-08, 2024. DOI.
MCML Authors
Link to Gengyuan Zhang

Gengyuan Zhang

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[699]
E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part I.
4OR (Jan. 2024). DOI.
MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[698]
E. Hüllermeier and R. Slowinski.
Preference learning and multiple criteria decision aiding: Differences, commonalities, and synergies -- Part II.
4OR (Jan. 2024). DOI.
MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[697]
G. Casalicchio and L. Burk.
Evaluation and Benchmarking.
Applied Machine Learning Using mlr3 in R I.3 (Jan. 2024). DOI.
Abstract

Machine learning models can only be deployed in practice if they are robustly evaluated to estimate a model’s generalization performance, i.e. how well it will perform on new data. Resampling strategies including cross-validation and bootstrapping, can be used to estimate the generalization performance. Models can be compared to one another using a benchmark experiment, which makes use of the same resampling strategies and measures to fairly compare models and to help practitioners decide which model to use in practice.
This chapter introduces resample strategies in mlr3, including cross-validation, repeated cross-validation, leave-one-out, bootstrapping, and custom strategies. These are then demonstrated with the resample() function, which is used to resample a single learner with a given strategy. Benchmarking is then introduced and the benchmark() function is demonstrated for comparing multiple learners. The chapter concludes with a deep dive into binary classification evaluation, including ROC analysis and the Area Under the Curve metric.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Lukas Burk

Lukas Burk

Statistical Learning & Data Science


[696]
M. Becker, L. Schneider and S. Fischer.
Hyperparameter Optimization.
Applied Machine Learning Using mlr3 in R II.4 (Jan. 2024). DOI.
Abstract

Machine learning models include parameters and hyperparameters. The former refers to model coefficients that are estimated during training. The latter are parameters that are set by the user and affect how the model is fit or how it makes predictions. Setting hyperparameters manually is arduous and error-prone, instead hyperparameter optimization (HPO) automating this ‘tuning’ procedure to reduce bias. When performing HPO there are many considerations including what tuning algorithm to use, how long to tune it for, and what measures to optimize. Moreover users have to decide which hyperparameters to tune and for what configurations. Finally, one has to be careful to make use of nested resampling to prevent leakage of information from training to testing datasets that can occur when resampling and tuning simultaneously. This chapter begins by introducing mlr3tuning and its functionality for tuning learners. This includes Tuners for configuring and running optimization algorithms, TuningInstances for storing results, and Terminators for controlling when to stop the HPO process. The chapter provides a practical example of tuning hyperparameters of a support vector machine, including introducing logarithmic transformations. The AutoTuner class is also introduced which is used for automating nested resampling to reduce bias in tuning.

MCML Authors
Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Sebastian Fischer

Sebastian Fischer

Statistical Learning & Data Science


[695]
L. Schneider and M. Becker.
Advanced Tuning Methods and Black Box Optimization.
Applied Machine Learning Using mlr3 in R II.5 (Jan. 2024). DOI.
Abstract

Automated tuning can be error prone and it is very likely that models will crash in the tuning process, it is therefore essential to have reliable methods of encapsulating errors to prevent large experiments from failing and losing intermediate results. This chapter therefore begins by introducing fallback learners and encapsulation methods, which are returned to in ‘Advanced Technical Aspects of mlr3’.
Models can be tuned with respect to one or multiple measures. In general when tuning to multiple measures there will be a trade-off between them and therefore there will not be one optimal hyperparameter configuration, instead the aim is to estimate configurations that are not Pareto-dominated by any other. This chapter introduces multi-objective tuning and concepts including Pareto optimality.
Some tuning methods are more advanced than others, including Hyperband and Bayesian optimization. Hyperband is a multi-fidelity tuner that makes use of fidelity parameters, which provide a tradeoff between model runtime and performance accuracy. Bayesian optimization is a sample-efficient black-box optimization algorithm that is highly flexible and allows user fine-grained control over tuning large search spaces. This chapter introduces mlr3hyperband and the concept of fidelity parameters, and then mlr3mbo and bbotk to discuss black-box optimization and Bayesian optimization.

MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Marc Becker

Marc Becker

Statistical Learning & Data Science


[694]
M. Binder and F. Pfisterer.
Sequential Pipelines.
Applied Machine Learning Using mlr3 in R II.7 (Jan. 2024). DOI.
Abstract

Computational pipelines provide a layer of abstraction for swapping in and out different elements of the pipeline. In machine learning this can be useful for swapping algorithms, as well as common operations for data preprocessing and model post processing. Many real-world machine learning applications involve more than just fitting a single model at a time: It is often beneficial or even necessary to preprocess data for feature engineering and compatibility with learners. In many cases it is also useful to combine predictions of multiple models in ensembles. By defining these workflows as computational objects, it is then possible to treat them like models to be trained/tested and even tuned. This chapter introduces mlr3pipelines, a dataflow programming language that can be used to define machine learning processes from simple building blocks. The chapter focuses on sequential pipelines, in which data passes from one operation to another in a linear sequence and each operation has one input and output. The chapter introduces PipeOp and Graph, which are the building blocks of a pipeline, and provides some concrete examples with PCA.

MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science


[693]
M. Binder, F. Pfisterer, M. Becker and M. N. Wright.
Non-sequential Pipelines and Tuning.
Applied Machine Learning Using mlr3 in R II.8 (Jan. 2024). DOI.
Abstract

Real-world applications often require complicated pipeline that do not progress sequentially. For example, many experiments have demonstrated that bagging is a powerful method to improve model performance. Bagging can be thought of as a non-sequential pipeline where a learner is replicated, each separate learner is trained and makes predictions, and their results are combined. This is non-sequential as data is not flowing sequentially through the pipeline but is instead passed to all learners (who may then subsample the data) and then recombined, thus creating a pipeline where operations have multiple inputs and outputs. Pipeline operations also have hyperparameters that can be set and tuned to improve model performance. Moreover the choice of operations to include in a pipeline can also be tuned, known as combined algorithm selection and hyperparameter optimization (CASH).
This chapter looks at more advanced uses of mlr3pipelines. This is put into practice by demonstrating how to build a bagging and stacking pipeline from scratch, as well as how to access common pipelines that are readily available in mlr3pipelines. The chapter then looks at tuning pipelines and CASH.

MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Marc Becker

Marc Becker

Statistical Learning & Data Science


[692]
M. Lang, S. Fischer and R. Sonabend.
Advanced Technical Aspects of mlr3.
Applied Machine Learning Using mlr3 in R IV.10 (Jan. 2024). DOI.
Abstract

Parallelization is often required to efficiently run machine learning models, which means models are run simultaneously on multiple CPU cores, CPUs, or computational nodes. This chapter begins by demonstrating how mlr3 uses the future package for parallelization and how different ‘plans’ can be applied to mlr3 experiments.
In large machine learning experiments, it is common for a model to error during training or predicting. This is because the algorithms have to process arbitrary data, and not all eventualities can always be handled. It is therefore imperative to have robust methods for encapsulating and dealing with errors. This chapter builds on what has been briefly seen in Chapter 5 to discuss error handling and logging, including how to make use of fallback learners in experiments.
Large experiments may also require data to be handled in different formats and to prevent all the data being loaded into memory. This chapter discussed different ‘backends’ that can be used for mlr3 Tasks, including interfacing with DuckDB and SQL.
Finally, this chapter demonstrates how to extend classes in mlr3 by using the Measure class as an example. This may be of particular interest to readers who want to create new Measures or Learners.

MCML Authors

[691]
S. Fischer, M. Lang and M. Becker.
Large-Scale Benchmarking.
Applied Machine Learning Using mlr3 in R IV.11 (Jan. 2024). DOI.
Abstract

In the field of machine learning, benchmark experiments are used to evaluate and compare the performance of algorithms. To draw robust conclusions, benchmark experiments often have to be ‘large-scale’, which means including many datasets, learners, and possibly measures. Finding datasets can be difficult and the choice of dataset impacts conclusions that can be drawn. Conducting large-scale benchmark experiments is also complex as they are usually computationally intensive. It is therefore common to make use of high-performance computing clusters to efficiently run the experiment. Finally once these experiments are run, analysis of experiments usually requires more than a single score from a given performance measure, and therefore statistical test are often employed.
This chapter introduces mlr3oml for interfacing the OpenML database for accessing data and tasks. It then continues by discussing how to run experiments on high-performance computing clusters using batchtools and mlr3batchmark. Finally, mlr3benchmark is introduced for statistical analysis including Friedman tests and critical difference diagrams.

MCML Authors
Link to Sebastian Fischer

Sebastian Fischer

Statistical Learning & Data Science

Link to Marc Becker

Marc Becker

Statistical Learning & Data Science


[690]
S. Dandl, P. Biecek, G. Casalicchio and M. N. Wright.
Model Interpretation.
Applied Machine Learning Using mlr3 in R IV.12 (Jan. 2024). DOI.
Abstract

The increasing availability of data and software frameworks to create predictive models has allowed the widespread adoption of machine learning in many applications. However, high predictive performance of such models often comes at the cost of interpretability. Machine learning interpretation methods can be useful for several purposes: 1) gaining global insights into a model (e.g., feature importance); 2) model improvement if flaws were identified (e.g., unexpected reliance on a certain feature); 3) understanding individual predictions. Several model-agnostic methods have been developed including feature permutation, Shapleys, and LIME.
This chapter presents the packages iml, counterfactuals, and DALEX, which implement model-agnostic interpretation methods. Throughout the chapter an xgboost is trained on the german credit dataset to understand how predictions are made and why. The chapter starts with discussing the iml package and the theory behind the discussed methods, as well as how to practically use the interface. It then moves to counterfactuals and the benefits of counterfactual analysis, including methods What-If and MOC. Finally, DALEX is introduced, which includes similar methods to iml but with a different design, hence users can make use of either package depending on their design preference.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[689]
J. Gertheiss, D. Rügamer, B. Liew and S. Greven.
Functional Data Analysis: An Introduction and Recent Developments.
Biometrical Journal (2024). To be published. Preprint at arXiv. arXiv. GitHub.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[688]
B. Bischl, R. Sonabend, L. Kotthoff and M. Lang.
Applied Machine Learning Using mlr3 in R.
Chapman and Hall/CRC (Jan. 2024). DOI.
Abstract

mlr3 is an award-winning ecosystem of R packages that have been developed to enable state-of-the-art machine learning capabilities in R. Applied Machine Learning Using mlr3 in R gives an overview of flexible and robust machine learning methods, with an emphasis on how to implement them using mlr3 in R. It covers various key topics, including basic machine learning tasks, such as building and evaluating a predictive model; hyperparameter tuning of machine learning approaches to obtain peak performance; building machine learning pipelines that perform complex operations such as pre-processing followed by modelling followed by aggregation of predictions; and extending the mlr3 ecosystem with custom learners, measures, or pipeline components. The book is primarily aimed at researchers, practitioners, and graduate students who use machine learning or who are interested in using it. It can be used as a textbook for an introductory or advanced machine learning class that uses R, as a reference for people who work with machine learning methods, and in industry for exploratory experiments in machine learning.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[687]
A. Farshad.
Learning to Learn Neural Representations with Limited Data and Supervision.
Dissertation 2024. URL.
Abstract

Learning to learn is a powerful paradigm that enables machine learning models to leverage the previously learned features for new tasks and domains more effectively. This thesis explores different aspects of learning to learn from data, models, and semantics, and shows how they can enhance various computer vision and medical imaging tasks. In the first part of the thesis, we present novel and fundamental research on learning to learn from data, and in the second part, we investigate the use of high-level semantics in generative models.

MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[686]
L. Haliburton.
Designing behavior change technologies for workplace wellbeing.
Dissertation 2024. DOI.
Abstract

Advances in technology have made humans more productive at work but often at the cost of wellbeing, with issues like sedentary behavior, social isolation, and excessive screen time affecting modern knowledge workers. Despite efforts to introduce healthy interventions, such as standing desks, uptake remains low due to the intention-behavior gap. This thesis explores ways to design technology that encourages healthy behaviors, using passive and active behavior change methods to motivate users, and proposes a design framework for ethical behavior change technologies that promote a healthier, more productive workplace. (Shortened).

MCML Authors
Link to Luke Haliburton

Luke Haliburton

Dr.

Human-Centered Ubiquitous Media


[685]
H. Krasowski.
Guaranteeing Complex Safety Specifications for Autonomous Vehicles via Reinforcement Learning with Formal Methods.
Dissertation 2024. URL.
Abstract

Reinforcement learning (RL) solves complicated motion planning tasks for autonomous vehicles. Current RL methods lack safety guarantees. This dissertation combines RL with formal methods that verify safety specifications so that only verified actions are executed. The safe RL approaches are developed for autonomous vehicles and their complex safety specifications. The evaluation confirms the safety guarantees and real-time capability.

MCML Authors
Link to Hanna Krasowski

Hanna Krasowski

Dr.

Cyber Physical Systems


[684]
C. Leiber.
Clustering in transformed feature spaces by analyzing distinct modes.
Dissertation 2024. DOI.
Abstract

The growing availability of data demands clustering methods that can extract valuable information without requiring costly annotations, especially for large, high-dimensional datasets. This dissertation develops subspace and deep clustering approaches, leveraging methods like the Dip-test of unimodality and Minimum Description Length principle to identify and encode relevant features and clusters automatically, even in complex datasets. By incorporating these techniques into neural networks and refining them through a novel parameter-free approach, the research offers robust clustering tools that perform well without prior knowledge of the number of clusters, all implemented in the open-source package ClustPy. (Shortened).

MCML Authors
Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining


[683]
P. Wesp.
Application of machine learning in CT colonography and radiological age assessment: enhancing traditional diagnostics in radiology.
Dissertation 2024. DOI.
Abstract

Machine learning can address limitations in radiology where traditional methods fall short, as shown by this work’s focus on two clinical problems: differentiating premalignant from benign colorectal polyps and continuous age prediction through clavicle ossification in CT scans. For colorectal polyps, a random forest classifier and CNN models enabled non-invasive differentiation between benign and premalignant types in CT colonography, potentially supporting more precise cancer prevention. For age assessment, a deep learning model trained on automatically detected clavicle regions achieved superior accuracy compared to human estimates, demonstrating machine learning’s potential to enhance radiological diagnostics in complex cases. (Shortened).

MCML Authors
Link to Philipp Wesp

Philipp Wesp

Clinical Data Science in Radiology


[682]
J. Xie, Y. Shi, D. Ni, M. Milling, S. Liu, J. Zhang, K. Qian and B. W. Schuller.
Automatic Bird Sound Source Separation Based on Passive Acoustic Devices in Wild Environment.
IEEE Internet of Things Journal 11.9 (Jan. 2024). DOI.
Abstract

The Internet of Things (IoT)-based passive acoustic monitoring (PAM) has shown great potential in large-scale remote bird monitoring. However, field recordings often contain overlapping signals, making precise bird information extraction challenging. To solve this challenge, first, the interchannel spatial feature is chosen as complementary information to the spectral feature to obtain additional spatial correlations between the sources. Then, an end-to-end model named BACPPNet is built based on Deeplabv3plus and enhanced with the polarized self-attention mechanism to estimate the spectral magnitude mask (SMM) for separating bird vocalizations. Finally, the separated bird vocalizations are recovered from SMMs and the spectrogram of mixed audio using the inverse short Fourier transform (ISTFT). We evaluate our proposed method utilizing the generated mixed data set. Experiments have shown that our method can separate bird vocalizations from mixed audio with root mean square error (RMSE), source-to-distortion ratio (SDR), source-to-interference ratio (SIR), source-to-artifact ratio (SAR), and short-time objective intelligibility (STOI) values of 2.82, 10.00 dB, 29.90 dB, 11.08 dB, and 0.66, respectively, which are better than existing methods. Furthermore, the average classification accuracy of the separated bird vocalizations drops the least. This indicates that our method outperforms other compared separation methods in bird sound separation and preserves the fidelity of the separated sound sources, which might help us better understand wild bird sound recordings.

MCML Authors
Link to Manuel Milling

Manuel Milling

Health Informatics

Link to Björn Schuller

Björn Schuller

Prof. Dr.

Health Informatics


[681]
T. Yang, J. Maly, S. Dirksen and G. Caire.
Plug-In Channel Estimation With Dithered Quantized Signals in Spatially Non-Stationary Massive MIMO Systems.
IEEE Transactions on Communications 72.1 (Jan. 2024). DOI.
Abstract

As the array dimension of massive MIMO systems increases to unprecedented levels, two problems occur. First, the spatial stationarity assumption along the antenna elements is no longer valid. Second, the large array size results in an unacceptably high power consumption if high-resolution analog-to-digital converters are used. To address these two challenges, we consider a Bussgang linear minimum mean square error (BLMMSE)-based channel estimator for large scale massive MIMO systems with one-bit quantizers and a spatially non-stationary channel. Whereas other works usually assume that the channel covariance is known at the base station, we consider a plug-in BLMMSE estimator that uses an estimate of the channel covariance and rigorously analyze the distortion produced by using an estimated, rather than the true, covariance. To cope with the spatial non-stationarity, we introduce dithering into the quantized signals and provide a theoretical error analysis. In addition, we propose an angular domain fitting procedure which is based on solving an instance of non-negative least squares. For the multi-user data transmission phase, we further propose a BLMMSE-based receiver to handle one-bit quantized data signals. Our numerical results show that the performance of the proposed BLMMSE channel estimator is very close to the oracle-aided scheme with ideal knowledge of the channel covariance matrix. The BLMMSE receiver outperforms the conventional maximum-ratio-combining and zero-forcing receivers in terms of the resulting ergodic sum rate.

MCML Authors
Link to Johannes Maly

Johannes Maly

Prof. Dr.

Associate

Mathematical Data Science and Artificial Intelligence


[680]
F. Zhang, Y. Shi, Z. Xiong and X. Zhu.
Few-Shot Object Detection in Remote Sensing: Lifting the Curse of Incompletely Annotated Novel Objects.
IEEE Transactions on Geoscience and Remote Sensing 62 (Jan. 2024). DOI. GitHub.
Abstract

Object detection (OD) is an essential and fundamental task in computer vision (CV) and satellite image processing. Existing deep learning methods have achieved impressive performance thanks to the availability of large-scale annotated datasets. Yet, in real-world applications, the availability of labels is limited. In this article, few-shot OD (FSOD) has emerged as a promising direction, which aims at enabling the model to detect novel objects with only few of them annotated. However, many existing FSOD algorithms overlook a critical issue: when an input image contains multiple novel objects and only a subset of them are annotated, the unlabeled objects will be considered as background during training. This can cause confusions and severely impact the model’s ability to recall novel objects. To address this issue, we propose a self-training-based FSOD (ST-FSOD) approach, which incorporates the self-training mechanism into the few-shot fine-tuning process. ST-FSOD aims to enable the discovery of novel objects that are not annotated and take them into account during training. On the one hand, we devise a two-branch region proposal networks (RPNs) to separate the proposal extraction of base and novel objects. On the another hand, we incorporate the student-teacher mechanism into RPN and the region-of-interest (RoI) head to include those highly confident yet unlabeled targets as pseudolabels. Experimental results demonstrate that our proposed method outperforms the state of the art in various FSOD settings by a large margin.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[679]
P. Wesp, B. M. Schachtner, K. Jeblick, J. Topalis, M. Weber, F. Fischer, R. Penning, J. Ricke, M. Ingrisch and B. O. Sabel.
Radiological age assessment based on clavicle ossification in CT: enhanced accuracy through deep learning.
International Journal of Legal Medicine (Jan. 2024). DOI.
MCML Authors
Link to Philipp Wesp

Philipp Wesp

Clinical Data Science in Radiology

Link to Katharina Jeblick

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[678]
L. Kook, P. F. M. Baumann, O. Dürr, B. Sick and D. Rügamer.
Estimating Conditional Distributions with Neural Networks using R package deeptrafo.
Journal of Statistical Software (2024). To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[677]
F. Bongratz, A.-M. Rickmann and C. Wachinger.
Neural deformation fields for template-based reconstruction of cortical surfaces from MRI.
Medical Image Analysis 93 (Jan. 2024). DOI.
Abstract

The reconstruction of cortical surfaces is a prerequisite for quantitative analyses of the cerebral cortex in magnetic resonance imaging (MRI). Existing segmentation-based methods separate the surface registration from the surface extraction, which is computationally inefficient and prone to distortions. We introduce Vox2Cortex-Flow (V2C-Flow), a deep mesh-deformation technique that learns a deformation field from a brain template to the cortical surfaces of an MRI scan. To this end, we present a geometric neural network that models the deformation-describing ordinary differential equation in a continuous manner. The network architecture comprises convolutional and graph-convolutional layers, which allows it to work with images and meshes at the same time. V2C-Flow is not only very fast, requiring less than two seconds to infer all four cortical surfaces, but also establishes vertex-wise correspondences to the template during reconstruction. In addition, V2C-Flow is the first approach for cortex reconstruction that models white matter and pial surfaces jointly, therefore avoiding intersections between them. Our comprehensive experiments on internal and external test data demonstrate that V2C-Flow results in cortical surfaces that are state-of-the-art in terms of accuracy. Moreover, we show that the established correspondences are more consistent than in FreeSurfer and that they can directly be utilized for cortex parcellation and group analyses of cortical thickness.

MCML Authors
Link to Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Radiology

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[676]
V. Lehmann, T. Zueger, M. Maritsch, M. Notter, S. Schallmoser, C. Bérubé, C. Albrecht, M. Kraus, S. Feuerriegel, E. Fleisch, T. Kowatsch, S. Lagger, M. Laimer, F. Wortmann and C. Stettler.
Machine Learning to Infer a Health State Using Biomedical Signals - Detection of Hypoglycemia in People with Diabetes while Driving Real Cars.
NEJM AI (Jan. 2024). DOI.
MCML Authors
Link to Simon Schallmoser

Simon Schallmoser

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[675]
D. Zhu, Q. Khan and D. Cremers.
Multi-vehicle trajectory prediction and control at intersections using state and intention information.
Neurocomputing 574 (Jan. 2024). DOI. GitHub.
MCML Authors
Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[674]
M. M. Mandl, S. Hoffmann, S. Bieringer, A. E. Jacob, M. Kraft, S. Lemster and A.-L. Boulesteix.
Raising awareness of uncertain choices in empirical data analysis: A teaching concept towards replicable research practices.
PLOS Computational Biology 20.3 (2024). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[673]
H. Boch, A. Fono and G. Kutyniok.
Mathematical Algorithm Design for Deep Learning under Societal and Judicial Constraints: The Algorithmic Transparency Requirement.
Preprint at arXiv (Jan. 2024). arXiv.
Abstract

Deep learning still has drawbacks in terms of trustworthiness, which describes a comprehensible, fair, safe, and reliable method. To mitigate the potential risk of AI, clear obligations associated to trustworthiness have been proposed via regulatory guidelines, e.g., in the European AI Act. Therefore, a central question is to what extent trustworthy deep learning can be realized. Establishing the described properties constituting trustworthiness requires that the factors influencing an algorithmic computation can be retraced, i.e., the algorithmic implementation is transparent. Motivated by the observation that the current evolution of deep learning models necessitates a change in computing technology, we derive a mathematical framework which enables us to analyze whether a transparent implementation in a computing model is feasible. We exemplarily apply our trustworthiness framework to analyze deep learning approaches for inverse problems in digital and analog computing models represented by Turing and Blum-Shub-Smale Machines, respectively. Based on previous results, we find that Blum-Shub-Smale Machines have the potential to establish trustworthy solvers for inverse problems under fairly general conditions, whereas Turing machines cannot guarantee trustworthiness to the same degree.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[672]
L. Bothmann, K. Peters and B. Bischl.
What Is Fairness? On the Role of Protected Attributes and Fictitious Worlds.
Preprint at arXiv (Jan. 2024). arXiv.
Abstract

A growing body of literature in fairness-aware machine learning (fairML) aims to mitigate machine learning (ML)-related unfairness in automated decision-making (ADM) by defining metrics that measure fairness of an ML model and by proposing methods to ensure that trained ML models achieve low scores on these metrics. However, the underlying concept of fairness, i.e., the question of what fairness is, is rarely discussed, leaving a significant gap between centuries of philosophical discussion and the recent adoption of the concept in the ML community. In this work, we try to bridge this gap by formalizing a consistent concept of fairness and by translating the philosophical considerations into a formal framework for the training and evaluation of ML models in ADM systems. We argue that fairness problems can arise even without the presence of protected attributes (PAs), and point out that fairness and predictive performance are not irreconcilable opposites, but that the latter is necessary to achieve the former. Furthermore, we argue why and how causal considerations are necessary when assessing fairness in the presence of PAs by proposing a fictitious, normatively desired (FiND) world in which PAs have no causal effects. In practice, this FiND world must be approximated by a warped world in which the causal effects of the PAs are removed from the real-world data. Finally, we achieve greater linguistic clarity in the discussion of fairML. We outline algorithms for practical applications and present illustrative experiments on COMPAS data.

MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[671]
S. Dirksen and J. Maly.
Tuning-free one-bit covariance estimation using data-driven dithering.
Preprint at arXiv (Jan. 2024). arXiv.
Abstract

We consider covariance estimation of any subgaussian distribution from finitely many i.i.d. samples that are quantized to one bit of information per entry. Recent work has shown that a reliable estimator can be constructed if uniformly distributed dithers on [−λ,λ] are used in the one-bit quantizer. This estimator enjoys near-minimax optimal, non-asymptotic error estimates in the operator and Frobenius norms if λ is chosen proportional to the largest variance of the distribution. However, this quantity is not known a-priori, and in practice λ needs to be carefully tuned to achieve good performance. In this work we resolve this problem by introducing a tuning-free variant of this estimator, which replaces λ by a data-driven quantity. We prove that this estimator satisfies the same non-asymptotic error estimates - up to small (logarithmic) losses and a slightly worse probability estimate. We also show that by using refined data-driven dithers that vary per entry of each sample, one can construct an estimator satisfying the same estimation error bound as the sample covariance of the samples before quantization – again up logarithmic losses. Our proofs rely on a new version of the Burkholder-Rosenthal inequalities for matrix martingales, which is expected to be of independent interest.

MCML Authors
Link to Johannes Maly

Johannes Maly

Prof. Dr.

Associate

Mathematical Data Science and Artificial Intelligence


[670]
P. Gupta, M. Wever and E. Hüllermeier.
Information Leakage Detection through Approximate Bayes-optimal Prediction.
Preprint at arXiv (Jan. 2024). arXiv.
Abstract

In today’s data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor’s log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.

MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[669]
M. M. Mandl, A. S. Becker-Pennrich, L. C. Hinske, S. Hoffmann and A.-L. Boulesteix.
Addressing researcher degrees of freedom through minP adjustment.
Preprint at arXiv (Jan. 2024). arXiv.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[668]
Z. S. Dunias, B. Van Calster, D. Timmerman, A.-L. Boulesteix and M. van Smeden.
A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study.
Statistics in Medicine (Jan. 2024). DOI.
Abstract

Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[667]
M. Wünsch, C. Sauer, P. Callahan, L. C. Hinske and A.-L. Boulesteix.
From RNA sequencing measurements to the final results: a practical guide to navigating the choices and uncertainties of gene set analysis.
Wiley Interdisciplinary Reviews: Computational Statistics 16.1 (Jan. 2024). DOI.
Abstract

Gene set analysis (GSA), a popular approach for analyzing high-throughput gene expression data, aims to identify sets of related genes that show significantly enriched or depleted expression patterns between different conditions. In the last years, a multitude of methods have been developed for this task. However, clear guidance is lacking: choosing the right method is the first hurdle a researcher is confronted with. No less challenging than overcoming this so-called method uncertainty is the procedure of preprocessing, from knowing which steps are required to selecting a corresponding approach from the plethora of valid options to create the accepted input object (data preprocessing uncertainty), with clear guidance again being scarce. Here, we provide a practical guide through all steps required to conduct GSA, beginning with a concise overview of a selection of established methods, including Gene Set Enrichment Analysis and Database for Annotation, Visualization, and Integrated Discovery (DAVID). We thereby lay a special focus on reviewing and explaining the necessary preprocessing steps for each method under consideration (e.g., the necessity of a transformation of the RNA sequencing data)—an essential aspect that is typically paid only limited attention to in both existing reviews and applications. To raise awareness of the spectrum of uncertainties, our review is accompanied by an extensive overview of the literature on valid approaches for each step and illustrative R code demonstrating the complex analysis pipelines. It ends with a discussion and recommendations to both users and developers to ensure that the results of GSA are, despite the above-mentioned uncertainties, replicable and transparent.

MCML Authors
Link to Christina Sauer (née Nießl)

Christina Sauer (née Nießl)

Biometry in Molecular Medicine

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


2023


[666]
H. A. Gündüz, S. Giri, M. Binder, B. Bischl and M. Rezaei.
Uncertainty Quantification for Deep Learning Models Predicting the Regulatory Activity of DNA Sequences.
22nd IEEE International Conference on Machine Learning and Applications (ICMLA 2023). Jacksonville, Florida, USA, Dec 15-17, 2023. DOI.
Abstract

The field of computational biology has been enhanced by deep learning models, which hold great promise for revolutionizing domains such as protein folding and drug discovery. Recent studies have underscored the tremendous potential of these models, particularly in the realm of gene regulation and the more profound understanding of the non-coding regions of the genome. On the other hand, this raises significant concerns about the reliability and efficacy of such models, which have their own biases by design, along with those learned from the data. Uncertainty quantification allows us to measure where the system is confident and know when it can be trusted. In this paper, we study several uncertainty quantification methods with respect to a multi-target regression task, specifically predicting regulatory activity profiles using DNA sequence data. Using the Basenji model, we investigate how such methods can improve in-domain generalization, out-of-distribution detection, and provide coverage guarantees on prediction intervals.

MCML Authors
Link to Hüseyin Anil Gündüz

Hüseyin Anil Gündüz

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[665]
M. Zahn von, O. Hinz and S. Feuerriegel.
Locating disparities in machine learning.
IEEE International Conference on Big Data (IEEE BigData 2023). Sorrento, Italy, Dec 15-18, 2023. DOI.
MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[664]
M. Singh, A. Fono and G. Kutyniok.
Expressivity of Spiking Neural Networks through the Spike Response Model.
1st Workshop on Unifying Representations in Neural Models (UniReps 2023) at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[663]
S. Chen, J. Gu, Z. Han, Y. Ma, P. Torr and V. Tresp.
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL. GitHub.
MCML Authors
Link to Shuo Chen

Shuo Chen

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[662]
D. Frauen, V. Melnychuk and S. Feuerriegel.
Sharp Bounds for Generalized Causal Sensitivity Analysis.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[661]
F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
SHAP-IQ: Unified Approximation of any-order Shapley Interactions.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Maximilian Muschalik

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[660]
M. Ghahremani and C. Wachinger.
RegBN: Batch Normalization of Multimodal Data with Regularization.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL. GitHub.
MCML Authors
Link to Morteza Ghahremani

Morteza Ghahremani

Dr.

Artificial Intelligence in Radiology

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[659]
C. Kümmerle and J. Maly.
Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
Abstract

We propose a new algorithm for the problem of recovering data that adheres to multiple, heterogenous low-dimensional structures from linear observations. Focussing on data matrices that are simultaneously row-sparse and low-rank, we propose and analyze an iteratively reweighted least squares (IRLS) algorithm that is able to leverage both structures. In particular, it optimizes a combination of non-convex surrogates for row-sparsity and rank, a balancing of which is built into the algorithm. We prove locally quadratic convergence of the iterates to a simultaneously structured data matrix in a regime of minimal sample complexity (up to constants and a logarithmic factor), which is known to be impossible for a combination of convex surrogates. In experiments, we show that the IRLS method exhibits favorable empirical convergence, identifying simultaneously row-sparse and low-rank matrices from fewer measurements than state-of-the-art methods.

MCML Authors
Link to Johannes Maly

Johannes Maly

Prof. Dr.

Associate

Mathematical Data Science and Artificial Intelligence


[658]
S. Maskey, R. Paolino, A. Bacho and G. Kutyniok.
A Fractional Graph Laplacian Approach to Oversmoothing.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL. GitHub.
MCML Authors
Link to Raffaele Paolino

Raffaele Paolino

Mathematical Foundations of Artificial Intelligence

Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[657]
V. Melnychuk, D. Frauen and S. Feuerriegel.
Partial Counterfactual Identification of Continuous Outcomes with a Curvature Sensitivity Model.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[656]
S. Šćepanović, I. Obadic, S. Joglekar, L. Giustarini, C. Nattero, D. Quercia and X. Zhu.
MedSat: A Public Health Dataset for England Featuring Medical Prescriptions and Satellite Imagery.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
Abstract

As extreme weather events become more frequent, understanding their impact on human health becomes increasingly crucial. However, the utilization of Earth Observation to effectively analyze the environmental context in relation to health remains limited. This limitation is primarily due to the lack of fine-grained spatial and temporal data in public and population health studies, hindering a comprehensive understanding of health outcomes. Additionally, obtaining appropriate environmental indices across different geographical levels and timeframes poses a challenge. For the years 2019 (pre-COVID) and 2020 (COVID), we collected spatio-temporal indicators for all Lower Layer Super Output Areas in England. These indicators included: i) 111 sociodemographic features linked to health in existing literature, ii) 43 environmental point features (e.g., greenery and air pollution levels), iii) 4 seasonal composite satellite images each with 11 bands, and iv) prescription prevalence associated with five medical conditions (depression, anxiety, diabetes, hypertension, and asthma), opioids and total prescriptions. We combined these indicators into a single MEDSAT dataset, the availability of which presents an opportunity for the machine learning community to develop new techniques specific to public health. These techniques would address challenges such as handling large and complex data volumes, performing effective feature engineering on environmental and sociodemographic factors, capturing spatial and temporal dependencies in the models, addressing imbalanced data distributions, developing novel computer vision methods for health modeling based on satellite imagery, ensuring model explainability, and achieving generalization beyond the specific geographical region.

MCML Authors
Link to Ivica Obadic

Ivica Obadic

Data Science in Earth Observation

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[655]
J. Schweisthal, D. Frauen, V. Melnychuk and S. Feuerriegel.
Reliable Off-Policy Learning for Dosage Combinations.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Jonas Schweisthal

Jonas Schweisthal

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[654]
N. Sturma, C. Squires, M. Drton and C. Uhler.
Unpaired Multi-Domain Causal Representation Learning.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[653]
G. Zhai, E. P. Örnek, S.-C. Wu, Y. Di, F. Tombari, N. Navab and B. Busam.
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graphs.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Guangyao Zhai

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Benjamin Busam

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[652]
S. Zhang, P. Wicke, L. K. Senel, L. Figueredo, A. Naceri, S. Haddadin, B. Plank and H. Schütze.
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation.
6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Shengqiang Zhang

Shengqiang Zhang

Statistical NLP and Deep Learning

Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[651]
X. Li, E. Nie and S. Liang.
From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL.
Workshop Instruction Tuning and Instruction Following at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Sheng Liang

Sheng Liang

Statistical NLP and Deep Learning


[650]
R. Liao, X. Jia, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
Workshop New Frontiers in Graph Learning (GLFrontiers 2023) at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[649]
Y. Zhang, Y. Li, H. Brown, M. Rezaei, B. Bischl, P. Torr, A. Khakzar and K. Kawaguchi.
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments.
Workshop XAI in Action: Past, Present, and Future Applications (XAIA 2023) at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
Abstract

Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.

MCML Authors
Link to Yawei Li

Yawei Li

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member


[648]
M. F. Azampour, Y. Velikova, E. Fatemizadeh, S. P. Dakua and N. Navab.
Self-supervised Probe Pose Regression via Optimized Ultrasound Representations for US-CT Fusion.
International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD 2023). Cambridge, UK, Dec 09-10, 2023. DOI. GitHub.
Abstract

Aligning 2D ultrasound images with 3D CT scans of the liver holds significant clinical value in enhancing diagnostic precision, surgical planning, and treatment delivery. Conventional approaches primarily rely on optimization techniques, which often have a limited capture range and are susceptible to initialization errors. To address these limitations, we define the problem as “probe pose regression” and leverage deep learning for a more robust and efficient solution for liver US-CT registration without access to paired data. The proposed method is a three-part framework that combines ultrasound rendering, generative model and pose regression. In the first stage, we exploit a differentiable ultrasound rendering model designed to synthesize ultrasound images given segmentation labels. We let the downstream task optimize the rendering parameters, enhancing the performance of the overall method. In the second stage, a generative model bridges the gap between real and rendered ultrasound images, enabling application on real B-mode images. Finally, we use a patient-specific pose regression network, trained self-supervised with only synthetic images and their known poses. We use ultrasound, and CT scans from a dual-modality human abdomen phantom to validate the proposed method.
Our experimental results indicate that the proposed method can estimate probe poses within an acceptable error margin, which can later be fine-tuned using conventional methods. This capability confirms that the proposed framework can serve as a reliable initialization step for US-CT fusion and achieve fully automated US-CT fusion when coupled with conventional methods.

MCML Authors
Link to Mohammad Farid Azampour

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Link to Yordanka Velikova

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[647]
X. Li, E. Nie and S. Liang.
Crosslingual Retrieval Augmented In-context Learning for Bangla.
1st Workshop on Bangla Language Processing (BLP-2023). Singapore, Dec 07, 2023. DOI.
MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Sheng Liang

Sheng Liang

Statistical NLP and Deep Learning


[646]
V. Hangya, S. Severini, R. Ralev, A. Fraser and H. Schütze.
Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages.
3rd Workshop on Multi-lingual Representation Learning (MRL 2023) at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

Very low-resource languages, having only a few million tokens worth of data, are not well- supported by multilingual NLP approaches due to poor quality cross-lingual word representations. Recent work showed that good crosslingual performance can be achieved if a source language is related to the low-resource target language. However, not all language pairs are related. In this paper, we propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach, that incorporates intermediate related languages to bridge the gap between the distant source and target. We build MWEs one language at a time by starting from the resource rich source and sequentially adding each language in the chain till we reach the target. We extend a semi-joint bilingual approach to multiple languages in order to eliminate the main weakness of previous works, i.e., independently trained monolingual embeddings, by anchoring the target language around the multilingual space. We evaluate our method on bilingual lexicon induction for 4 language families, involving 4 very low-resource (≤ 5M tokens) and 4 moderately low-resource (≤ 50M) target languages, showing improved performance in both categories. Additionally, our analysis reveals the importance of good quality embeddings for intermediate languages as well as the importance of leveraging anchor points from all languages in the multilingual space.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[645]
Z. Zhang, H. Yang, B. Ma, D. Rügamer and E. Nie.
Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models.
BabyLM Challenge at 27th Conference on Computational Natural Language Learning (CoNLL 2023). Singapore, Dec 06-10, 2023. DOI. GitHub.
MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning


[644]
M. Di Marco, K. Hämmerl and A. Fraser.
A Study on Accessing Linguistic Information in Pre-Trained Language Models by Using Prompts.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

We study whether linguistic information in pre-trained multilingual language models can be accessed by human language: So far, there is no easy method to directly obtain linguistic information and gain insights into the linguistic principles encoded in such models. We use the technique of prompting and formulate linguistic tasks to test the LM’s access to explicit grammatical principles and study how effective this method is at providing access to linguistic features. Our experiments on German, Icelandic and Spanish show that some linguistic properties can in fact be accessed through prompting, whereas others are harder to capture.

MCML Authors
Link to Katharina Hämmerl

Katharina Hämmerl

Data Analytics & Statistics

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[643]
E. Garces Arias, V. Pai, M. Schöffel, C. Heumann and M. Aßenmacher.
Automatic transcription of handwritten Old Occitan language.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

While existing neural network-based approaches have shown promising results in Handwritten Text Recognition (HTR) for high-resource languages and standardized/machine-written text, their application to low-resource languages often presents challenges, resulting in reduced effectiveness. In this paper, we propose an innovative HTR approach that leverages the Transformer architecture for recognizing handwritten Old Occitan language. Given the limited availability of data, which comprises only word pairs of graphical variants and lemmas, we develop and rely on elaborate data augmentation techniques for both text and image data. Our model combines a custom-trained Swin image encoder with a BERT text decoder, which we pre-train using a large-scale augmented synthetic data set and fine-tune on the small human-labeled data set. Experimental results reveal that our approach surpasses the performance of current state-of-the-art models for Old Occitan HTR, including open-source Transformer-based models such as a fine-tuned TrOCR and commercial applications like Google Cloud Vision. To nurture further research and development, we make our models, data sets, and code publicly available.

MCML Authors
Link to Esteban Garces Arias

Esteban Garces Arias

Statistical Learning & Data Science

Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[642]
M. Giulianelli, J. Baan, W. Aziz, R. Fernández and B. Plank.
What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways. We characterise the extent to which human production varies lexically, syntactically, and semantically across four NLG tasks, connecting human production variability to aleatoric or data uncertainty. We then inspect the space of output strings shaped by a generation system’s predicted probability distribution and decoding algorithm to probe its uncertainty. For each test input, we measure the generator’s calibration to human production variability. Following this instance-level approach, we analyse NLG models and decoding strategies, demonstrating that probing a generator with multiple samples and, when possible, multiple references, provides the level of detail necessary to gain understanding of a model’s representation of uncertainty.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[641]
N. Kassner, O. Tafjord, A. Sabharwal, K. Richardson, H. Schütze and P. Clark.
Language Models with Rationality.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent ‘beliefs’. This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that answers are supported by interpretable chains of reasoning drawn from a consistent network of beliefs. Our approach, which we call REFLEX, is to add a rational, self-reflecting layer on top of the LLM. First, given a question, we construct a belief graph using a backward-chaining process to materialize relevant model beliefs (including beliefs about answer candidates) and their inferential relationships. Second, we identify and minimize contradictions in that graph using a formal constraint reasoner. We find that REFLEX significantly improves consistency (by 8%-11% absolute) without harming overall answer accuracy, resulting in answers supported by faithful chains of reasoning drawn from a more consistent belief system. This suggests a new style of system architecture in which an LLM extended with a rational layer can provide an interpretable window into system beliefs, add a systematic reasoning capability, and repair latent inconsistencies present in the LLM.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[640]
R. Litschko, M. Müller-Eberstein, R. van der Goot, L. Weber-Genzel and B. Plank.
Establishing Trustworthiness: Rethinking Tasks and Model Evaluation.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades. Traditionally, facets of linguistic intelligence have been compartmentalized into tasks with specialized model architectures and corresponding evaluation protocols. With the advent of large language models (LLMs) the community has witnessed a dramatic shift towards general purpose, task-agnostic approaches powered by generative models. As a consequence, the traditional compartmentalized notion of language tasks is breaking down, followed by an increasing challenge for evaluation and analysis. At the same time, LLMs are being deployed in more real-world scenarios, including previously unforeseen zero-shot setups, increasing the need for trustworthy and reliable systems. Therefore, we argue that it is time to rethink what constitutes tasks and model evaluation in NLP, and pursue a more holistic view on language, placing trustworthiness at the center. Towards this goal, we review existing compartmentalized approaches for understanding the origins of a model’s functional capacity, and provide recommendations for more multi-faceted evaluation protocols.

MCML Authors
Link to Robert Litschko

Robert Litschko

Artificial Intelligence and Computational Linguistics

Link to Leon Weber-Genzel

Leon Weber-Genzel

Dr.

* Former member

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[639]
M. Wang, H. Adel, L. Lange, J. Strötgen and H. Schütze.
GradSim: Gradient-Based Language Grouping for Effective Multilingual Training.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

Most languages of the world pose low-resource challenges to natural language processing models. With multilingual training, knowledge can be shared among languages. However, not all languages positively influence each other and it is an open research question how to select the most suitable set of languages for multilingual training and avoid negative interference among languages whose characteristics or data distributions are not compatible. In this paper, we propose GradSim, a language grouping method based on gradient similarity. Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains compared to other similarity measures and it is better correlated with cross-lingual model performance. As a result, we set the new state of the art on AfriSenti, a benchmark dataset for sentiment analysis on low-resource African languages. In our extensive analysis, we further reveal that besides linguistic features, the topics of the datasets play an important role for language grouping and that lower layers of transformer models encode language-specific features while higher layers capture task-specific information.

MCML Authors
Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[638]
X. Wang and B. Plank.
ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

Label aggregation such as majority voting is commonly used to resolve annotator disagreement in dataset creation. However, this may disregard minority values and opinions. Recent studies indicate that learning from individual annotations outperforms learning from aggregated labels, though they require a considerable amount of annotation. Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement. We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation. By designing and evaluating acquisition functions with annotator-specific heads on two datasets, we show that group-level entropy works generally well on both datasets. Importantly, it achieves performance in terms of both prediction and uncertainty estimation comparable to full-scale training from disagreement, while saving 70% of the annotation budget.

MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[637]
L. Weissweiler, V. Hofmann, A. Kantharuban, A. Cai, R. Dutt, A. Hengle, A. Kabra, A. Kulkarni, A. Vijayakumar, H. Yu, H. Schütze, K. Oflazer and D. Mortensen.
Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko’s (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results—through the lens of morphology—cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[636]
S. Xu, S. T.y.s.s, O. Ichim, I. Risini, B. Plank and M. Grabmair.
From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

In legal NLP, Case Outcome Classification (COC) must not only be accurate but also trustworthy and explainable. Existing work in explainable COC has been limited to annotations by a single expert. However, it is well-known that lawyers may disagree in their assessment of case facts. We hence collect a novel dataset RaVE: Rationale Variation in ECHR, which is obtained from two experts in the domain of international human rights law, for whom we observe weak agreement. We study their disagreements and build a two-level task-independent taxonomy, supplemented with COC-specific subcategories. To our knowledge, this is the first work in the legal NLP that focuses on human label variation. We quantitatively assess different taxonomy categories and find that disagreements mainly stem from underspecification of the legal context, which poses challenges given the typically limited granularity and noise in COC metadata. We further assess the explainablility of state-of-the-art COC models on RaVE and observe limited agreement between models and experts. Overall, our case study reveals hitherto underappreciated complexities in creating benchmark datasets in legal NLP that revolve around identifying aspects of a case’s facts supposedly relevant for its outcome.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[635]
A. H. Kargaran, A. Imani, F. Yvon and H. Schütze.
GlotLID: Language Identification for Low-Resource Languages.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI. GitHub.
Abstract

Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages. However, there is no LID available that (i) covers a wide range of low-resource languages, (ii) is rigorously evaluated and reliable and (iii) efficient and easy to use. Here, we publish GlotLID-M, an LID model that satisfies the desiderata of wide coverage, reliability and efficiency. It identifies 1665 languages, a large increase in coverage compared to prior work. In our experiments, GlotLID-M outperforms four baselines (CLD3, FT176, OpenLID and NLLB) when balancing F1 and false positive rate (FPR). We analyze the unique challenges that low-resource LID poses: incorrect corpus metadata, leakage from high-resource languages, difficulty separating closely related languages, handling of macrolanguage vs varieties and in general noisy data. We hope that integrating GlotLID-M into dataset creation pipelines will improve quality and enhance accessibility of NLP technology for low-resource languages and cultures.

MCML Authors
Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[634]
A. Köksal, T. Schick and H. Schütze.
MEAL: Stable and Active Learning for Few-Shot Prompting.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI. GitHub.
Abstract

Few-shot classification has made great strides due to foundation models that, through priming and prompting, are highly effective few-shot learners. However, this approach has high variance both across different sets of few shots (data selection) and across different finetuning runs (run variability). This is problematic not only because it impedes the fair comparison of different approaches, but especially because it makes few-shot learning too unreliable for many real-world applications. To alleviate these issues, we make two contributions for more stable and effective few-shot learning: First, we propose novel ensembling methods and show that they substantially reduce run variability. Second, we introduce a new active learning (AL) criterion for data selection and present the first AL-based approach specifically tailored towards prompt-based learning. In our experiments, we show that our combined method, MEAL (Multiprompt finetuning and prediction Ensembling with Active Learning), improves overall performance of prompt-based finetuning by 2.3 points on five diverse tasks.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[633]
A. Köksal, O. Yalcin, A. Akbiyik, M. T. Kilavuz, A. Korhonen and H. Schütze.
Language-Agnostic Bias Detection in Language Models with Bias Probing.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI. GitHub.
Abstract

Pretrained language models (PLMs) are key components in NLP, but they contain strong social biases. Quantifying these biases is challenging because current methods focusing on fill-the-mask objectives are sensitive to slight changes in input. To address this, we propose a bias probing technique called LABDet, for evaluating social bias in PLMs with a robust and language-agnostic method. For nationality as a case study, we show that LABDet “surfaces” nationality bias by training a classifier on top of a frozen PLM on non-nationality sentiment detection. We find consistent patterns of nationality bias across monolingual PLMs in six languages that align with historical and political context. We also show for English BERT that bias surfaced by LABDet correlates well with bias in the pretraining data; thus, our work is one of the few studies that directly links pretraining data to PLM behavior. Finally, we verify LABDet’s reliability and applicability to different templates and languages through an extensive set of robustness checks.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[632]
W. Lai, A. Chronopoulou and A. Fraser.
Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

Despite advances in multilingual neural machine translation (MNMT), we argue that there are still two major challenges in this area: data imbalance and representation degeneration. The data imbalance problem refers to the imbalance in the amount of parallel corpora for all language pairs, especially for long-tail languages (i.e., very low-resource languages). The representation degeneration problem refers to the problem of encoded tokens tending to appear only in a small subspace of the full space available to the MNMT model. To solve these two issues, we propose Bi-ACL, a framework which only requires target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model. We define two modules, named bidirectional autoencoder and bidirectional contrastive learning, which we combine with an online constrained beam search and a curriculum learning sampling strategy. Extensive experiments show that our proposed method is more effective than strong baselines both in long-tail languages and in high-resource languages. We also demonstrate that our approach is capable of transferring knowledge between domains and languages in zero-shot scenarios.

MCML Authors
Link to Alexandra Chronopoulou

Alexandra Chronopoulou

Dr.

* Former member

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[631]
Y. Liu, H. Ye, L. Weissweiler, R. Pei and H. Schütze.
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

In comparative linguistics, colexification refers to the phenomenon of a lexical form conveying two or more distinct meanings. Existing work on colexification patterns relies on annotated word lists, limiting scalability and usefulness in NLP. In contrast, we identify colexification patterns of more than 2,000 concepts across 1,335 languages directly from an unannotated parallel corpus. We then propose simple and effective methods to build multilingual graphs from the colexification patterns: ColexNet and ColexNet+. ColexNet’s nodes are concepts and its edges are colexifications. In ColexNet+, concept nodes are additionally linked through intermediate nodes, each representing an ngram in one of 1,334 languages. We use ColexNet+ to train ColexNet+, high-quality multilingual embeddings that are well-suited for transfer learning. In our experiments, we first show that ColexNet achieves high recall on CLICS, a dataset of crosslingual colexifications. We then evaluate ColexNet+ on roundtrip translation, sentence retrieval and sentence classification and show that our embeddings surpass several transfer learning baselines. This demonstrates the benefits of using colexification as a source of information in multilingual NLP.

MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[630]
M. Müller-Eberstein, R. van der Goot, B. Plank and I. Titov.
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

Representational spaces learned via language modeling are fundamental to Natural Language Processing (NLP), however there has been limited understanding regarding how and when during training various types of linguistic information emerge and interact. Leveraging a novel information theoretic probing suite, which enables direct comparisons of not just task performance, but their representational subspaces, we analyze nine tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds. We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize. Across these phases, syntactic knowledge is acquired rapidly after 0.5% of full training. Continued performance improvements primarily stem from the acquisition of open-domain knowledge, while semantics and reasoning tasks benefit from later boosts to long-range contextualization and higher specialization. Measuring cross-task similarity further reveals that linguistically related tasks share information throughout training, and do so more during the critical phase of learning than before or after. Our findings have implications for model interpretability, multi-task learning, and learning from limited data.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[629]
E. Nie, H. Schmid and H. Schütze.
Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
Abstract

Pretrained multilingual encoder models can directly perform zero-shot multilingual tasks or linguistic probing by reformulating the input examples into cloze-style prompts. This is accomplished by predicting the probabilities of the label words at the masked token position, without requiring any updates to the model parameters. However, the performance of this method is limited by the model’s bias toward predicting label words which frequently occurred during the pretraining. These words typically receive high probabilities. To address this issue, we combine the models with calibration techniques which modify the probabilities of label words predicted by the models. We first validate the effectiveness of a proposed simple calibration method together with other existing techniques on monolingual encoders in both zero- and few-shot scenarios. We subsequently employ these calibration techniques on multilingual encoders, resulting in substantial performance improvements across a wide range of tasks.

MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[628]
L. Haliburton, B. Rossmy, A. Schmidt and C. George.
An Exploration of Hidden Data: Identifying and Physicalizing Personal Virtual Data to Extend Co-located Communication.
22nd International Conference on Mobile and Ubiquitous Multimedia (MUM 2023). Vienna, Austria, Dec 03-06, 2023. DOI.
MCML Authors
Link to Luke Haliburton

Luke Haliburton

Dr.

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media


[627]
J. Rausch, G. Rashiti, M. Gusev, C. Zhang and S. Feuerriegel.
DSG: An End-to-End Document Structure Generator.
23rd IEEE International Conference on Data Mining (ICDM 2023). Shanghai, China, Dec 01-04, 2023. DOI.
MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[626]
F. Karl, T. Pielok, J. Moosbauer, F. Pfisterer, S. Coors, M. Binder, L. Schneider, J. Thomas, J. Richter, M. Lang, E. C. Garrido-Merchán, J. Branke and B. Bischl.
Multi-Objective Hyperparameter Optimization in Machine Learning—An Overview.
ACM Transactions on Evolutionary Learning and Optimization 3.4 (Dec. 2023). DOI.
MCML Authors
Link to Florian Karl

Florian Karl

Statistical Learning & Data Science

Link to Tobias Pielok

Tobias Pielok

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[625]
D. Geissler, D. Bär, N. Pröllochs and S. Feuerriegel.
Russian propaganda on social media during the 2022 invasion of Ukraine.
EPJ Data Science (Dec. 2023). DOI.
MCML Authors
Link to Dominique Geißler

Dominique Geißler

Artificial Intelligence in Management

Link to Dominik Bär

Dominik Bär

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[624]
C. Leiber, L. Miklautz, C. Plant and C. Böhm.
Benchmarking Deep Clustering Algorithms With ClustPy.
IEEE International Conference on Data Mining Workshops (ICDMW 2023). Shanghai, China, Dec 01-04, 2023. DOI. GitHub.
MCML Authors
Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining

Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[623]
F. Xu, Y. Shi, P. Ebel, W. Yang and X. Zhu.
Multimodal and Multiresolution Data Fusion for High-Resolution Cloud Removal: A Novel Baseline and Benchmark.
IEEE Transactions on Geoscience and Remote Sensing 62 (Dec. 2023). DOI. GitHub.
Abstract

Cloud removal (CR) is a significant and challenging problem in remote sensing, and in recent years, there have been notable advancements in this area. However, two major issues remain hindering the development of CR: the unavailability of high-resolution imagery for existing datasets and the absence of evaluation regarding the semantic meaningfulness of the generated structures. In this article, we introduce M3R-CR, a benchmark dataset for high-resolution CR with multimodal and multiresolution data fusion. M3R-CR is the first public dataset for CR to feature globally sampled high-resolution optical observations, paired with radar measurements and pixel-level land-cover annotations. With this dataset, we consider the problem of CR in high-resolution optical remote-sensing imagery by integrating multimodal and multiresolution information. In this context, we have to take into account the alignment errors caused by the multiresolution nature, along with the more pronounced misalignment issues in high-resolution images due to inherent imaging mechanism differences and other factors. Existing multimodal data fusion-based methods, which assume the image pairs are aligned accurately at the pixel level, are thus not appropriate for this problem. To this end, we design a new baseline named Align-CR to perform the low-resolution synthetic aperture radar (SAR) image-guided high-resolution optical image CR. It gradually warps and fuses the features of the multimodal and multiresolution data during the reconstruction process, effectively mitigating concerns associated with misalignment. In the experiments, we evaluate the performance of CR by analyzing the quality of visually pleasing textures using image reconstruction (IR) metrics and further analyze the generation of semantically meaningful structures using a well-established semantic segmentation task. The proposed Align-CR method is superior to other baseline methods in both areas.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[622]
H. Boche, A. Fono and G. Kutyniok.
Limitations of Deep Learning for Inverse Problems on Digital Hardware.
IEEE Transactions on Information Theory 69.12 (Dec. 2023). DOI.
MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[621]
A. T. Stüber, S. Coors, B. Schachtner, T. Weber, D. Rügamer, A. Bender, A. Mittermeier, O. Öcal, M. Seidensticker, J. Ricke, B. Bischl and M. Ingrisch.
A comprehensive machine learning benchmark study for radiomics-based survival analysis of CT imaging data in patients with hepatic metastases of CRC.
Investigative Radiology 58.12 (Dec. 2023). DOI.
MCML Authors
Link to Theresa Stüber

Theresa Stüber

Clinical Data Science in Radiology

Link to Balthasar Schachtner

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Andreas Mittermeier

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[620]
D. Strieder and M. Drton.
Confidence in causal inference under structure uncertainty in linear causal models with equal variances.
Journal of Causal Inference 11.1 (Dec. 2023). DOI.
MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[619]
F. Brechtmann, T. Bechtler, S. Londhe, C. Mertes and J. Gagneur.
Evaluation of input data modality choices on functional gene embeddings.
NAR Genomics and Bioinformatics 5.4 (Dec. 2023). DOI.
Abstract

Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein–protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we benchmarked functional gene embeddings obtained from various data modalities for predicting disease-gene lists, cancer drivers, phenotype–gene associations and scores from genome-wide association studies. Off-the-shelf predictors trained on precomputed embeddings matched or outperformed dedicated state-of-the-art predictors, demonstrating their high utility. Embeddings based on literature and protein–protein interactions inferred from low-throughput experiments outperformed embeddings derived from genome-wide experimental data (transcriptomics, deletion screens and protein sequence) when predicting curated gene lists. In contrast, they did not perform better when predicting genome-wide association signals and were biased towards highly-studied genes. These results indicate that embeddings derived from literature and low-throughput experiments appear favourable in many existing benchmarks because they are biased towards well-studied genes and should therefore be considered with caution. Altogether, our study and precomputed embeddings will facilitate the development of machine-learning models in genetics and related fields.

MCML Authors
Link to Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine


[618]
T. Kaufmann, P. Weng, V. Bengs and E. Hüllermeier.
A Survey of Reinforcement Learning from Human Feedback.
Preprint at arXiv (Dec. 2023). arXiv.
Abstract

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in directing the model’s capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

MCML Authors
Link to Timo Kaufmann

Timo Kaufmann

Artificial Intelligence & Machine Learning

Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[617]
Y. Sale, P. Hofman, L. Wimmer, E. Hüllermeier and T. Nagler.
Second-Order Uncertainty Quantification: Variance-Based Measures.
Preprint at arXiv (Dec. 2023). arXiv.
MCML Authors
Link to Paul Hofman

Paul Hofman

Artificial Intelligence & Machine Learning

Link to Lisa Wimmer

Lisa Wimmer

Statistical Learning & Data Science

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning

Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science


[616]
C. A. Scholbeck, J. Moosbauer, G. Casalicchio, H. Gupta, B. Bischl and C. Heumann.
Position Paper: Bridging the Gap Between Machine Learning and Sensitivity Analysis.
Preprint at arXiv (Dec. 2023). arXiv.
MCML Authors
Link to Christian Scholbeck

Christian Scholbeck

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[615]
G. Zhang, J. Bi, J. Gu, Y. Chen and V. Tresp.
SPOT! Revisiting Video-Language Models for Event Understanding.
Preprint at arXiv (Dec. 2023). arXiv.
Abstract

Understanding videos is an important research topic for multimodal learning. Leveraging large-scale datasets of web-crawled video-text pairs as weak supervision has become a pre-training paradigm for learning joint representations and showcased remarkable potential in video understanding tasks. However, videos can be multi-event and multi-grained, while these video-text pairs usually contain only broad-level video captions. This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events? To address this, we introduce SPOT Prober, to benchmark existing video-language models’s capacities of distinguishing event-level discrepancies as an indicator of models’ event understanding ability. Our approach involves extracting events as tuples (<Subject, Predicate, Object, Attribute, Timestamps>) from videos and generating false event tuples by manipulating tuple components systematically. We reevaluate the existing video-language models with these positive and negative captions and find they fail to distinguish most of the manipulated events. Based on our findings, we propose to plug in these manipulated event captions as hard negative samples and find them effective in enhancing models for event understanding.

MCML Authors
Link to Gengyuan Zhang

Gengyuan Zhang

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[614]
Z. Ding, Z. Li, R. Qi, J. Wu, B. He, Y. Ma, Z. Meng, S. Chen, R. Liao, Z. Han and V. Tresp.
FORECASTTKGQUESTIONS: A Benchmark for Temporal Question Answering and Forecasting over Temporal Knowledge Graphs.
22nd International Semantic Web Conference (ISWC 2023). Athens, Greeke, Nov 06-11, 2023. DOI.
Abstract

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. Previous related works aim to develop QA systems that answer temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning this period can be fully used for inference. In real-world scenarios, however, it is common that given knowledge until the current instance, we wish the TKGQA systems to answer the questions asking about future. As humans constantly plan the future, building forecasting TKGQA systems is important. In this paper, we propose a novel task: forecasting TKGQA, and propose a coupled large-scale TKGQA benchmark dataset, i.e., FORECASTTKGQUESTIONS. It includes three types of forecasting questions, i.e., entity prediction, yes-unknown, and fact reasoning questions. For every question, a timestamp is annotated and QA models only have access to TKG information prior to it for answer inference. We find that previous TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-unknown and fact reasoning questions. To this end, we propose FORECASTTKGQA, a TKGQA model that employs a TKG forecasting module for future inference. Experiments show that it performs well in forecasting TKGQA.

MCML Authors
Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Zongyue Li

Zongyue Li

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Shuo Chen

Shuo Chen

Database Systems & Data Mining

Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[613]
D. Rügamer, F. Pfisterer, B. Bischl and B. Grün.
Mixture of Experts Distributional Regression: Implementation Using Robust Estimation with Adaptive First-order Methods.
Advances in Statistical Analysis (Nov. 2023). DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[612]
B. H. Lang, S. Nyholm and J. Blumenthal-Barby.
Responsibility Gaps and Black Box Healthcare Ai: Shared Responsibilization as a Solution.
Digital Society 2.52 (Nov. 2023). DOI.
MCML Authors
Link to Sven Nyholm

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence


[611]
L. Bothmann, L. Wimmer, O. Charrakh, T. Weber, H. Edelhoff, W. Peters, H. Nguyen, C. Benjamin and A. Menzel.
Automated wildlife image classification: An active learning tool for ecological applications.
Ecological Informatics 77 (Nov. 2023). DOI.
MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

Link to Lisa Wimmer

Lisa Wimmer

Statistical Learning & Data Science


[610]
C. Wachinger, T. N. Wolf and S. Pölsterl.
Deep learning for the prediction of type 2 diabetes mellitus from neck-to-knee Dixon MRI in the UK biobank.
Heliyon 9.11 (Nov. 2023). DOI.
MCML Authors
Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology

Link to Tom Nuno Wolf

Tom Nuno Wolf

Artificial Intelligence in Radiology


[609]
S. Feuerriegel, R. DiResta, J. A. Goldstein, S. Kumar, P. Lorenz-Spreen, M. Tomz and N. Pröllochs.
Research can help to tackle AI-generated disinformation.
Nature Human Behaviour 7 (Nov. 2023). DOI.
MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[608]
A. Köksal, R. Aksitov and C.-C. Chang.
Hallucination Augmented Recitations for Language Models.
Preprint at arXiv (Nov. 2023). arXiv.
MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning


[607]
W. Lai, V. Hangya and A. Fraser.
Extending Multilingual Machine Translation through Imitation Learning.
Preprint at arXiv (Nov. 2023). arXiv.
Abstract

Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world’s languages are still being left behind. We aim to extend large-scale MNMT models to a new language, allowing for translation between the newly added and all of the already supported languages in a challenging scenario: using only a parallel corpus between the new language and English. Previous approaches, such as continued training on parallel data including the new language, suffer from catastrophic forgetting (i.e., performance on other languages is reduced). Our novel approach Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert, a technique widely used in the computer vision area, but not well explored in NLP. More specifically, we construct a pseudo multi-parallel corpus of the new and the original languages by pivoting through English, and imitate the output distribution of the original MNMT model. Extensive experiments show that our approach significantly improves the translation performance between the new and the original languages, without severe catastrophic forgetting. We also demonstrate that our approach is capable of solving copy and off-target problems, which are two common issues existence in current large-scale MNMT models.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[606]
A. Scagliotti and S. Farinelli.
Normalizing flows as approximations of optimal transport maps via linear-control neural ODEs.
Preprint at arXiv (Nov. 2023). arXiv.
MCML Authors
Link to Alessandro Scagliotti

Alessandro Scagliotti

Applied Numerical Analysis


[605]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Unreading Race: Purging Protected Features from Chest X-ray Embeddings.
Under review. Preprint at arXiv (Nov. 2023). arXiv.
MCML Authors
Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[604]
A. Maldonado, L. Zellner, S. Strickroth and T. Seidl.
Process Mining Techniques for Collusion Detection in Online Exams.
2nd International Workshop 'Education meets Process Mining' (EduPM 2023) organized with the 5th International Conference on Process Mining (ICPM 2023). Rome, Italy, Oct 23-27, 2023. DOI.
MCML Authors
Link to Andrea Maldonado

Andrea Maldonado

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[603]
A. Maldonado, G. M. Tavares, R. Oyamada, P. Ceravolo and T. Seidl.
FEEED: Feature Extraction from Event Data.
Doctoral Consortium at the 5th International Conference on Process Mining (ICPM 2023). Rome, Italy, Oct 23-27, 2023. PDF.
Abstract

The analysis of event data is largely influenced by the effective characterization of descriptors. These descriptors serve as the building blocks of our understanding, encapsulating the behavior described within the event data. In light of these considerations, we introduce FEEED (Feature Extraction from Event Data), an extendable tool for event data feature extraction. FEEED represents a significant advancement in event data behavior analysis, offering a range of features to empower analysts and data scientists in their pursuit of insightful, actionable, and understandable event data analysis. What sets FEEED apart is its unique capacity to act as a bridge between the worlds of data mining and process mining. In doing so, it promises to enhance the accuracy, comprehensiveness, and utility of characterizing event data for a diverse range of applications.

MCML Authors
Link to Andrea Maldonado

Andrea Maldonado

Database Systems & Data Mining

Link to Gabriel Marques Tavares

Gabriel Marques Tavares

Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[602]
C. Leiber, L. Miklautz, C. Plant and C. Böhm.
Application of Deep Clustering Algorithms.
32nd ACM International Conference on Information and Knowledge Management (CIKM 2023). Birmingham, UK, Oct 21-25, 2023. DOI.
MCML Authors
Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining

Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[601]
Y. Xin, X. Zuo, D. Lu and S. Leutenegger.
SimpleMapping: Real-time visual-inertial dense mapping with deep multi-view stereo.
IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR 2023). Sydney, Australia, Oct 16-20, 2023. DOI.
MCML Authors
Link to Xingxing Zuo

Xingxing Zuo

Dr.

Machine Learning for Robotics

Link to Stefan Leutenegger

Stefan Leutenegger

Prof. Dr.

Machine Learning for Robotics


[600]
L. Miklautz, A. Shkabrii, C. Leiber, B. Tobias, B. Seidl, E. Weissensteiner, A. Rausch, C. Böhm and C. Plant.
Non-Redundant Image Clustering of Early Medieval Glass Beads.
10th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2023). Thessaloniki, Greece, Oct 09-13, 2023. DOI.
MCML Authors
Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining

Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[599]
J. Hanselle, J. Fürnkranz and E. Hüllermeier.
Probabilistic Scoring Lists for Interpretable Machine Learning.
26th International Conference on Discovery Science (DS 2023). Porto, Portugal, Oct 09-11, 2023. DOI.
MCML Authors
Link to Jonas Hanselle

Jonas Hanselle

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[598]
J. Brandt, E. Schede, S. Sharma, V. Bengs, E. Hüllermeier and K. Tierney.
Contextual Preselection Methods in Pool-based Realtime Algorithm Configuration.
Conference on Lernen. Wissen. Daten. Analysen (LWDA 2023). Marburg, Germany, Oct 09-11, 2023. PDF.
MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[597]
J. Hanselle, J. Kornowicz, S. Heid, K. Thommes and E. Hüllermeier.
Comparing Humans and Algorithms in Feature Ranking: A Case-Study in the Medical Domain.
Conference on Lernen. Wissen. Daten. Analysen (LWDA 2023). Marburg, Germany, Oct 09-11, 2023. PDF.
MCML Authors
Link to Jonas Hanselle

Jonas Hanselle

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[596]
N. Stolt-Ansó, J. McGinnis, J. Pan, K. Hammernik and D. Rückert.
NISF: Neural implicit segmentation functions.
26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI.
Abstract

Segmentation of anatomical shapes from medical images has taken an important role in the automation of clinical measurements. While typical deep-learning segmentation approaches are performed on discrete voxels, the underlying objects being analysed exist in a real-valued continuous space. Approaches that rely on convolutional neural networks (CNNs) are limited to grid-like inputs and not easily applicable to sparse or partial measurements. We propose a novel family of image segmentation models that tackle many of CNNs’ shortcomings: Neural Implicit Segmentation Functions (NISF). Our framework takes inspiration from the field of neural implicit functions where a network learns a mapping from a real-valued coordinate-space to a shape representation. NISFs have the ability to segment anatomical shapes in high-dimensional continuous spaces. Training is not limited to voxelized grids, and covers applications with sparse and partial data. Interpolation between observations is learnt naturally in the training procedure and requires no post-processing. Furthermore, NISFs allow the leveraging of learnt shape priors to make predictions for regions outside of the original image plane. We go on to show the framework achieves dice scores of on a (3D+t) short-axis cardiac segmentation task using the UK Biobank dataset. We also provide a qualitative analysis on our frameworks ability to perform segmentation and image interpolation on unseen regions of an image volume at arbitrary resolutions.

MCML Authors
Link to Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[595]
Y. Yeganeh, A. Farshad and N. Navab.
Anatomy-Aware Masking for Inpainting in Medical Imaging.
3rd Workshop on Shape in Medical Imaging (ShapeMI 2023) at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI.
MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[594]
L. Haliburton, S. Kheirinejad, A. Schmidt and S. Mayer.
Exploring Smart Standing Desks to Foster a Healthier Workplace.
ACM Conference on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT 2023). Cancun, Mexico, Oct 08-12, 2023. DOI.
MCML Authors
Link to Luke Haliburton

Luke Haliburton

Dr.

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[593]
M. Zaiss, H. N. Dang, V. Golkov, J. R. Rajput, D. Cremers, F. Knoll and A. Maier.
GPT4MR: Exploring GPT-4 as an MR Sequence and Reconstruction Programming Assistant.
39th Annual Meeting of the European Society for Magnetic Resonance in Medicine and Biology (ESMRMB 2023). Basel, Switzerland, Oct 04-07, 2023. URL.
Abstract

In this study, we explore the potential of generative pre-trained transformer (GPT), as a coding assistant for MRI sequence programming using the Pulseq framework. The programming of MRI sequences is traditionally a complex and time-consuming task, and the Pulseq standard has recently simplified this process. It allows researchers to define and generate complex pulse sequences used in MRI experiments. Leveraging GPT-4’s capabilities in natural language generation, we adapted it for MRI sequence programming, creating a specialized assistant named GPT4MR. Our tests involved generating various MRI sequences, revealing that GPT-4, guided by a tailored prompt, outperformed GPT-3.5, producing fewer errors and demonstrating improved reasoning. Despite limitations in handling complex sequences, GPT4MR corrected its own errors and successfully generated code with step-by-step instructions. The study showcases GPT4MR’s ability to accelerate MRI sequence development, even for novel ideas absent in its training set. While further research and improvement are needed to address complexity limitations, a well-designed prompt enhances performance. The findings propose GPT4MR as a valuable MRI sequence programming assistant, streamlining prototyping and development. The future prospect involves integrating a PyPulseq plugin into lightweight, open-source LLMs, potentially revolutionizing MRI sequence development and prototyping.

MCML Authors
Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[592]
M. Bernhard, N. Strauß and M. Schubert.
MapFormer: Boosting Change Detection by Using Pre-change Information.
IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Maximilian Bernhard

Maximilian Bernhard

Database Systems & Data Mining

Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[591]
H. Chen, A. Frikha, D. Krompass, J. Gu and V. Tresp.
FRAug: Tackling Federated Learning with Non-IID Features via Representation Augmentation.
IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[590]
M. B. Colomer, P. L. Dovesi, T. Panagiotakopoulos, J. F. Carvalho, L. Härenstam-Nielsen, H. Azizpour, H. Kjellström, D. Cremers and M. Poggi.
To adapt or not to adapt? Real-time adaptation for semantic segmentation.
IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[589]
M. Gao, P. Roetzer, M. Eisenberger, Z. Lähner, M. Moeller, D. Cremers and F. Bernard.
ΣIGMA: Scale-Invariant Global Sparse Shape Matching.
IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Maolin Gao

Maolin Gao

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[588]
H. Li, J. Gu, R. Koner, S. Sharifzadeh and V. Tresp.
Do DALL-E and Flamingo Understand Each Other?.
IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Hang Li

Hang Li

Database Systems & Data Mining

Link to Rajat Koner

Rajat Koner

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[587]
H. Li, J. Dong, B. Wen, M. Gao, T. Huang, Y.-H. Liu and D. Cremers.
DDIT: Semantic Scene Completion via Deformable Deep Implicit Templates.
IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Haoang Li

Haoang Li

Dr.

* Former member

Link to Maolin Gao

Maolin Gao

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[586]
Y. Xia, M. Gladkova, R. Wang, Q. Li, U. Stilla, J. F. Henriques and D. Cremers.
CASSPR: Cross Attention Single Scan Place Recognition.
IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Yan Xia

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Mariia Gladkova

Mariia Gladkova

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[585]
Y. Yeganeh, A. Farshad, P. Weinberger, S.-A. Ahmadi, E. Adeli and N. Navab.
Transformers pay attention to convolutions leveraging emerging properties of vits by dual attention-image network.
IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[584]
G. Zhang, J. Ren, J. Gu and V. Tresp.
Multi-event Video-Text Retrieval.
IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI. GitHub.
MCML Authors
Link to Gengyuan Zhang

Gengyuan Zhang

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[583]
A. Farshad, Y. Yeganeh, Y. Chi, C. Shen, B. Ommer and N. Navab.
Scenegenie: Scene graph guided diffusion models for image synthesis.
Workshops at the IEEE/CVF International Conference on Computer Vision (ICCV 2023). Paris, France, Oct 02-06, 2023. DOI.
MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[582]
A. Scagliotti and P. Colli Franzone.
A subgradient method with constant step-size for l1-composite optimization.
Bolletino dell Unione Matematica Italiana (Oct. 2023). DOI.
MCML Authors
Link to Alessandro Scagliotti

Alessandro Scagliotti

Applied Numerical Analysis


[581]
K. Riedl.
Leveraging Memory Effects and Gradient Information in Consensus-Based Optimisation: On Global Convergence in Mean-Field Law.
European Journal of Applied Mathematics (Oct. 2023). DOI.
MCML Authors
Link to Konstantin Riedl

Konstantin Riedl

Applied Numerical Analysis


[580]
L. Weissweiler, V. Hofmann, A. Köksal and H. Schütze.
Explaining pretrained language models' understanding of linguistic structures using construction grammar.
Frontiers in Artificial Intelligence 6 (Oct. 2023). DOI.
Abstract

Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasizing the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step toward assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models’ behavior in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs, as well as OPT, are able to recognize the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[579]
J. Pan, C. Zhou, M. Gladkova, Q. Khan and D. Cremers.
Robust Autonomous Vehicle Pursuit without Expert Steering Labels.
IEEE Robotics and Automation Letters 8.10 (Oct. 2023). DOI.
MCML Authors
Link to Mariia Gladkova

Mariia Gladkova

Computer Vision & Artificial Intelligence

Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[578]
T. Beker, H. Ansari, S. Montazeri, Q. Song and X. Zhu.
Deep Learning for Subtle Volcanic Deformation Detection With InSAR Data in Central Volcanic Zone.
IEEE Transactions on Geoscience and Remote Sensing 61 (Oct. 2023). DOI.
Abstract

Subtle volcanic deformations point to volcanic activities, and monitoring them helps predict eruptions. Today, it is possible to remotely detect volcanic deformation in mm/year scale thanks to advances in interferometric synthetic aperture radar (InSAR). This article proposes a framework based on a deep learning model to automatically discriminate subtle volcanic deformations from other deformation types in five-year-long InSAR stacks. Models are trained on a synthetic training set. To better understand and improve the models, explainable artificial intelligence (AI) analyses are performed. In initial models, Gradient-weighted Class Activation Mapping (Grad-CAM) linked new-found patterns of slope processes and salt lake deformations to false-positive detections. The models are then improved by fine-tuning (FT) with a hybrid synthetic-real data, and additional performance is extracted by low-pass spatial filtering (LSF) of the real test set. The t-distributed stochastic neighbor embedding (t-SNE) latent feature visualization confirmed the similarity and shortcomings of the FT set, highlighting the problem of elevation components in residual tropospheric noise. After fine-tuning, all the volcanic deformations are detected, including the smallest one, Lazufre, deforming 5 mm/year. The first time confirmed deformation of Cerro El Condor is observed, deforming 9.9–17.5 mm/year. Finally, sensitivity analysis uncovered the model’s minimal detectable deformation of 2 mm/year.

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[577]
S. Chen, Y. Shi, Z. Xiong and X. Zhu.
HTC-DC Net: Monocular Height Estimation From Single Remote Sensing Images.
IEEE Transactions on Geoscience and Remote Sensing 61 (Oct. 2023). DOI. GitHub.
Abstract

Three-dimensional geoinformation is of great significance for understanding the living environment; however, 3-D perception from remote sensing data, especially on a large scale, is restricted, mainly due to the high costs of 3-D sensors such as light detection and ranging (LiDAR). To tackle this problem, we propose a method for monocular height estimation from optical imagery, which is currently one of the richest sources of remote sensing data. As an ill-posed problem, monocular height estimation requires well-designed networks for enhanced representations to improve the performance. Moreover, the distribution of height values is long-tailed with the low-height pixels, e.g., the background (BG), as the head, and thus, trained networks are usually biased and tend to underestimate building heights. To solve the problems, instead of formalizing the problem as a regression task, we propose HTC-DC Net following the classification–regression paradigm, with the head-tail cut (HTC) and the distribution-based constraints (DCs) as the main contributions. HTC-DC Net is composed of the backbone network as the feature extractor, the HTC-AdaBins module, and the hybrid regression process. The HTC-AdaBins module serves as the classification phase to determine bins adaptive to each input image. It is equipped with a vision transformer (ViT) encoder to incorporate local context with holistic information and involves an HTC to address the long-tailed problem in monocular height estimation for balancing the performances of foreground (FG) and BG pixels. The hybrid regression process does the regression via the smoothing of bins from the classification phase, which is trained via DCs. The proposed network is tested on three datasets of different resolutions, namely ISPRS Vaihingen (0.09 m), Data Fusion Contest 19 (DFC19) (1.3 m), and Global Building Height (GBH) (3 m). The experimental results show the superiority of the proposed network over existing methods by large margins. Extensive ablation studies demonstrate the effectiveness of each design component.

MCML Authors
Link to Sining Chen

Sining Chen

Data Science in Earth Observation

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[576]
F. Zhou, X. Sun, C. Sun, J. Dong and X. Zhu.
Adaptive Morphology Filter: A Lightweight Module for Deep Hyperspectral Image Classification.
IEEE Transactions on Geoscience and Remote Sensing 61 (Oct. 2023). DOI. GitHub.
Abstract

Deep neural network models significantly outperform classical algorithms in the hyperspectral image (HSI) classification task. These deep models improve generalization but incur significant computational demands. This article endeavors to alleviate the computational distress in a depthwise manner through the use of morphological operations. We propose the adaptive morphology filter (AMF) to effectively extract spatial features like the conventional depthwise convolution layer. Furthermore, we reparameterize AMF into its equivalent form, i.e., a traditional binary morphology filter, which drastically reduces the number of parameters in the inference phase. Finally, we stack multiple AMFs to achieve a large receptive field and construct a lightweight AMNet for classifying HSIs. It is noteworthy that we prove the deep stack of depthwise AMFs to be equivalent to structural element decomposition. We test our model on five benchmark datasets. Experiments show that our approach outperforms state-of-the-art methods with fewer parameters (≈10k).

MCML Authors
Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[575]
J. Külz, M. Mayer and M. Althoff.
Timor Python: A Toolbox for Industrial Modular Robotics.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023). Detroit, MI, USA, Oct 01-05, 2023. DOI.
Abstract

Modular Reconfigurable Robots (MRRs) represent an exciting path forward for industrial robotics, opening up new possibilities for robot design. Compared to monolithic manipulators, they promise greater flexibility, improved maintainability, and cost-efficiency. However, there is no tool or standardized way to model and simulate assemblies of modules in the same way it has been done for robotic manipulators for decades. We introduce the Toolbox for Industrial Modular Robotics (Timor), a Python toolbox to bridge this gap and integrate modular robotics into existing simulation and optimization pipelines. Our open-source library offers model generation and task-based configuration optimization for MRRs. It can easily be integrated with existing simulation tools - not least by offering URDF export of arbitrary modular robot assemblies. Moreover, our experimental study demonstrates the effectiveness of Timor as a tool for designing modular robots optimized for specific use cases.

MCML Authors
Link to Jonathan Külz

Jonathan Külz

Cyber Physical Systems

Link to Matthias Althoff

Matthias Althoff

Prof. Dr.

Cyber Physical Systems


[574]
Y. R. Shrestha, G. von Krogh and S. Feuerriegel.
Building open-source AI.
Nature Computational Science 3.11 (Oct. 2023). DOI.
MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[573]
M. Fornasier, P. Richtárik, K. Riedl and L. Sun.
Consensus-Based Optimization with Truncated Noise.
Preprint at arXiv (Oct. 2023). arXiv.
MCML Authors
Link to Massimo Fornasier

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis

Link to Konstantin Riedl

Konstantin Riedl

Applied Numerical Analysis


[572]
J. Gauss, F. Scheipl and M. Herrmann.
DCSI–An improved measure of cluster separability based on separation and connectedness.
Preprint at arXiv (Oct. 2023). arXiv.
Abstract

Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness, and neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate them. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted Rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not correspond to meaningful density-based clusters.

MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine


[571]
R. Hornung, M. Nalenz, L. Schneider, A. Bender, L. Bothmann, B. Bischl, T. Augustin and A.-L. Boulesteix.
Evaluating machine learning models in non-standard settings: An overview and new findings.
Preprint at arXiv (Oct. 2023). arXiv.
MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[570]
H. Löwe, C. A. Scholbeck, C. Heumann, B. Bischl and G. Casalicchio.
fmeffects: An R Package for Forward Marginal Effects.
Preprint at arXiv (Oct. 2023). arXiv.
MCML Authors
Link to Christian Scholbeck

Christian Scholbeck

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[569]
Y. Ma, D. Frauen, V. Melnychuk and S. Feuerriegel.
Counterfactual Fairness for Predictions using Generative Adversarial Networks.
Preprint at arXiv (Oct. 2023). arXiv.
Abstract

Fairness in predictions is of direct importance in practice due to legal, ethical, and societal reasons. It is often achieved through counterfactual fairness, which ensures that the prediction for an individual is the same as that in a counterfactual world under a different sensitive attribute. However, achieving counterfactual fairness is challenging as counterfactuals are unobservable. In this paper, we develop a novel deep neural network called Generative Counterfactual Fairness Network (GCFN) for making predictions under counterfactual fairness. Specifically, we leverage a tailored generative adversarial network to directly learn the counterfactual distribution of the descendants of the sensitive attribute, which we then use to enforce fair predictions through a novel counterfactual mediator regularization. If the counterfactual distribution is learned sufficiently well, our method is mathematically guaranteed to ensure the notion of counterfactual fairness. Thereby, our GCFN addresses key shortcomings of existing baselines that are based on inferring latent variables, yet which (a) are potentially correlated with the sensitive attributes and thus lead to bias, and (b) have weak capability in constructing latent representations and thus low prediction performance. Across various experiments, our method achieves state-of-the-art performance. Using a real-world case study from recidivism prediction, we further demonstrate that our method makes meaningful predictions in practice.

MCML Authors
Link to Yuchen Ma

Yuchen Ma

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[568]
Y. Shen, R. Liao, Z. Han, Y. Ma and V. Tresp.
GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models.
Preprint at arXiv (Oct. 2023). arXiv.
MCML Authors
Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[567]
F. Bongratz, A.-M. Rickmann and C. Wachinger.
Abdominal organ segmentation via deep diffeomorphic mesh deformations.
Scientific Reports 13.1 (Oct. 2023). DOI.
MCML Authors
Link to Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Radiology

Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology


[566]
J. Smids, H. Berkers, P. Le Blanc, S. Rispens and S. Nyholm.
Employers Have a Duty of Beneficence to Design for Meaningful Work: A General Argument and Logistics Warehouses as a Case Study.
The Journal of Ethics (Oct. 2023). DOI.
MCML Authors
Link to Sven Nyholm

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence


[565]
L. Bothmann, S. Dandl and M. Schomaker.
Causal Fair Machine Learning via Rank-Preserving Interventional Distributions.
1st Workshop on Fairness and Bias in AI (AEQUITAS 2023) co-located with the 26th European Conference on Artificial Intelligence (ECAI 2023). Kraków, Poland, Sep 30-Oct 04, 2023. PDF.
MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[564]
D. Winkel, N. Strauß, M. Schubert and T. Seidl.
Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning.
26th European Conference on Artificial Intelligence (ECAI 2023). Kraków, Poland, Sep 30-Oct 04, 2023. DOI.
MCML Authors
Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[563]
J. Herbinger, S. Dandl, F. K. Ewald, S. Loibl and G. Casalicchio.
Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation.
3rd International Workshop on Explainable and Interpretable Machine Learning (XI-ML 2023) co-located with the 26th European Conference on Artificial Intelligence (ECAI 2023). Kraków, Poland, Sep 30-Oct 04, 2023. DOI.
MCML Authors
Link to Fiona Ewald

Fiona Ewald

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[562]
Y. Ma, Q. Khan and D. Cremers.
Multi Agent Navigation in Unconstrained Environments Using a Centralized Attention Based Graphical Neural Network Controller.
26th IEEE International Conference on Intelligent Transportation (ITSC 2023). Bilbao, Spain, Sep 24-28, 2023. DOI.
MCML Authors
Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[561]
J. Schmidt, Q. Khan and D. Cremers.
LiDAR View Synthesis for Robust Vehicle Navigation Without Expert Labels.
26th IEEE International Conference on Intelligent Transportation (ITSC 2023). Bilbao, Spain, Sep 24-28, 2023. DOI.
MCML Authors
Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[560]
L. Bothmann, S. Strickroth, G. Casalicchio, D. Rügamer, M. Lindauer, F. Scheipl and B. Bischl.
Developing Open Source Educational Resources for Machine Learning and Data Science.
3rd Teaching Machine Learning and Artificial Intelligence Workshop. Grenoble, France, Sep 19-23, 2023. URL.
MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[559]
B. Ma, E. Nie, H. Schmid and H. Schütze.
Is Prompt-Based Finetuning Always Better than Vanilla Finetuning? Insights from Cross-Lingual Language Understanding.
19th Conference on Natural Language Processing (KONVENS 2023). Ingolstadt, Germany, Sep 18-22, 2023. URL.
Abstract

Multilingual pretrained language models (MPLMs) have demonstrated substantial performance improvements in zero-shot cross-lingual transfer across various natural language understanding tasks by finetuning MPLMs on task-specific labelled data of a source language (e.g. English) and evaluating on a wide range of target languages. Recent studies show that prompt-based finetuning surpasses regular finetuning in few-shot scenarios. However, the exploration of prompt-based learning in multilingual tasks remains limited. In this study, we propose the PROFIT pipeline to investigate the cross-lingual capabilities of Prompt- based Finetuning. We conduct comprehensive experiments on diverse cross-lingual language understanding tasks (sentiment classification, paraphrase identification, and natural language inference) and empirically analyze the variation trends of prompt-based finetuning performance in cross-lingual transfer across different few-shot and full-data settings. Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding. Our findings indicate that prompt-based finetuning outperforms vanilla finetuning in full-data scenarios and exhibits greater advantages in few-shot scenarios, with different performance patterns dependent on task types. Additionally, we analyze underlying factors such as language similarity and pretraining data size that impact the cross-lingual performance of prompt-based finetuning. Overall, our work provides valuable insights into the cross-lingual prowess of prompt-based finetuning.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[558]
S. Urchs, V. Thurner, M. Aßenmacher, C. Heumann and S. Thiemichen.
How Prevalent is Gender Bias in ChatGPT? - Exploring German and English ChatGPT Responses.
1st Workshop on Biased Data in Conversational Agents (BDCA 2023) co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. arXiv.
MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[557]
I. T. Öztürk, R. Nedelchev, C. Heumann, E. Garces Arias, M. Roger, B. Bischl and M. Aßenmacher.
How Different Is Stereotypical Bias Across Languages?.
3rd Workshop on Bias and Fairness in AI (BIAS 2023) co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. arXiv.
MCML Authors
Link to Esteban Garces Arias

Esteban Garces Arias

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[556]
M. Aßenmacher, L. Rauch, J. Goschenhofer, A. Stephan, B. Bischl, B. Roth and B. Sick.
Towards Enhancing Deep Active Learning with Weak Supervision and Constrained Clustering.
7th International Workshop on Interactive Adaptive Learning (IAL 2023) at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. PDF.
Abstract

Three fields revolving around the question of how to cope with limited amounts of labeled data are Deep Active Learning (DAL), deep Constrained Clustering (CC), and Weakly Supervised Learning (WSL). DAL tackles the problem by adaptively posing the question of which data samples to annotate next in order to achieve the best incremental learning improvement, although it suffers from several limitations that hinder its deployment in practical settings. We point out how CC algorithms and WSL could be employed to overcome these limitations and increase the practical applicability of DAL research. Specifically, we discuss the opportunities to use the class discovery capabilities of CC and the possibility of further reducing human annotation efforts by utilizing WSL. We argue that the practical applicability of DAL algorithms will benefit from employing CC and WSL methods for the learning and labeling process. We inspect the overlaps between the three research areas and identify relevant and exciting research questions at the intersection of these areas.

MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[555]
S. Dandl, G. Casalicchio, B. Bischl and L. Bothmann.
Interpretable Regional Descriptors: Hyperbox-Based Local Explanations.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI.
MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[554]
Z. Ding, J. Wu, Z. Li, Y. Ma and V. Tresp.
Improving Few-Shot Inductive Learning on Temporal Knowledge Graphs Using Confidence-Augmented Reinforcement Learning.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI. GitHub.
Abstract

Temporal knowledge graph completion (TKGC) aims to predict the missing links among the entities in a temporal knowledge graph (TKG). Most previous TKGC methods only consider predicting the missing links among the entities seen in the training set, while they are unable to achieve great performance in link prediction concerning newly-emerged unseen entities. Recently, a new task, i.e., TKG few-shot out-of-graph (OOG) link prediction, is proposed, where TKGC models are required to achieve great link prediction performance concerning newly-emerged entities that only have few-shot observed examples. In this work, we propose a TKGC method FITCARL that combines few-shot learning with reinforcement learning to solve this task. In FITCARL, an agent traverses through the whole TKG to search for the prediction answer. A policy network is designed to guide the search process based on the traversed path. To better address the data scarcity problem in the few-shot setting, we introduce a module that computes the confidence of each candidate action and integrate it into the policy for action selection. We also exploit the entity concept information with a novel concept regularizer to boost model performance. Experimental results show that FITCARL achieves stat-of-the-art performance on TKG few-shot OOG link prediction.

MCML Authors
Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Zongyue Li

Zongyue Li

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[553]
S. Gilhuber, J. Busch, D. Rotthues, C. M. M. Frey and T. Seidl.
DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI.
MCML Authors
Link to Christian Frey

Christian Frey

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[552]
S. Gilhuber, R. Hvingelby, M. L. A. Fok and T. Seidl.
How to Overcome Confirmation Bias in Semi-Supervised Image Classification by Active Learning.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI.
MCML Authors
Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[551]
S. Haas and E. Hüllermeier.
Rectifying Bias in Ordinal Observational Data Using Unimodal Label Smoothing.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI.
MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[550]
M. Klein, C. Leiber and C. Böhm.
k-SubMix: Common Subspace Clustering on Mixed-Type Data.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI.
MCML Authors
Link to Mauritius Klein

Mauritius Klein

Dr.

* Former member

Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining

Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[549]
M. Muschalik, F. Fumagalli, B. Hammer and E. Hüllermeier.
iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI.
MCML Authors
Link to Maximilian Muschalik

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[548]
L. Rauch, M. Aßenmacher, D. Huseljic, M. Wirth, B. Bischl and B. Sick.
ActiveGLAE: A Benchmark for Deep Active Learning with Transformers.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI.
MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[547]
J. G. Wiese, L. Wimmer, T. Papamarkou, B. Bischl, S. Günnemann and D. Rügamer.
Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. Best paper award. DOI.
MCML Authors
Link to Lisa Wimmer

Lisa Wimmer

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[546]
F. Hoppe, C. M. Verdun, H. Laus, F. Krahmer and H. Rauhut.
Uncertainty Quantification For Learned ISTA.
IEEE Workshop on Machine Learning for Signal Processing (MLSP 2023). Rome, Italy, Sep 17-20, 2023. DOI.
MCML Authors
Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member

Link to Hannah Laus

Hannah Laus

Optimization & Data Analysis

Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Holger Rauhut

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence


[545]
Ç. Yapar, F. Jaensch, L. Ron, G. Kutyniok and G. Caire.
Overview of the Urban Wireless Localization Competition.
IEEE Workshop on Machine Learning for Signal Processing (MLSP 2023). Rome, Italy, Sep 17-20, 2023. DOI.
MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[544]
A. Javanmardi, Y. Sale, P. Hofman and E. Hüllermeier.
Conformal Prediction with Partially Labeled Data.
12th Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2023). Limassol, Cyprus, Sep 13-15, 2023. URL.
MCML Authors
Link to Alireza Javanmardi

Alireza Javanmardi

Artificial Intelligence & Machine Learning

Link to Paul Hofman

Paul Hofman

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[543]
S. F. Fischer, L. Harutyunyan, M. Feurer and B. Bischl.
OpenML-CTR23 - A curated tabular regression benchmarking suite.
International Conference on Automated Machine Learning (AutoML 2023) - Workshop Track. Berlin, Germany, Sep 12-15, 2023. URL.
MCML Authors
Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[542]
L. O. Purucker, L. Schneider, M. Anastacio, J. Beel, B. Bischl and H. Hoos.
Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML.
International Conference on Automated Machine Learning (AutoML 2023). Berlin, Germany, Sep 12-15, 2023. URL.
MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[541]
S. Segel, H. Graf, A. Tornede, B. Bischl and M. Lindauer.
Symbolic Explanations for Hyperparameter Optimization.
International Conference on Automated Machine Learning (AutoML 2023). Berlin, Germany, Sep 12-15, 2023. URL.
Abstract

Hyperparameter optimization (HPO) methods can determine well-performing hyperparameter configurations efficiently but often lack insights and transparency. We propose to apply symbolic regression to meta-data collected with Bayesian optimization (BO) during HPO. In contrast to prior approaches explaining the effects of hyperparameters on model performance, symbolic regression allows for obtaining explicit formulas quantifying the relation between hyperparameter values and model performance. Overall, our approach aims to make the HPO process more explainable and human-centered, addressing the needs of multiple user groups: First, providing insights into the HPO process can support data scientists and machine learning practitioners in their decisions when using and interacting with HPO tools. Second, obtaining explicit formulas and inspecting their properties could help researchers understand the HPO loss landscape better. In an experimental evaluation, we find that naively applying symbolic regression directly to meta-data collected during HPO is affected by the sampling bias introduced by BO. However, the true underlying loss landscape can be approximated by fitting the symbolic regression on the surrogate model trained during BO. By penalizing longer formulas, symbolic regression furthermore allows the user to decide how to balance the accuracy and explainability of the resulting formulas.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[540]
A. Maronikolakis, P. O’Grady, H. Schütze and M. Lyra.
Improving Few-Shot Learning with Multilingual Transfer and Monte Carlo Training Set Selection.
CLASP Conference on Learning with Small Data (LSD 2023). Gothenburg, Sweden, Sep 11-12, 2023. URL.
Abstract

In industry settings, machine learning is an attractive tool to automatize processes. Unfortunately, annotated and high-quality data is expensive to source. This problem is exacerbated in settings spanning multiple markets and languages. Thus, developing solutions for multilingual tasks with little available data is challenging. Few-shot learning is a compelling approach when building solutions in multilingual and low-resource settings, since the method not only requires just a few training examples to achieve high performance, but is also a technique agnostic to language. Even though the technique can be applied to multilingual settings, optimizing performance is an open question. In our work we show that leveraging higher-resource, task-specific language data can boost overall performance and we propose a method to select training examples per their average performance in a Monte Carlo simulation, resulting in a training set more conducive to learning. We demonstrate the effectiveness of our methods in fashion text reviews moderation, classifying reviews as related or unrelated to the given product. We show that our methodology boosts performance in multilingual (English, French, German) settings, increasing F1 score and significantly decreasing false positives.

MCML Authors
Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[539]
P. Koch, G. V. Nuñez, E. Garces Arias, C. Heumann, M. Schöffel, A. Häberlin and M. Aßenmacher.
A tailored Handwritten-Text-Recognition System for Medieval Latin.
1st Workshop on Ancient Language Processing (ALP 2023) co-located with the Conference on Recent Advances in Natural Language Processing (RANLP 2023). Varna, Bulgaria, Sep 08, 2023. URL.
MCML Authors
Link to Esteban Garces Arias

Esteban Garces Arias

Statistical Learning & Data Science

Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[538]
E. Nie, H. Schmid and H. Schütze.
Cross-Lingual Constituency Parsing for Middle High German: A Delexicalized Approach.
1st Workshop on Ancient Language Processing (ALP 2023) co-located with the Conference on Recent Advances in Natural Language Processing (RANLP 2023). Varna, Bulgaria, Sep 08, 2023. URL.
MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[537]
V. Hangya and A. Fraser.
LMU at HaSpeeDe3: Multi-Dataset Training for Cross-Domain Hate Speech Detection.
Final Workshop of the 8th evaluation campaign EVALITA 2023. Parma, Italy, Sep 07-08, 2023. PDF.
MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[536]
S. Nyholm.
Is Academic Enhancement Possible by Means of Generative Ai-Based Digital Twins?.
American Journal of Bioethics 23.10 (Sep. 2023). DOI.
MCML Authors
Link to Sven Nyholm

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence


[535]
H. A. Gündüz, M. Binder, X.-Y. To, R. Mreches, B. Bischl, A. C. McHardy, P. C. Münch and M. Rezaei.
A self-supervised deep learning method for data-efficient training in genomics.
Communications Biology 6.928 (Sep. 2023). DOI.
MCML Authors
Link to Hüseyin Anil Gündüz

Hüseyin Anil Gündüz

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[534]
D. Bär, N. Pröllochs and S. Feuerriegel.
New Threats to Society from Free-Speech Social Media Platforms.
Communications of the ACM 66.10 (Sep. 2023). DOI.
MCML Authors
Link to Dominik Bär

Dominik Bär

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[533]
M. Toetzke, B. Probst and S. Feuerriegel.
Leveraging large language models to monitor climate technology innovation.
Environmental Research Letters 18.9 (Sep. 2023). DOI.
MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[532]
B. X. W. Liew, F. M. Kovacs, D. Rügamer and A. Royuela.
Automatic variable selection algorithms in prognostic factor research in neck pain.
Journal of Clinical Medicine (Sep. 2023). DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[531]
S. Hoffmann, F. Scheipl and A.-L. Boulesteix.
Reproduzierbare und replizierbare Forschung.
Moderne Verfahren der Angewandten Statistik (Sep. 2023). URL.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[530]
A. Bacho, H. Boche and G. Kutyniok.
Complexity Blowup for Solutions of the Laplace and the Diffusion Equation.
Preprint at arXiv (Sep. 2023). arXiv.
Abstract

In this paper, we investigate the computational complexity of solutions to the Laplace and the diffusion equation. We show that for a certain class of initial-boundary value problems of the Laplace and the diffusion equation, the solution operator is #P1/#P-complete in the sense that it maps polynomial-time computable functions to the set of #P1/#P-complete functions. Consequently, there exists polynomial-time (Turing) computable input data such that the solution is not polynomial-time computable, unless FP=#P or FP1=#P1. In this case, we can, in general, not simulate the solution of the Laplace or the diffusion equation on a digital computer without having a complexity blowup, i.e., the computation time for obtaining an approximation of the solution with up to a finite number of significant digits grows non-polynomially in the number of digits. This indicates that the computational complexity of the solution operator that models a physical phenomena is intrinsically high, independent of the numerical algorithm that is used to approximate a solution.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[529]
F. Hoppe, F. Krahmer, C. M. Verdun, M. I. Menzel and H. Rauhut.
Uncertainty quantification for sparse Fourier recovery.
Preprint at arXiv (Sep. 2023). arXiv.
Abstract

One of the most prominent methods for uncertainty quantification in high-dimen-sional statistics is the desparsified LASSO that relies on unconstrained ℓ1-minimization. The majority of initial works focused on real (sub-)Gaussian designs. However, in many applications, such as magnetic resonance imaging (MRI), the measurement process possesses a certain structure due to the nature of the problem. The measurement operator in MRI can be described by a subsampled Fourier matrix. The purpose of this work is to extend the uncertainty quantification process using the desparsified LASSO to design matrices originating from a bounded orthonormal system, which naturally generalizes the subsampled Fourier case and also allows for the treatment of the case where the sparsity basis is not the standard basis. In particular we construct honest confidence intervals for every pixel of an MR image that is sparse in the standard basis provided the number of measurements satisfies n≳max{slog2slogp,slog2p} or that is sparse with respect to the Haar Wavelet basis provided a slightly larger number of measurements.

MCML Authors
Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member

Link to Holger Rauhut

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence


[528]
Y. Shan, Y. Xia, Y. Chen and D. Cremers.
SCP: Scene Completion Pre-training for 3D Object Detection.
Preprint at arXiv (Sep. 2023). arXiv.
MCML Authors
Link to Yan Xia

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[527]
R. P. Prager, K. Dietrich, L. Schneider, L. Schäpermeier, B. Bischl, P. Kerschke, H. Trautmann and O. Mersmann.
Neural Networks as Black-Box Benchmark Functions Optimized for Exploratory Landscape Features.
17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms (FOGA 2023). Potsdam, Germany, Aug 30-Sep 01, 2023. DOI.
Abstract

Artificial benchmark functions are commonly used in optimization research because of their ability to rapidly evaluate potential solutions, making them a preferred substitute for real-world problems. However, these benchmark functions have faced criticism for their limited resemblance to real-world problems. In response, recent research has focused on automatically generating new benchmark functions for areas where established test suites are inadequate. These approaches have limitations, such as the difficulty of generating new benchmark functions that exhibit exploratory landscape analysis (ELA) features beyond those of existing benchmarks. The objective of this work is to develop a method for generating benchmark functions for single-objective continuous optimization with user-specified structural properties. Specifically, we aim to demonstrate a proof of concept for a method that uses an ELA feature vector to specify these properties in advance. To achieve this, we begin by generating a random sample of decision space variables and objective values. We then adjust the objective values using CMA-ES until the corresponding features of our new problem match the predefined ELA features within a specified threshold. By iteratively transforming the landscape in this way, we ensure that the resulting function exhibits the desired properties. To create the final function, we use the resulting point cloud as training data for a simple neural network that produces a function exhibiting the target ELA features. We demonstrate the effectiveness of this approach by replicating the existing functions of the well-known BBOB suite and creating new functions with ELA feature values that are not present in BBOB.

MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[526]
A. Scheppach, H. A. Gündüz, E. Dorigatti, P. C. Münch, A. C. McHardy, B. Bischl, M. Rezaei and M. Binder.
Neural Architecture Search for Genomic Sequence Data.
20th IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology (CIBCB 2023). Eindhoven, The Netherlands, Aug 29-31, 2023. DOI.
Abstract

Deep learning has enabled outstanding progress on bioinformatics datasets and a variety of tasks, such as protein structure prediction, identification of regulatory regions, genome annotation, and interpretation of the noncoding genome. The layout and configuration of neural networks used for these tasks have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Therefore, there is growing interest in automated neural architecture search (NAS) methods in bioinformatics. In this paper, we present a novel search space for NAS algorithms that operate on genome data, thus creating extensions for existing NAS algorithms for sequence data that we name Genome-DARTS, Genome-P-DARTS, Genome-BONAS, Genome-SH, and Genome-RS. Moreover, we introduce two novel NAS algorithms, CWP-DARTS and EDPDARTS, that build on and extend the idea of P-DARTS. We evaluate the presented methods and compare them to manually designed neural architectures on a widely used genome sequence machine learning task to show that NAS methods can be adapted well for bioinformatics sequence datasets. Our experiments show that architectures optimized by our NAS methods outperform manually developed architectures while having significantly fewer parameters.

MCML Authors
Link to Hüseyin Anil Gündüz

Hüseyin Anil Gündüz

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science


[525]
L. Rottkamp, N. Strauß and M. Schubert.
DEAR: Dynamic Electric Ambulance Redeployment.
18th International Symposium on Spatial and Temporal Databases (SSTD 2023). Calgary, Canada, Aug 23-25, 2023. DOI.
MCML Authors
Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[524]
M. Windl, A. Scheidle, C. George and S. Mayer.
Investigating Security Indicators for Hyperlinking Within the Metaverse.
19th Symposium on Usable Privacy and Security (SOUPS 2023). Anaheim, CA, USA, Aug 06-08, 2023. URL.
MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[523]
A. Beer, A. Draganov, E. Hohma, P. Jahn, C. M. M. Frey and I. Assent.
Connecting the Dots — Density-Connectivity Distance unifies DBSCAN, k-Center and Spectral Clustering.
29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2023). Long Beach, CA, USA, Aug 06-10, 2023. DOI.
MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Philipp Jahn

Philipp Jahn

Database Systems & Data Mining

Link to Christian Frey

Christian Frey

Dr.

* Former member


[522]
M. Caprio, Y. Sale, E. Hüllermeier and I. Lee.
A Novel Bayes' Theorem for Upper Probabilities..
International Workshop on Epistemic Uncertainty in Artificial Intelligence (Epi UAI 2023). Pittsburgh, PA, USA, Aug 04, 2023. DOI.
MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[521]
J. Rodemann, J. Goschenhofer, E. Dorigatti, T. Nagler and T. Augustin.
Approximately Bayes-optimal pseudo-label selection.
39th Conference on Uncertainty in Artificial Intelligence (UAI 2023). Pittsburgh, PA, USA, Aug 01-03, 2023. URL.
MCML Authors
Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science


[520]
Y. Sale, M. Caprio and E. Hüllermeier.
Is the Volume of a Credal Set a Good Measure for Epistemic Uncertainty?.
39th Conference on Uncertainty in Artificial Intelligence (UAI 2023). Pittsburgh, PA, USA, Aug 01-03, 2023. URL.
MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[519]
L. Wimmer, Y. Sale, P. Hofman, B. Bischl and E. Hüllermeier.
Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures?.
39th Conference on Uncertainty in Artificial Intelligence (UAI 2023). Pittsburgh, PA, USA, Aug 01-03, 2023. URL.
MCML Authors
Link to Lisa Wimmer

Lisa Wimmer

Statistical Learning & Data Science

Link to Paul Hofman

Paul Hofman

Artificial Intelligence & Machine Learning

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[518]
S. Endt, M. Engel, E. Naldi, R. Assereto, M. Molendowska, L. Mueller, C. M. Verdun, C. M. Pirkl, M. Palombo, D. K. Jones and M. I. Menzel.
In vivo myelin water quantification using diffusion--relaxation correlation MRI: A comparison of 1D and 2D methods.
Applied Magnetic Resonance 54 (Aug. 2023). DOI.
MCML Authors
Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member


[517]
F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Auxiliary Cross-Modal Representation Learning With Triplet Loss Functions for Online Handwriting Recognition.
IEEE Access 11 (Aug. 2023). DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[516]
D. Wolffram, S. Abbott, M. an der Heiden, S. Funk, F. Günther, D. Hailer, S. Heyder, T. Hotz, J. van de Kassteele, H. Küchenhoff, S. Müller-Hansen, D. Syliqi, A. Ullrich, M. Weigert, M. Schienle and J. Bracher.
Collaborative nowcasting of COVID-19 hospitalization incidences in Germany.
PLOS Computational Biology 19.8 (Aug. 2023). DOI.
MCML Authors
Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)


[515]
S. Bamberger, R. Heckel and F. Krahmer.
Approximating Positive Homogeneous Functions with Scale Invariant Neural Networks.
Preprint at arXiv (Aug. 2023). arXiv.
MCML Authors
Link to Reinhard Heckel

Reinhard Heckel

Prof. Dr.

Machine Learning

Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis


[514]
H.-H. Chou, J. Maly and D. Stöger.
How to induce regularization in linear models: A guide to reparametrizing gradient flow.
Preprint at arXiv (Aug. 2023). arXiv.
Abstract

In this work, we analyze the relation between reparametrizations of gradient flow and the induced implicit bias in linear models, which encompass various basic regression tasks. In particular, we aim at understanding the influence of the model parameters - reparametrization, loss, and link function - on the convergence behavior of gradient flow. Our results provide conditions under which the implicit bias can be well-described and convergence of the flow is guaranteed. We furthermore show how to use these insights for designing reparametrization functions that lead to specific implicit biases which are closely connected to ℓp- or trigonometric regularizers.

MCML Authors
Link to Hung-Hsu Chou

Hung-Hsu Chou

Dr.

Optimization & Data Analysis

Link to Johannes Maly

Johannes Maly

Prof. Dr.

Associate

Mathematical Data Science and Artificial Intelligence


[513]
S. Henzgen and E. Hüllermeier.
Weighting by Tying: A New Approach to Weighted Rank Correlation.
Preprint at arXiv (Aug. 2023). arXiv.
Abstract

Measures of rank correlation are commonly used in statistics to capture the degree of concordance between two orderings of the same set of items. Standard measures like Kendall’s tau and Spearman’s rho coefficient put equal emphasis on each position of a ranking. Yet, motivated by applications in which some of the positions (typically those on the top) are more important than others, a few weighted variants of these measures have been proposed. Most of these generalizations fail to meet desirable formal properties, however. Besides, they are often quite inflexible in the sense of committing to a fixed weighing scheme. In this paper, we propose a weighted rank correlation measure on the basis of fuzzy order relations. Our measure, called scaled gamma, is related to Goodman and Kruskal’s gamma rank correlation. It is parametrized by a fuzzy equivalence relation on the rank positions, which in turn is specified conveniently by a so-called scaling function. This approach combines soundness with flexibility: it has a sound formal foundation and allows for weighing rank positions in a flexible way.

MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[512]
Y. Li, Y. Zhang, K. Kawaguchi, A. Khakzar, B. Bischl and M. Rezaei.
A Dual-Perspective Approach to Evaluating Feature Attribution Methods.
Preprint at arXiv (Aug. 2023). arXiv.
Abstract

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model’s behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

MCML Authors
Link to Yawei Li

Yawei Li

Statistical Learning & Data Science

Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[511]
A. Volkmann, A. Stöcker, F. Scheipl and S. Greven.
Multivariate Functional Additive Mixed Models.
Statistical Modelling 23.4 (Aug. 2023). DOI.
Abstract

Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.

MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[510]
M. K. Belaid, R. Bornemann, M. Rabus, R. Krestel and E. Hüllermeier.
Compare-xAI: Toward Unifying Functional Testing Methods for Post-hoc XAI Algorithms into a Multi-dimensional Benchmark.
1st World Conference on eXplainable Artificial Intelligence (xAI 2023). Lisbon, Portugal, Jul 26-28, 2023. DOI.
MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[509]
C. Molnar, T. Freiesleben, G. König, J. Herbinger, T. Reisinger, G. Casalicchio, M. N. Wright and B. Bischl.
Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process.
1st World Conference on eXplainable Artificial Intelligence (xAI 2023). Lisbon, Portugal, Jul 26-28, 2023. DOI.
Abstract

Scientists and practitioners increasingly rely on machine learning to model data and draw conclusions. Compared to statistical modeling approaches, machine learning makes fewer explicit assumptions about data structures, such as linearity. However, their model parameters usually cannot be easily related to the data generating process. To learn about the modeled relationships, partial dependence (PD) plots and permutation feature importance (PFI) are often used as interpretation methods. However, PD and PFI lack a theory that relates them to the data generating process. We formalize PD and PFI as statistical estimators of ground truth estimands rooted in the data generating process. We show that PD and PFI estimates deviate from this ground truth due to statistical biases, model variance and Monte Carlo approximation errors. To account for model variance in PD and PFI estimation, we propose the learner-PD and the learner-PFI based on model refits, and propose corrected variance and confidence interval estimators.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[508]
M. Muschalik, F. Fumagalli, R. Jagtani, B. Hammer and E. Hüllermeier.
iPDP: On Partial Dependence Plots in Dynamic Modeling Scenarios.
1st World Conference on eXplainable Artificial Intelligence (xAI 2023). Lisbon, Portugal, Jul 26-28, 2023. Best Paper Award. DOI.
MCML Authors
Link to Maximilian Muschalik

Maximilian Muschalik

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[507]
A. Stüber, S. Coors and M. Ingrisch.
Revitalize the Potential of Radiomics: Interpretation and Feature Stability in Medical Imaging Analyses through Groupwise Feature Importance.
Late-breaking Work, Demos and Doctoral Consortium (LB-D-DC 2023) at the 1st World Conference on eXplainable Artificial Intelligence (xAI 2023). Lisbon, Portugal, Jul 26-28, 2023. PDF.
Abstract

Radiomics, involving analysis of calculated, quantitative features from medical images with machine learning tools, shares the instability challenge with other high-dimensional data analyses due to variations in the training set. This instability affects model interpretation and feature importance assessment. To enhance stability and interpretability, we introduce grouped feature importance, shedding light on tool limitations and advocating for more reliable radiomics-based analysis methods.

MCML Authors
Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[506]
V. Bengs, E. Hüllermeier and W. Waegeman.
On Second-Order Scoring Rules for Epistemic Uncertainty Quantification.
40th International Conference on Machine Learning (ICML 2023). Honolulu, Hawaii, Jul 23-29, 2023. URL.
MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[505]
V. Melnychuk, D. Frauen and S. Feuerriegel.
Normalizing Flows for Interventional Density Estimation.
40th International Conference on Machine Learning (ICML 2023). Honolulu, Hawaii, Jul 23-29, 2023. URL.
MCML Authors
Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[504]
T. Nagler.
Statistical Foundations of Prior-Data Fitted Networks.
40th International Conference on Machine Learning (ICML 2023). Honolulu, Hawaii, Jul 23-29, 2023. URL.
MCML Authors
Link to Thomas Nagler

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science


[503]
D. Rügamer.
A New PHO-rmula for Improved Performance of Semi-Structured Networks.
40th International Conference on Machine Learning (ICML 2023). Honolulu, Hawaii, Jul 23-29, 2023. URL.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[502]
N. Stucki, J. C. Paetzold, S. Shit, B. Menze and U. Bauer.
Topologically faithful image segmentation via induced matching of persistence barcodes.
40th International Conference on Machine Learning (ICML 2023). Honolulu, Hawaii, Jul 23-29, 2023. URL. GitHub.
Abstract

Segmentation models predominantly optimize pixel-overlap-based loss, an objective that is actually inadequate for many segmentation tasks. In recent years, their limitations fueled a growing interest in topology-aware methods, which aim to recover the topology of the segmented structures. However, so far, existing methods only consider global topological properties, ignoring the need to preserve topological features spatially, which is crucial for accurate segmentation. We introduce the concept of induced matchings from persistent homology to achieve a spatially correct matching between persistence barcodes in a segmentation setting. Based on this concept, we define the Betti matching error as an interpretable, topologically and feature-wise accurate metric for image segmentations, which resolves the limitations of the Betti number error. Our Betti matching error is differentiable and efficient to use as a loss function. We demonstrate that it improves the topological performance of segmentation networks significantly across six diverse datasets while preserving the performance with respect to traditional scores.

MCML Authors
Link to Nico Stucki

Nico Stucki

Applied Topology and Geometry

Link to Ulrich Bauer

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry


[501]
C. Tomani, F. Waseda, Y. Shen and D. Cremers.
Beyond In-Domain Scenarios: Robust Density-Aware Calibration.
40th International Conference on Machine Learning (ICML 2023). Honolulu, Hawaii, Jul 23-29, 2023. URL.
MCML Authors
Link to Christian Tomani

Christian Tomani

Computer Vision & Artificial Intelligence

Yuesong Shen

Yuesong Shen

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[500]
J. Goschenhofer, B. Bischl and Z. Kira.
ConstraintMatch for Semi-constrained Clustering.
International Joint Conference on Neural Networks (IJCNN 2023). Gold Coast Convention and Exhibition Centre, Queensland, Australia, Jul 18-23, 2023. DOI.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[499]
C. Kolb, B. Bischl, C. L. Müller and D. Rügamer.
Sparse Modality Regression.
37th International Workshop on Statistical Modelling (IWSM 2023). Dortmund, Germany, Jul 17-21, 2023. Best Paper Award. PDF.
MCML Authors
Link to Chris Kolb

Chris Kolb

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[498]
A. Giovagnoli, Y. Ma, M. Schubert and V. Tresp.
QNEAT: Natural Evolution of Variational Quantum Circuit Architecture.
Genetic and Evolutionary Computation Conference (GECCO 2023). Lisbon, Portugal, Jul 15-19, 2023. DOI.
Abstract

Quantum Machine Learning (QML) is a recent and rapidly evolving field where the theoretical framework and logic of quantum mechanics is employed to solve machine learning tasks. A variety of techniques that have a different level of quantum-classical hybridization has been presented. Here we focus on variational quantum circuits (VQC), which emerged as the most promising candidates for the quantum counterpart of neural networks in the noisy intermediate-scale quantum (NISQ) era. Although showing promising results, VQCs can be hard to train because of different issues e.g. barren plateau, periodicity of the weights or choice of the architecture. In this paper we focus on this last problem and in order to address it we propose a gradient free algorithm inspired by natural evolution to optimise both the weights and the architecture of the VQC. In particular, we present a version of the well known neuroevolution of augmenting topologies (NEAT) algorithm adapted to the case of quantum variational circuits. We test the algorithm with different benchmark problems of classical fields of machine learning i.e. reinforcement learning and optimization.

MCML Authors
Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[497]
L. Schneider, B. Bischl and J. Thomas.
Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models.
Genetic and Evolutionary Computation Conference (GECCO 2023). Lisbon, Portugal, Jul 15-19, 2023. DOI.
MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[496]
M. Wever, M. Özdogan and E. Hüllermeier.
Cooperative Co-Evolution for Ensembles of Nested Dichotomies for Multi-Class Classification.
Genetic and Evolutionary Computation Conference (GECCO 2023). Lisbon, Portugal, Jul 15-19, 2023. DOI.
MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[495]
T. Fuchs, F. Krahmer and R. Kueng.
Greedy-type sparse recovery from heavy-tailed measurements.
International Conference on Sampling Theory and Applications (SampTA 2023). Yale, CT, USA, Jul 10-14, 2023. DOI.
MCML Authors
Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis


[494]
F. Hoppe, F. Krahmer, C. M. Verdun, M. I. Menzel and H. Rauhut.
Sampling Strategies for Compressive Imaging Under Statistical Noise.
International Conference on Sampling Theory and Applications (SampTA 2023). Yale, CT, USA, Jul 10-14, 2023. DOI.
MCML Authors
Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member

Link to Holger Rauhut

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence


[493]
R. Joy, F. Krahmer, A. Lupoli and R. Ramakrishan.
Quantization of Bandlimited Functions Using Random Samples.
International Conference on Sampling Theory and Applications (SampTA 2023). Yale, CT, USA, Jul 10-14, 2023. DOI.
MCML Authors
Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis


[492]
F. Krahmer, H. Lyu, R. Saab, A. Veselovska and R. Wang.
Quantization of Bandlimited Graph Signals.
International Conference on Sampling Theory and Applications (SampTA 2023). Yale, CT, USA, Jul 10-14, 2023. DOI.
MCML Authors
Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Hanna Veselovska

Hanna Veselovska

Dr.

Optimization & Data Analysis


[491]
F. Krahmer and A. Veselovska.
Digital Halftoning via Mixed-Order Weighted Σ∆ Modulation.
International Conference on Sampling Theory and Applications (SampTA 2023). Yale, CT, USA, Jul 10-14, 2023. DOI.
MCML Authors
Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Hanna Veselovska

Hanna Veselovska

Dr.

Optimization & Data Analysis


[490]
Y. Liu, A. Chronopoulou, H. Schütze and A. Fraser.
On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss.
20th International Conference on Spoken Language Translation (IWSLT 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[489]
A. Imani, P. Lin, A. H. Kargaran, S. Severini, M. J. Sabet, N. Kassner, C. Ma, H. Schmid, A. Martins, F. Yvon and H. Schütze.
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages.
61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI. GitHub.
Abstract

The NLP community has mainly focused on scaling Large Language Models (LLMs) vertically, i.e., making them better for about 100 languages. We instead scale LLMs horizontally: we create, through continued pretraining, Glot500-m, an LLM that covers 511 predominantly low-resource languages. An important part of this effort is to collect and clean Glot500-c, a corpus that covers these 511 languages and allows us to train Glot500-m. We evaluate Glot500-m on five diverse tasks across these languages. We observe large improvements for both high-resource and low-resource languages compared to an XLM-R baseline. Our analysis shows that no single factor explains the quality of multilingual LLM representations. Rather, a combination of factors determines quality including corpus size, script, ‘help’ from related languages and the total capacity of the model. Our work addresses an important goal of NLP research: we should notlimit NLP to a small fraction of the world’s languages and instead strive to support as many languages as possible to bring the benefits of NLP technology to all languages and cultures.

MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[488]
Y. Liu, H. Ye, L. Weissweiler, P. Wicke, R. Pei, R. Zangenfeind and H. Schütze.
A Crosslingual Investigation of Conceptualization in 1335 Languages.
61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
Abstract

Languages differ in how they divide up the world into concepts and words; e.g., in contrast to English, Swahili has a single concept for ‘belly’ and ‘womb’. We investigate these differences in conceptualization across 1,335 languages by aligning concepts in a parallel corpus. To this end, we propose Conceptualizer, a method that creates a bipartite directed alignment graph between source language concepts and sets of target language strings. In a detailed linguistic analysis across all languages for one concept (‘bird’) and an evaluation on gold standard data for 32 Swadesh concepts, we show that Conceptualizer has good alignment accuracy. We demonstrate the potential of research on conceptualization in NLP with two experiments. (1) We define crosslingual stability of a concept as the degree to which it has 1-1 correspondences across languages, and show that concreteness predicts stability. (2) We represent each language by its conceptualization pattern for 83 concepts, and define a similarity measure on these representations. The resulting measure for the conceptual similarity between two languages is complementary to standard genealogical, typological, and surface similarity measures. For four out of six language families, we can assign languages to their correct family based on conceptual similarity with accuracies between 54% and 87%.

MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[487]
Y. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism.
61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
Abstract

We investigate response generation for multi-turn dialogue in generative chatbots. Existing generative modelsbased on RNNs (Recurrent Neural Networks) usually employ the last hidden state to summarize the history, which makesmodels unable to capture the subtle variability observed in different dialogues and cannot distinguish the differencesbetween dialogues that are similar in composition. In this paper, we propose Pseudo-Variational Gated Recurrent Unit (PVGRU). The key novelty of PVGRU is a recurrent summarizing variable thataggregates the accumulated distribution variations of subsequences. We train PVGRU without relying on posterior knowledge, thus avoiding the training-inference inconsistency problem. PVGRU can perceive subtle semantic variability through summarizing variables that are optimized by two objectives we employ for training: distribution consistency and reconstruction. In addition, we build a Pseudo-Variational Hierarchical Dialogue(PVHD) model based on PVGRU. Experimental results demonstrate that PVGRU can broadly improve the diversity andrelevance of responses on two benchmark datasets.

MCML Authors
Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[486]
A. Modarressi, M. Fayyaz, E. Aghazadeh, Y. Yaghoobzadeh and M. T. Pilehvar.
DecompX: Explaining Transformers Decisions by Propagating Token Decomposition.
61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI. GitHub.
Abstract

An emerging solution for explaining Transformer-based models is to use vector-based analysis on how the representations are formed. However, providing a faithful vector-based explanation for a multi-layer model could be challenging in three aspects: (1) Incorporating all components into the analysis, (2) Aggregating the layer dynamics to determine the information flow and mixture throughout the entire model, and (3) Identifying the connection between the vector-based analysis and the model’s predictions. In this paper, we present DecompX to tackle these challenges. DecompX is based on the construction of decomposed token representations and their successive propagation throughout the model without mixing them in between layers. Additionally, our proposal provides multiple advantages over existing solutions for its inclusion of all encoder components (especially nonlinear feed-forward networks) and the classification head. The former allows acquiring precise vectors while the latter transforms the decomposition into meaningful prediction-based values, eliminating the need for norm- or summation-based vector aggregation. According to the standard faithfulness evaluations, DecompX consistently outperforms existing gradient-based and vector-based approaches on various datasets.

MCML Authors
Link to Ali Modarressi

Ali Modarressi

Statistical NLP and Deep Learning


[485]
M. Fromm, M. Berrendorf, E. Faerman and T. Seidl.
Cross-Domain Argument Quality Estimation.
Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI. GitHub.
Abstract

Argumentation is one of society’s foundational pillars, and, sparked by advances in NLP, and the vast availability of text data, automated mining of arguments receives increasing attention. A decisive property of arguments is their strength or quality. While there are works on the automated estimation of argument strength, their scope is narrow:They focus on isolated datasets and neglect the interactions with related argument-mining tasks, such as argument identification and evidence detection. In this work, we close this gap by approaching argument quality estimation from multiple different angles:Grounded on rich results from thorough empirical evaluations, we assess the generalization capabilities of argument quality estimation across diverse domains and the interplay with related argument mining tasks. We find that generalization depends on a sufficient representation of different domains in the training part. In zero-shot transfer and multi-task experiments, we reveal that argument quality is among the more challenging tasks but can improve others.

MCML Authors
Link to Michael Fromm

Michael Fromm

Dr.

* Former member

Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[484]
K. Hämmerl, B. Deiseroth, P. Schramowski, J. Libovický, C. Rothkopf, A. Fraser and K. Kersting.
Speaking Multiple Languages Affects the Moral Bias of Language Models.
Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
Abstract

Pre-trained multilingual language models (PMLMs) are commonly used when dealing with data from multiple languages and cross-lingual transfer. However, PMLMs are trained on varying amounts of data for each language. In practice this means their performance is often much better on English than many other languages. We explore to what extent this also applies to moral norms. Do the models capture moral norms from English and impose them on other languages? Do the models exhibit random and thus potentially harmful beliefs in certain languages? Both these issues could negatively impact cross-lingual transfer and potentially lead to harmful outcomes. In this paper, we (1) apply the MORALDIRECTION framework to multilingual models, comparing results in German, Czech, Arabic, Chinese, and English, (2) analyse model behaviour on filtered parallel subtitles corpora, and (3) apply the models to a Moral Foundations Questionnaire, comparing with human responses from different countries. Our experiments demonstrate that, indeed, PMLMs encode differing moral biases, but these do not necessarily correspond to cultural differences or commonalities in human opinions. We release our code and models.

MCML Authors
Link to Katharina Hämmerl

Katharina Hämmerl

Data Analytics & Statistics

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[483]
K. Hämmerl, A. Fastowski, J. Libovický and A. Fraser.
Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity.
Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
Abstract

Previous work has shown that the representations output by contextual language models are more anisotropic than static type embeddings, and typically display outlier dimensions. This seems to be true for both monolingual and multilingual models, although much less work has been done on the multilingual context. Why these outliers occur and how they affect the representations is still an active area of research. We investigate outlier dimensions and their relationship to anisotropy in multiple pre-trained multilingual language models. We focus on cross-lingual semantic similarity tasks, as these are natural tasks for evaluating multilingual representations. Specifically, we examine sentence representations. Sentence transformers which are fine-tuned on parallel resources (that are not always available) perform better on this task, and we show that their representations are more isotropic. However, we aim to improve multilingual representations in general. We investigate how much of the performance difference can be made up by only transforming the embedding space without fine-tuning, and visualise the resulting spaces. We test different operations: Removing individual outlier dimensions, cluster-based isotropy enhancement, and ZCA whitening. We publish our code for reproducibility.

MCML Authors
Link to Katharina Hämmerl

Katharina Hämmerl

Data Analytics & Statistics

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[482]
Z. Han, R. Liao, J. Gu, Y. Zhang, Z. Ding, Y. Gu, H. Köppl, H. Schütze and V. Tresp.
ECOLA: Enhancing Temporal Knowledge Embeddings with Contextualized Language Representations.
Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
Abstract

Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts. However, existing enhancement approaches cannot apply to temporal knowledge graphs (tKGs), which contain time-dependent event knowledge with complex temporal dynamics. Specifically, existing enhancement approaches often assume knowledge embedding is time-independent. In contrast, the entity embedding in tKG models usually evolves, which poses the challenge of aligning temporally relevant texts with entities. To this end, we propose to study enhancing temporal knowledge embedding with textual data in this paper. As an approach to this task, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which takes the temporal aspect into account and injects textual information into temporal knowledge embedding. To evaluate ECOLA, we introduce three new datasets for training and evaluating ECOLA. Extensive experiments show that ECOLA significantly enhances temporal KG embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task.

MCML Authors
Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

Link to Yao Zhang

Yao Zhang

Database Systems & Data Mining

Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[481]
E. Nie, S. Liang, H. Schmid and H. Schütze.
Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages.
Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
Abstract

Multilingual Pretrained Language Models (MPLMs) perform strongly in cross-lingual transfer. We propose Prompts Augmented by Retrieval Crosslingually (PARC) to improve zero-shot performance on low-resource languages (LRLs) by augmenting the context with prompts consisting of semantically similar sentences retrieved from a high-resource language (HRL). PARC improves zero-shot performance on three downstream tasks (sentiment classification, topic categorization, natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in unlabeled (+5.1%) and labeled settings (+16.3%). PARC also outperforms finetuning by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Sheng Liang

Sheng Liang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[480]
L. Weber and B. Plank.
ActiveAED: A Human in the Loop Improves Annotation Error Detection.
Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
Abstract

Manually annotated datasets are crucial for training and evaluating Natural Language Processing models. However, recent work has discovered that even widely-used benchmark datasets contain a substantial number of erroneous annotations. This problem has been addressed with Annotation Error Detection (AED) models, which can flag such errors for human re-annotation. However, even though many of these AED methods assume a final curation step in which a human annotator decides whether the annotation is erroneous, they have been developed as static models without any human-in-the-loop component. In this work, we propose ActiveAED, an AED method that can detect errors more accurately by repeatedly querying a human for error corrections in its prediction loop. We evaluate ActiveAED on eight datasets spanning five different tasks and find that it leads to improvements over the state of the art on seven of them, with gains of up to six percentage points in average precision.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[479]
P. Wicke.
LMs stand their Ground: Investigating the Effect of Embodiment in Figurative Language Interpretation by Language Models.
Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
Abstract

Figurative language is a challenge for language models since its interpretation is based on the use of words in a way that deviates from their conventional order and meaning. Yet, humans can easily understand and interpret metaphors, similes or idioms as they can be derived from embodied metaphors. Language is a proxy for embodiment and if a metaphor is conventional and lexicalised, it becomes easier for a system without a body to make sense of embodied concepts. Yet, the intricate relation between embodiment and features such as concreteness or age of acquisition has not been studied in the context of figurative language interpretation concerning language models. Hence, the presented study shows how larger language models perform better at interpreting metaphoric sentences when the action of the metaphorical sentence is more embodied. The analysis rules out multicollinearity with other features (e.g. word length or concreteness) and provides initial evidence that larger language models conceptualise embodied concepts to a degree that facilitates figurative language understanding.

MCML Authors
Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning


[478]
P. Wicke, L. K. Senel, S. Zhang, L. Figueredo, A. Naceri, S. Haddadin and H. Schütze.
Towards Language-Based Modulation of Assistive Robots through Multimodal Models.
2nd Geriatronics Summit (Geriatronics Summit 2023). Garmisch-Partenkirchen, Germany, Jul 02-03, 2023. arXiv.
Abstract

In the field of Geriatronics, enabling effective and transparent communication between humans and robots is crucial for enhancing the acceptance and performance of assistive robots. Our early-stage research project investigates the potential of language-based modulation as a means to improve human-robot interaction. We propose to explore real-time modulation during task execution, leveraging language cues, visual references, and multimodal inputs. By developing transparent and interpretable methods, we aim to enable robots to adapt and respond to language commands, enhancing their usability and flexibility. Through the exchange of insights and knowledge at the workshop, we seek to gather valuable feedback to advance our research and contribute to the development of interactive robotic systems for Geriatronics and beyond.

MCML Authors
Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning

Link to Shengqiang Zhang

Shengqiang Zhang

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[477]
G. Kutyniok.
An introduction to the mathematics of deep learning.
European Congress of Mathematics (Jul. 2023). DOI.
MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[476]
B. X. W. Liew, D. Rügamer, Q. Mei, Z. Altai, X. Zhu, X. Zhai and N. Cortes.
Smooth and accurate predictions of joint contact force timeseries in gait using overparameterised deep neural networks.
Frontiers in Bioengineering and Biotechnology 11 (Jul. 2023). DOI.
Abstract

Alterations in joint contact forces (JCFs) are thought to be important mechanisms for the onset and progression of many musculoskeletal and orthopaedic pain disorders. Computational approaches to JCFs assessment represent the only non-invasive means of estimating in-vivo forces; but this cannot be undertaken in free-living environments. Here, we used deep neural networks to train models to predict JCFs, using only joint angles as predictors. Our neural network models were generally able to predict JCFs with errors within published minimal detectable change values. The errors ranged from the lowest value of 0.03 bodyweight (BW) (ankle medial-lateral JCF in walking) to a maximum of 0.65BW (knee VT JCF in running). Interestingly, we also found that over parametrised neural networks by training on longer epochs (>100) resulted in better and smoother waveform predictions. Our methods for predicting JCFs using only joint kinematics hold a lot of promise in allowing clinicians and coaches to continuously monitor tissue loading in free-living environments.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[475]
M. Aßenmacher, N. Sauter and C. Heumann.
Classifying multilingual party manifestos: Domain transfer across country, time, and genre.
Preprint at arXiv (Jul. 2023). arXiv.
Abstract

Annotating costs of large corpora are still one of the main bottlenecks in empirical social science research. On the one hand, making use of the capabilities of domain transfer allows re-using annotated data sets and trained models. On the other hand, it is not clear how well domain transfer works and how reliable the results are for transfer across different dimensions. We explore the potential of domain transfer across geographical locations, languages, time, and genre in a large-scale database of political manifestos. First, we show the strong within-domain classification performance of fine-tuned transformer models. Second, we vary the genre of the test set across the aforementioned dimensions to test for the fine-tuned models’ robustness and transferability. For switching genres, we use an external corpus of transcribed speeches from New Zealand politicians while for the other three dimensions, custom splits of the Manifesto database are used. While BERT achieves the best scores in the initial experiments across modalities, DistilBERT proves to be competitive at a lower computational expense and is thus used for further experiments across time and country. The results of the additional analysis show that (Distil)BERT can be applied to future data with similar performance. Moreover, we observe (partly) notable differences between the political manifestos of different countries of origin, even if these countries share a language or a cultural background.

MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[474]
J. Baan, N. Daheim, E. Ilia, D. Ulmer, H.-S. Li, R. Fernández, B. Plank, R. Sennrich, C. Zerva and W. Aziz.
Uncertainty in Natural Language Generation: From Theory to Applications.
Preprint at arXiv (Jul. 2023). arXiv.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[473]
A. Bacho, H. Boche and G. Kutyniok.
Reliable AI: Does the Next Generation Require Quantum Computing?.
Preprint at arXiv (Jul. 2023). arXiv.
Abstract

In this survey, we aim to explore the fundamental question of whether the next generation of artificial intelligence requires quantum computing. Artificial intelligence is increasingly playing a crucial role in many aspects of our daily lives and is central to the fourth industrial revolution. It is therefore imperative that artificial intelligence is reliable and trustworthy. However, there are still many issues with reliability of artificial intelligence, such as privacy, responsibility, safety, and security, in areas such as autonomous driving, healthcare, robotics, and others. These problems can have various causes, including insufficient data, biases, and robustness problems, as well as fundamental issues such as computability problems on digital hardware. The cause of these computability problems is rooted in the fact that digital hardware is based on the computing model of the Turing machine, which is inherently discrete. Notably, our findings demonstrate that digital hardware is inherently constrained in solving problems about optimization, deep learning, or differential equations. Therefore, these limitations carry substantial implications for the field of artificial intelligence, in particular for machine learning. Furthermore, although it is well known that the quantum computer shows a quantum advantage for certain classes of problems, our findings establish that some of these limitations persist when employing quantum computing models based on the quantum circuit or the quantum Turing machine paradigm. In contrast, analog computing models, such as the Blum-Shub-Smale machine, exhibit the potential to surmount these limitations.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[472]
J. Gu, Z. Han, S. Chen, A. Beirami, B. He, G. Zhang, R. Liao, Y. Qin, V. Tresp and P. Torr.
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
Preprint at arXiv (Jul. 2023). arXiv.
Abstract

Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. In past years, Prompt engineering has been well-studied in natural language processing. Recently, it has also been intensively studied in vision-language modeling. However, there is currently a lack of a systematic overview of prompt engineering on pre-trained vision-language models. This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e.g. Flamingo), image-text matching models (e.g. CLIP), and text-to-image generation models (e.g. Stable Diffusion). For each type of model, a brief model summary, prompting methods, prompting-based applications, and the corresponding responsibility and integrity issues are summarized and discussed. Furthermore, the commonalities and differences between prompting on vision-language models, language models, and vision models are also discussed. The challenges, future directions, and research opportunities are summarized to foster future research on this topic.

MCML Authors
Link to Shuo Chen

Shuo Chen

Database Systems & Data Mining

Link to Gengyuan Zhang

Gengyuan Zhang

Database Systems & Data Mining

Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[471]
F. Krahmer and A. Veselovska.
Enhanced Digital Halftoning via Weighted Sigma-Delta Modulation.
SIAM Journal on Imaging Sciences 16.3 (Jul. 2023). DOI.
MCML Authors
Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

Link to Hanna Veselovska

Hanna Veselovska

Dr.

Optimization & Data Analysis


[470]
C. Kolb, C. L. Müller, B. Bischl and D. Rügamer.
Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization.
Under Review (Jul. 2023). arXiv.
MCML Authors
Link to Chris Kolb

Chris Kolb

Statistical Learning & Data Science

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[469]
C. Reinkemeyer, Y. Khazaei, M. Weigert, M. Hannes, R. Le Gleut, M. Plank, S. Winter, I. Norena, T. Meier, L. Xu, R. Rubio-Acero, S. Wiegrebe, T. G. Le Thi, C. Fuchs, K. Radon, I. Paunovic, C. Janke, A. Wieser, H. Küchenhoff, M. Hoelscher, N. Castelletti and K. I. O. W. G. KoCo Impf ORCHESTRA Working Grp.
The Prospective COVID-19 Post-Immunization Serological Cohort in Munich (KoCo-Impf): Risk Factors and Determinants of Immune Response in Healthcare Workers.
Viruses 15.7 (Jul. 2023). DOI.
MCML Authors
Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)


[468]
I. van Mechelen, A.-L. Boulesteix, R. Dangl, N. Dean, C. Hennig, F. Leisch, D. Steinley and M. J. Warrens.
A white paper on good research practices in benchmarking: The case of cluster analysis.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13.6 (Jul. 2023). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[467]
A. Farshad.
Representation learning for semantic scene understanding.
2nd International Conference on Hybrid Human-Artificial Intelligence (HHAI 2023). Munich, Germany, Jun 26-30, 2023. DOI.
MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[466]
M. Eisenberger, A. Toker, L. Leal-Taixé and D. Cremers.
G-MSM: Unsupervised Multi-Shape Matching with Graph-based Affinity Priors.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). Vancouver, Canada, Jun 18-23, 2023. DOI.
MCML Authors
Link to Laura Leal-Taixé

Laura Leal-Taixé

Prof. Dr.

* Former member

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[465]
L. Härenstam-Nielsen, N. Zeller and D. Cremers.
Semidefinite Relaxations for Robust Multiview Triangulation.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). Vancouver, Canada, Jun 18-23, 2023. DOI.
MCML Authors
Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[464]
D. Kotovenko, P. Ma, T. Milbich and B. Ommer.
Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). Vancouver, Canada, Jun 18-23, 2023. DOI.
Abstract

Learning compact image embeddings that yield seman-tic similarities between images and that generalize to un-seen test classes, is at the core of deep metric learning (DML). Finding a mapping from a rich, localized image feature map onto a compact embedding vector is challenging: Although similarity emerges between tuples of images, DML approaches marginalize out information in an individ-ual image before considering another image to which simi-larity is to be computed. Instead, we propose during training to condition the em-bedding of an image on the image we want to compare it to. Rather than embedding by a simple pooling as in standard DML, we use cross-attention so that one image can iden-tify relevant features in the other image. Consequently, the attention mechanism establishes a hierarchy of conditional embeddings that gradually incorporates information about the tuple to steer the representation of an individual image. The cross-attention layers bridge the gap between the origi-nal unconditional embedding and the final similarity and al-low backpropagtion to update encodings more directly than through a lossy pooling layer. At test time we use the re-sulting improved unconditional embeddings, thus requiring no additional parameters or computational overhead. Ex-periments on established DML benchmarks show that our cross-attention conditional embedding during training im-proves the underlying standard DML pipeline significantly so that it outperforms the state-of-the-art.

MCML Authors
Link to Pingchuan Ma

Pingchuan Ma

Machine Vision & Learning

Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[463]
Y. Mansour and R. Heckel.
Zero-Shot Noise2Noise: Efficient Image Denoising without any Data.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). Vancouver, Canada, Jun 18-23, 2023. DOI.
MCML Authors
Link to Reinhard Heckel

Reinhard Heckel

Prof. Dr.

Machine Learning


[462]
D. Muhle, L. Koestler, K. M. Jatavallabhula and D. Cremers.
Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). Vancouver, Canada, Jun 18-23, 2023. DOI.
MCML Authors
Link to Dominik Muhle

Dominik Muhle

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[461]
J. Seidenschwarz, G. Braso, I. Elezi and L. Leal-Taixé.
Simple Cues Lead to a Strong Multi-Object Tracker.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). Vancouver, Canada, Jun 18-23, 2023. DOI.
MCML Authors
Link to Laura Leal-Taixé

Laura Leal-Taixé

Prof. Dr.

* Former member


[460]
S. Weber, N. Demmel, T. Chon Chan and D. Cremers.
Power Bundle Adjustment for Large-Scale 3D Reconstruction.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). Vancouver, Canada, Jun 18-23, 2023. DOI.
MCML Authors
Link to Simon Weber

Simon Weber

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[459]
F. Wimbauer, N. Yang, C. Rupprecht and D. Cremers.
Behind the Scenes: Density Fields for Single View Reconstruction.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023). Vancouver, Canada, Jun 18-23, 2023. DOI.
MCML Authors
Link to Felix Wimbauer

Felix Wimbauer

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[458]
D. Bär, N. Pröllochs and S. Feuerriegel.
Finding Qs: Profiling QAnon Supporters on Parler.
17th International AAAI Conference on Web and Social Media (ICWSM 2023). Limassol, Cyprus, Jun 05-08, 2023. DOI.
MCML Authors
Link to Dominik Bär

Dominik Bär

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[457]
Ç. Yapar, F. Jaensch, R. Levie, G. Kutyniok and G. Caire.
The First Pathloss Radio Map Prediction Challenge.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023). Rhode Island, Greece, Jun 04-10, 2023. DOI.
MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[456]
X. Zuo, N. Yang, N. Merrill, B. Xu and S. Leutenegger.
Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion.
IEEE Robotics and Automation Letters 8.6 (Jun. 2023). DOI.
MCML Authors
Link to Xingxing Zuo

Xingxing Zuo

Dr.

Machine Learning for Robotics

Link to Stefan Leutenegger

Stefan Leutenegger

Prof. Dr.

Machine Learning for Robotics


[455]
M. Rezaei, A. Vahidi, T. Elze, B. Bischl and M. Eslami.
Self-supervised Learning and Self-labeling Framework for Glaucoma Detection.
Investigative Ophthalmology and Visual Science 64.8 (Jun. 2023). URL.
MCML Authors
Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[454]
T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever and E. Hüllermeier.
Towards Green Automated Machine Learning: Status Quo and Future Directions.
Journal of Artificial Intelligence Research 77 (Jun. 2023). DOI.
MCML Authors
Link to Jonas Hanselle

Jonas Hanselle

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[453]
M. Trappmann, G.-C. Haas, S. Malich, F. Keusch, S. Bähr, F. Kreuter and S. Schwarz.
Augmenting survey data with digital trace data: Is there a threat to panel retention?.
Journal of Survey Statistics and Methodology 11.3 (Jun. 2023). DOI.
MCML Authors
Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[452]
J. Herbinger, B. Bischl and G. Casalicchio.
Decomposing Global Feature Effects Based on Feature Interactions.
Preprint at arXiv (Jun. 2023). arXiv.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[451]
K. Riedl, T. Klock, C. Geldhauser and M. Fornasier.
Gradient is All You Need?.
Preprint at arXiv (Jun. 2023). arXiv.
MCML Authors
Link to Konstantin Riedl

Konstantin Riedl

Applied Numerical Analysis

Link to Carina Geldhauser

Carina Geldhauser

Dr.

* Former member

Link to Massimo Fornasier

Massimo Fornasier

Prof. Dr.

Applied Numerical Analysis


[450]
V. Steinborn, A. Maronikolakis and H. Schütze.
Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models.
Preprint at arXiv (Jun. 2023). arXiv.
Abstract

In efforts to keep up with the rapid progress and use of large language models, gender bias research is becoming more prevalent in NLP. Non-English bias research, however, is still in its infancy with most work focusing on English. In our work, we study how grammatical gender bias relating to politeness levels manifests in Japanese and Korean language models. Linguistic studies in these languages have identified a connection between gender bias and politeness levels, however it is not yet known if language models reproduce these biases. We analyze relative prediction probabilities of the male and female grammatical genders using templates and find that informal polite speech is most indicative of the female grammatical gender, while rude and formal speech is most indicative of the male grammatical gender. Further, we find politeness levels to be an attack vector for allocational gender bias in cyberbullying detection models. Cyberbullies can evade detection through simple techniques abusing politeness levels. We introduce an attack dataset to (i) identify representational gender bias across politeness levels, (ii) demonstrate how gender biases can be abused to bypass cyberbullying detection models and (iii) show that allocational biases can be mitigated via training on our proposed dataset. Through our findings we highlight the importance of bias research moving beyond its current English-centrism.

MCML Authors
Link to Victor Steinborn

Victor Steinborn

Statistical NLP and Deep Learning

Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[449]
R. Hornung, F. Ludwigs, J. Hagenberg and A.-L. Boulesteix.
Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study.
Wiley Interdisciplinary Reviews: Computational Statistics 16 (Jun. 2023). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[448]
J. W. Grootjen, H. Weingärtner and S. Mayer.
Highlighting the Challenges of Blinks in Eye Tracking for Interactive Systems.
8th International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction (PETMEI 2023) at the ACM Symposium on Eye Tracking Research and Applications (ETRA 2023). Tübingen, Germany, May 30-Jun 02, 2023. DOI.
MCML Authors
Link to Jesse Grootjen

Jesse Grootjen

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[447]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis.
27th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2023). Osaka, Japan, May 25-28, 2023. DOI.
Abstract

While recent advances in large-scale foundational models show promising results, their application to the medical domain has not yet been explored in detail. In this paper, we progress into the realms of large-scale modeling in medical synthesis by proposing Cheff - a foundational cascaded latent diffusion model, which generates highly-realistic chest radiographs providing state-of-the-art quality on a 1-megapixel scale. We further propose MaCheX, which is a unified interface for public chest datasets and forms the largest open collection of chest X-rays up to date. With Cheff conditioned on radiological reports, we further guide the synthesis process over text prompts and unveil the research area of report-to-chest-X-ray generation.

MCML Authors
Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[446]
D. Winkel, N. Strauß, M. Schubert, Y. Ma and T. Seidl.
Constrained Portfolio Management using Action Space Decomposition for Reinforcement Learning.
27th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2023). Osaka, Japan, May 25-28, 2023. DOI.
Abstract

Financial portfolio managers typically face multi-period optimization tasks such as short-selling or investing at least a particular portion of the portfolio in a specific industry sector. A common approach to tackle these problems is to use constrained Markov decision process (CMDP) methods, which may suffer from sample inefficiency, hyperparameter tuning, and lack of guarantees for constraint violations. In this paper, we propose Action Space Decomposition Based Optimization (ADBO) for optimizing a more straightforward surrogate task that allows actions to be mapped back to the original task. We examine our method on two real-world data portfolio construction tasks. The results show that our new approach consistently outperforms state-of-the-art benchmark approaches for general CMDPs.

MCML Authors
Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[445]
V. Blaschke, H. Schütze and B. Plank.
A Survey of Corpora for Germanic Low-Resource Languages and Dialects.
24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023). Tórshavn, Faroe Islands, May 22-24, 2023. URL.
MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[444]
A. K. Wickert, C. Damke, L. Baumgärtner, E. Hüllermeier and M. Mezini.
UnGoML: Automated Classification of unsafe Usages in Go.
IEEE/ACM 20th International Conference on Mining Software Repositories (MSR 2023). Melbourne, Australia, May 15-16, 2023. FOSS (Free, Open Source Software) Impact Paper Award. DOI.
MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[443]
T. Ladner and M. Althoff.
Automatic Abstraction Refinement in Neural Network Verification Using Sensitivity Analysis.
26th ACM International Conference on Hybrid Systems: Computation and Control (HSCC 2023). San Antonio, TX, USA, May 09-12, 2023. DOI.
Abstract

The formal verification of neural networks is essential for their application in safety-critical environments. However, the set-based verification of neural networks using linear approximations often obtains overly conservative results, while nonlinear approximations quickly become computationally infeasible in deep neural networks. We address this issue for the first time by automatically balancing between precision and computation time without splitting the propagated set. Our work introduces a novel automatic abstraction refinement approach using sensitivity analysis to iteratively reduce the abstraction error at the neuron level until either the specifications are met or a maximum number of iterations is reached. Our evaluation shows that we can tightly over-approximate the output sets of deep neural networks and that our approach is up to a thousand times faster than a naive approach. We further demonstrate the applicability of our approach in closed-loop settings.

MCML Authors
Link to Tobias Ladner

Tobias Ladner

Cyber Physical Systems

Link to Matthias Althoff

Matthias Althoff

Prof. Dr.

Cyber Physical Systems


[442]
V. Ehm, D. Cremers and F. Bernard.
Non-Separable Multi-Dimensional Network Flows for Visual Computing.
Poster at the 44th Annual Conference of the European Association for Computer Graphics (EG 2023). Saarbrücken, Germany, May 08-12, 2023. DOI.
MCML Authors
Link to Viktoria Ehm

Viktoria Ehm

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[441]
V. Blaschke, H. Schütze and B. Plank.
Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages.
10th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023) at the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI.
MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[440]
X. Wang, L. Weissweiler, H. Schütze and B. Plank.
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives.
17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI.
MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[439]
A. Chronopoulou, D. Stojanovski and A. Fraser.
Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation.
6th Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023) at the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI.
MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[438]
A. Chronopoulou, M. Peters, A. Fraser and J. Dodge.
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models.
Findings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI.
Abstract

Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results. We explore various approaches for choosing which adapters to combine, such as text clustering and semantic similarity. We find that using clustering leads to the most competitive results on novel domains.

MCML Authors
Link to Alexandra Chronopoulou

Alexandra Chronopoulou

Dr.

* Former member

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[437]
D. Frauen and S. Feuerriegel.
Estimating individual treatment effects under unobserved confounding using binary instruments.
11th International Conference on Learning Representations (ICLR 2023). Kigali, Rwanda, May 01-05, 2023. URL.
Abstract

Estimating conditional average treatment effects (CATEs) from observational data is relevant in many fields such as personalized medicine. However, in practice, the treatment assignment is usually confounded by unobserved variables and thus introduces bias. A remedy to remove the bias is the use of instrumental variables (IVs). Such settings are widespread in medicine (e.g., trials where the treatment assignment is used as binary IV). In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating CATEs using binary IVs and thus yield an unbiased CATE estimator. Different from previous work for binary IVs, our framework estimates the CATE directly via a pseudo outcome regression. (1)~We provide a theoretical analysis where we show that our framework yields multiple robust convergence rates: our CATE estimator achieves fast convergence even if several nuisance estimators converge slowly. (2)~We further show that our framework asymptotically outperforms state-of-the-art plug-in IV methods for CATE estimation, in the sense that it achieves a faster rate of convergence if the CATE is smoother than the individual outcome surfaces. (3)~We build upon our theoretical results and propose a tailored deep neural network architecture called MRIV-Net for CATE estimation using binary IVs. Across various computational experiments, we demonstrate empirically that our MRIV-Net achieves state-of-the-art performance. To the best of our knowledge, our MRIV is the first multiply robust machine learning framework tailored to estimating CATEs in the binary IV setting.

MCML Authors
Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[436]
R. Paolino, A. Bojchevski, S. Günnemann, G. Kutyniok and R. Levie.
Unveiling the Sampling Density in Non-Uniform Geometric Graphs.
11th International Conference on Learning Representations (ICLR 2023). Kigali, Rwanda, May 01-05, 2023. URL.
Abstract

A powerful framework for studying graphs is to consider them as geometric graphs: nodes are randomly sampled from an underlying metric space, and any pair of nodes is connected if their distance is less than a specified neighborhood radius. Currently, the literature mostly focuses on uniform sampling and constant neighborhood radius. However, real-world graphs are likely to be better represented by a model in which the sampling density and the neighborhood radius can both vary over the latent space. For instance, in a social network communities can be modeled as densely sampled areas, and hubs as nodes with larger neighborhood radius. In this work, we first perform a rigorous mathematical analysis of this (more general) class of models, including derivations of the resulting graph shift operators. The key insight is that graph shift operators should be corrected in order to avoid potential distortions introduced by the non-uniform sampling. Then, we develop methods to estimate the unknown sampling density in a self-supervised fashion. Finally, we present exemplary applications in which the learnt density is used to 1) correct the graph shift operator and improve performance on a variety of tasks, 2) improve pooling, and 3) extract knowledge from networks. Our experimental findings support our theory and provide strong evidence for our model.

MCML Authors
Link to Raffaele Paolino

Raffaele Paolino

Mathematical Foundations of Artificial Intelligence

Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[435]
T. Pielok, B. Bischl and D. Rügamer.
Approximate Bayesian Inference with Stein Functional Variational Gradient Descent.
11th International Conference on Learning Representations (ICLR 2023). Kigali, Rwanda, May 01-05, 2023. URL.
MCML Authors
Link to Tobias Pielok

Tobias Pielok

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[434]
H. Huang, J. Qiu and K. Riedl.
On the global convergence of particle swarm optimization methods.
Applied Mathematics and Optimization 88.2 (May. 2023). DOI.
MCML Authors
Link to Konstantin Riedl

Konstantin Riedl

Applied Numerical Analysis


[433]
K. Rath, D. Rügamer, B. Bischl, U. von Toussaint and C. Albert.
Dependent state space Student-t processes for imputation and data augmentation in plasma diagnostics.
Contributions to Plasma Physics 63.5-6 (May. 2023). DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[432]
Z. Liu, Y. Ma, M. Schubert, Y. Ouyang, W. Rong and Z. Xiong.
Multimodal Contrastive Transformer for Explainable Recommendation.
IEEE Transactions on Computational Social Systems (May. 2023). DOI.
MCML Authors
Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[431]
N. Banholzer, T. Mellan, H. J. T. Unwin, S. Feuerriegel, S. Mishra and S. Bhatt.
A comparison of short-term probabilistic forecasts for the incidence of COVID-19 using mechanistic and statistical time series models.
Preprint at arXiv (May. 2023). arXiv.
Abstract

Short-term forecasts of infectious disease spread are a critical component in risk evaluation and public health decision making. While different models for short-term forecasting have been developed, open questions about their relative performance remain. Here, we compare short-term probabilistic forecasts of popular mechanistic models based on the renewal equation with forecasts of statistical time series models. Our empirical comparison is based on data of the daily incidence of COVID-19 across six large US states over the first pandemic year. We find that, on average, probabilistic forecasts from statistical time series models are overall at least as accurate as forecasts from mechanistic models. Moreover, statistical time series models better capture volatility. Our findings suggest that domain knowledge, which is integrated into mechanistic models by making assumptions about disease dynamics, does not improve short-term forecasts of disease incidence. We note, however, that forecasting is often only one of many objectives and thus mechanistic models remain important, for example, to model the impact of vaccines or the emergence of new variants.

MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[430]
H.-H. Chou, H. Rauhut and R. Ward.
Robust implicit regularization via weight normalization.
Preprint at arXiv (May. 2023). arXiv.
MCML Authors
Link to Hung-Hsu Chou

Hung-Hsu Chou

Dr.

Optimization & Data Analysis

Link to Holger Rauhut

Holger Rauhut

Prof. Dr.

Mathematical Data Science and Artificial Intelligence


[429]
H. N. Dang, V. Golkov, T. Wimmer, D. Cremers, A. Maier and M. Zaiss.
Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution.
Preprint at arXiv (May. 2023). arXiv.
MCML Authors
Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[428]
T. Hannan, R. Koner, M. Bernhard, S. Shit, B. Menze, V. Tresp, M. Schubert and T. Seidl.
GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation.
Preprint at arXiv (May. 2023). arXiv.
MCML Authors
Link to Tanveer Hannan

Tanveer Hannan

Database Systems & Data Mining

Link to Rajat Koner

Rajat Koner

Database Systems & Data Mining

Link to Maximilian Bernhard

Maximilian Bernhard

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[427]
Y. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response.
Preprint at arXiv (May. 2023). arXiv.
Abstract

LLMs (large language models) such as ChatGPT have shown remarkable language understanding and generation capabilities. Although reference-free evaluators based on LLMs show better human alignment than traditional reference-based evaluators, there are many challenges in using reference-free evaluators based on LLMs. Reference-free evaluators are more suitable for open-ended examples with different semantics responses. But not all examples are open-ended. For closed-ended examples with unique correct semantic response, reference-free evaluators will still consider it high quality when giving a response that is inconsistent with the facts and the semantic of reference. In order to comprehensively evaluate the reliability of evaluators based on LLMs, we construct two adversarial meta-evaluation dialogue generation datasets KdConv-ADV and DSTC7-ADV based on KdConv and DSTC7-AVSD, respectively. Compared to previous meta-evaluation benchmarks, KdConv-ADV and DSTC7-ADV are much more challenging since they requires evaluators to be able to reasonably evaluate closed-ended examples with the help of external knowledge or even its own knowledge. Empirical results show that the ability of LLMs to identify unreasonable responses is insufficient. There are risks in using eference-free evaluators based on LLMs to evaluate the quality of dialogue responses.

MCML Authors
Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[426]
A. Modarressi, A. Imani, M. Fayyaz and H. Schütze.
RET-LLM: Towards a General Read-Write Memory for Large Language Models.
Preprint at arXiv (May. 2023). arXiv.
MCML Authors
Link to Ali Modarressi

Ali Modarressi

Statistical NLP and Deep Learning

Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[425]
H. Ye, Y. Liu and H. Schütze.
A study of conceptual language similarity: comparison and evaluation.
Preprint at arXiv (May. 2023). arXiv.
Abstract

An interesting line of research in natural language processing (NLP) aims to incorporate linguistic typology to bridge linguistic diversity and assist the research of low-resource languages. While most works construct linguistic similarity measures based on lexical or typological features, such as word order and verbal inflection, recent work has introduced a novel approach to defining language similarity based on how they represent basic concepts, which is complementary to existing similarity measures. In this work, we study the conceptual similarity in detail and evaluate it extensively on a binary classification task.

MCML Authors
Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[424]
D. Bär, F. Calderon, M. Lawlor, S. Licklederer, M. Totzauer and S. Feuerriegel.
Analyzing Social Media Activities at Bellingcat.
15th ACM Web Science Conference 2023 (WebSci 2023). Austin, TX, USA, Apr 30-May 01, 2023. DOI.
Abstract

Open-source journalism emerged as a new phenomenon in the media ecosystem, which uses crowdsourcing to fact-check and generate investigative reports for world events using open sources (e.g., social media). A particularly prominent example is Bellingcat. Bellingcat is known for its investigations on the illegal use of chemical weapons during the Syrian war, the Russian responsibility for downing flight MH17, the identification of the perpetrators in the attempted murder of Alexei Navalny, and war crimes in the Russo-Ukraine war. Crucial for this is social media in order to disseminate findings and crowdsource fact-checks. In this work, we characterize the social media activities at Bellingcat on Twitter. For this, we built a comprehensive dataset of all N=24,682 tweets posted by Bellingcat on Twitter since its inception in July 2014. Our analysis is three-fold: (1) We analyze how Bellingcat uses Twitter to disseminate information and collect information from its follower base. Here, we find a steady increase in both posts and replies over time, particularly during the Russo-Ukrainian war, which is in line with the growing importance of Bellingcat for the traditional media ecosystem. (2) We identify characteristics of posts that are successful in eliciting user engagement. User engagement is particularly large for posts embedding additional media items and with a more negative sentiment. (3) We examine how the follower base has responded to the Russian invasion of Ukraine. Here, we find that the sentiment has become more polarized and negative. We attribute this to a ~13-fold increase in bots interacting with the Bellingcat account. Overall, our findings provide recommendations for how open-source journalism such as Bellingcat can successfully operate on social media.

MCML Authors
Link to Dominik Bär

Dominik Bär

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[423]
L. G. M. Bauer, C. Leiber, C. Böhm and C. Plant.
Extension of the Dip-test Repertoire - Efficient and Differentiable p-value Calculation for Clustering.
SIAM International Conference on Data Mining (SDM 2023). Minneapolis, MN, USA, Apr 27-29, 2023. DOI.
Abstract

Over the last decade, the Dip-test of unimodality has gained increasing interest in the data mining community as it is a parameter-free statistical test that reliably rates the modality in one-dimensional samples. It returns a so called Dip-value and a corresponding probability for the sample’s unimodality (Dip-p-value). These two values share a sigmoidal relationship. However, the specific transformation is dependent on the sample size. Many Dip-based clustering algorithms use bootstrapped look-up tables translating Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a specifically designed sigmoid function as a substitute for these state-of-the-art look-up tables. This accelerates computation and provides an approximation of the Dip- to Dip-p-value transformation for every single sample size. Further, it is differentiable and can therefore easily be integrated in learning schemes using gradient descent. We showcase this by exploiting our function in a novel subspace clustering algorithm called Dip’n’Sub. We highlight in extensive experiments the various benefits of our proposal.

MCML Authors
Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining

Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[422]
E. Dorigatti, B. Schubert, B. Bischl and D. Rügamer.
Frequentist Uncertainty Quantification in Semi-Structured Neural Networks.
26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023). Valencia, Spain, Apr 25-27, 2023. URL.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[421]
G. Keropyan, D. Strieder and M. Drton.
Rank-Based Causal Discovery for Post-Nonlinear Models.
26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023). Valencia, Spain, Apr 25-27, 2023. URL.
MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[420]
C. Luther, G. König and M. Grosse-Wentrup.
Efficient SAGE Estimation via Causal Structure Learning.
26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023). Valencia, Spain, Apr 25-27, 2023. URL.
Abstract

The Shapley Additive Global Importance (SAGE) value is a theoretically appealing interpretability method that fairly attributes global importance to a model’s features. However, its exact calculation requires the computation of the feature’s surplus performance contributions over an exponential number of feature sets. This is computationally expensive, particularly because estimating the surplus contributions requires sampling from conditional distributions. Thus, SAGE approximation algorithms only take a fraction of the feature sets into account. We propose $d$-SAGE, a method that accelerates SAGE approximation. $d$-SAGE is motivated by the observation that conditional independencies (CIs) between a feature and the model target imply zero surplus contributions, such that their computation can be skipped. To identify CIs, we leverage causal structure learning (CSL) to infer a graph that encodes (conditional) independencies in the data as $d$-separations. This is computationally more efficient because the expense of the one-time graph inference and the $d$-separation queries is negligible compared to the expense of surplus contribution evaluations. Empirically we demonstrate that $d$-SAGE enables the efficient and accurate estimation of SAGE values.

MCML Authors
Link to Moritz Grosse-Wentrup

Moritz Grosse-Wentrup

Prof. Dr.

* Former member


[419]
N. Pröllochs and S. Feuerriegel.
Mechanisms of True and False Rumor Sharing in Social Media: Collective Intelligence or Herd Behavior?.
Conference on Human Factors in Computing Systems (CHI 2023). Hamburg, Germany, Apr 23-28, 2023. DOI.
MCML Authors
Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[418]
M. Rusu and S. Mayer.
Deep Learning Super-Resolution Network Facilitating Fiducial Tangibles on Capacitive Touchscreens.
Conference on Human Factors in Computing Systems (CHI 2023). Hamburg, Germany, Apr 23-28, 2023. DOI.
Abstract

Over the last few years, we have seen many approaches using tangibles to address the limited expressiveness of touchscreens. Mainstream tangible detection uses fiducial markers embedded in the tangibles. However, the coarse sensor size of capacitive touchscreens makes tangibles bulky, limiting their usefulness. We propose a novel deep-learning super-resolution network to facilitate fiducial tangibles on capacitive touchscreens better. In detail, our network super-resolves the markers enabling off-the-shelf detection algorithms to track tangibles reliably. Our network generalizes to unseen marker sets, such as AprilTag, ArUco, and ARToolKit. Therefore, we are not limited to a fixed number of distinguishable objects and do not require data collection and network training for new fiducial markers. With extensive evaluation, including real-world users and five showcases, we demonstrate the applicability of our open-source approach on commodity mobile devices and further highlight the potential of tangibles on capacitive touchscreens.

MCML Authors
Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[417]
M. Windl, A. Schmidt and S. S. Feger.
Investigating Tangible Privacy-Preserving Mechanisms for Future Smart Homes.
Conference on Human Factors in Computing Systems (CHI 2023). Hamburg, Germany, Apr 23-28, 2023. DOI.
MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media


[416]
M. Windl, V. Winterhalter, A. Schmidt and S. Mayer.
Understanding and Mitigating Technology-Facilitated Privacy Violations in the Physical World.
Conference on Human Factors in Computing Systems (CHI 2023). Hamburg, Germany, Apr 23-28, 2023. DOI.
MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[415]
M. Feurer, K. Eggensperger, E. Bergman, F. Pfisterer, B. Bischl and F. Hutter.
Mind the Gap: Measuring Generalization Performance Across Multiple Objectives.
21st International Symposium on Intelligent Data Analysis (IDA 2023). Louvain-la-Neuve, Belgium, Apr 12-14, 2023. DOI.
MCML Authors
Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[414]
D. Schubert, P. Gupta and M. Wever.
Meta-learning for Automated Selection of Anomaly Detectors for Semi-supervised Datasets.
21st International Symposium on Intelligent Data Analysis (IDA 2023). Louvain-la-Neuve, Belgium, Apr 12-14, 2023. DOI.
MCML Authors

[413]
D. Schalk, B. Bischl and D. Rügamer.
Accelerated Componentwise Gradient Boosting Using Efficient Data Representation and Momentum-Based Optimization.
Journal of Computational and Graphical Statistics 32.2 (Apr. 2023). DOI.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[412]
M. K. Belaid, D. E. Mekki, M. Rabus and E. Hüllermeier.
Optimizing Data Shapley Interaction Calculation from $O(2^n)$ to $O(t n^2)$ for KNN models.
Preprint at arXiv (Apr. 2023). arXiv.
Abstract

With the rapid growth of data availability and usage, quantifying the added value of each training data point has become a crucial process in the field of artificial intelligence. The Shapley values have been recognized as an effective method for data valuation, enabling efficient training set summarization, acquisition, and outlier removal. In this paper, we introduce ‘STI-KNN’, an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in $O(t n^2)$ time, which is a significant improvement over the $O(2^n)$ time complexity of baseline methods. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.

MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[411]
S. Dandl, A. Hofheinz, M. Binder, B. Bischl and G. Casalicchio.
counterfactuals: An R Package for Counterfactual Explanation Methods.
Preprint at arXiv (Apr. 2023). arXiv.
MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[410]
J. Maly and R. Saab.
A simple approach for quantizing neural networks.
Preprint at arXiv (Apr. 2023). arXiv.
Abstract

In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving the network performance on given training data. On one hand, the computational complexity of this pre-processing slightly exceeds that of state-of-the-art algorithms in the literature. On the other hand, our approach does not require any hyper-parameter tuning and, in contrast to previous methods, allows a plain analysis. We provide rigorous theoretical guarantees in the case of quantizing single network layers and show that the relative error decays with the number of parameters in the network if the training data behaves well, e.g., if it is sampled from suitable random distributions. The developed method also readily allows the quantization of deep networks by consecutive application to single layers.

MCML Authors
Link to Johannes Maly

Johannes Maly

Prof. Dr.

Associate

Mathematical Data Science and Artificial Intelligence


[409]
T. Wimmer, V. Golkov, H. N. Dang, M. Zaiss, A. Maier and D. Cremers.
Scale-Equivariant Deep Learning for 3D Data.
Preprint at arXiv (Apr. 2023). arXiv.
MCML Authors
Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[408]
Y. Yeganeh, A. Farshad, G. Guevercin, A. Abu-zer, R. Xiao, Y. Tang, E. Adeli and N. Navab.
SCOPE: Structural Continuity Preservation for Medical Image Segmentation.
Preprint at arXiv (Apr. 2023). arXiv.
Abstract

Although the preservation of shape continuity and physiological anatomy is a natural assumption in the segmentation of medical images, it is often neglected by deep learning methods that mostly aim for the statistical modeling of input data as pixels rather than interconnected structures. In biological structures, however, organs are not separate entities; for example, in reality, a severed vessel is an indication of an underlying problem, but traditional segmentation models are not designed to strictly enforce the continuity of anatomy, potentially leading to inaccurate medical diagnoses. To address this issue, we propose a graph-based approach that enforces the continuity and connectivity of anatomical topology in medical images. Our method encodes the continuity of shapes as a graph constraint, ensuring that the network’s predictions maintain this continuity. We evaluate our method on two public benchmarks on retinal vessel segmentation, showing significant improvements in connectivity metrics compared to traditional methods while getting better or on-par performance on segmentation metrics.

MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[407]
Y. Yeganeh, A. Farshad, P. Weinberger, S.-A. Ahmadi, E. Adeli and N. Navab.
DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation.
Preprint at arXiv (Apr. 2023). arXiv.
Abstract

Although purely transformer-based architectures showed promising performance in many computer vision tasks, many hybrid models consisting of CNN and transformer blocks are introduced to fit more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to CNNs in medical imaging segmentation, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose simple architectures based on purely convolutional layers, and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network (e.g., DINO) one can outperform complex transformer-based networks with much less computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model (as multiple channels) in the other branch. The results of our experiments on two publicly available medical imaging datasets show that the proposed pipeline outperforms U-Net and the state-of-the-art medical image segmentation models.

MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[406]
M. Drton, H. Shi and D. Strieder.
Discussion of “A note on universal inference” by Timmy Tse and Anthony Davison.
Stat 12.1 (Apr. 2023). DOI.
MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[405]
T. Tornede, A. Tornede, L. Fehring, L. Gehring, H. Graf, J. Hanselle, F. Mohr and M. Wever.
PyExperimenter: Easily distribute experiments and track results.
The Journal of Open Source Software 8.86 (Apr. 2023). DOI.
MCML Authors
Link to Jonas Hanselle

Jonas Hanselle

Artificial Intelligence & Machine Learning


[404]
M. Herrmann, F. Pfisterer and F. Scheipl.
A geometric framework for outlier detection in high-dimensional data.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery e1491 (Apr. 2023). DOI.
Abstract

Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework which exploits the metric structure of a data set. Our approach rests on the manifold assumption, that is, that the observed, nominally high-dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with manifold learning methods. We show that exploiting this structure significantly improves the detection of outlying observations in high dimensional data. We also suggest a novel, mathematically precise and widely applicable distinction between distributional and structural outliers based on the geometry and topology of the data manifold that clarifies conceptual ambiguities prevalent throughout the literature. Our experiments focus on functional data as one class of structured high-dimensional data, but the framework we propose is completely general and we include image and graph data applications. Our results show that the outlier structure of high-dimensional and non-tabular data can be detected and visualized using manifold learning methods and quantified using standard outlier scoring methods applied to the manifold embedding vectors.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[403]
L. Bothmann and K. Peters.
Fairness von KI – ein Brückenschlag zwischen Philosophie und Maschinellem Lernen.
Grenzen Künstlicher Intelligenz. Munich, Germany, Mar 29-31, 2023.
MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[402]
L. Weissweiler, T. He, N. Otani, D. R. Mortensen, L. Levin and H. Schütze.
Construction Grammar Provides Unique Insight into Neural Language Models.
Georgetown University Round Table on Linguistics (GURT 2023). Washington D.C., USA, Mar 09-12, 2023. URL.
Abstract

Construction Grammar (CxG) has recently been used as the basis for probing studies that have investigated the performance of large pre- trained language models (PLMs) with respect to the structure and meaning of constructions. In this position paper, we make suggestions for the continuation and augmentation of this line of research. We look at probing methodology that was not designed with CxG in mind, as well as probing methodology that was designed for specific constructions. We analyse selected previous work in detail, and provide our view of the most important challenges and research questions that this promising new field faces.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[401]
J. Moosbauer, G. Casalicchio, M. Lindauer and B. Bischl.
Improving Accuracy of Interpretability Measures in Hyperparameter Optimization via Bayesian Algorithm Execution.
Workshop on Configuration and Selection of Algorithms (COSEAL 2023). Paris, France, Mar 06-08, 2023. arXiv.
Abstract

Despite all the benefits of automated hyperparameter optimization (HPO), most modern HPO algorithms are black-boxes themselves. This makes it difficult to understand the decision process which leads to the selected configuration, reduces trust in HPO, and thus hinders its broad adoption. Here, we study the combination of HPO with interpretable machine learning (IML) methods such as partial dependence plots. These techniques are more and more used to explain the marginal effect of hyperparameters on the black-box cost function or to quantify the importance of hyperparameters. However, if such methods are naively applied to the experimental data of the HPO process in a post-hoc manner, the underlying sampling bias of the optimizer can distort interpretations. We propose a modified HPO method which efficiently balances the search for the global optimum w.r.t. predictive performance and the reliable estimation of IML explanations of an underlying black-box function by coupling Bayesian optimization and Bayesian Algorithm Execution. On benchmark cases of both synthetic objectives and HPO of a neural network, we demonstrate that our method returns more reliable explanations of the underlying black-box without a loss of optimization performance.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[400]
T. Ullmann, A. Beer, M. Hünemörder, T. Seidl and A.-L. Boulesteix.
Over-optimistic evaluation and reporting of novel cluster algorithms: An illustrative study.
Advances in Data Analysis and Classification 17 (Mar. 2023). DOI.
Abstract

When researchers publish new cluster algorithms, they usually demonstrate the strengths of their novel approaches by comparing the algorithms’ performance with existing competitors. However, such studies are likely to be optimistically biased towards the new algorithms, as the authors have a vested interest in presenting their method as favorably as possible in order to increase their chances of getting published. Therefore, the superior performance of newly introduced cluster algorithms is over-optimistic and might not be confirmed in independent benchmark studies performed by neutral and unbiased authors. This problem is known among many researchers, but so far, the different mechanisms leading to over-optimism in cluster algorithm evaluation have never been systematically studied and discussed. Researchers are thus often not aware of the full extent of the problem. We present an illustrative study to illuminate the mechanisms by which authors—consciously or unconsciously—paint their cluster algorithm’s performance in an over-optimistic light. Using the recently published cluster algorithm Rock as an example, we demonstrate how optimization of the used datasets or data characteristics, of the algorithm’s parameters and of the choice of the competing cluster algorithms leads to Rock’s performance appearing better than it actually is. Our study is thus a cautionary tale that illustrates how easy it can be for researchers to claim apparent ‘superiority’ of a new cluster algorithm. This illuminates the vital importance of strategies for avoiding the problems of over-optimism (such as, e.g., neutral benchmark studies), which we also discuss in the article.

MCML Authors
Theresa Ullmann

Theresa Ullmann

Dr.

Biometry in Molecular Medicine

Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[399]
Q. Khan, I. Sülö, M. Öcal and D. Cremers.
Learning vision based autonomous lateral vehicle control without supervision.
Applied Intelligence 53 (Mar. 2023). DOI. GitHub.
Abstract

Supervised deep learning methods using image data as input have shown promising results in the context of vehicle control. However, these supervised methods have two main disadvantages: 1) They require a copious amount of labeled training data, which is difficult and expensive to collect. 2) Such models do not perform well, when situations that are not in the distribution of the training set are encountered. This includes deviations from the designated driving behavior. We therefore provide a framework to mitigate these problems from merely an unlabeled sequence of images. Visual Odometry is first used to determine the vehicle trajectory. Model Predictive Control (MPC) then uses this trajectory to implicitly infer the steering labels. Meanwhile, synthesized images at deviated trajectories are included in the training distribution for enhanced robustness of the neural network model. Experimental results demonstrate that the performance of our network is at par with methods requiring additional data collection or supervision.

MCML Authors
Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[398]
P. Heid.
A short note on an adaptive damped Newton method for strongly monotone and Lipschitz continuous operator equations.
Archiv der Mathematik (Mar. 2023). URL.
MCML Authors
Link to Pascal Heid

Pascal Heid

Dr.

Applied Numerical Analysis


[397]
C. Nießl, S. Hoffmann, T. Ullmann and A.-L. Boulesteix.
Explaining the optimistic performance evaluation of newly proposed methods: A cross-design validation experiment.
Biometrical Journal (Mar. 2023). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[396]
A. Scagliotti.
Optimal control of ensembles of dynamical systems.
ESAIM - Control, Optimisation and Calculus of Variations 29.22 (Mar. 2023). DOI.
Abstract

In this paper we consider the problem of the optimal control of an ensemble of affine-control systems. After proving the well-posedness of the minimization problem under examination, we establish a $Gamma$-convergence result that allows us to substitute the original (and usually infinite) ensemble with a sequence of finite increasing-in-size sub-ensembles. The solutions of the optimal control problems involving these sub-ensembles provide approximations in the $L^2$-strong topology of the minimizers of the original problem. Using again a $Gamma$-convergence argument, we manage to derive a Maximum Principle for ensemble optimal control problems with end-point cost. Moreover, in the case of finite sub-ensembles, we can address the minimization of the related cost through numerical schemes. In particular, we propose an algorithm that consists of a subspace projection of the gradient field induced on the space of admissible controls by the approximating cost functional. In addition, we consider an iterative method based on the Pontryagin Maximum Principle. Finally, we test the algorithms on an ensemble of linear systems in mathbb{R^2}.

MCML Authors
Link to Alessandro Scagliotti

Alessandro Scagliotti

Applied Numerical Analysis


[395]
S. Klenk, L. Koestler, D. Scaramuzza and D. Cremers.
E-NeRF: Neural Radiance Fields from a Moving Event Camera.
IEEE Robotics and Automation Letters 8.3 (Mar. 2023). DOI.
MCML Authors
Link to Simon Klenk

Simon Klenk

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[394]
D. S. Fischer, A. C. Schaar and F. J. Theis.
Modeling intercellular communication in tissues using spatial graphs of cell.
Nature Biotechnology 41 (Mar. 2023). DOI.
MCML Authors
Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems


[393]
L. Heumos, A. C. Schaar, C. Lance, A. Litinetskaya, F. Drost, L. Zappia, M. D. Lücken, D. C. Strobl, J. Henao, F. Curion, S.-c. Best Practices Consortium, H. B. Schiller and F. J. Theis.
Best practices for single-cell analysis across modalities.
Nature Reviews Genetics 24 (Mar. 2023). DOI.
MCML Authors
Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems


[392]
J. Kostin, F. Krahmer and D. Stöger.
How robust is randomized blind deconvolution via nuclear norm minimization against adversarial noise?.
Preprint at arXiv (Mar. 2023). arXiv.
MCML Authors
Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis


[391]
B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A.-L. Boulesteix, D. Deng and M. Lindauer.
Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13.2 (Mar. 2023). DOI.
Abstract

Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Tobias Pielok

Tobias Pielok

Statistical Learning & Data Science

Theresa Ullmann

Theresa Ullmann

Dr.

Biometry in Molecular Medicine

Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[390]
J. Brandt, E. Schede, B. Haddenhorst, V. Bengs, E. Hüllermeier and K. Tierney.
AC-Band: A Combinatorial Bandit-Based Approach to Algorithm Configuration.
37th Conference on Artificial Intelligence (AAAI 2023). Washington, DC, USA, Feb 07-14, 2023. DOI.
Abstract

We study the algorithm configuration (AC) problem, in which one seeks to find an optimal parameter configuration of a given target algorithm in an automated way. Although this field of research has experienced much progress recently regarding approaches satisfying strong theoretical guarantees, there is still a gap between the practical performance of these approaches and the heuristic state-of-the-art approaches. Recently, there has been significant progress in designing AC approaches that satisfy strong theoretical guarantees. However, a significant gap still remains between the practical performance of these approaches and state-of-the-art heuristic methods. To this end, we introduce AC-Band, a general approach for the AC problem based on multi-armed bandits that provides theoretical guarantees while exhibiting strong practical performance. We show that AC-Band requires significantly less computation time than other AC approaches providing theoretical guarantees while still yielding high-quality configurations.

MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[389]
D. Frauen, T. Hatt, V. Melnychuk and S. Feuerriegel.
Estimating Average Causal Effects from Patient Trajectories.
37th Conference on Artificial Intelligence (AAAI 2023). Washington, DC, USA, Feb 07-14, 2023. DOI.
Abstract

In medical practice, treatments are selected based on the ex- pected causal effects on patient outcomes. Here, the gold standard for estimating causal effects are randomized con- trolled trials; however, such trials are costly and sometimes even unethical. Instead, medical practice is increasingly interested in estimating causal effects among patient (sub)groups from electronic health records, that is, observational data. In this paper, we aim at estimating the average causal effect (ACE) from observational data (patient trajectories) that are collected over time. For this, we propose DeepACE: an end-to-end deep learning model. DeepACE leverages the iterative G-computation formula to adjust for the bias induced by time-varying confounders. Moreover, we develop a novel sequential targeting procedure which ensures that DeepACE has favorable theoretical properties, i. e., is doubly robust and asymptotically efficient. To the best of our knowledge, this is the first work that proposes an end-to-end deep learning model tailored for estimating time-varying ACEs. We com- pare DeepACE in an extensive number of experiments, confirming that it achieves state-of-the-art performance. We further provide a case study for patients suffering from low back pain to demonstrate that DeepACE generates important and meaningful findings for clinical practice. Our work enables practitioners to develop effective treatment recommendations based on population effects.

MCML Authors
Link to Dennis Frauen

Dennis Frauen

Artificial Intelligence in Management

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[388]
R. Koner, T. Hannan, S. Shit, S. Sharifzadeh, M. Schubert, T. Seidl and V. Tresp.
InstanceFormer: An Online Video Instance Segmentation Framework.
37th Conference on Artificial Intelligence (AAAI 2023). Washington, DC, USA, Feb 07-14, 2023. DOI. GitHub.
Abstract

Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transformer-based efficient online VIS framework named InstanceFormer, which is especially suitable for long and challenging videos. We propose three novel components to model short-term and long-term dependency and temporal coherence. First, we propagate the representation, location, and semantic information of prior instances to model short-term changes. Second, we propose a novel memory cross-attention in the decoder, which allows the network to look into earlier instances within a certain temporal window. Finally, we employ a temporal contrastive loss to impose coherence in the representation of an instance across all frames. Memory attention and temporal coherence are particularly beneficial to long-range dependency modeling, including challenging scenarios like occlusion. The proposed InstanceFormer outperforms previous online benchmark methods by a large margin across multiple datasets. Most importantly, InstanceFormer surpasses offline approaches for challenging and long datasets such as YouTube-VIS-2021 and OVIS.

MCML Authors
Link to Rajat Koner

Rajat Koner

Database Systems & Data Mining

Link to Tanveer Hannan

Tanveer Hannan

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[387]
G. König, T. Freiesleben and M. Grosse-Wentrup.
Improvement-focused causal recourse (ICR).
37th Conference on Artificial Intelligence (AAAI 2023). Washington, DC, USA, Feb 07-14, 2023. DOI.
Abstract

Algorithmic recourse recommendations, such as Karimi et al.’s (2021) causal recourse (CR), inform stakeholders of how to act to revert unfavorable decisions. However, there are ac- tions that lead to acceptance (i.e., revert the model’s deci- sion) but do not lead to improvement (i.e., may not revert the underlying real-world state). To recommend such actions is to recommend fooling the predictor. We introduce a novel method, Improvement-Focused Causal Recourse (ICR), which involves a conceptual shift: Firstly, we require ICR recommen- dations to guide toward improvement. Secondly, we do not tailor the recommendations to be accepted by a specific predic- tor. Instead, we leverage causal knowledge to design decision systems that predict accurately pre- and post-recourse. As a result, improvement guarantees translate into acceptance guar- antees. We demonstrate that given correct causal knowledge ICR, in contrast to existing approaches, guides toward both acceptance and improvement.

MCML Authors
Link to Moritz Grosse-Wentrup

Moritz Grosse-Wentrup

Prof. Dr.

* Former member


[386]
D. Rügamer, C. Kolb and N. Klein.
Semi-Structured Distributional Regression.
American Statistician (Feb. 2023). DOI.
Abstract

Combining additive models and neural networks allows to broaden the scope of statistical regression and extends deep learning-based approaches by interpretable structured additive predictors at the same time. Existing approaches uniting the two modeling approaches are, however, limited to very specific combinations and, more importantly, involve an identifiability issue. As a consequence, interpretability and stable estimation is typically lost. We propose a general framework to combine structured regression models and deep neural networks into a unifying network architecture. To overcome the inherent identifiability issues between different model parts, we construct an orthogonalization cell that projects the deep neural network into the orthogonal complement of the statistical model predictor. This enables proper estimation of structured model parts and thereby interpretability. We demonstrate the framework’s efficacy in numerical experiments and illustrate its special merits in benchmarks and real-world applications.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Chris Kolb

Chris Kolb

Statistical Learning & Data Science


[385]
S. Schallmoser, T. Zueger, M. Kraus, M. Saar-Tsechansky, C. Stettler and S. Feuerriegel.
Machine Learning for Predicting Micro- and Macrovascular Complications in Individuals With Prediabetes or Diabetes: Retrospective Cohort Study.
Journal of Medical Internet Research 25 (Feb. 2023). DOI.
MCML Authors
Link to Simon Schallmoser

Simon Schallmoser

Artificial Intelligence in Management

Link to Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management


[384]
J. Brandt, M. Wever, D. Iliadis, V. Bengs and E. Hüllermeier.
Iterative Deepening Hyperband.
Preprint at arXiv (Feb. 2023). arXiv.
Abstract

Hyperparameter optimization (HPO) is concerned with the automated search for the most appropriate hyperparameter configuration (HPC) of a parameterized machine learning algorithm. A state-of-the-art HPO method is Hyperband, which, however, has its own parameters that influence its performance. One of these parameters, the maximal budget, is especially problematic: If chosen too small, the budget needs to be increased in hindsight and, as Hyperband is not incremental by design, the entire algorithm must be re-run. This is not only costly but also comes with a loss of valuable knowledge already accumulated. In this paper, we propose incremental variants of Hyperband that eliminate these drawbacks, and show that these variants satisfy theoretical guarantees qualitatively similar to those for the original Hyperband with the ‘right’ budget. Moreover, we demonstrate their practical utility in experiments with benchmark data sets.

MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[383]
D. Rügamer, P. Baumann, T. Kneib and T. Hothorn.
Probabilistic Time Series Forecasts with Autoregressive Transformation Models.
Statistics and Computing 33.2 (Feb. 2023). URL.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[382]
D. Schalk, V. Hoffmann, B. Bischl and U. Mansmann.
dsBinVal: Conducting distributed ROC analysis using DataSHIELD.
The Journal of Open Source Software 8.82 (Feb. 2023). DOI.
Abstract

Our R (R Core Team, 2021) package dsBinVal implements the methodology explained by Schalk et al. (2022). It extends the ROC-GLM (Pepe, 2000) to distributed data by using techniques of differential privacy (Dwork et al., 2006) and the idea of sharing highly aggregated values only. The package also exports functionality to calculate distributed calibration curves and assess the calibration. Using the package allows us to evaluate a prognostic model based on a binary outcome using the DataSHIELD (Gaye et al., 2014) framework. Therefore, the main functionality makes it able to 1) compute the receiver operating characteristic (ROC) curve using the ROC-GLM from which 2) the area under the curve (AUC) and confidence intervals (CI) are derived to conduct hypothesis testing according to DeLong et al. (1988). Furthermore, 3) the calibration can be assessed distributively via calibration curves and the Brier score. Visualizing the approximated ROC curve, the AUC with confidence intervals, and the calibration curves using ggplot2 is also supported. Examples can be found in the README file of the repository.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[381]
L. Sang, B. Häfner, X. Zuo and D. Cremers.
High-Quality RGB-D Reconstruction via Multi-View Uncalibrated Photometric Stereo and Gradient-SDF.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2023). Waikoloa, Hawaii, Jan 03-07, 2023. DOI.
MCML Authors
Link to Björn Häfner

Björn Häfner

Computer Vision & Artificial Intelligence

Link to Xingxing Zuo

Xingxing Zuo

Dr.

Machine Learning for Robotics

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[380]
C. Molnar, G. König, B. Bischl and G. Casalicchio.
Model-agnostic feature importance and effects with dependent features: a conditional subgroup approach.
Data Mining and Knowledge Discovery (Jan. 2023). DOI.
Abstract

The interpretation of feature importance in machine learning models is challenging when features are dependent. Permutation feature importance (PFI) ignores such dependencies, which can cause misleading interpretations due to extrapolation. A possible remedy is more advanced conditional PFI approaches that enable the assessment of feature importance conditional on all other features. Due to this shift in perspective and in order to enable correct interpretations, it is beneficial if the conditioning is transparent and comprehensible. In this paper, we propose a new sampling mechanism for the conditional distribution based on permutations in conditional subgroups. As these subgroups are constructed using tree-based methods such as transformation trees, the conditioning becomes inherently interpretable. This not only provides a simple and effective estimator of conditional PFI, but also local PFI estimates within the subgroups. In addition, we apply the conditional subgroups approach to partial dependence plots, a popular method for describing feature effects that can also suffer from extrapolation when features are dependent and interactions are present in the model. In simulations and a real-world application, we demonstrate the advantages of the conditional subgroup approach over existing methods: It allows to compute conditional PFI that is more true to the data than existing proposals and enables a fine-grained interpretation of feature effects and importance within the conditional subgroups.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[379]
S. Dandl.
Causality concepts in machine learning: heterogeneous treatment effect estimation with machine learning and model interpretation with counterfactual and semi-factual explanations.
Dissertation 2023. DOI.
Abstract

This thesis explores the growing intersection of machine learning and causality through seven articles, offering new insights into how these fields can enhance one another. It addresses key topics, including adapting machine learning algorithms for heterogeneous treatment effect estimation, where combining causal and model-based forest elements improves performance across diverse datasets. Additionally, the thesis introduces advanced interpretability tools, proposing methods to generate multiple counterfactual and semi-factual explanations that aid in fairness assessments and address interpretability challenges. A modular R package developed in this work provides accessible tools for researchers to apply and compare counterfactual explanation methods, further bridging machine learning and causal inference for practical applications. (Shortened).

MCML Authors

[378]
E. Dorigatti.
Cancer immunotherapy design and analysis through discrete optimization, positive-unlabeled learning, and semi-structured regression models.
Dissertation 2023. DOI.
Abstract

This thesis advances precision medicine by leveraging artificial intelligence to improve cancer immunotherapy development and tackle key challenges in clinical trials, where high failure rates often stem from insufficient understanding of patient and disease-specific factors. Through novel computational frameworks for cancer vaccine design, methods for handling imbalanced biological data, and hybrid modeling techniques that combine clinical data with imaging, this work demonstrates AI’s potential to personalize and accelerate therapeutic development. These contributions collectively pave the way for more effective, targeted treatments, potentially reducing the time and cost to bring new therapies to market. (Shortened).

MCML Authors

[377]
C. M. M. Frey.
Learning from complex networks.
Dissertation 2023. DOI.
Abstract

This thesis addresses key challenges in modern graph-based applications by proposing advanced techniques in spectral clustering, graph neural networks, and probabilistic graph structures. It introduces a robust, accelerated spectral clustering model for homogeneous graphs and a transformer-inspired Graph Shell Attention model to counter over-smoothing in graph neural networks. Furthermore, it tackles optimization in uncertain networks, presents a new approach to a vehicle routing problem with flexible delivery locations, and provides a novel method for classifying social media trends, illustrating the vital role of AI in understanding complex graph structures. (Shortened).

MCML Authors
Link to Christian Frey

Christian Frey

Dr.

* Former member


[376]
C. Fritz.
Statistical approaches to dynamic networks in society.
Dissertation 2023. DOI.
Abstract

This dissertation focuses on dynamic networks in the Social Sciences, examining methods and applications in network modeling. Part two provides an overview of modeling frameworks for dynamic networks, including applications in studying COVID-19 infections using social connectivity as covariates. In part three, the dissertation introduces a Signed Exponential Random Graph Model (SERGM) for signed networks and a bipartite variant of the Temporal Exponential Random Graph Model (TERGM) to study co-inventorship in patents. Part four concludes with models for event networks, including a Relational Event Model for Spurious Events (REMSE) to manage false-discovery rates in event data. (Shortened).

MCML Authors

[375]
J. Goschenhofer.
Reducing the effort for data annotation: contributions to weakly supervised deep learning.
Dissertation 2023. DOI.
Abstract

This thesis addresses methods for training machine learning models with limited labeled data, focusing on semi-supervised, positive unlabeled, constrained clustering, and transfer learning. It explores deep semi-supervised learning, particularly in time series and medical imaging contexts, and investigates positive unlabeled learning methods that utilize predictive uncertainty for self-training. The thesis also introduces weakly supervised learning for constrained clustering, combining it with semi-supervised approaches, and applies transfer learning to tasks with varying granularity in medical domains. (Shortened).

MCML Authors

[374]
J. Herbinger.
On grouping and partitioning approaches in interpretable machine learning.
Dissertation 2023. DOI.
Abstract

This thesis addresses the challenges of interpreting machine learning models, particularly focusing on the limitations of global explanation methods. It identifies two key issues: the human-incomprehensibility of high-dimensional outputs and the misleading interpretations caused by aggregation bias. The thesis proposes solutions to these problems, such as grouping features for simpler interpretations and using recursive partitioning algorithms to provide regional explanations, ensuring more accurate and understandable insights into model behavior. (Shortened.)

MCML Authors

[373]
A. Khakzar.
Rethinking Feature Attribution for Neural Network Explanation.
Dissertation 2023. DOI.
Abstract

Feature attribution is arguably the predominant approach for illuminating black-box neural networks. This dissertation rethinks feature attribution by leveraging critical neural pathways, identifying input features with predictive information, and evaluating feature attribution using the neural network model. The dissertation also rethinks feature attribution for the explanation of medical imaging models.

MCML Authors
Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member


[372]
G. König.
If interpretability is the answer, what is the question?: a causal perspective.
Dissertation 2023. DOI.
Abstract

This thesis addresses fundamental challenges in the field of interpretable machine learning (IML), particularly the lack of a clear definition of ‘interpretability’, the potential misinterpretation of existing methods, and the computational difficulties of conditional-sampling-based techniques. By disentangling the different goals of interpretability, we provide clearer guidelines for deriving target estimands, with specific examples such as recourse and scientific inference. Additionally, we propose formal interpretation rules for feature importance, highlight common pitfalls in IML, and introduce efficient methods for estimating conditional-sampling techniques by leveraging the data’s dependence structure, with a strong emphasis on causal inference to improve clarity and computational efficiency. (Shortened.)

MCML Authors

[371]
A. Mittermeier.
Robust evaluation of contrast-enhanced imaging for perfusion quantification.
Dissertation 2023. DOI.
Abstract

This thesis advances the quantification and prediction of hemodynamic parameters in dynamic contrast-enhanced (DCE) imaging through two innovative approaches. The Bayesian Tofts model (BTM) improves the reliability and uncertainty estimation of perfusion parameters, demonstrating its potential for enhanced treatment response assessment in cancer care. Additionally, the development of a deep learning model offers a promising alternative by directly predicting clinical endpoints from raw DCE-CT data, eliminating the need for traditional tracer-kinetic modeling and paving the way for more efficient and accurate clinical applications in stroke and other conditions. (Shortened.)

MCML Authors
Link to Andreas Mittermeier

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology


[370]
J. Moosbauer.
Towards explainable automated machine learning.
Dissertation 2023. DOI.
Abstract

This thesis explores the intersection of Automated Machine Learning (AutoML) and explainable AI, addressing the need for transparency at multiple levels: the model, the learning algorithm, and the AutoML system itself. The work develops methods for enhancing model explainability through multi-objective hyperparameter optimization (HPO) and introduces new techniques to understand the effects of hyperparameters and optimizers within AutoML systems. These contributions advance the field by providing more interpretable and reliable tools for AutoML, ultimately increasing the accessibility and trustworthiness of machine learning models and their deployment. (Shortened.)

MCML Authors

[369]
F. Ott.
Representation learning for domain adaptation and cross-modal retrieval: in the context of online handwriting recognition and visual self-localization.
Dissertation 2023. DOI.
Abstract

This thesis focuses on domain adaptation and cross-modal retrieval to address the challenges posed by domain shifts in machine learning applications. Specifically, it explores techniques for online handwriting recognition and visual self-localization. For handwriting recognition, the study uses deep metric learning and optimal transport to reduce domain shifts between different writing styles and writing modalities, while for visual self-localization, it enhances pose prediction through auxiliary tasks and representation learning fusion techniques to improve accuracy across sensor modalities. (Shortened.)

MCML Authors

[368]
F. Pfisterer.
Democratizing machine learning: contributions in AutoML and fairness.
Dissertation 2023. DOI.
Abstract

This thesis focuses on democratizing access to machine learning (ML) by improving automated machine learning (AutoML) systems and making ML tools more accessible to non-experts. Key contributions include methods to accelerate hyperparameter optimization by learning from previous experiments, the integration of fairness considerations in AutoML, and the development of software packages such as mlr3pipelines for creating machine learning pipelines and mlr3fairness for auditing and debiasing models. The thesis also includes tools for estimating and mitigating model fairness, such as the mcboost package for multi-calibration, addressing both the technical and ethical challenges of widespread ML deployment. (Shortened.)

MCML Authors

[367]
D. Schalk.
Modern approaches for component-wise boosting: Automation, efficiency, and distributed computing with application to the medical domain.
Dissertation 2023. DOI.
Abstract

This thesis focuses on enhancing component-wise boosting (CWB) by improving its efficiency and usability, particularly in high-dimensional feature spaces and distributed data settings. Key contributions include the optimization of the CWB algorithm through Nesterov’s momentum for faster fitting and reduced memory usage, as well as the development of the Autocompboost framework to integrate CWB with AutoML, emphasizing model interpretability. Additionally, the thesis introduces methods for evaluating binary classification models on distributed data using ROC analysis, and presents several R packages (compboost, dsCWB, Autocompboost, dsBinVal) that implement these advances. (Shortened.)

MCML Authors

[366]
T. Ullmann.
Evaluation of clustering results and novel cluster algorithms: a metascientific perspective.
Dissertation 2023. DOI.
Abstract

This dissertation addresses the reliability of clustering results and the evaluation of new clustering algorithms, particularly in light of the replication crisis in scientific research. The first contribution presents a framework for validating clustering results using validation data, ensuring the replicability and generalizability of findings. The second contribution quantifies over-optimistic bias in microbiome research by analyzing the effects of multiple analysis strategies on unsupervised tasks, while the third contribution highlights the over-optimism in evaluating new clustering algorithms, using the example of the ‘Rock’ algorithm, and advocates for more rigorous and neutral benchmarking methods. (Shortened.)

MCML Authors
Theresa Ullmann

Theresa Ullmann

Dr.

Biometry in Molecular Medicine


[365]
C. M. Verdun.
Scalability in Ill-posed Machine Learning Problems: Bridging Least Squares Methods with (Non-)convex Algorithms.
Dissertation 2023. DOI.
Abstract

We introduce novel algorithms to address some challenges in machine learning, including ill-conditioned low-rank matrix retrieval, constrained least squares, and high-dimensional regression with unknown noise. By bridging least squares with modern (non-)convex optimization, our methods achieve scalability, data efficiency, and robustness. We provide theoretical guarantees with minimal assumptions and numerically validate their performance.

MCML Authors
Link to Claudio Mayrink Verdun

Claudio Mayrink Verdun

Dr.

* Former member


[364]
V. Bengs and E. Hüllermeier.
Multi-armed bandits with censored consumption of resources.
Machine Learning 112.1 (Jan. 2023). DOI.
MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[363]
T. Ullmann, S. Peschel, P. Finger, C. L. Müller and A.-L. Boulesteix.
Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering.
PLOS Computational Biology 19.1 (Jan. 2023). DOI.
Abstract

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.

MCML Authors
Link to Stefanie Peschel

Stefanie Peschel

Biomedical Statistics and Data Science

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[362]
P. Gupta, J. P. Drees and E. Hüllermeier.
Automated Side-Channel Attacks using Black-Box Neural Architecture Search.
Preprint at Cryptology ePrint Archive (Jan. 2023). URL.
Abstract

The usage of convolutional neural networks (CNNs) to break cryptographic systems through hardware side-channels has enabled fast and adaptable attacks on devices like smart cards and TPMs. Current literature proposes fixed CNN architectures designed by domain experts to break such systems, which is time-consuming and unsuitable for attacking a new system. Recently, an approach using neural architecture search (NAS), which is able to acquire a suitable architecture automatically, has been explored. These works use the secret key information in the attack dataset for optimization and only explore two different search strategies using one-dimensional CNNs. We propose a NAS approach that relies only on using the profiling dataset for optimization, making it fully black-box. Using a large-scale experimental parameter study, we explore which choices for NAS, such as 1-D or 2-D CNNs and search strategy, produce the best results on 10 state-of-the-art datasets for Hamming weight and identity leakage models. We show that applying the random search strategy on 1-D inputs results in a high success rate and retrieves the correct secret key using a single attack trace on two of the datasets. This combination matches the attack efficiency of fixed CNN architectures, outperforming them in 4 out of 10 datasets. Our experiments also point toward the need for repeated attack evaluations of machine learning-based solutions in order to avoid biased performance estimates.

MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[361]
P. T. da Silva, Y. Zhang, E. Theodorakis, L. D. Martens, V. A. Yépez, V. Pelechano and J. Gagneur.
Cellular energy regulates mRNA translation and degradation in a codon-specific manner.
Preprint at bioRxiv (2023). DOI.
MCML Authors
Link to Pedro Tomaz da Silva

Pedro Tomaz da Silva

Computational Molecular Medicine

Link to Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine


[360]
A. Karollus, J. Hingerl, D. Gankin, M. Grosshauser, K. Klemon and J. Gagneur.
Species-aware DNA language models capture regulatory elements and their evolution.
Preprint at bioRxiv (2023). DOI.
MCML Authors
Link to Alexander Karollus

Alexander Karollus

Computational Molecular Medicine

Link to Johannes Hingerl

Johannes Hingerl

Computational Molecular Medicine

Link to Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine


[359]
I. Ziegler, B. Ma, B. Bischl, E. Dorigatti and B. Schubert.
Proteasomal cleavage prediction: state-of-the-art and future directions.
Preprint at bioRxiv (2023). DOI. GitHub.
Abstract

Epitope vaccines are a promising approach for precision treatment of pathogens, cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate proteasomal cleavage prediction to ensure that the epitopes included in the vaccine trigger an immune response. The performance of proteasomal cleavage predictors has been steadily improving over the past decades owing to increasing data availability and methodological advances. In this review, we summarize the current proteasomal cleavage prediction landscape and, in light of recent progress in the field of deep learning, develop and compare a wide range of recent architectures and techniques, including long short-term memory (LSTM), transformers, and convolutional neural networks (CNN), as well as four different denoising techniques. All open-source cleavage predictors re-trained on our dataset performed within two AUC percentage points. Our comprehensive deep learning architecture benchmark improved performance by 1.7 AUC percentage points, while closed-source predictors performed considerably worse. We found that a wide range of architectures and training regimes all result in very similar performance, suggesting that the specific modeling approach employed has a limited impact on predictive performance compared to the specifics of the dataset employed. We speculate that the noise and implicit nature of data acquisition techniques used for training proteasomal cleavage prediction models and the complexity of biological processes of the antigen processing pathway are the major limiting factors. While biological complexity can be tackled by more data and, to a lesser extent, better models, noise and randomness inherently limit the maximum achievable predictive performance.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


2022


[358]
J. Goschenhofer, P. Ragupathy, C. Heumann, B. Bischl and M. Aßenmacher.
CC-Top: Constrained Clustering for Dynamic Topic Discovery.
1st Workshop on Ever Evolving NLP (EvoNLP 2022). Abu Dhabi, United Arab Emirates, Dec 07, 2022. URL.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science


[357]
S. Legler, T. Janjic, M. H. Shaker and E. Hüllermeier.
Machine learning for estimating parameters of a convective-scale model: A comparison of neural networks and random forests.
32nd Workshop of Computational Intelligence of the VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik (GMA). Berlin, Germany, Dec 01-02, 2022. PDF.
MCML Authors
Link to Mohammad Hossein Shaker Ardakani

Mohammad Hossein Shaker Ardakani

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[356]
R. Foygel Barber, M. Drton, N. Sturma and L. Weihs.
Half-trek criterion for identifiability of latent variable models.
Annals of Statistics 50.6 (Dec. 2022). DOI.
MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[355]
M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, M. Galkin, S. Sharifzadeh, A. Fischer, V. Tresp and J. Lehmann.
Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework.
IEEE Transactions on Pattern Analysis and Machine Intelligence 44.12 (Dec. 2022). DOI. GitHub.
Abstract

The heterogeneity in recently published knowledge graph embedding models’ implementations, training, and evaluation has made fair and thorough comparisons difficult. To assess the reproducibility of previously published results, we re-implemented and evaluated 21 models in the PyKEEN software package. In this paper, we outline which results could be reproduced with their reported hyper-parameters, which could only be reproduced with alternate hyper-parameters, and which could not be reproduced at all, as well as provide insight as to why this might be the case. We then performed a large-scale benchmarking on four datasets with several thousands of experiments and 24,804 GPU hours of computation time. We present insights gained as to best practices, best configurations for each model, and where improvements could be made over previously published best configurations. Our results highlight that the combination of model architecture, training approach, loss function, and the explicit modeling of inverse relations is crucial for a model’s performance and is not only determined by its architecture. We provide evidence that several architectures can obtain results competitive to the state of the art when configured carefully.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[354]
C. Fritz, G. De Nicola, F. Günther, D. Rügamer, M. Rave, M. Schneble, A. Bender, M. Weigert, R. Brinks, A. Hoyer, U. Berger, H. Küchenhoff and G. Kauermann.
Challenges in Interpreting Epidemiological Surveillance Data – Experiences from Germany.
Journal of Computational and Graphical Statistics 32.3 (Dec. 2022). DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[353]
H. Boche, A. Fono and G. Kutyniok.
Non-Computability of the Pseudoinverse on Digital Computers.
Preprint at arXiv (Dec. 2022). arXiv.
Abstract

The pseudoinverse of a matrix, a generalized notion of the inverse, is of fundamental importance in linear algebra. However, there does not exist a closed form representation of the pseudoinverse, which can be straightforwardly computed. Therefore, an algorithmic computation is necessary. An algorithmic computation can only be evaluated by also considering the underlying hardware, typically digital hardware, which is responsible for performing the actual computations step by step. In this paper, we analyze if and to what degree the pseudoinverse actually can be computed on digital hardware platforms modeled as Turing machines. For this, we utilize the notion of an effective algorithm which describes a provably correct computation: upon an input of any error parameter, the algorithm provides an approximation within the given error bound with respect to the unknown solution. We prove that an effective algorithm for computing the pseudoinverse of any matrix can not exist on a Turing machine, although provably correct algorithms do exist for specific classes of matrices. Even more, our results introduce a lower bound on the accuracy that can be obtained algorithmically when computing the pseudoinverse on Turing machines.

MCML Authors
Link to Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence


[352]
M. Brunner, P. Heid, M. Innerberger, A. Miraci, D. Praetorius and J. Streitberger.
Adaptive FEM with quasi-optimal overall cost for nonsymmetric linear elliptic PDEs.
Preprint at arXiv (Dec. 2022). arXiv.
MCML Authors
Link to Pascal Heid

Pascal Heid

Dr.

Applied Numerical Analysis


[351]
M. Herold, A. Veselovska, J. Jehle and F. Krahmer.
Non-intrusive surrogate modelling using sparse random features with applications in crashworthiness analysis.
Preprint at arXiv (Dec. 2022). arXiv.
Abstract

Efficient surrogate modelling is a key requirement for uncertainty quantification in data-driven scenarios. In this work, a novel approach of using Sparse Random Features for surrogate modelling in combination with self-supervised dimensionality reduction is described. The method is compared to other methods on synthetic and real data obtained from crashworthiness analyses. The results show a superiority of the here described approach over state of the art surrogate modelling techniques, Polynomial Chaos Expansions and Neural Networks.

MCML Authors
Link to Hanna Veselovska

Hanna Veselovska

Dr.

Optimization & Data Analysis

Link to Felix Krahmer

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis


[350]
H. Huang, J. Qiu and K. Riedl.
Consensus-Based Optimization for Saddle Point Problems.
Preprint at arXiv (Dec. 2022). arXiv.
MCML Authors
Link to Konstantin Riedl

Konstantin Riedl

Applied Numerical Analysis


[349]
W. Durani, D. Mautz, C. Plant and C. Böhm.
DBHD: Density-based clustering for highly varying density.
22nd IEEE International Conference on Data Mining (ICDM 2022). Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI.
MCML Authors
Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[348]
S. Gilhuber, P. Jahn, Y. Ma and T. Seidl.
VERIPS: Verified Pseudo-label Selection for Deep Active Learning.
22nd IEEE International Conference on Data Mining (ICDM 2022). Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI. GitHub.
MCML Authors
Link to Philipp Jahn

Philipp Jahn

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[347]
M. Rezaei, E. Dorigatti, D. Rügamer and B. Bischl.
Learning Statistical Representation with Joint Deep Embedded Clustering.
IEEE International Conference on Data Mining Workshops (ICDMW 2022). Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI.
MCML Authors
Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[346]
N. Strauß, M. Berrendorf, T. Haider and M. Schubert.
A Comparison of Ambulance Redeployment Systems on Real-World Data.
IEEE International Conference on Data Mining Workshops (ICDMW 2022). Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI. GitHub.
Abstract

Modern Emergency Medical Services (EMS) benefit from real-time sensor information in various ways as they provide up-to-date location information and help assess current local emergency risks. A critical part of EMS is dynamic ambulance redeployment, i.e., the task of assigning idle ambulances to base stations throughout a community. Although there has been a considerable effort on methods to optimize emergency response systems, a comparison of proposed methods is generally difficult as reported results are mostly based on artificial and proprietary test beds. In this paper, we present a benchmark simulation environment for dynamic ambulance redeployment based on real emergency data from the city of San Francisco. Our proposed simulation environment is highly scalable and is compatible with modern reinforcement learning frameworks. We provide a comparative study of several state-of-the-art methods for various metrics. Results indicate that even simple baseline algorithms can perform considerably well in close-to-realistic settings.

MCML Authors
Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[345]
J. Ullerich, M. Windl, A. Bulling and S. Mayer.
ThumbPitch: Enriching Thumb Interaction on Mobile Touchscreens using Deep Learning.
33rd Australian Conference on Human-Computer Interaction (OZCHI 2022). Canberra, NSW, Australia, Nov 29-Dec 02, 2022. DOI.
MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[344]
J. Li, M. Zhao, Y. Xie, A. Maronikolakis, P. Pu and H. Schütze.
This joke is [MASK]: Recognizing Humor and Offense with Prompting.
1st Transfer Learning for Natural Language Processing Workshop (TL4NLP) at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL.
Abstract

Humor is a magnetic component in everyday human interactions and communications. Computationally modeling humor enables NLP systems to entertain and engage with users. We investigate the effectiveness of prompting, a new transfer learning paradigm for NLP, for humor recognition. We show that prompting performs similarly to finetuning when numerous annotations are available, but gives stellar performance in low-resource humor recognition. The relationship between humor and offense is also inspected by applying influence functions to prompting; we show that models could rely on offense to determine humor during transfer.

MCML Authors
Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[343]
V. Bengs, E. Hüllermeier and W. Waegeman.
Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation.
36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. PDF.
MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[342]
A. Blattmann, R. Rombach, K. Oktay and B. Ommer.
Retrieval-Augmented Diffusion Models.
36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. PDF.
MCML Authors
Link to Björn Ommer

Björn Ommer

Prof. Dr.

Machine Vision & Learning


[341]
J. Brandt, V. Bengs, B. Haddenhorst and E. Hüllermeier.
Finding optimal arms in non-stochastic combinatorial bandits with semi-bandit feedback and finite budget.
36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. PDF.
Abstract

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. Unlike existing works, we study this problem in a non-stochastic setting with subset-dependent feedback, i.e., the semi-bandit feedback received could be generated by an oblivious adversary and also might depend on the chosen set of arms. In addition, we consider a general feedback scenario covering both the numerical-based as well as preference-based case and introduce a sound theoretical framework for this setting guaranteeing sensible notions of optimal arms, which a learner seeks to find. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies from aggressive to conservative. Theoretical questions about the sufficient and necessary budget of the algorithm to find the best arm are answered and complemented by deriving lower bounds for any learning algorithm for this problem scenario.

MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[340]
L. Hetzel, S. Boehm, N. Kilbertus, S. Günnemann, M. Lotfollahi and F. J. Theis.
Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution.
36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. PDF.
MCML Authors
Link to Leon Hetzel

Leon Hetzel

Mathematical Modelling of Biological Systems

Link to Niki Kilbertus

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems


[339]
H. H.-H. Hsu, Y. Shen, C. Tomani and D. Cremers.
What Makes Graph Neural Networks Miscalibrated?.
36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. PDF.
MCML Authors
Yuesong Shen

Yuesong Shen

Computer Vision & Artificial Intelligence

Link to Christian Tomani

Christian Tomani

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[338]
Y. Scholten, J. Schuchardt, S. Geisler, A. Bojchevski and S. Günnemann.
Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks.
36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. PDF.
MCML Authors
Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[337]
Y. Shen and D. Cremers.
Deep Combinatorial Aggregation.
36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. PDF.
MCML Authors
Yuesong Shen

Yuesong Shen

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[336]
N. Hurmer, X.-Y. To, M. Binder, H. A. Gündüz, P. C. Münch, R. Mreches, A. C. McHardy, B. Bischl and M. Rezaei.
Transformer Model for Genome Sequence Analysis.
Workshop on Learning Meaningful Representations of Life (LMRL 2022) at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL.
MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Hüseyin Anil Gündüz

Hüseyin Anil Gündüz

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[335]
I. Ziegler, B. Ma, E. Nie, B. Bischl, D. Rügamer, B. Schubert and E. Dorigatti.
What cleaves? Is proteasomal cleavage prediction reaching a ceiling?.
Workshop on Learning Meaningful Representations of Life (LMRL 2022) at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL.
MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[334]
H. H.-H. Hsu, Y. Shen and D. Cremers.
A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs.
Workshop on New Frontiers in Graph Learning at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL.
MCML Authors
Yuesong Shen

Yuesong Shen

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[333]
A. Farshad, Y. Yeganeh, H. Dhamo, F. Tombari and N. Navab.
DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation.
33rd British Machine Vision Conference (BMVC 2022). London, UK, Nov 21-24, 2022. URL.
MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[332]
M. Windl, A. Hiesinger, R. Welsch, A. Schmidt and S. S. Feger.
SaferHome: Interactive Physical and Digital Smart Home Dashboards for Communicating Privacy Assessments to Owners and Bystanders.
ACM Interactive Surfaces and Spaces Conference (ISS 2022). Wellington, New Zealand, Nov 20-23, 2022. DOI.
Abstract

Private homes are increasingly becoming smart spaces. While smart homes promise comfort, they expose most intimate spaces to security and privacy risks. Unfortunately, most users today are not equipped with the right tools to assess the vulnerabilities or privacy practices of smart devices. Further, users might lose track of the devices installed in their homes or are unaware of devices placed by a partner or host. We developed SaferHome, an interactive digital-physical privacy framework, to provide smart home users with security and privacy assessments and a sense of device location. SaferHome includes a digital list view and physical and digital dashboards that map real floor plans. We evaluated SaferHome with eight households in the wild. We find that users adopted various strategies to integrate the dashboards into their understanding and interpretation of smart home privacy. We present implications for the design of future smart home privacy frameworks that are impacted by technical affinity, device types, device ownership, and tangibility of assessments.

MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media


[331]
A. Campagner, J. Lienen, E. Hüllermeier and D. Ciucci.
Scikit-Weak: A Python Library for Weakly Supervised Machine Learning.
International Joint Conference on Rough Sets (IJCRS 2022). Suzhou, China, Nov 11-14, 2022. DOI.
MCML Authors
Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[330]
H. S. Saadi, V. Hangya, T. Eder and A. Fraser.
Comparative Analysis of Cross-lingual Contextualized Word Embeddings.
2nd Workshop on Multi-lingual Representation Learning (MRL 2022) at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

Contextualized word embeddings have emerged as the most important tool for performing NLP tasks in a large variety of languages. In order to improve the cross- lingual representation and transfer learning quality, contextualized embedding alignment techniques, such as mapping and model fine-tuning, are employed. Existing techniques however are time-, data- and computational resource-intensive. In this paper we analyze these techniques by utilizing three tasks: bilingual lexicon induction (BLI), word retrieval and cross-lingual natural language inference (XNLI) for a high resource (German-English) and a low resource (Bengali-English) language pair. In contrast to previous works which focus only on a few popular models, we compare five multilingual and seven monolingual language models and investigate the effect of various aspects on their performance, such as vocabulary size, number of languages used for training and number of parameters. Additionally, we propose a parameter-, data- and runtime-efficient technique which can be trained with 10% of the data, less than 10% of the time and have less than 5% of the trainable parameters compared to model fine-tuning. We show that our proposed method is competitive with resource heavy models, even outperforming them in some cases, even though it relies on less resource.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[329]
J. Baan, W. Aziz, B. Plank and R. Fernandez.
Stop Measuring Calibration When Humans Disagree.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - including class frequency, ranking and entropy.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[328]
E. Bassignana, M. Müller-Eberstein, M. Zhang and B. Plank.
Evidence > Intuition: Transferability Estimation for Encoder Selection.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

With the increase in availability of large pre-trained language models (LMs) in Natural Language Processing (NLP), it becomes critical to assess their fit for a specific target task a priori—as fine-tuning the entire space of available LMs is computationally prohibitive and unsustainable. However, encoder transferability estimation has received little to no attention in NLP. In this paper, we propose to generate quantitative evidence to predict which LM, out of a pool of models, will perform best on a target task without having to fine-tune all candidates. We provide a comprehensive study on LM ranking for 10 NLP tasks spanning the two fundamental problem types of classification and structured prediction. We adopt the state-of-the-art Logarithm of Maximum Evidence (LogME) measure from Computer Vision (CV) and find that it positively correlates with final LM performance in 94% of the setups.In the first study of its kind, we further compare transferability measures with the de facto standard of human practitioner ranking, finding that evidence from quantitative metrics is more robust than pure intuition and can help identify unexpected LM candidates.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[327]
V. Hangya, H. S. Saadi and A. Fraser.
Improving Low-Resource Languages in Pre-Trained Multilingual Language Models.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

Pre-trained multilingual language models are the foundation of many NLP approaches, including cross-lingual transfer solutions. However, languages with small available monolingual corpora are often not well-supported by these models leading to poor performance. We propose an unsupervised approach to improve the cross-lingual representations of low-resource languages by bootstrapping word translation pairs from monolingual corpora and using them to improve language alignment in pre-trained language models. We perform experiments on nine languages, using contextual word retrieval and zero-shot named entity recognition to measure both intrinsic cross-lingual word representation quality and downstream task performance, showing improvements on both tasks. Our results show that it is possible to improve pre-trained multilingual language models by relying only on non-parallel resources.

MCML Authors
Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[326]
A. Imani, S. Severini, M. J. Sabet, F. Yvon and H. Schütze.
Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

Part-of-Speech (POS) tagging is an important component of the NLP pipeline, but many low-resource languages lack labeled data for training. An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages. In this paper, we propose a novel method for transferring labels from multiple high-resource source to low-resource target languages. We formalize POS tag projection as graph-based label propagation. Given translations of a sentence in multiple languages, we create a graph with words as nodes and alignment links as edges by aligning words for all language pairs. We then propagate node labels from source to target using a Graph Neural Network augmented with transformer layers. We show that our propagation creates training sets that allow us to train POS taggers for a diverse set of languages. When combined with enhanced contextualized embeddings, our method achieves a new state-of-the-art for unsupervised POS tagging of low-resource languages.

MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[325]
M. Müller-Eberstein, R. van der Goot and B. Plank.
Spectral Probing.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

Linguistic information is encoded at varying timescales (subwords, phrases, etc.) and communicative levels, such as syntax and semantics. Contextualized embeddings have analogously been found to capture these phenomena at distinctive layers and frequencies. Leveraging these findings, we develop a fully learnable frequency filter to identify spectral profiles for any given task. It enables vastly more granular analyses than prior handcrafted filters, and improves on efficiency. After demonstrating the informativeness of spectral probing over manual filters in a monolingual setting, we investigate its multilingual characteristics across seven diverse NLP tasks in six languages. Our analyses identify distinctive spectral profiles which quantify cross-task similarity in a linguistically intuitive manner, while remaining consistent across languages—highlighting their potential as robust, lightweight task descriptors.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[324]
B. Plank.
The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

Human variation in labeling is often considered noise. Annotation projects for machine learning (ML) aim at minimizing human label variation, with the assumption to maximize data quality and in turn optimize and maximize machine learning metrics. However, thisconventional practice assumes that there exists a ground truth, and neglects that there exists genuine human variation in labeling due to disagreement, subjectivity in annotation or multiple plausible answers. In this position paper, we argue that this big open problem of human label variation persists and critically needs more attention to move our field forward. This is because human label variation impacts all stages of the ML pipeline: data, modeling and evaluation. However, few works consider all of these dimensions jointly; and existing research is fragmented. We reconcile different previously proposed notions of human label variation, provide a repository of publicly-available datasets with un-aggregated labels, depict approaches proposed so far, identify gaps and suggest ways forward. As datasets are becoming increasingly available, we hope that this synthesized view on the ‘problem’ will lead to an open discussion on possible strategies to devise fundamentally new directions.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[323]
L. Weissweiler, V. Hofmann, A. Köksal and H. Schütze.
The better your Syntax, the better your Semantics? Probing Pretrained Language Models for the English Comparative Correlative.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step towards assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models’ behaviour in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[322]
E. Bassignana and B. Plank.
CrossRE: A Cross-Domain Dataset for Relation Extraction.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

Relation Extraction (RE) has attracted increasing attention, but current RE evaluation is limited to in-domain evaluation setups. Little is known on how well a RE system fares in challenging, but realistic out-of-distribution evaluation setups. To address this gap, we propose CrossRE, a new, freely-available cross-domain benchmark for RE, which comprises six distinct text domains and includes multi-label annotations. An additional innovation is that we release meta-data collected during annotation, to include explanations and flags of difficult instances. We provide an empirical evaluation with a state-of-the-art model for relation classification. As the meta-data enables us to shed new light on the state-of-the-art model, we provide a comprehensive analysis on the impact of difficult cases and find correlations between model and human annotations. Overall, our empirical investigation highlights the difficulty of cross-domain RE. We release our dataset, to spur more research in this direction.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[321]
W. Lai, A. Chronopoulou and A. Fraser.
m4 Adapter: Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

Multilingual neural machine translation models (MNMT) yield state-of-the-art performance when evaluated on data from a domain and language pair seen at training time. However, when a MNMT model is used to translate under domain shift or to a new language pair, performance drops dramatically. We consider a very challenging scenario: adapting the MNMT model both to a new domain and to a new language pair at the same time. In this paper, we propose m4Adapter (Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter), which combines domain and language knowledge using meta-learning with adapters. We present results showing that our approach is a parameter-efficient solution which effectively adapts a model to both a new language pair and a new domain, while outperforming other adapter methods. An ablation study also shows that our approach more effectively transfers domain knowledge across different languages and language information across different domains.

MCML Authors
Link to Alexandra Chronopoulou

Alexandra Chronopoulou

Dr.

* Former member

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics


[320]
D. Ulmer, E. Bassignana, M. Müller-Eberstein, D. Varab, M. Zhang, R. van der Goot, C. Hardmeier and B. Plank.
Experimental Standards for Deep Learning in Natural Language Processing Research.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
Abstract

The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, compared to more established disciplines, a lack of common experimental standards remains an open challenge to the field at large. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in NLP into a single, widely-applicable methodology. Following these best practices is crucial to strengthen experimental evidence, improve reproducibility and enable scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics


[319]
A. Lohrer, J. J. Binder and P. Kröger.
Group Anomaly Detection for Spatio-Temporal Collective Behaviour Scenarios in Smart Cities.
15th International Workshop on Computational Transportation Science (IWCTS 2022) at the 30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL 2022). Seattle, WA, USA, Nov 01-04, 2022. DOI.
MCML Authors
Andreas Lohrer

Andreas Lohrer

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member


[318]
M. Bernhard and M. Schubert.
Robust Object Detection in Remote Sensing Imagery with Noisy and Sparse Geo-Annotations.
30th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL 2022). Seattle, WA, USA, Nov 01-04, 2022. DOI. GitHub.
Abstract

Recently, the availability of remote sensing imagery from aerial vehicles and satellites constantly improved. For an automated interpretation of such data, deep-learning-based object detectors achieve state-of-the-art performance. However, established object detectors require complete, precise, and correct bounding box annotations for training. In order to create the necessary training annotations for object detectors, imagery can be georeferenced and combined with data from other sources, such as points of interest localized by GPS sensors. Unfortunately, this combination often leads to poor object localization and missing annotations. Therefore, training object detectors with such data often results in insufficient detection performance. In this paper, we present a novel approach for training object detectors with extremely noisy and incomplete annotations. Our method is based on a teacher-student learning framework and a correction module accounting for imprecise and missing annotations. Thus, our method is easy to use and can be combined with arbitrary object detectors. We demonstrate that our approach improves standard detectors by 37.1% $AP_{50}$ on a noisy real-world remote-sensing dataset. Furthermore, our method achieves great performance gains on two datasets with synthetic noise.

MCML Authors
Link to Maximilian Bernhard

Maximilian Bernhard

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[317]
E. Pretzsch, V. Heinemann, S. Stintzing, A. Bender, S. Chen, J. W. Holch, F. O. Hofmann, H. Ren, F. Böschand, H. Küchenhoff, J. Werner and W. K. Angele.
EMT-Related Genes Have No Prognostic Relevance in Metastatic Colorectal Cancer as Opposed to Stage II/III: Analysis of the Randomised, Phase III Trial FIRE-3 (AIO KRK 0306; FIRE-3).
Cancers 14.22 (Nov. 2022). DOI.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Shuo Chen

Shuo Chen

Database Systems & Data Mining

Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)


[316]
S. Shit, R. Koner, B. Wittmann, J. Paetzold, I. Ezhov, H. Li, J. Pan, S. Sharifzadeh, G. Kaissis, V. Tresp and B. Menze.
Relationformer: A Unified Framework for Image-to-Graph Generation.
17th European Conference on Computer Vision (ECCV 2022). Tel Aviv, Israel, Oct 23-27, 2022. DOI. GitHub.
Abstract

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach’s effectiveness and generalizability.

MCML Authors
Link to Rajat Koner

Rajat Koner

Database Systems & Data Mining

Link to Georgios Kaissis

Georgios Kaissis

Dr.

Privacy-Preserving and Trustworthy AI

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[315]
C. Tomani, D. Cremers and F. Buettner.
Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration.
17th European Conference on Computer Vision (ECCV 2022). Tel Aviv, Israel, Oct 23-27, 2022. DOI.
MCML Authors
Link to Christian Tomani

Christian Tomani

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[314]
C. Zelenka, A. Lohrer, M. Bayer and P. Kröger.
AI4EO Hyperview: A SpectralNet3D and RNNPlus Approach for Sustainable Soil Parameter Estimation on Hyperspectral Image Data.
IEEE International Conference on Image Processing (ICIP 2022). Bordeaux, France, Oct 16-19, 2022. DOI.
MCML Authors
Andreas Lohrer

Andreas Lohrer

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member


[313]
F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift.
30th ACM International Conference on Multimedia (MM 2022). Lisbon, Portugal, Oct 10-14, 2022. DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[312]
J. Moosbauer, M. Binder, L. Schneider, F. Pfisterer, M. Becker, M. Lang, L. Kotthoff and B. Bischl.
Automated Benchmark-Driven Design and Explanation of Hyperparameter Optimizers.
IEEE Transactions on Evolutionary Computation 26.6 (Oct. 2022). DOI.
Abstract

Automated hyperparameter optimization (HPO) has gained great popularity and is an important component of most automated machine learning frameworks. However, the process of designing HPO algorithms is still an unsystematic and manual process: new algorithms are often built on top of prior work, where limitations are identified and improvements are proposed. Even though this approach is guided by expert knowledge, it is still somewhat arbitrary. The process rarely allows for gaining a holistic understanding of which algorithmic components drive performance and carries the risk of overlooking good algorithmic design choices. We present a principled approach to automated benchmark-driven algorithm design applied to multifidelity HPO (MF-HPO). First, we formalize a rich space of MF-HPO candidates that includes, but is not limited to, common existing HPO algorithms and then present a configurable framework covering this space. To find the best candidate automatically and systematically, we follow a programming-by-optimization approach and search over the space of algorithm candidates via Bayesian optimization. We challenge whether the found design choices are necessary or could be replaced by more naive and simpler ones by performing an ablation analysis. We observe that using a relatively simple configuration (in some ways, simpler than established methods) performs very well as long as some critical configuration parameters are set to the right value.

MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[311]
E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
Journal of Artificial Intelligence Research 75 (Oct. 2022). DOI.
MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[310]
K. Rath, D. Rügamer, B. Bischl, U. von Toussaint, C. Rea, A. Maris, R. Granetz and C. Albert.
Data augmentation for disruption prediction via robust surrogate models.
Journal of Plasma Physics 88.5 (Oct. 2022). DOI.
Abstract

The goal of this work is to generate large statistically representative data sets to train machine learning models for disruption prediction provided by data from few existing discharges. Such a comprehensive training database is important to achieve satisfying and reliable prediction results in artificial neural network classifiers. Here, we aim for a robust augmentation of the training database for multivariate time series data using Student $t$ process regression. We apply Student $t$ process regression in a state space formulation via Bayesian filtering to tackle challenges imposed by outliers and noise in the training data set and to reduce the computational complexity. Thus, the method can also be used if the time resolution is high. We use an uncorrelated model for each dimension and impose correlations afterwards via colouring transformations. We demonstrate the efficacy of our approach on plasma diagnostics data of three different disruption classes from the DIII-D tokamak. To evaluate if the distribution of the generated data is similar to the training data, we additionally perform statistical analyses using methods from time series analysis, descriptive statistics and classic machine learning clustering algorithms.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[309]
I. Obadic, R. Roscher, D. A. B. Oliveira and X. Zhu.
Exploring Self-Attention for Crop-type Classification Explainability.
Preprint at arXiv (Oct. 2022). arXiv.
Abstract

Automated crop-type classification using Sentinel-2 satellite time series is essential to support agriculture monitoring. Recently, deep learning models based on transformer encoders became a promising approach for crop-type classification. Using explainable machine learning to reveal the inner workings of these models is an important step towards improving stakeholders’ trust and efficient agriculture monitoring. In this paper, we introduce a novel explainability framework that aims to shed a light on the essential crop disambiguation patterns learned by a state-of-the-art transformer encoder model. More specifically, we process the attention weights of a trained transformer encoder to reveal the critical dates for crop disambiguation and use domain knowledge to uncover the phenological events that support the model performance. We also present a sensitivity analysis approach to understand better the attention capability for revealing crop-specific phenological events. We report compelling results showing that attention patterns strongly relate to key dates, and consequently, to the critical phenological events for crop-type classification. These findings might be relevant for improving stakeholder trust and optimizing agriculture monitoring processes. Additionally, our sensitivity analysis demonstrates the limitation of attention weights for identifying the important events in the crop phenology as we empirically show that the unveiled phenological events depend on the other crops in the data considered during training.

MCML Authors
Link to Ivica Obadic

Ivica Obadic

Data Science in Earth Observation

Link to Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation


[308]
M. Windl and S. Mayer.
The Skewed Privacy Concerns of Bystanders in Smart Environments.
ACM International Conference on Mobile Human-Computer Interaction (MobileHCI 2022). Vancouver, Canada, Sep 28-Oct 01, 2022. DOI.
MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Sven Mayer

Sven Mayer

Prof. Dr.

Associate

Human-Computer Interaction and Artificial Intelligence


[307]
S. Gilhuber, M. Berrendorf, Y. Ma and T. Seidl.
Accelerating Diversity Sampling for Deep Active Learning By Low-Dimensional Representations.
6th International Workshop on Interactive Adaptive Learning (IAL 2022) co-located with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-22, 2022. PDF. GitHub.
Abstract

Selecting diverse instances for annotation is one of the key factors of successful active learning strategies. To this end, existing methods often operate on high-dimensional latent representations. In this work, we propose to use the low-dimensional vector of predicted probabilities instead, which can be seamlessly integrated into existing methods. We empirically demonstrate that this considerably decreases the query time, i.e., time to select an instance for annotation, while at the same time improving results. Low query times are relevant for active learning researchers, which use a (fast) oracle for simulated annotation and thus are often constrained by query time. It is also practically relevant when dealing with complex annotation tasks for which only a small pool of skilled domain experts is available for annotation with a limited time budget.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[306]
D. Deng, F. Karl, F. Hutter, B. Bischl and M. Lindauer.
Efficient Automated Deep Learning for Time Series Forecasting.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-22, 2022. DOI.
Abstract

Recent years have witnessed tremendously improved efficiency of Automated Machine Learning (AutoML), especially Automated Deep Learning (AutoDL) systems, but recent work focuses on tabular, image, or NLP tasks. So far, little attention has been paid to general AutoDL frameworks for time series forecasting, despite the enormous success in applying different novel architectures to such tasks. In this paper, we propose an efficient approach for the joint optimization of neural architecture and hyperparameters of the entire data processing pipeline for time series forecasting. In contrast to common NAS search spaces, we designed a novel neural architecture search space covering various state-of-the-art architectures, allowing for an efficient macro-search over different DL approaches. To efficiently search in such a large configuration space, we use Bayesian optimization with multi-fidelity optimization. We empirically study several different budget types enabling efficient multi-fidelity optimization on different forecasting datasets. Furthermore, we compared our resulting system, against several established baselines and show that it significantly outperforms all of them across several datasets.

MCML Authors
Link to Florian Karl

Florian Karl

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[305]
C. M. M. Frey, Y. Ma and M. Schubert.
SEA: Graph Shell Attention in Graph Neural Networks.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-22, 2022. DOI.
Abstract

A common problem in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes’ representations of the input graph align and become indiscernible. The latest models employing attention mechanisms with Graph Transformer Layers (GTLs) are still restricted to the layer-wise computational workflow of a GNN that are not beyond preventing such effects. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes’ representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph textbf{S}htextbf{e}ll textbf{A}ttention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node’s representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results while drastically reducing the number of parameters compared to state-of-the-art models.

MCML Authors
Link to Christian Frey

Christian Frey

Dr.

* Former member

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[304]
D. Rügamer, A. Bender, S. Wiegrebe, D. Racek, B. Bischl, C. L. Müller and C. Stachl.
Factorized Structured Regression for Large-Scale Varying Coefficient Models.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-22, 2022. DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science


[303]
N. Strauß, D. Winkel, M. Berrendorf and M. Schubert.
Reinforcement Learning for Multi-Agent Stochastic Resource Collection.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-22, 2022. DOI.
Abstract

Stochastic Resource Collection (SRC) describes tasks where an agent tries to collect a maximal amount of dynamic resources while navigating through a road network. An instance of SRC is the traveling officer problem (TOP), where a parking officer tries to maximize the number of fined parking violations. In contrast to vehicular routing problems, in SRC tasks, resources might appear and disappear by an unknown stochastic process, and thus, the task is inherently more dynamic. In most applications of SRC, such as TOP, covering realistic scenarios requires more than one agent. However, directly applying multi-agent approaches to SRC yields challenges considering temporal abstractions and inter-agent coordination. In this paper, we propose a novel multi-agent reinforcement learning method for the task of Multi-Agent Stochastic Resource Collection (MASRC). To this end, we formalize MASRC as a Semi-Markov Game which allows the use of temporal abstraction and asynchronous actions by various agents. In addition, we propose a novel architecture trained with independent learning, which integrates the information about collaborating agents and allows us to take advantage of temporal abstractions. Our agents are evaluated on the multiple traveling officer problem, an instance of MASRC where multiple officers try to maximize the number of fined parking violations. Our simulation environment is based on real-world sensor data. Results demonstrate that our proposed agent can beat various state-of-the-art approaches.

MCML Authors
Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[302]
D. Winkel, N. Strauß, M. Schubert and T. Seidl.
Risk-Aware Reinforcement Learning for Multi-Period Portfolio Selection.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-22, 2022. DOI.
MCML Authors
Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[301]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Implicit Embeddings via GAN Inversion for High Resolution Chest Radiographs.
1st Workshop on Medical Applications with Disentanglements (MAD 2022) at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI.
MCML Authors
Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[300]
A. Farshad, Y. Yeganeh, P. Gehlbach and N. Navab.
Y-Net: A Spatiospectral Dual-Encoder Network for Medical Image Segmentation.
25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI.
MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[299]
Y. Yeganeh, A. Farshad, J. Boschmann, R. Gaus, M. Frantzen and N. Navab.
FedAP: Adaptive Personalization in Federated Learning for Non-IID Data.
3rd Workshop on Distributed, Collaborative, and Federated Learning, and Affordable AI and Healthcare for Resource Diverse Global Health (DeCaF FAIR 2022) at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI.
Abstract

Federated learning (FL) is a distributed learning method that offers medical institutes the prospect of collaboration in a global model while preserving the privacy of their patients. Although most medical centers conduct similar medical imaging tasks, their differences, such as specializations, number of patients, and devices, lead to distinctive data distributions. Data heterogeneity poses a challenge for FL and the personalization of the local models. In this work, we investigate an adaptive hierarchical clustering method for FL to produce intermediate semi-global models, so clients with similar data distribution have the chance of forming a more specialized model. Our method forms several clusters consisting of clients with the most similar data distributions; then, each cluster continues to train separately. Inside the cluster, we use meta-learning to improve the personalization of the participants’ models. We compare the clustering approach with classical FedAvg and centralized training by evaluating our proposed methods on the HAM10k dataset for skin lesion classification with extreme heterogeneous data distribution. Our experiments demonstrate significant performance gain in heterogeneous distribution compared to standard FL methods in classification accuracy. Moreover, we show that the models converge faster if applied in clusters and outperform centralized training while using only a small subset of data.

MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[298]
A. Farshad, A. Makarevich, V. Belagiannis and N. Navab.
MetaMedSeg: Volumetric Meta-learning for Few-Shot Organ Segmentation.
4th Workshop on Domain Adaptation and Representation Transfer (DART 2022) at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI.
MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[297]
P. Engstler, M. Keicher, D. Schinz, K. Mach, A. S. Gersing, S. C. Foreman, S. S. Goller, J. Weissinger, J. Rischewski, A.-S. Dietrich, B. Wiestler, J. S. Kirschke, A. Khakzar and N. Navab.
Interpretable Vertebral Fracture Diagnosis.
Workshop on Interpretability of Machine Intelligence in Medical Image Computing (iMIMIC 2022) at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI.
MCML Authors
Link to Matthias Keicher

Matthias Keicher

Computer Aided Medical Procedures & Augmented Reality

Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[296]
E. Hohma, C. M. M. Frey, A. Beer and T. Seidl.
SCAR - Spectral Clustering Accelerated and Robustified.
48th International Conference on Very Large Databases (VLDB 2022). Sydney, Australia (and hybrid), Sep 05-09, 2022. DOI. GitHub.
MCML Authors
Link to Christian Frey

Christian Frey

Dr.

* Former member

Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[295]
R. Sonabend, A. Bender and S. Vollmer.
Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures.
Bioinformatics 38.17 (Sep. 2022). DOI.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[294]
G. Brasó, O. Cetintas and L. Leal-Taixé.
Multi-Object Tracking and Segmentation Via Neural Message Passing.
International Journal of Computer Vision 130.12 (Sep. 2022). DOI.
MCML Authors
Link to Guillem Brasó

Guillem Brasó

* Former member

Link to Laura Leal-Taixé

Laura Leal-Taixé

Prof. Dr.

* Former member


[293]
C. Fritz, M. Mehrl, P. W. Thurner and G. Kauermann.
All that Glitters is not Gold: Relational Events Models with Spurious Events.
Network Science 11.2 (Sep. 2022). DOI.
Abstract

As relational event models are an increasingly popular model for studying relational structures, the reliability of large-scale event data collection becomes more and more important. Automated or human-coded events often suffer from non-negligible false-discovery rates in event identification. And most sensor data are primarily based on actors’ spatial proximity for predefined time windows; hence, the observed events could relate either to a social relationship or random co-location. Both examples imply spurious events that may bias estimates and inference. We propose the Relational Event Model for Spurious Events (REMSE), an extension to existing approaches for interaction data. The model provides a flexible solution for modeling data while controlling for spurious events. Estimation of our model is carried out in an empirical Bayesian approach via data augmentation. Based on a simulation study, we investigate the properties of the estimation procedure. To demonstrate its usefulness in two distinct applications, we employ this model to combat events from the Syrian civil war and student co-location data. Results from the simulation and the applications identify the REMSE as a suitable approach to modeling relational event data in the presence of spurious events.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[292]
E. Dorigatti, B. Bischl and B. Schubert.
Improved proteasomal cleavage prediction with positive-unlabeled learning.
Preprint at arXiv (Sep. 2022). arXiv.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[291]
E. Dorigatti, J. Schweisthal, B. Bischl and M. Rezaei.
Robust and Efficient Imbalanced Positive-Unlabeled Learning with Self-supervision.
Preprint at arXiv (Sep. 2022). arXiv.
MCML Authors
Link to Jonas Schweisthal

Jonas Schweisthal

Artificial Intelligence in Management

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[290]
S.-F. Zheng, J. E. Nam, E. Dorigatti, B. Bischl, S. Azizi and M. Rezaei.
Joint Debiased Representation and Image Clustering Learning with Self-Supervision.
Preprint at arXiv (Sep. 2022). arXiv.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[289]
W. Ghada, E. Casellas, J. Herbinger, A. Garcia-Benadí, L. Bothmann, N. Estrella, J. Bech and A. Menzel.
Stratiform and Convective Rain Classification Using Machine Learning Models and Micro Rain Radar.
Remote Sensing 14.18 (Sep. 2022). DOI.
MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science


[288]
F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Representation Learning for Tablet and Paper Domain Adaptation in favor of Online Handwriting Recognition.
7th International Workshop on Multimodal pattern recognition of social signals in human computer interaction (MPRSS 2022) at the 26th International Conference on Pattern Recognition (ICPR 2022). Montreal, Canada, Aug 21-25, 2022. arXiv.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[287]
C. Leiber, L. G. M. Bauer, M. Neumayr, C. Plant and C. Böhm.
The DipEncoder: Enforcing Multimodality in Autoencoders.
28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2022). Washington, DC, USA, Aug 14-18, 2022. DOI.
Abstract

Hartigan’s Dip-test of unimodality gained increasing interest in unsupervised learning over the past few years. It is free from complex parameterization and does not require a distribution assumed a priori. A useful property is that the resulting Dip-values can be derived to find a projection axis that identifies multimodal structures in the data set. In this paper, we show how to apply the gradient not only with respect to the projection axis but also with respect to the data to improve the cluster structure. By tightly coupling the Dip-test with an autoencoder, we obtain an embedding that clearly separates all clusters in the data set. This method, called DipEncoder, is the basis of a novel deep clustering algorithm. Extensive experiments show that the DipEncoder is highly competitive to state-of-the-art methods.

MCML Authors
Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining

Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[286]
M. van Smeden, G. Heinze, B. Van Calster, F. W. Asselbergs, P. E. Vardas, N. Bruining, P. de Jaegere, J. H. Moore, S. Denaxas, A.-L. Boulesteix and K. G. M. Moons.
Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease.
European Heart Journal 43.31 (Aug. 2022). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[285]
M. Lotfollahi, M. Naghipourfar, M. D. Luecken, M. Khajavi, M. Büttner, M. Wagenstetter, Ž. Avsec, A. Gayoso, N. Yosef, M. Interlandi, S. Rybakov, A. V. Misharin and F. J. Theis.
Mapping single-cell data to reference atlases by transfer learning.
Nature Biotechnology 40 (Aug. 2022). DOI.
MCML Authors
Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems


[284]
Z. Ding, R. Qi, Z. Li, B. He, J. Wu, Y. Ma, Z. Meng, Z. Han and V. Tresp.
Forecasting Question Answering over Temporal Knowledge Graphs.
Preprint at arXiv (Aug. 2022). arXiv.
Abstract

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. TKGQA requires temporal reasoning techniques to extract the relevant information from temporal knowledge bases. The only existing TKGQA dataset, i.e., CronQuestions, consists of temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning the same period can be fully used for answer inference, allowing the TKGQA models to use even the future knowledge to answer the questions based on the past facts. In real-world scenarios, however, it is also common that given the knowledge until now, we wish the TKGQA systems to answer the questions asking about the future. As humans constantly seek plans for the future, building TKGQA systems for answering such forecasting questions is important. Nevertheless, this has still been unexplored in previous research. In this paper, we propose a novel task: forecasting question answering over temporal knowledge graphs. We also propose a large-scale TKGQA benchmark dataset, i.e., ForecastTKGQuestions, for this task. It includes three types of questions, i.e., entity prediction, yes-no, and fact reasoning questions. For every forecasting question in our dataset, QA models can only have access to the TKG information before the timestamp annotated in the given question for answer inference. We find that the state-of-the-art TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-no questions and fact reasoning questions. To this end, we propose ForecastTKGQA, a TKGQA model that employs a TKG forecasting module for future inference, to answer all three types of questions. Experimental results show that ForecastTKGQA outperforms recent TKGQA methods on the entity prediction questions, and it also shows great effectiveness in answering the other two types of questions.

MCML Authors
Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Zongyue Li

Zongyue Li

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[283]
F. Ott, N. L. Raichur, D. Rügamer, T. Feigl, H. Neumann, B. Bischl and C. Mutschler.
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression.
Preprint at arXiv (Aug. 2022). arXiv.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[282]
L. Schneider, L. Schäpermeier, R. P. Prager, B. Bischl, H. Trautmann and P. Kerschke.
HPO X ELA: Investigating Hyperparameter Optimization Landscapes by Means of Exploratory Landscape Analysis.
Preprint at arXiv (Aug. 2022). arXiv.
MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[281]
M. Schneble and G. Kauermann.
Estimation of Latent Network Flows in Bike-Sharing Systems.
Statistical Modelling 22.2 (Aug. 2022). DOI.
Abstract

Estimation of latent network flows is a common problem in statistical network analysis. The typical setting is that we know the margins of the network, that is, in- and outdegrees, but the flows are unobserved. In this article, we develop a mixed regression model to estimate network flows in a bike-sharing network if only the hourly differences of in- and outdegrees at bike stations are known. We also include exogenous covariates such as weather conditions. Two different parameterizations of the model are considered to estimate (a) the whole network flow and (b) the network margins only. The estimation of the model parameters is proposed via an iterative penalized maximum likelihood approach. This is exemplified by modelling network flows in the Vienna bike-sharing system. In order to evaluate our modelling approach, we conduct our analyses exploiting different distributional assumptions while we also respect the provider’s interventions appropriately for keeping the estimation error low. Furthermore, a simulation study is conducted to show the performance of the model. For practical purposes, it is crucial to predict when and at which station there is a lack or an excess of bikes. For this application, our model shows to be well suited by providing quite accurate predictions.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[280]
C. Fritz, G. De Nicola, M. Rave, M. Weigert, Y. Khazaei, U. Berger, H. Küchenhoff and G. Kauermann.
Statistical modelling of COVID-19 data: Putting generalized additive models to work.
Statistical Modelling 24.4 (Aug. 2022). DOI.
MCML Authors
Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[279]
F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder and B. Bischl.
YAHPO Gym - Design Criteria and a new Multifidelity Benchmark for Hyperparameter Optimization.
1st International Conference on Automated Machine Learning (AutoML 2022) co-located with the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 25-27, 2022. URL. GitHub.
Abstract

When developing and analyzing new hyperparameter optimization (HPO) methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we list desirable properties and requirements for such benchmarks and propose a new set of challenging and relevant multifidelity HPO benchmark problems motivated by these requirements. For this, we revisit the concept of surrogate-based benchmarks and empirically compare them to more widely-used tabular benchmarks, showing that the latter ones may induce bias in performance estimation and ranking of HPO methods. We present a new surrogate-based benchmark suite for multifidelity HPO methods consisting of 9 benchmark collections that constitute over 700 multifidelity HPO problems in total. All our benchmarks also allow for querying of multiple optimization targets, enabling the benchmarking of multi-objective HPO. We examine and compare our benchmark suite with respect to the defined requirements and show that our benchmarks provide viable additions to existing suites.

MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[278]
L. Schneider, F. Pfisterer, P. Kent, J. Branke, B. Bischl and J. Thomas.
Tackling neural architecture search with quality diversity optimization.
1st International Conference on Automated Machine Learning (AutoML 2022) co-located with the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 25-27, 2022. URL.
Abstract

Neural architecture search (NAS) has been studied extensively and has grown to become a research field with substantial impact. While classical single-objective NAS searches for the architecture with the best performance, multi-objective NAS considers multiple objectives that should be optimized simultaneously, e.g., minimizing resource usage along the validation error. Although considerable progress has been made in the field of multi-objective NAS, we argue that there is some discrepancy between the actual optimization problem of practical interest and the optimization problem that multi-objective NAS tries to solve. We resolve this discrepancy by formulating the multi-objective NAS problem as a quality diversity optimization (QDO) problem and introduce three quality diversity NAS optimizers (two of them belonging to the group of multifidelity optimizers), which search for high-performing yet diverse architectures that are optimal for application-specific niches, e.g., hardware constraints. By comparing these optimizers to their multi-objective counterparts, we demonstrate that quality diversity NAS in general outperforms multi-objective NAS with respect to quality of solutions and efficiency. We further show how applications and future NAS research can thrive on QDO.

MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[277]
E. Schede, J. Brandt, A. Tornede, M. Wever, V. Bengs, E. Hüllermeier and K. Tierney.
A Survey of Methods for Automated Algorithm Configuration.
31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 2022). Vienna, Austria, Jul 23-29, 2022. Extended Abstract. DOI.
MCML Authors
Link to Viktor Bengs

Viktor Bengs

Dr.

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[276]
M. Ali, M. Berrendorf, M. Galkin, V. Thost, T. Ma, V. Tresp and J. Lehmann.
Improving Inductive Link Prediction Using Hyper-Relational Facts (Extended Abstract).
Best paper track at the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 2022). Vienna, Austria, Jul 23-29, 2022. DOI.
Abstract

For many years, link prediction on knowledge graphs (KGs) has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based KGs, whereas their richer counterparts, hyper-relational KGs (e.g., Wikidata), have not yet been properly studied. In this work, we classify different inductive settings and study the benefits of employing hyper-relational KGs on a wide range of semi- and fully inductive link prediction tasks powered by recent advancements in graph neural networks. Our experiments on a novel set of benchmarks show that qualifiers over typed edges can lead to performance improvements of 6% of absolute gains (for the Hits@10 metric) compared to triple-only baselines.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[275]
A. Klaß, S. M. Lorenz, M. W. Lauer-Schmaltz, D. Rügamer, B. Bischl, C. Mutschler and F. Ott.
Uncertainty-aware Evaluation of Time-Series Classification for Online Handwriting Recognition with Domain Shift.
Workshop on Spatio-Temporal Reasoning and Learning (STRL 2022) at the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 2022). Vienna, Austria, Jul 23-29, 2022. URL.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[274]
A. Khakzar, Y. Li, Y. Zhang, M. Sanisoglu, S. T. Kim, M. Rezaei, B. Bischl and N. Navab.
Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models.
2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH 2022) at the the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 17-23, 2022. arXiv.
MCML Authors
Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Yawei Li

Yawei Li

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[273]
L. Hang, Q. Khan, V. Tresp and D. Cremers.
Biologically Inspired Neural Path Finding.
15th International Conference on Brain Informatics (BI 2022). Padova, Italy, Jul 15-15, 2022. DOI.
MCML Authors
Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[272]
A. Maronikolakis, P. Baader and H. Schütze.
Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes.
4th Workshop on Gender Bias in Natural Language Processing (GeBNLP 2022). Seattle, WA, USA, Jul 15, 2022. DOI.
MCML Authors
Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[271]
S. Yuan, A. Maronikolakis and H. Schütze.
Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing.
6th Workshop on Online Abuse and Harms (WOAH 2022). Seattle, WA, USA, Jul 14, 2022. DOI.
MCML Authors
Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[270]
S. Dandl, F. Pfisterer and B. Bischl.
Multi-Objective Counterfactual Fairness.
Genetic and Evolutionary Computation Conference (GECCO 2022). Boston, MA, USA, Jul 09-13, 2022. DOI.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[269]
L. Schneider, F. Pfisterer, J. Thomas and B. Bischl.
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models.
Genetic and Evolutionary Computation Conference (GECCO 2022). Boston, MA, USA, Jul 09-13, 2022. DOI.
MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[268]
M. Mittermeier, M. Weigert, D. Rügamer, H. Küchenhoff and R. Ludwig.
A deep learning based classification of atmospheric circulation types over Europe: projection of future changes in a CMIP6 large ensemble.
Environmental Research Letters 17.8 (Jul. 2022). DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)


[267]
Y. Yeganeh, A. Farshad and N. Navab.
Shape-Aware Masking for Inpainting in Medical Imaging.
Preprint at arXiv (Jul. 2022). arXiv.
MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[266]
Z. Liu, Y. Ma, M. Schubert, Y. Ouyang and Z. Xiong.
Multi-Modal Contrastive Pre-training for Recommendation.
ACM International Conference on Multimedia Retrieval (ICMR 2022). Newark, NJ, USA, Jun 27-30, 2022. DOI.
MCML Authors
Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[265]
S. Severini, A. Imani, P. Dufter and H. Schütze.
Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages.
13th International Conference on Language Resources and Evaluation (LREC 2022). Marseille, France, Jun 21-23, 2022. URL.
Abstract

Parallel corpora are ideal for extracting a multilingual named entity (MNE) resource, i.e., a dataset of names translated into multiple languages. Prior work on extracting MNE datasets from parallel corpora required resources such as large monolingual corpora or word aligners that are unavailable or perform poorly for underresourced languages. We present CLC-BN, a new method for creating an MNE resource, and apply it to the Parallel Bible Corpus, a corpus of more than 1000 languages. CLC-BN learns a neural transliteration model from parallel-corpus statistics, without requiring any other bilingual resources, word aligners, or seed data. Experimental results show that CLC-BN clearly outperforms prior work. We release an MNE resource for 1340 languages and demonstrate its effectiveness in two downstream tasks: knowledge graph augmentation and bilingual lexicon induction.

MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[264]
S. Severini, V. Hangya, M. J. Sabet, A. Fraser and H. Schütze.
Don't Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings.
15th Workshop on Building and Using Comparable Corpora (BUCC 2022) at the 13th International Conference on Language Resources and Evaluation (LREC 2022). Marseille, France, Jun 21-23, 2022. URL.
Abstract

Bilingual Word Embeddings (BWEs) are one of the cornerstones of cross-lingual transfer of NLP models. They can be built using only monolingual corpora without supervision leading to numerous works focusing on unsupervised BWEs. However, most of the current approaches to build unsupervised BWEs do not compare their results with methods based on easy-to-access cross-lingual signals. In this paper, we argue that such signals should always be considered when developing unsupervised BWE methods. The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold. We experiment on thirteen non-Latin languages (and English) and show that such cheap signals work well and that they outperform using more complex unsupervised methods on distant language pairs such as Chinese, Japanese, Kannada, Tamil, and Thai. In addition, they are even competitive with the use of high-quality lexicons in supervised approaches. Our results show that these training signals should not be neglected when building BWEs, even for distant languages.

MCML Authors
Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[263]
A. Khakzar, P. Khorsandi, R. Nobahari and N. Navab.
Do Explanations Explain? Model Knows Best.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). New Orleans, LA, USA, Jun 19-24, 2022. DOI.
MCML Authors
Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[262]
D. Muhle, L. Koestler, N. Demmel, F. Bernard and D. Cremers.
The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). New Orleans, LA, USA, Jun 19-24, 2022. DOI.
MCML Authors
Link to Dominik Muhle

Dominik Muhle

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[261]
V. Steinborn, P. Dufter, H. Jabbar and H. Schütze.
An Information-Theoretic Approach and Dataset for Probing Gender Stereotypes in Multilingual Masked Language Models.
Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022). Seattle, WA, USA, Jun 10-15, 2022. DOI.
Abstract

Bias research in NLP is a rapidly growing and developing field. Similar to CrowS-Pairs (Nangia et al., 2020), we assess gender bias in masked-language models (MLMs) by studying pairs of sentences with gender swapped person references.Most bias research focuses on and often is specific to English.Using a novel methodology for creating sentence pairs that is applicable across languages, we create, based on CrowS-Pairs, a multilingual dataset for English, Finnish, German, Indonesian and Thai.Additionally, we propose SJSD, a new bias measure based on Jensen–Shannon divergence, which we argue retains more information from the model output probabilities than other previously proposed bias measures for MLMs.Using multilingual MLMs, we find that SJSD diagnoses the same systematic biased behavior for non-English that previous studies have found for monolingual English pre-trained MLMs. SJSD outperforms the CrowS-Pairs measure, which struggles to find such biases for smaller non-English datasets.

MCML Authors
Link to Victor Steinborn

Victor Steinborn

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[260]
M. Zhao, F. Mi, Y. Wang, M. Li, X. Jiang, Q. Liu and H. Schütze.
LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework.
Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022). Seattle, WA, USA, Jun 10-15, 2022. DOI.
Abstract

Vast efforts have been devoted to creating high-performance few-shot learners, i.e., large-scale pretrained language models (PLMs) that perform well with little downstream task training data. Training PLMs has incurred significant cost, but utilizing the few-shot learners is still challenging due to their enormous size. This work focuses on a crucial question: How to make effective use of these few-shot learners? We propose LMTurk, a novel approach that treats few-shotlearners as crowdsourcing workers. The rationale is that crowdsourcing workers are in fact few-shot learners: They are shown a few illustrative examples to learn about a task and then start annotating. LMTurk employs few-shot learners built upon PLMs as workers. We show that the resulting annotations can be utilized to train models that solve the task well and are small enough to be deployable in practical scenarios. Active learning is integrated into LMTurk to reduce the amount of queries made to PLMs, minimizing the computational cost of running PLM inference passes. Altogether, LMTurk is an important step towards making effective use of current PLMs.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[259]
F. Müller, Q. Khan and D. Cremers.
Lateral Ego-Vehicle Control Without Supervision Using Point Clouds.
3rd International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI 2022). Paris, France, Jun 01-03, 2022. DOI.
MCML Authors
Link to Qadeer Khan

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[258]
M. Schneble and G. Kauermann.
Intensity Estimation on Geometric Networks with Penalized Splines.
Annals of Applied Statistics 16.2 (Jun. 2022). DOI.
Abstract

In the past decades the growing amount of network data lead to many novel statistical models. In this paper we consider so-called geometric networks. Typical examples are road networks or other infrastructure networks. Nevertheless, the neurons or the blood vessels in a human body can also be interpreted as a geometric network embedded in a three-dimensional space. A network-specific metric, rather than the Euclidean metric, is usually used in all these applications, making the analyses of network data challenging. We consider network-based point processes, and our task is to estimate the intensity (or density) of the process which allows us to detect high- and low-intensity regions of the underlying stochastic processes. Available routines that tackle this problem are commonly based on kernel smoothing methods. This paper uses penalized spline smoothing and extends this toward smooth intensity estimation on geometric networks. Furthermore, our approach easily allows incorporating covariates, enabling us to respect the network geometry in a regression model framework. Several data examples and a simulation study show that penalized spline-based intensity estimation on geometric networks is a numerically stable and efficient tool. Furthermore, it also allows estimating linear and smooth covariate effects, distinguishing our approach from already existing methodologies.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[257]
Q. Au, J. Herbinger, C. Stachl, B. Bischl and G. Casalicchio.
Grouped Feature Importance and Combined Features Effect Plot.
Data Mining and Knowledge Discovery 36 (Jun. 2022). DOI.
Abstract

Interpretable machine learning has become a very active area of research due to the rising popularity of machine learning algorithms and their inherently challenging interpretability. Most work in this area has been focused on the interpretation of single features in a model. However, for researchers and practitioners, it is often equally important to quantify the importance or visualize the effect of feature groups. To address this research gap, we provide a comprehensive overview of how existing model-agnostic techniques can be defined for feature groups to assess the grouped feature importance, focusing on permutation-based, refitting, and Shapley-based methods. We also introduce an importance-based sequential procedure that identifies a stable and well-performing combination of features in the grouped feature space. Furthermore, we introduce the combined features effect plot, which is a technique to visualize the effect of a group of features based on a sparse, interpretable linear combination of features. We used simulation studies and real data examples to analyze, compare, and discuss these methods.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[256]
J. Moosbauer, G. Casalicchio, M. Lindauer and B. Bischl.
Enhancing Explainability of Hyperparameter Optimization via Bayesian Algorithm Execution.
Preprint at arXiv (Jun. 2022). arXiv.
Abstract

Despite all the benefits of automated hyperparameter optimization (HPO), most modern HPO algorithms are black-boxes themselves. This makes it difficult to understand the decision process which leads to the selected configuration, reduces trust in HPO, and thus hinders its broad adoption. Here, we study the combination of HPO with interpretable machine learning (IML) methods such as partial dependence plots. These techniques are more and more used to explain the marginal effect of hyperparameters on the black-box cost function or to quantify the importance of hyperparameters. However, if such methods are naively applied to the experimental data of the HPO process in a post-hoc manner, the underlying sampling bias of the optimizer can distort interpretations. We propose a modified HPO method which efficiently balances the search for the global optimum w.r.t. predictive performance emph{and} the reliable estimation of IML explanations of an underlying black-box function by coupling Bayesian optimization and Bayesian Algorithm Execution. On benchmark cases of both synthetic objectives and HPO of a neural network, we demonstrate that our method returns more reliable explanations of the underlying black-box without a loss of optimization performance.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[255]
L. Weissweiler, V. Hofmann, M. J. Sabet and H. Schütze.
CaMEL: Case Marker Extraction without Labels.
60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). Dublin, Ireland, May 22-27, 2022. DOI.
Abstract

We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a silver standard from UniMorph. The case markers extracted by our model can be used to detect and visualise similarities and differences between the case systems of different languages as well as to annotate fine-grained deep cases in languages in which they are not overtly marked.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[254]
G. Fu, Z. Meng, Z. Han, Z. Ding, Y. Ma, M. Schubert, V. Tresp and R. Wattenhofer.
TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion.
6th ACL Workshop on Structured Prediction for NLP (SPNLP 2022) at the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). Dublin, Ireland, May 22-27, 2022. DOI.
Abstract

Temporal knowledge graphs store the dynamics of entities and relations during a time period. However, typical temporal knowledge graphs often suffer from incomplete dynamics with missing facts in real-world scenarios. Hence, modeling temporal knowledge graphs to complete the missing facts is important. In this paper, we tackle the temporal knowledge graph completion task by proposing TempCaps, which is a Capsule network-based embedding model for Temporal knowledge graph completion. TempCaps models temporal knowledge graphs by introducing a novel dynamic routing aggregator inspired by Capsule Networks. Specifically, TempCaps builds entity embeddings by dynamically routing retrieved temporal relation and neighbor information. Experimental results demonstrate that TempCaps reaches state-of-the-art performance for temporal knowledge graph completion. Additional analysis also shows that TempCaps is efficient.

MCML Authors
Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[253]
P. Kopper, S. Wiegrebe, B. Bischl, A. Bender and D. Rügamer.
DeepPAMM: Deep Piecewise Exponential Additive Mixed Models for Complex Hazard Structures in Survival Analysis.
26th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2022). Chengdu, China, May 16-19, 2022. DOI.
Abstract

Survival analysis (SA) is an active field of research that is concerned with time-to-event outcomes and is prevalent in many domains, particularly biomedical applications. Despite its importance, SA remains challenging due to small-scale data sets and complex outcome distributions, concealed by truncation and censoring processes. The piecewise exponential additive mixed model (PAMM) is a model class addressing many of these challenges, yet PAMMs are not applicable in high-dimensional feature settings or in the case of unstructured or multimodal data. We unify existing approaches by proposing DeepPAMM, a versatile deep learning framework that is well-founded from a statistical point of view, yet with enough flexibility for modeling complex hazard structures. We illustrate that DeepPAMM is competitive with other machine learning approaches with respect to predictive performance while maintaining interpretability through benchmark experiments and an extended case study.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[252]
L. Bothmann, K. Peters and B. Bischl.
What Is Fairness? Implications For FairML.
Preprint at arXiv (May. 2022). arXiv.
MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[251]
D. Rügamer.
Additive Higher-Order Factorization Machines.
Preprint at arXiv (May. 2022). arXiv.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[250]
C. Tomani and D. Cremers.
Challenger: Training with Attribution Maps.
Preprint at arXiv (May. 2022). arXiv.
MCML Authors
Link to Christian Tomani

Christian Tomani

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[249]
A. Bauer, M. Weigert and H. Jalal.
APCtools: Descriptive and Model-based Age-Period-Cohort Analysis.
The Journal of Open Source Software 7.73 (May. 2022). DOI.
Abstract

Age-Period-Cohort (APC) analysis aims to determine relevant drivers for long-term develop- ments and is used in many fields of science (Yang & Land, 2013). The R package APCtools offers modern visualization techniques and general routines to facilitate the interpretability of the interdependent temporal structures and to simplify the workflow of an APC analysis. Separation of the temporal effects is performed utilizing a semiparametric regression approach. We shortly discuss the challenges of APC analysis, give an overview of existing statistical software packages and outline the main functionalities of the package.

MCML Authors

[248]
T. Ullmann, C. Hennig and A.-L. Boulesteix.
Validation of cluster analysis results on validation data: A systematic framework.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12.3 (May. 2022). DOI.
Abstract

Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To assess the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic, structured review of the existing literature about this topic. For this purpose, we outline a formal framework that covers most existing approaches for validating clustering results on validation data. In particular, we review classical validation techniques such as internal and external validation, stability analysis, and visual validation, and show how they can be interpreted in terms of our framework. We define and formalize different types of validation of clustering results on a validation dataset, and give examples of how clustering studies from the applied literature that used a validation dataset can be seen as instances of our framework.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[247]
M. Windl, S. S. Feger, L. Zijlstra, A. Schmidt and P. W. Wozniak.
‘It Is Not Always Discovery Time’: Four Pragmatic Approaches in Designing AI Systems.
Conference on Human Factors in Computing Systems (CHI 2022). New Orleans, LA, USA, Apr 30-May 05, 2022. DOI.
Abstract

While systems that use Artificial Intelligence (AI) are increasingly becoming part of everyday technology use, we do not fully understand how AI changes design processes. A structured understanding of how designers work with AI is needed to improve the design process and educate future designers. To that end, we conducted interviews with designers who participated in projects which used AI. While past work focused on AI systems created by experienced designers, we focus on the perspectives of a diverse sample of interaction designers. Our results show that the design process of an interactive system is affected when AI is integrated and that design teams adapt their processes to accommodate AI. Based on our data, we contribute four approaches adopted by interaction designers working with AI: a priori, post-hoc, model-centric, and competence-centric. Our work contributes a pragmatic account of how design processes for AI systems are enacted.

MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media


[246]
M. Windl, N. Henze, A. Schmidt and S. S. Feger.
Automating Contextual Privacy Policies: Design and Evaluation of a Production Tool for Digital Consumer Privacy Awareness.
Conference on Human Factors in Computing Systems (CHI 2022). New Orleans, LA, USA, Apr 30-May 05, 2022. DOI.
Abstract

Users avoid engaging with privacy policies because they are lengthy and complex, making it challenging to retrieve relevant information. In response, research proposed contextual privacy policies (CPPs) that embed relevant privacy information directly into their affiliated contexts. To date, CPPs are limited to concept showcases. This work evolves CPPs into a production tool that automatically extracts and displays concise policy information. We first evaluated the technical functionality on the US’s 500 most visited websites with 59 participants. Based on our results, we further revised the tool to deploy it in the wild with 11 participants over ten days. We found that our tool is effective at embedding CPP information on websites. Moreover, we found that the tool’s usage led to more reflective privacy behavior, making CPPs powerful in helping users understand the consequences of their online activities. We contribute design implications around CPP presentation to inform future systems design.

MCML Authors
Link to Maximiliane Windl

Maximiliane Windl

Human-Centered Ubiquitous Media

Link to Albrecht Schmidt

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media


[245]
D. Alivanistos, M. Berrendorf, M. Cochez and M. Galkin.
Query Embedding on Hyper-Relational Knowledge Graphs.
10th International Conference on Learning Representations (ICLR 2022). Virtual, Apr 25-29, 2022. URL. GitHub.
Abstract

Multi-hop logical reasoning is an established problem in the field of representation learning on knowledge graphs (KGs). It subsumes both one-hop link prediction as well as other more complex types of logical queries. Existing algorithms operate only on classical, triple-based graphs, whereas modern KGs often employ a hyper-relational modeling paradigm. In this paradigm, typed edges may have several key-value pairs known as qualifiers that provide fine-grained context for facts. In queries, this context modifies the meaning of relations, and usually reduces the answer set. Hyper-relational queries are often observed in real-world KG applications, and existing approaches for approximate query answering cannot make use of qualifier pairs. In this work, we bridge this gap and extend the multi-hop reasoning problem to hyper-relational KGs allowing to tackle this new type of complex queries. Building upon recent advancements in Graph Neural Networks and query embedding techniques, we study how to embed and answer hyper-relational conjunctive queries. Besides that, we propose a method to answer such queries and demonstrate in our experiments that qualifiers improve query answering on a diverse set of query patterns.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member


[244]
M. Galkin, M. Berrendorf and C. T. Hoyt.
An Open Challenge for Inductive Link Prediction on Knowledge Graphs.
Workshop on Graph Learning Benchmarks (GLB 2022) at the International World Wide Web Conference (WWW 2022). Virtual, Apr 22-29, 2022. arXiv. GitHub.
Abstract

An emerging trend in representation learning over knowledge graphs (KGs) moves beyond transductive link prediction tasks over a fixed set of known entities in favor of inductive tasks that imply training on one graph and performing inference over a new graph with unseen entities. In inductive setups, node features are often not available and training shallow entity embedding matrices is meaningless as they cannot be used at inference time with unseen entities. Despite the growing interest, there are not enough benchmarks for evaluating inductive representation learning methods. In this work, we introduce ILPC 2022, a novel open challenge on KG inductive link prediction. To this end, we constructed two new datasets based on Wikidata with various sizes of training and inference graphs that are much larger than existing inductive benchmarks. We also provide two strong baselines leveraging recently proposed inductive methods. We hope this challenge helps to streamline community efforts in the inductive graph representation learning area. ILPC 2022 follows best practices on evaluation fairness and reproducibility.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member


[243]
C. T. Hoyt, M. Berrendorf, M. Gaklin, V. Tresp and B. M. Gyori.
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs.
Workshop on Graph Learning Benchmarks (GLB 2022) at the International World Wide Web Conference (WWW 2022). Virtual, Apr 22-29, 2022. arXiv.
MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[242]
C. Brunner, A. Duensing, C. Schröder, M. Mittermair, V. Golkov, M. Pollanka, D. Cremers and R. Kienberger.
Deep Learning in Attosecond Metrology.
Optics Express 30.9 (Apr. 2022). Editor's Pick. DOI.
MCML Authors
Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[241]
J. Herbinger, B. Bischl and G. Casalicchio.
REPID: Regional Effect Plots with implicit Interaction Detection.
25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022). Virtual, Mar 28-30, 2022. URL.
Abstract

Machine learning models can automatically learn complex relationships, such as non-linear and interaction effects. Interpretable machine learning methods such as partial dependence plots visualize marginal feature effects but may lead to misleading interpretations when feature interactions are present. Hence, employing additional methods that can detect and measure the strength of interactions is paramount to better understand the inner workings of machine learning models. We demonstrate several drawbacks of existing global interaction detection approaches, characterize them theoretically, and evaluate them empirically. Furthermore, we introduce regional effect plots with implicit interaction detection, a novel framework to detect interactions between a feature of interest and other features. The framework also quantifies the strength of interactions and provides interpretable and distinct regions in which feature effects can be interpreted more reliably, as they are less confounded by interactions. We prove the theoretical eligibility of our method and show its applicability on various simulation and real-world examples.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[240]
F. Pargent, F. Pfisterer, J. Thomas and B. Bischl.
Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features.
Computational Statistics 37 (Mar. 2022). DOI.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[239]
D. Strieder and M. Drton.
On the choice of the splitting ratio for the split likelihood ratio test.
Electronic Journal of Statistics 16.2 (Mar. 2022). DOI.
MCML Authors
Link to Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics


[238]
K. E. Riehm, E. Badillo Goicoechea, F. M. Wang, E. Kim, L. R. Aldridge, C. P. Lupton-Smith, R. Presskreischer, T.-H. Chang, S. LaRocca, F. Kreuter and E. A. Stuart.
Association of Non-Pharmaceutical Interventions to Reduce the Spread of SARS-CoV-2 With Anxiety and Depressive Symptoms: A Multi-National Study of 43 Countries.
International Journal of Public Health 67 (Mar. 2022). DOI.
MCML Authors
Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[237]
M. Keicher, K. Mullakaeva, T. Czempiel, K. Mach, A. Khakzar and N. Navab.
Few-shot Structured Radiology Report Generation Using Natural Language Prompts.
Preprint at arXiv (Mar. 2022). arXiv.
MCML Authors
Link to Matthias Keicher

Matthias Keicher

Computer Aided Medical Procedures & Augmented Reality

Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[236]
C. Fritz, E. Dorigatti and D. Rügamer.
Combining Graph Neural Networks and Spatio-temporal Disease Models to Predict COVID-19 Cases in Germany.
Scientific Reports 12.3930 (Mar. 2022). DOI.
Abstract

During 2020, the infection rate of COVID-19 has been investigated by many scholars from different research fields. In this context, reliable and interpretable forecasts of disease incidents are a vital tool for policymakers to manage healthcare resources. In this context, several experts have called for the necessity to account for human mobility to explain the spread of COVID-19. Existing approaches often apply standard models of the respective research field, frequently restricting modeling possibilities. For instance, most statistical or epidemiological models cannot directly incorporate unstructured data sources, including relational data that may encode human mobility. In contrast, machine learning approaches may yield better predictions by exploiting these data structures yet lack intuitive interpretability as they are often categorized as black-box models. We propose a combination of both research directions and present a multimodal learning framework that amalgamates statistical regression and machine learning models for predicting local COVID-19 cases in Germany. Results and implications: the novel approach introduced enables the use of a richer collection of data types, including mobility flows and colocation probabilities, and yields the lowest mean squared error scores throughout the observational period in the reported benchmark study. The results corroborate that during most of the observational period more dispersed meeting patterns and a lower percentage of people staying put are associated with higher infection rates. Moreover, the analysis underpins the necessity of including mobility data and showcases the flexibility and interpretability of the proposed approach.

MCML Authors

[235]
C. Nießl, M. Herrmann, C. Wiedemann, G. Casalicchio and A.-L. Boulesteix.
Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12.2 (Mar. 2022). DOI.
Abstract

In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over-optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[234]
Y. Liu, Y. Ma, M. Hildebrandt, M. Joblin and V. Tresp.
TLogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs.
36th Conference on Artificial Intelligence (AAAI 2022). Virtual, Feb 22-Mar 01, 2022. DOI.
Abstract

Conventional static knowledge graphs model entities in relational data as nodes, connected by edges of specific relation types. However, information and knowledge evolve continuously, and temporal dynamics emerge, which are expected to influence future situations. In temporal knowledge graphs, time information is integrated into the graph by equipping each edge with a timestamp or a time range. Embedding-based methods have been introduced for link prediction on temporal knowledge graphs, but they mostly lack explainability and comprehensible reasoning chains. Particularly, they are usually not designed to deal with link forecasting – event prediction involving future timestamps. We address the task of link forecasting on temporal knowledge graphs and introduce TLogic, an explainable framework that is based on temporal logical rules extracted via temporal random walks. We compare TLogic with state-of-the-art baselines on three benchmark datasets and show better overall performance while our method also provides explanations that preserve time consistency. Furthermore, in contrast to most state-of-the-art embedding-based methods, TLogic works well in the inductive setting where already learned rules are transferred to related datasets with a common vocabulary.

MCML Authors
Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[233]
S. Sharifzadeh, S. M. Baharlou, M. Schmitt, H. Schütze and V. Tresp.
Improving Scene Graph Classification by Exploiting Knowledge from Texts.
36th Conference on Artificial Intelligence (AAAI 2022). Virtual, Feb 22-Mar 01, 2022. DOI.
Abstract

Training scene graph classification models requires a large amount of annotated image data. Meanwhile, scene graphs represent relational knowledge that can be modeled with symbolic data from texts or knowledge graphs. While image annotation demands extensive labor, collecting textual descriptions of natural scenes requires less effort. In this work, we investigate whether textual scene descriptions can substitute for annotated image data. To this end, we employ a scene graph classification framework that is trained not only from annotated images but also from symbolic data. In our architecture, the symbolic entities are first mapped to their correspondent image-grounded representations and then fed into the relational reasoning pipeline. Even though a structured form of knowledge, such as the form in knowledge graphs, is not always available, we can generate it from unstructured texts using a transformer-based language model. We show that by fine-tuning the classification pipeline with the extracted knowledge from texts, we can achieve ~8x more accurate results in scene graph classification, ~3x in object classification, and ~1.5x in predicate classification, compared to the supervised baselines with only 1% of the annotated images.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[232]
A. Scagliotti and P. Colli Franzone.
Accelerated subgradient methods.
Preprint at arXiv (Feb. 2022). arXiv.
MCML Authors
Link to Alessandro Scagliotti

Alessandro Scagliotti

Applied Numerical Analysis


[231]
G. De Nicola, B. Sischka and G. Kauermann.
Mixture Models and Networks: The Stochastic Block Model.
Statistical Modelling 22.1-2 (Feb. 2022). DOI.
Abstract

Mixture models are probabilistic models aimed at uncovering and representing latent subgroups within a population. In the realm of network data analysis, the latent subgroups of nodes are typically identified by their connectivity behaviour, with nodes behaving similarly belonging to the same community. In this context, mixture modelling is pursued through stochastic blockmodelling. We consider stochastic blockmodels and some of their variants and extensions from a mixture modelling perspective. We also explore some of the main classes of estimation methods available and propose an alternative approach based on the reformulation of the blockmodel as a graphon. In addition to the discussion of inferential properties and estimating procedures, we focus on the application of the models to several real-world network datasets, showcasing the advantages and pitfalls of different approaches.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[230]
F. Ott, D. Rügamer, L. Heublein, B. Bischl and C. Mutschler.
Joint Classification and Trajectory Regression of Online Handwriting Using a Multi-Task Learning Approach.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2022). Waikoloa, Hawaii, Jan 04-08, 2022. DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[229]
J. Goldsmith and F. Scheipl.
tf: S3 classes and methods for tidy functional data. R package.
2022. GitHub.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[228]
J. Goldsmith and F. Scheipl.
tidyfun: Clean, wholesome, tidy fun with functional data in R. R package.
2022. GitHub.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[227]
R. Valliant, J. A. Dever, F. Kreuter and M. R. Valliant.
Package ‘PracTools’.
2022. URL.
MCML Authors
Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab


[226]
A. Bauer.
Flexible approaches in functional data and age-period-cohort analysis with application on complex geoscience data.
Dissertation 2022. DOI.
Abstract

This dissertation develops new approaches for robustly estimating functional data structures and analyzing age-period-cohort (APC) effects, with applications in seismology and tourism science. The first part introduces a method that separates amplitude and phase variation in functional data, adapting a likelihood-based registration approach for generalized and incomplete data, demonstrated on seismic data. The second part presents generalized functional additive models (GFAMs) for analyzing associations between functional data and scalar covariates, along with practical guidelines and an R package. The final part addresses APC analysis, proposing new visualization techniques and a semiparametric estimation approach to disentangle temporal dimensions, with applications to tourism data, and is supported by the APCtools R package. (Shortened.)

MCML Authors

[225]
M. Fromm.
Machine learning driven argument mining.
Dissertation 2022. DOI.
Abstract

This thesis addresses the challenges of argumentation in the digital age by applying machine learning methods to automatically identify, retrieve, and evaluate arguments from diverse and often contradictory online sources. The first focus is on argument identification, specifically in heterogeneous text sources and peer reviews, where the relationship between the topic and arguments is crucial, and knowledge transfer across domains is limited. The second focus is on argument retrieval, where machine learning is used to select relevant documents, ensuring comprehensive and non-redundant argument coverage. Finally, the thesis explores the strength or quality of arguments, integrating this concept with other argument mining tasks and evaluating its impact across different text domains and contexts. (Shortened.)

MCML Authors
Link to Michael Fromm

Michael Fromm

Dr.

* Former member


[224]
M. Herrmann.
Towards more reliable machine learning: conceptual insights and practical approaches for unsupervised manifold learning and supervised benchmark studies.
Dissertation 2022. DOI.
Abstract

This thesis focuses on improving the reliability and trustworthiness of machine learning, particularly in unsupervised learning methods like manifold learning. It investigates the challenges of evaluating manifold learning techniques and proposes improvements for embedding evaluation, outlier detection, and cluster analysis, using methods like UMAP and DBSCAN. Additionally, the thesis contributes to supervised learning by presenting a benchmark study on survival prediction in multi-omics cancer data and exploring the effects of design and analysis choices on benchmark results. (Shortened).

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine


[223]
D. Kazempour.
Advances in correlation clustering.
Dissertation 2022. DOI.
Abstract

This thesis addresses key challenges in correlation clustering, particularly in high-dimensional datasets, by developing novel methods to evaluate and improve clustering algorithms. The first contribution focuses on defining and deriving internal evaluation criteria for correlation clustering, proposing a new cost function to assess cluster quality based on commonalities among existing algorithms. The second part introduces two innovative strategies for detecting regions of interest (ROIs) in Hough space, improving the robustness of the Hough transform algorithm, and extending it to handle quadratic and periodic correlated clusters. Finally, the thesis explores unifying local and global correlation clustering views and enhancing the resilience of these methods to outliers. (Shortened.)

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member


[222]
O. Shchur.
Modeling Continuous-time Event Data with Neural Temporal Point Processes.
Dissertation 2022. URL.
Abstract

Temporal point processes (TPPs) provide a natural framework for modeling continuous-time event data such as earthquake catalogs in seismology or spike trains in neuroscience. Unlike conventional TPP models, neural TPPs are able to capture complex patterns present in real-world event data. The two main themes of this thesis are design of flexible, tractable and efficient neural TPP models, and their applications to real-world problems.

MCML Authors
Link to Oleksandr Shchur

Oleksandr Shchur

Dr.

* Former member


[221]
W. Simson.
Physics-Informed Deep Learning for Advanced Medical Ultrasound.
Dissertation 2022. DOI.
Abstract

Freehand ultrasound imaging is an important medical imaging modality due to its ease of applicability and wide application spectrum. Still, modern ultrasound imaging is a largely passive imaging modality, and does not dynamically adapt to the physics in the medium of interest. This dissertation presents the application of physics-informed deep learning for ultrasound imaging applied to sound speed estimation.

MCML Authors
Link to Walter Simson

Walter Simson

Dr.

* Former member


[220]
D. Zügner.
Adversarial Robustness of Graph Neural Networks.
Dissertation 2022. URL.
Abstract

In this thesis we look at graph neural networks (GNNs) from a perspective of adversarial robustness. We generalize the notion of adversarial attacks – small perturbations to the input data deliberately crafted to mislead a machine learning model – from traditional vector data such as images to graphs. We further propose robustness certification procedures for perturbations of the node attributes as well as the graph structure.

MCML Authors
Link to Daniel Zügner

Daniel Zügner

Dr.

* Former member


[219]
F. Ott, D. Rügamer, L. Heublein, T. Hamann, J. Barth, B. Bischl and C. Mutschler.
Benchmarking online sequence-to-sequence and character-based handwriting recognition from IMU-enhanced pens.
International Journal on Document Analysis and Recognition 25.4 (2022). DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[218]
C. Fritz and G. Kauermann.
On the Interplay of Regional Mobility, Social Connectedness, and the Spread of COVID-19 in Germany.
Journal of the Royal Statistical Society. Series A (Statistics in Society) 185.1 (Jan. 2022). DOI.
Abstract

Since the primary mode of respiratory virus transmission is person-to-person interaction, we are required to reconsider physical interaction patterns to mitigate the number of people infected with COVID-19. While research has shown that non-pharmaceutical interventions (NPI) had an evident impact on national mobility patterns, we investigate the relative regional mobility behaviour to assess the effect of human movement on the spread of COVID-19. In particular, we explore the impact of human mobility and social connectivity derived from Facebook activities on the weekly rate of new infections in Germany between 3 March and 22 June 2020. Our results confirm that reduced social activity lowers the infection rate, accounting for regional and temporal patterns. The extent of social distancing, quantified by the percentage of people staying put within a federal administrative district, has an overall negative effect on the incidence of infections. Additionally, our results show spatial infection patterns based on geographical as well as social distances.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[217]
A. Python, A. Bender, M. Blangiardo, J. B. Illian, Y. Lin, B. Liu, T. C. D. Lucas, S. Tan, Y. Wen, D. Svanidze and J. Yin.
A downscaling approach to compare COVID-19 count data from databases aggregated at different spatial scales.
Journal of the Royal Statistical Society. Series A (Statistics in Society) 185.1 (Jan. 2022). DOI.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[216]
V. Nguyen, M. H. Shaker and E. Hüllermeier.
How to measure uncertainty in uncertainty sampling for active learning.
Machine Learning 111.1 (2022). DOI.
MCML Authors
Link to Mohammad Hossein Shaker Ardakani

Mohammad Hossein Shaker Ardakani

Artificial Intelligence & Machine Learning

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning


[215]
M. Lange, V. Bergen, M. Klein, M. Setty, B. Reuter, M. Bakhti, H. Lickert, M. Ansari, J. Schniering, H. B. Schiller, D. Pe’er and F. J. Theis.
CellRank for directed single-cell fate mapping.
Nature Methods 19.2 (Jan. 2022). DOI.
MCML Authors
Link to Marius Lange

Marius Lange

Dr.

* Former member

Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems


[214]
E. Dorigatti, J. Goschenhofer, B. Schubert, M. Rezaei and B. Bischl.
Positive-Unlabeled Learning with Uncertainty-aware Pseudo-label Selection.
Preprint at arXiv (Jan. 2022). arXiv.
MCML Authors
Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


2021


[213]
A. Beer, L. Stephan and T. Seidl.
LUCKe- Connecting Clustering and Correlation Clustering.
IEEE International Conference on Data Mining Workshops (ICDMW 2021). Auckland, New Zealand, Dec 07-10, 2021. DOI.
Abstract

LUCKe allows any purely distance-based ‘classic’ clustering algorithm to reliably find linear correlation clusters. An elaborated distance matrix based on the points’ local PCA extracts all necessary information from high dimensional data to declare points of the same arbitrary dimensional linear correlation cluster as ‘similar’. For that, the points’ eigensystems as well as only the relevant information about their position in space, are put together. LUCKe allows transferring known benefits from the large field of basic clustering to correlation clustering. Its applicability is shown in extensive experiments with simple representatives of diverse basic clustering approaches.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[212]
A. Lohrer, J. Deller, M. Hünemörder and P. Kröger.
OAB - An Open Anomaly Benchmark Framework for Unsupervised and Semisupervised Anomaly Detection on Image and Tabular Data Sets.
IEEE International Conference on Data Mining Workshops (ICDMW 2021). Auckland, New Zealand, Dec 07-10, 2021. DOI.
Abstract

We introduce OAB, an Open Anomaly Benchmark Framework for unsupervised and semisupervised anomaly detection on image and tabular data sets, ensuring simple reproducibility for existing benchmark results as well as a reliable comparability and low-effort extensibility when new anomaly detection algorithms or new data sets are added. While making established methods of the most popular benchmarks easily accessible, OAB generalizes the task of un- and semisupervised anomaly benchmarking and offers besides commonly used benchmark data sets also semantically meaningful real-world anomaly data sets as well as a broad range of traditional and state-of-the-art anomaly detection algorithms. The benefit of OAB for the research community has been demonstrated by reproducing and extending existing benchmarks to new algorithms with very low effort allowing researchers to focus on the actual algorithm research.

MCML Authors
Andreas Lohrer

Andreas Lohrer

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member


[211]
J. Moosbauer, J. Herbinger, G. Casalicchio, M. Lindauer and B. Bischl.
Explaining Hyperparameter Optimization via Partial Dependence Plots.
35th Conference on Neural Information Processing Systems (NeurIPS 2021). Virtual, Dec 06-14, 2021. URL. GitHub.
Abstract

Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of explainability makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO with Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, such as the partial dependence plot (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. We propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[210]
M. Weber, J. Xie, M. D. Collins, Y. Zhu, P. Voigtlaender, H. Adam, B. Green, A. Geiger, B. Leibe, D. Cremers, A. Osep, L. Leal-Taixé and L.-C. Chen.
STEP: Segmenting and Tracking Every Pixel.
Track on Datasets and Benchmarks at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Virtual, Dec 06-14, 2021. PDF.
MCML Authors
Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Link to Laura Leal-Taixé

Laura Leal-Taixé

Prof. Dr.

* Former member


[209]
Y. Zhang, A. Khakzar, Y. Li, A. Farshad, S. T. Kim and N. Navab.
Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information.
Track on Datasets and Benchmarks at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Virtual, Dec 06-14, 2021. URL.
MCML Authors
Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Yawei Li

Yawei Li

Statistical Learning & Data Science

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[208]
T. Weber, M. Ingrisch, M. Fabritius, B. Bischl and D. Rügamer.
Survival-oriented embeddings for improving accessibility to complex data structures.
Workshop on Bridging the Gap: from Machine Learning Research to Clinical Practice at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Virtual, Dec 06-14, 2021. arXiv.
MCML Authors
Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[207]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation.
Workshop on Deep Generative Models and Downstream Applications at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Virtual, Dec 06-14, 2021. PDF.
MCML Authors
Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[206]
M. Mittermeier, M. Weigert and D. Rügamer.
Identifying the atmospheric drivers of drought and heat using a smoothed deep learning approach.
Workshop on Tackling Climate Change with Machine Learning at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Virtual, Dec 06-14, 2021. PDF.
Abstract

Europe was hit by several, disastrous heat and drought events in recent summers. Besides thermodynamic influences, such hot and dry extremes are driven by certain atmospheric situations including anticyclonic conditions. Effects of climate change on atmospheric circulations are complex and many open research questions remain in this context, e.g., on future trends of anticyclonic conditions. Based on the combination of a catalog of labeled circulation patterns and spatial atmospheric variables, we propose a smoothed convolutional neural network classifier for six types of anticyclonic circulations that are associated with drought and heat. Our work can help to identify important drivers of hot and dry extremes in climate simulations, which allows to unveil the impact of climate change on these drivers. We address various challenges inherent to circulation pattern classification that are also present in other climate patterns, e.g., subjective labels and unambiguous transition periods.

MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[205]
S. Kevork and G. Kauermann.
Iterative Estimation of Mixed Exponential Random Graph Models with Nodal Random Effects.
Network Science 9.4 (Dec. 2021). DOI.
Abstract

The presence of unobserved node-specific heterogeneity in exponential random graph models (ERGM) is a general concern, both with respect to model validity as well as estimation instability. We, therefore, include node-specific random effects in the ERGM that account for unobserved heterogeneity in the network. This leads to a mixed model with parametric as well as random coefficients, labelled as mixed ERGM. Estimation is carried out by iterating between approximate pseudolikelihood estimation for the random effects and maximum likelihood estimation for the remaining parameters in the model. This approach provides a stable algorithm, which allows to fit nodal heterogeneity effects even for large scale networks. We also propose model selection based on the Akaike Information Criterion to check for node-specific heterogeneity.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[204]
M. Bernhard and M. Schubert.
Correcting Imprecise Object Locations for Training Object Detectors in Remote Sensing Applications.
Remote Sensing 13 (Dec. 2021). URL.
Abstract

Object detection on aerial and satellite imagery is an important tool for image analysis in remote sensing and has many areas of application. As modern object detectors require accurate annotations for training, manual and labor-intensive labeling is necessary. In situations where GPS coordinates for the objects of interest are already available, there is potential to avoid the cumbersome annotation process. Unfortunately, GPS coordinates are often not well-aligned with georectified imagery. These spatial errors can be seen as noise regarding the object locations, which may critically harm the training of object detectors and, ultimately, limit their practical applicability. To overcome this issue, we propose a co-correction technique that allows us to robustly train a neural network with noisy object locations and to transform them toward the true locations. When applied as a preprocessing step on noisy annotations, our method greatly improves the performance of existing object detectors. Our method is applicable in scenarios where the images are only annotated with points roughly indicating object locations, instead of entire bounding boxes providing precise information on the object locations and extents. We test our method on three datasets and achieve a substantial improvement (e.g., 29.6% mAP on the COWC dataset) over existing methods for noise-robust object detection.

MCML Authors
Link to Maximilian Bernhard

Maximilian Bernhard

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[203]
Y. Elazar, N. Kassner, S. Ravfogel, A. Ravichander, E. Hovy, H. Schütze and Y. Goldberg.
Measuring and Improving Consistency in Pretrained Language Models.
Transactions of the Association for Computational Linguistics 9 (Dec. 2021). DOI.
Abstract

Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel, we show that the consistency of all PLMs we experiment with is poor— though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[202]
A. Farshad, S. Musatian, H. Dhamo and N. Navab.
MIGS: Meta Image Generation from Scene Graphs.
32nd British Machine Vision Conference (BMVC 2021). Virtual, Nov 22-25, 2021. URL.
MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[201]
N. Kees, M. Fromm, E. Faerman and T. Seidl.
Active Learning for Argument Strength Estimation.
2nd Workshop on Insights from Negative Results (Insights 2021) at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). Punta Cana, Dominican Republic, Nov 07-11, 2021. DOI.
Abstract

High-quality arguments are an essential part of decision-making. Automatically predicting the quality of an argument is a complex task that recently got much attention in argument mining. However, the annotation effort for this task is exceptionally high. Therefore, we test uncertainty-based active learning (AL) methods on two popular argument-strength data sets to estimate whether sample-efficient learning can be enabled. Our extensive empirical evaluation shows that uncertainty-based acquisition functions can not surpass the accuracy reached with the random acquisition on these data sets.

MCML Authors
Link to Michael Fromm

Michael Fromm

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[200]
A. Imani, M. J. Sabet, L. K. Senel, P. Philipp, F. Yvon and H. Schütze.
Graph Algorithms for Multiparallel Word Alignment.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). Punta Cana, Dominican Republic, Nov 07-11, 2021. DOI.
Abstract

With the advent of end-to-end deep learning approaches in machine translation, interest in word alignments initially decreased; however, they have again become a focus of research more recently. Alignments are useful for typological research, transferring formatting like markup to translated texts, and can be used in the decoding of machine translation systems. At the same time, massively multilingual processing is becoming an important NLP scenario, and pretrained language and machine translation models that are truly multilingual are proposed. However, most alignment algorithms rely on bitexts only and do not leverage the fact that many parallel corpora are multiparallel. In this work, we exploit the multiparallelity of corpora by representing an initial set of bilingual alignments as a graph and then predicting additional edges in the graph. We present two graph algorithms for edge prediction: one inspired by recommender systems and one based on network link prediction. Our experimental results show absolute improvements in F1 of up to 28{%} over the baseline bilingual word aligner in different datasets.

MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[199]
N. Kassner, O. Tafjord, H. Schütze and P. Clark.
BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). Punta Cana, Dominican Republic, Nov 07-11, 2021. DOI.
Abstract

Although pretrained language models (PTLMs) contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after specialized training. As a result, it can be hard to identify what the model actually “believes” about the world, making it susceptible to inconsistent behavior and simple errors. Our goal is to reduce these problems. Our approach is to embed a PTLM in a broader system that also includes an evolving, symbolic memory of beliefs – a BeliefBank – that records but then may modify the raw PTLM answers. We describe two mechanisms to improve belief consistency in the overall system. First, a reasoning component – a weighted MaxSAT solver – revises beliefs that significantly clash with others. Second, a feedback component issues future queries to the PTLM using known beliefs as context. We show that, in a controlled experimental setting, these two mechanisms result in more consistent beliefs in the overall system, improving both the accuracy and consistency of its answers over time. This is significant as it is a first step towards PTLM-based architectures with a systematic notion of belief, enabling them to construct a more coherent picture of the world, and improve over time without model retraining.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[198]
C. Fritz, M. Mehrl, P. W. Thurner and G. Kauermann.
The Role of Governmental Weapons Procurements in Forecasting Monthly Fatalities in Intrastate Conflicts: A Semiparametric Hierarchical Hurdle Model.
International Interactions 48.4 (Nov. 2021). DOI.
Abstract

Accurate and interpretable forecasting models predicting spatially and temporally fine-grained changes in the numbers of intrastate conflict casualties are of crucial importance for policymakers and international non-governmental organizations (NGOs). Using a count data approach, we propose a hierarchical hurdle regression model to address the corresponding prediction challenge at the monthly PRIO-grid level. More precisely, we model the intensity of local armed conflict at a specific point in time as a three-stage process. Stages one and two of our approach estimate whether we will observe any casualties at the country- and grid-cell-level, respectively, while stage three applies a regression model for truncated data to predict the number of such fatalities conditional upon the previous two stages. Within this modeling framework, we focus on the role of governmental arms imports as a processual factor allowing governments to intensify or deter from fighting. We further argue that a grid cell’s geographic remoteness is bound to moderate the effects of these military buildups. Out-of-sample predictions corroborate the effectiveness of our parsimonious and theory-driven model, which enables full transparency combined with accuracy in the forecasting process.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[197]
M. Herrmann and F. Scheipl.
A Geometric Perspective on Functional Outlier Detection.
Stats 4.4 (Nov. 2021). DOI.
Abstract

We consider functional outlier detection from a geometric perspective, specifically: for functional datasets drawn from a functional manifold, which is defined by the data’s modes of variation in shape, translation, and phase. Based on this manifold, we developed a conceptualization of functional outlier detection that is more widely applicable and realistic than previously proposed taxonomies. Our theoretical and experimental analyses demonstrated several important advantages of this perspective: it considerably improves theoretical understanding and allows describing and analyzing complex functional outlier scenarios consistently and in full generality, by differentiating between structurally anomalous outlier data that are off-manifold and distributionally outlying data that are on-manifold, but at its margins. This improves the practical feasibility of functional outlier detection: we show that simple manifold-learning methods can be used to reliably infer and visualize the geometric structure of functional datasets. We also show that standard outlier-detection methods requiring tabular data inputs can be applied to functional data very successfully by simply using their vector-valued representations learned from manifold learning methods as the input features. Our experiments on synthetic and real datasets demonstrated that this approach leads to outlier detection performances at least on par with existing functional-data-specific methods in a large variety of settings, without the highly specialized, complex methodology and narrow domain of application these methods often entail.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[196]
M. Ali, M. Berrendorf, M. Galkin, V. Thost, T. Ma, V. Tresp and J. Lehmann.
Improving Inductive Link Prediction Using Hyper-Relational Facts.
20th International Semantic Web Conference (ISWC 2021). Virtual, Oct 24-28, 2021. DOI. GitHub.
Abstract

For many years, link prediction on knowledge graphs (KGs) has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based KGs, whereas their richer counterparts, hyper-relational KGs (e.g., Wikidata), have not yet been properly studied. In this work, we classify different inductive settings and study the benefits of employing hyper-relational KGs on a wide range of semi- and fully inductive link prediction tasks powered by recent advancements in graph neural networks. Our experiments on a novel set of benchmarks show that qualifiers over typed edges can lead to performance improvements of 6% of absolute gains (for the Hits@10 metric) compared to triple-only baselines.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[195]
G. Braso, N. Kister and L. Leal-Taixé.
The Center of Attention: Center-Keypoint Grouping Attention for Multi-Person Pose Estimation.
IEEE/CVF International Conference on Computer Vision (ICCV 2021). Virtual, Oct 11-17, 2021. DOI.
MCML Authors
Link to Laura Leal-Taixé

Laura Leal-Taixé

Prof. Dr.

* Former member


[194]
S. Garg, H. Dhamo, A. Farshad, S. Musatian, N. Navab and F. Tombari.
Unconditional Scene Graph Generation.
IEEE/CVF International Conference on Computer Vision (ICCV 2021). Virtual, Oct 11-17, 2021. DOI.
MCML Authors
Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[193]
A. Khakzar, S. Musatian, J. Buchberger, I. V. Quiroz, N. Pinger, S. Baselizadeh, S. T. Kim and N. Navab.
Towards Semantic Interpretation of Thoracic Disease and COVID-19 Diagnosis Models.
24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). Strasbourg, France, Sep 27-Oct 01, 2021. DOI.
MCML Authors
Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[192]
A. Khakzar, Y. Zhang, W. Mansour, Y. Cai, Y. Li, Y. Zhang, S. T. Kim and N. Navab.
Explaining COVID-19 and Thoracic Pathology Model Predictions by Identifying Informative Input Features.
24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). Strasbourg, France, Sep 27-Oct 01, 2021. DOI.
MCML Authors
Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Yawei Li

Yawei Li

Statistical Learning & Data Science

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[191]
D. Kazempour, A. Beer, M. Oelker, P. Kröger and T. Seidl.
Compound Segmentation via Clustering on Mol2Vec-based Embeddings.
17th IEEE eScience Conference (eScience 2021). Virtual, Sep 20-23, 2021. DOI.
Abstract

During different steps in the process of discovering drug candidates for diseases, it can be supportive to identify groups of molecules that share similar properties, i.e. common overall structural similarity. The existing methods for computing (dis)similarities between chemical structures rely on a priori domain knowledge. Here we investigate the clustering of compounds that are applied on embeddings generated from a recently published Mol2Vec technique which enables an entirely unsupervised vector representation of compounds. A research question we address in this work is: do existent well-known clustering algorithms such as k-means or hierarchical clustering methods yield meaningful clusters on the Mol2Vec embeddings? Further, we investigate how far subspace clustering can be utilized to compress the data by reducing the dimensionality of the compounds vector representation. Our first conducted experiments on a set of COVID-19 drug candidates reveal that well-established methods yield meaningful clusters. Preliminary results from subspace clusterings indicate that a compression of the vector representations seems viable.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[190]
S. Obermeier, A. Beer, F. Wahl and T. Seidl.
Cluster Flow — an Advanced Concept for Ensemble-Enabling, Interactive Clustering.
19th Symposium of Database Systems for Business, Technology and Web (BTW 2021). Dresden, Germany, Sep 13-17, 2021. DOI.
Abstract

Even though most clustering algorithms serve knowledge discovery in fields other than computer science, most of them still require users to be familiar with programming or data mining to some extent. As that often prevents efficient research, we developed an easy to use, highly explainable clustering method accompanied by an interactive tool for clustering. It is based on intuitively understandable kNN graphs and the subsequent application of adaptable filters, which can be combined ensemble-like and iteratively and prune unnecessary or misleading edges. For a first overview of the data, fully automatic predefined filter cascades deliver robust results. A selection of simple filters and combination methods that can be chosen interactively yield very good results on benchmark datasets compared to various algorithms.

MCML Authors
Link to Sandra Gilhuber

Sandra Gilhuber

Database Systems & Data Mining

Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[189]
S. Coors, D. Schalk, B. Bischl and D. Rügamer.
Automatic Componentwise Boosting: An Interpretable AutoML System.
Automating Data Science Workshop (ADS 2021) at the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD 2021). Virtual, Sep 13-17, 2021. arXiv.
Abstract

In practice, machine learning (ML) workflows require various different steps, from data preprocessing, missing value imputation, model selection, to model tuning as well as model evaluation. Many of these steps rely on human ML experts. AutoML - the field of automating these ML pipelines - tries to help practitioners to apply ML off-the-shelf without any expert knowledge. Most modern AutoML systems like auto-sklearn, H20-AutoML or TPOT aim for high predictive performance, thereby generating ensembles that consist almost exclusively of black-box models. This, in turn, makes the interpretation for the layperson more intricate and adds another layer of opacity for users. We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm. Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions, allows for a straightforward calculation of feature importance, and gives insights into the required model complexity to fit the given task. We introduce the general framework and outline its implementation autocompboost. To demonstrate the frameworks efficacy, we compare autocompboost to other existing systems based on the OpenML AutoML-Benchmark. Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets while being more user-friendly and transparent.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[188]
R. Sonabend, F. J. Király, A. Bender, B. Bischl and M. Lang.
mlr3proba: An R Package for Machine Learning in Survival Analysis.
Bioinformatics 37.17 (Sep. 2021). DOI.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[187]
A. Lohrer, A. Beer, M. Hünemörder, J. Lauterbach, T. Seidl and P. Kröger.
AnyCORE - An Anytime Algorithm for Cluster Outlier REmoval.
Conference on Lernen. Wissen. Daten. Analysen (LWDA 2021). München, Germany, Sep 01-03, 2021. PDF.
Abstract

We introduce AnyCORE (Anytime Cluster Outlier REmoval), an algorithm that enables users to detect and remove outliers at anytime. The algorithm is based on the idea of MORe++, an approach for outlier detection and removal that iteratively scores and removes 1d-cluster-outliers in n-dimensional data sets. In contrast to MORe++, AnyCORE provides continuous responses for its users and converges independent of cluster centers. This allows AnyCORE to perform outlier detection in combination with an arbitrary clustering method that is most suitable for a given data set. We conducted our AnyCORE experiments on synthetic and real-world data sets by benchmarking its variant with k-Means as the underlying clustering method versus the traditional batch algorithm version of MORe++. In extensive experiments we show that AnyCORE is able to compete with the related batch algorithm version.

MCML Authors
Andreas Lohrer

Andreas Lohrer

* Former member

Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member


[186]
D. Kazempour, J. Winter, P. Kröger and T. Seidl.
On Methods and Measures for the Inspection of Arbitrarily Oriented Subspace Clusters.
Datenbank-Spektrum 21 (Sep. 2021). DOI.
Abstract

When using arbitrarily oriented subspace clustering algorithms one obtains a partitioning of a given data set and for each partition its individual subspace. Since clustering is an unsupervised machine learning task, we may not have “ground truth” labels at our disposal or do not wish to rely on them. What is needed in such cases are internal measure which permits a label-less analysis of the obtained subspace clustering. In this work, we propose methods for revising clusters obtained from arbitrarily oriented correlation clustering algorithms. Initial experiments conducted reveal improvements in the clustering results compared to the original clustering outcome. Our proposed approach is simple and can be applied as a post-processing step on arbitrarily oriented correlation clusterings.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[185]
T. Seidl, M. Fromm and S. Obermeier.
Proceedings of the LWDA 2021 Workshops: FGWM, KDML, FGWI-BIA, and FGIR.
LWDA 2021 - Lernen, Wissen, Daten, Analysen 2021 (Sep. 2021). URL.
Abstract

LWDA 2021 is a joint conference of six special interest groups of the German Computer Science Society (GI), addressing research in the areas of knowledge discovery and machine learning, information retrieval, database systems, and knowledge management. The German acronym LWDA stands for ‘Lernen, Wissen, Daten, Analysen’ (Learning, Knowledge, Data, Analytics). Following the tradition of the last years, LWDA 2021 provides a joint forum for experienced and young researchers, to bring insights into recent trends, technologies, and applications and to promote interaction among the special interest groups.

MCML Authors
Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

Link to Michael Fromm

Michael Fromm

Dr.

* Former member

Link to Sandra Gilhuber

Sandra Gilhuber

Database Systems & Data Mining


[184]
C. Fritz, P. W. Thurner and G. Kauermann.
Separable and Semiparametric Network-based Counting Processes applied to the International Combat Aircraft Trades.
Network Science 9.3 (Sep. 2021). DOI.
Abstract

We propose a novel tie-oriented model for longitudinal event network data. The generating mechanism is assumed to be a multivariate Poisson process that governs the onset and repetition of yearly observed events with two separate intensity functions. We apply the model to a network obtained from the yearly dyadic number of international deliveries of combat aircraft trades between 1950 and 2017. Based on the trade gravity approach, we identify economic and political factors impeding or promoting the number of transfers. Extensive dynamics as well as country heterogeneities require the specification of semiparametric time-varying effects as well as random effects. Our findings reveal strong heterogeneous as well as time-varying effects of endogenous and exogenous covariates on the onset and repetition of aircraft trade events.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[183]
F. Soleymani, M. Eslami, T. Elze, B. Bischl and M. Rezaei.
Deep Variational Clustering Framework for Self-labeling of Large-scale Medical Images.
Preprint at arXiv (Sep. 2021). arXiv.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science


[182]
L. Miklautz, L. G. M. Bauer, D. Mautz, S. Tschiatschek, C. Böhm and C. Plant.
Details (Don't) Matter: Isolating Cluster Information in Deep Embedded Spaces.
30th International Joint Conference on Artificial Intelligence ((IJCAI 2021)). Montreal, Canada, Aug 19-26, 2021. DOI.
Abstract

Deep clustering techniques combine representation learning with clustering objectives to improve their performance. Among existing deep clustering techniques, autoencoder-based methods are the most prevalent ones. While they achieve promising clustering results, they suffer from an inherent conflict between preserving details, as expressed by the reconstruction loss, and finding similar groups by ignoring details, as expressed by the clustering loss. This conflict leads to brittle training procedures, dependence on trade-off hyperparameters and less interpretable results. We propose our framework, ACe/DeC, that is compatible with Autoencoder Centroid based Deep Clustering methods and automatically learns a latent representation consisting of two separate spaces. The clustering space captures all cluster-specific information and the shared space explains general variation in the data. This separation resolves the above mentioned conflict and allows our method to learn both detailed reconstructions and cluster specific abstractions. We evaluate our framework with extensive experiments to show several benefits: (1) cluster performance – on various data sets we outperform relevant baselines; (2) no hyperparameter tuning – this improved performance is achieved without introducing new clustering specific hyperparameters; (3) interpretability – isolating the cluster specific information in a separate space is advantageous for data exploration and interpreting the clustering results; and (4) dimensionality of the embedded space – we automatically learn a low dimensional space for clustering. Our ACe/DeC framework isolates cluster information, increases stability and interpretability, while improving cluster performance.

MCML Authors
Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[181]
C. Leiber, L. G. M. Bauer, B. Schelling, C. Böhm and C. Plant.
Dip-based Deep Embedded Clustering with k-Estimation.
27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2021). Singapore, Aug 14-18, 2021. DOI.
Abstract

The combination of clustering with Deep Learning has gained much attention in recent years. Unsupervised neural networks like autoencoders can autonomously learn the essential structures in a data set. This idea can be combined with clustering objectives to learn relevant features automatically. Unfortunately, they are often based on a k-means framework, from which they inherit various assumptions, like spherical-shaped clusters. Another assumption, also found in approaches outside the k-means-family, is knowing the number of clusters a-priori. In this paper, we present the novel clustering algorithm DipDECK, which can estimate the number of clusters simultaneously to improving a Deep Learning-based clustering objective. Additionally, we can cluster complex data sets without assuming only spherically shaped clusters. Our algorithm works by heavily overestimating the number of clusters in the embedded space of an autoencoder and, based on Hartigan’s Dip-test - a statistical test for unimodality - analyses the resulting micro-clusters to determine which to merge. We show in extensive experiments the various benefits of our method: (1) we achieve competitive results while learning the clustering-friendly representation and number of clusters simultaneously; (2) our method is robust regarding parameters, stable in performance, and allows for more flexibility in the cluster shape; (3) we outperform relevant competitors in the estimation of the number of clusters.

MCML Authors
Link to Collin Leiber

Collin Leiber

Database Systems & Data Mining

Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[180]
A. Imani, M. J. Sabet, P. Dufter, M. Cysouw and H. Schütze.
ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus.
Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021). Bangkok, Thailand, Aug 01-06, 2021. DOI.
Abstract

With more than 7000 languages worldwide, multilingual natural language processing (NLP) is essential both from an academic and commercial perspective. Researching typological properties of languages is fundamental for progress in multilingual NLP. Examples include assessing language similarity for effective transfer learning, injecting inductive biases into machine learning models or creating resources such as dictionaries and inflection tables. We provide ParCourE, an online tool that allows to browse a word-aligned parallel corpus, covering 1334 languages. We give evidence that this is useful for typological research. ParCourE can be set up for any parallel corpus and can thus be used for typological research on other corpora as well as for exploring their quality and properties.

MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[179]
M. P. Fabritius, M. Seidensticker, J. Rueckel, C. Heinze, M. Pech, K. J. Paprottka, P. M. Paprottka, J. Topalis, A. Bender, J. Ricke, A. Mittermeier and M. Ingrisch.
Bi-Centric Independent Validation of Outcome Prediction after Radioembolization of Primary and Secondary Liver Cancer.
Journal of Clinical Medicine 10.16 (Aug. 2021). DOI.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Andreas Mittermeier

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Link to Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[178]
A. Bauer, F. Scheipl and H. Küchenhoff.
Registration for Incomplete Non-Gaussian Functional Data.
Preprint at arXiv (Aug. 2021). arXiv.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)


[177]
H. Seibold, A. Charlton, A.-L. Boulesteix and S. Hoffmann.
Statisticians roll up your sleeves! There’s a crisis to be solved.
Significance 18.4 (Aug. 2021). DOI.
Abstract

Statisticians play a key role in almost all scientific research. As such, they may be key to solving the reproducibility crisis. Heidi Seibold, Alethea Charlton, Anne-Laure Boulesteix and Sabine Hoffmann urge statisticians to take an active role in promoting more credible science.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[176]
F. Pfisterer, C. Kern, S. Dandl, M. Sun, M. P. Kim and B. Bischl.
mcboost: Multi-Calibration Boosting for R.
The Journal of Open Source Software 6.64 (Aug. 2021). DOI.
MCML Authors
Link to Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[175]
Y. Wang, Y. Shen and D. Cremers.
Explicit pairwise factorized graph neural network for semi-supervised node classification.
Conference on Uncertainty in Artificial Intelligence (UAI 2021). Virtual, Jul 27-29, 2021. PDF.
MCML Authors
Yuesong Shen

Yuesong Shen

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[174]
M. Biloš and S. Günnemann.
Scalable Normalizing Flows for Permutation Invariant Densities.
38th International Conference on Machine Learning (ICML 2021). Virtual, Jul 18-24, 2021. URL.
MCML Authors
Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[173]
T. Frerix, D. Kochkov, J. Smith, D. Cremers, M. Brenner and S. Hoyer.
Variational Data Assimilation with a Learned Inverse Observation Operator.
38th International Conference on Machine Learning (ICML 2021). Virtual, Jul 18-24, 2021. URL.
MCML Authors
Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[172]
G. König, T. Freiesleben and M. Grosse-Wentrup.
A causal perspective on meaningful and robust algorithmic recourse.
Workshop on Algorithmic Recourse at the 38th International Conference on Machine Learning (ICML 2021). Virtual, Jul 18-24, 2021. URL.
Abstract

Algorithmic recourse explanations inform stakeholders on how to act to revert unfavorable predictions. However, in general ML models do not predict well in interventional distributions. Thus, an action that changes the prediction in the desired way may not lead to an improvement of the underlying target. Such recourse is neither meaningful nor robust to model refits. Extending the work of Karimi et al. (2021), we propose meaningful algorithmic recourse (MAR) that only recommends actions that improve both prediction and target. We justify this selection constraint by highlighting the differences between model audit and meaningful, actionable recourse explanations. Additionally, we introduce a relaxation of MAR called effective algorithmic recourse (EAR), which, under certain assumptions, yields meaningful recourse by only allowing interventions on causes of the target.

MCML Authors
Link to Moritz Grosse-Wentrup

Moritz Grosse-Wentrup

Prof. Dr.

* Former member


[171]
P. Gijsbers, F. Pfisterer, J. van Rijn, B. Bischl and J. Vanschoren.
Meta-Learning for Symbolic Hyperparameter Defaults.
Genetic and Evolutionary Computation Conference (GECCO 2021). Lile, France, Jul 10-14, 2021. DOI.
Abstract

Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but still data-dependent, configuration of the ML algorithm, compared to standard hyperparameter optimization approaches. In the past, symbolic and static default values have usually been obtained as hand-crafted heuristics. We propose an approach of learning such symbolic configurations as formulas of dataset properties from a large set of prior evaluations on multiple datasets by optimizing over a grammar of expressions using an evolutionary algorithm. We evaluate our method on surrogate empirical performance models as well as on real data across 6 ML algorithms on more than 100 datasets and demonstrate that our method indeed finds viable symbolic defaults.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[170]
F. Pfisterer, J. van Rijn, P. Probst, A. Müller and B. Bischl.
Learning Multiple Defaults for Machine Learning Algorithms.
Genetic and Evolutionary Computation Conference (GECCO 2021). Lile, France, Jul 10-14, 2021. DOI.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[169]
A. Python, A. Bender, A. K. Nandi, P. A. Hancock, R. Arambepola, J. Brandsch and T. C. D. Lucas.
Predicting non-state terrorism worldwide.
Science Advances 7.31 (Jul. 2021). DOI.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)


[168]
M. Aygun, A. Ošep, M. Weber, M. Maximov, C. Stachniss, J. Behley and L. Leal-Taixé.
4D Panoptic LiDAR Segmentation.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021). Virtual, Jun 19-25, 2021. DOI. GitHub.
Abstract

Temporal semantic scene understanding is critical for self-driving cars or robots operating in dynamic environments. In this paper, we propose 4D panoptic LiDAR segmentation to assign a semantic class and a temporally-consistent instance ID to a sequence of 3D points. To this end, we present an approach and a point-centric evaluation metric. Our approach determines a semantic class for every point while modeling object instances as probability distributions in the 4D spatio-temporal domain. We process multiple point clouds in parallel and resolve point-to-instance associations, effectively alleviating the need for explicit temporal data association. Inspired by recent advances in benchmarking of multi-object tracking, we propose to adopt a new evaluation metric that separates the semantic and point-to-instance association aspects of the task. With this work, we aim at paving the road for future developments of temporal LiDAR panoptic perception.

MCML Authors
Link to Laura Leal-Taixé

Laura Leal-Taixé

Prof. Dr.

* Former member


[167]
M. Eisenberger, D. Novotny, G. Kerchenbaum, P. Labatut, N. Neverova, D. Cremers and A. Vedaldi.
NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021). Virtual, Jun 19-25, 2021. DOI. GitHub.
MCML Authors
Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[166]
M. Gao, Z. Lähner, J. Thunberg, D. Cremers and F. Bernard.
Isometric Multi-Shape Matching.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021). Virtual, Jun 19-25, 2021. DOI. GitHub.
MCML Authors
Link to Maolin Gao

Maolin Gao

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[165]
A. Khakzar, S. Baselizadeh, S. Khanduja, C. Rupprecht, S. T. Kim and N. Navab.
Neural Response Interpretation through the Lens of Critical Pathways.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021). Virtual, Jun 19-25, 2021. DOI.
MCML Authors
Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[164]
C. Tomani, S. Gruber, M. E. Erdem, D. Cremers and F. Buettner.
Post-hoc Uncertainty Calibration for Domain Drift Scenarios.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021). Virtual, Jun 19-25, 2021. DOI.
MCML Authors
Link to Christian Tomani

Christian Tomani

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[163]
N. Strauß, L. Rottkamp, S. Schmoll and M. Schubert.
Efficient Parking Search using Shared Fleet Data.
22nd IEEE International Conference on Mobile Data Management (MDM 2021). Virtual, Jun 15-18, 2021. DOI.
Abstract

Finding an available on-street parking spot is a relevant problem of day-to-day life. In recent years, several cities began providing real-time parking occupancy data. Finding a free parking spot in such a smart environment can be modeled and solved as a Markov decision process (MDP). The solver has to consider uncertainty as available parking spots might not remain available until arrival due to other vehicles claiming spots in the meantime. Knowing the parking intention of every vehicle in the environment would eliminate this uncertainty but is currently not realistic. In contrast, acquiring data from a subset of vehicles appears feasible and could at least reduce uncertainty.In this paper, we examine how sharing data within a vehicle fleet might lower parking search times. We use this data to better estimate the availability of parking spots at arrival. Since optimal solutions for large scenarios are computationally infeasible, we base our methods on approximations shown to perform well in single-agent settings. Our evaluation features a simulation of a part of Melbourne and indicates that fleet data can significantly reduce the time spent searching for a free parking bay.

MCML Authors
Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[162]
P. Dufter, N. Kassner and H. Schütze.
Static Embeddings as Efficient Knowledge Bases?.
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021). Virtual, Jun 06-11, 2021. DOI.
Abstract

Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as ‘Paris is the capital of [MASK]’ are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically diverse languages, we study knowledge contained in static embeddings. We show that, when restricting the output space to a candidate set, simple nearest neighbor matching using static embeddings performs better than PLMs. E.g., static embeddings perform 1.6% points better than BERT while just using 0.3% of energy for training. One important factor in their good comparative performance is that static embeddings are standardly learned for a large vocabulary. In contrast, BERT exploits its more sophisticated, but expensive ability to com- pose meaningful representations from a much smaller subword vocabulary.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[161]
M. Binder, F. Pfisterer, M. Lang, L. Schneider, L. Kotthoff and B. Bischl.
mlr3pipelines - Flexible Machine Learning Pipelines in R.
Journal of Machine Learning Research 22.184 (Jun. 2021). URL.
MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[160]
D. S. Fischer, A. C. Schaar and F. J. Theis.
Learning cell communication from spatial graphs of cells.
Preprint at bioRxiv (Jun. 2021). DOI.
MCML Authors
Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems


[159]
P. Müller, V. Golkov, V. Tomassini and D. Cremers.
Rotation-Equivariant Deep Learning for Diffusion MRI (short version).
International Society for Magnetic Resonance in Medicine Annual Meeting (ISMRM 2021). Virtual, May 15-20, 2021. Long version in arXiv. arXiv.
MCML Authors
Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[158]
J. Schuchardt, A. Bojchevski, J. Klicpera and S. Günnemann.
Collective Robustness Certificates - Exploiting Interdependence in Graph Neural Networks.
9th International Conference on Learning Representations (ICLR 2021). Virtual, May 03-07, 2021. URL.
MCML Authors
Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[157]
M. Lotfollahi, A. K. Susmelj, C. De Donno, Y. Ji, I. L. Ibarra, F. A. Wolf, N. Yakubova, F. J. Theis and D. Lopez-Paz.
Compositional perturbation autoencoder for single-cell response modeling.
Preprint at bioRxiv (May. 2021). DOI.
MCML Authors
Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems


[156]
N. Kassner, P. Dufter and H. Schütze.
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models.
16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021). Virtual, Apr 19-23, 2021. DOI.
Abstract

Recently, it has been found that monolingual English language models can be used as knowledge bases. Instead of structural knowledge base queries, masked sentences such as “Paris is the capital of [MASK]” are used as probes. We translate the established benchmarks TREx and GoogleRE into 53 languages. Working with mBERT, we investigate three questions. (i) Can mBERT be used as a multilingual knowledge base? Most prior work only considers English. Extending research to multiple languages is important for diversity and accessibility. (ii) Is mBERT’s performance as knowledge base language-independent or does it vary from language to language? (iii) A multilingual model is trained on more text, e.g., mBERT is trained on 104 Wikipedias. Can mBERT leverage this for better performance? We find that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance. Conversely, mBERT exhibits a language bias; e.g., when queried in Italian, it tends to predict Italy as the country of origin.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[155]
Y. Ma and V. Tresp.
Causal Inference under Networked Interference and Intervention Policy Enhancement.
24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021). Virtual, Apr 13-15, 2021. URL.
MCML Authors
Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[154]
M. Berrendorf, E. Faerman and V. Tresp.
Active Learning for Entity Alignment.
43rd European Conference on Information Retrieval (ECIR 2021). Virtual, Mar 28-Apr 01, 2021. DOI. GitHub.
Abstract

In this work, we propose a novel framework for labeling entity alignments in knowledge graph datasets. Different strategies to select informative instances for the human labeler build the core of our framework. We illustrate how the labeling of entity alignments is different from assigning class labels to single instances and how these differences affect the labeling efficiency. Based on these considerations, we propose and evaluate different active and passive learning strategies. One of our main findings is that passive learning approaches, which can be efficiently precomputed, and deployed more easily, achieve performance comparable to the active learning strategies.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[153]
M. Berrendorf, L. Wacker and E. Faerman.
A Critical Assessment of State-of-the-Art in Entity Alignment.
43rd European Conference on Information Retrieval (ECIR 2021). Virtual, Mar 28-Apr 01, 2021. DOI. GitHub.
Abstract

In this work, we perform an extensive investigation of two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs. Therefore, we first carefully examine the benchmarking process and identify several shortcomings, making the results reported in the original works not always comparable. Furthermore, we suspect that it is a common practice in the community to make the hyperparameter optimization directly on a test set, reducing the informative value of reported performance. Thus, we select a representative sample of benchmarking datasets and describe their properties. We also examine different initializations for entity representations since they are a decisive factor for model performance. Furthermore, we use a shared train/validation/test split for an appropriate evaluation setting to evaluate all methods on all datasets. In our evaluation, we make several interesting findings. While we observe that most of the time SotA approaches perform better than baselines, they have difficulties when the dataset contains noise, which is the case in most real-life applications. Moreover, in our ablation study, we find out that often different features of SotA method are crucial for good performance than previously assumed.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member


[152]
M. Fromm, M. Berrendorf, S. Obermeier, T. Seidl and E. Faerman.
Diversity Aware Relevance Learning for Argument Search.
43rd European Conference on Information Retrieval (ECIR 2021). Virtual, Mar 28-Apr 01, 2021. DOI. GitHub.
Abstract

In this work, we focus on the problem of retrieving relevant arguments for a query claim covering diverse aspects. State-of-the-art methods rely on explicit mappings between claims and premises, and thus are unable to utilize large available collections of premises without laborious and costly manual annotation. Their diversity approach relies on removing duplicates via clustering which does not directly ensure that the selected premises cover all aspects. This work introduces a new multi-step approach for the argument retrieval problem. Rather than relying on ground-truth assignments, our approach employs a machine learning model to capture semantic relationships between arguments. Beyond that, it aims to cover diverse facets of the query, instead of trying to identify duplicates explicitly. Our empirical evaluation demonstrates that our approach leads to a significant improvement in the argument retrieval task even though it requires less data.

MCML Authors
Link to Michael Fromm

Michael Fromm

Dr.

* Former member

Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Sandra Gilhuber

Sandra Gilhuber

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member


[151]
A. Beer, E. Allerborn, V. Hartmann and T. Seidl.
KISS - A fast kNN-based Importance Score for Subspaces.
24th International Conference on Extending Database Technology (EDBT 2021). Nicosia, Cyprus, Mar 23-26, 2021. PDF.
Abstract

In high-dimensional datasets some dimensions or attributes can be more important than others. Whereas most algorithms neglect one or more dimensions for all points of a dataset or at least for all points of a certain cluster together, our method KISS (textbf{k}NN-based textbf{I}mportance textbf{S}core of textbf{S}ubspaces) detects the most important dimensions for each point individually. It is fully unsupervised and does not depend on distorted multidimensional distance measures. Instead, the $k$ nearest neighbors ($k$NN) in one-dimensional projections of the data points are used to calculate the score for every dimension’s importance. Experiments across a variety of settings show that those scores reflect well the structure of the data. KISS can be used for subspace clustering. What sets it apart from other methods for this task is its runtime, which is linear in the number of dimensions and $O(n log(n))$ in the number of points, as opposed to quadratic or even exponential runtimes for previous algorithms.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[150]
P. Kopper, S. Pölsterl, C. Wachinger, B. Bischl, A. Bender and D. Rügamer.
Semi-Structured Deep Piecewise Exponential Models.
AAAI Spring Symposium Series on Survival Prediction: Algorithms, Challenges and Applications (AAAI-SPACA 2021). Palo Alto, California, USA, Mar 21-24, 2021. PDF.
Abstract

We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise expo-nential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data sources and higher-order interaction effects into the model, we embed the model class in a neural network and thereby enable the si-multaneous estimation of both inherently interpretable structured regression inputs as well as deep neural network components which can potentially process additional unstructured data sources. A proof of concept is provided by using the framework to predict Alzheimer’s disease progression based on tabular and 3D point cloud data and applying it to synthetic data.

MCML Authors
Link to Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[149]
M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, S. Sharifzadeh, V. Tresp and J. Lehmann.
PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings.
Journal of Machine Learning Research 22.82 (Mar. 2021). PDF.
Abstract

Recently, knowledge graph embeddings (KGEs) have received significant attention, and several software libraries have been developed for training and evaluation. While each of them addresses specific needs, we report on a community effort to a re-design and re-implementation of PyKEEN, one of the early KGE libraries. PyKEEN 1.0 enables users to compose knowledge graph embedding models based on a wide range of interaction models, training approaches, loss functions, and permits the explicit modeling of inverse relations. It allows users to measure each component’s influence individually on the model’s performance. Besides, an automatic memory optimization has been realized in order to optimally exploit the provided hardware. Through the integration of Optuna, extensive hyper-parameter optimization (HPO) functionalities are provided.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[148]
M. Fromm, E. Faerman, M. Berrendorf, S. Bhargava, R. Qi, Y. Zhang, L. Dennert, S. Selle, Y. Mao and T. Seidl.
Argument Mining Driven Analysis of Peer-Reviews.
35th Conference on Artificial Intelligence (AAAI 2021). Virtual, Feb 02-09, 2021. DOI. GitHub.
Abstract

Peer reviewing is a central process in modern research and essential for ensuring high quality and reliability of published work. At the same time, it is a time-consuming process and increasing interest in emerging fields often results in a high review workload, especially for senior researchers in this area. How to cope with this problem is an open question and it is vividly discussed across all major conferences. In this work, we propose an Argument Mining based approach for the assistance of editors, meta-reviewers, and reviewers. We demonstrate that the decision process in the field of scientific publications is driven by arguments and automatic argument identification is helpful in various use-cases. One of our findings is that arguments used in the peer-review process differ from arguments in other domains making the transfer of pre-trained models difficult. Therefore, we provide the community with a new peer-review dataset from different computer science conferences with annotated arguments. In our extensive empirical evaluation, we show that Argument Mining can be used to efficiently extract the most relevant parts from reviews, which are paramount for the publication decision. The process remains interpretable since the extracted arguments can be highlighted in a review without detaching them from their context.

MCML Authors
Link to Michael Fromm

Michael Fromm

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Yao Zhang

Yao Zhang

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[147]
S. Sharifzadeh, S. M. Baharlou and V. Tresp.
Classification by Attention: Scene Graph Classification with Prior Knowledge.
35th Conference on Artificial Intelligence (AAAI 2021). Virtual, Feb 02-09, 2021. DOI.
Abstract

A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for perception and prior knowledge. Instead, we take a multi-task learning approach by introducing schema representations and implementing the classification as an attention layer between image-based representations and the schemata. This allows for the prior knowledge to emerge and propagate within the perception model. By enforcing the model also to represent the prior, we achieve a strong inductive bias. We show that our model can accurately generate commonsense knowledge and that the iterative injection of this knowledge to scene representations, as a top-down mechanism, leads to significantly higher classification performance. Additionally, our model can be fine-tuned on external knowledge given as triples. When combined with self-supervised learning and with 1% of annotated images only, this gives more than 3% improvement in object classification, 26% in scene graph classification, and 36% in predicate prediction accuracy.

MCML Authors
Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[146]
S. Klau, S. Hoffmann, C. Patel, J. P. A. Ioannidis and A.-L. Boulesteix.
Examining the robustness of observational associations to model, measurement and sampling uncertainty with the vibration of effects framework.
International Journal of Epidemiology 50.1 (Feb. 2021). DOI.
Abstract

Uncertainty is a crucial issue in statistics which can be considered from different points of view. One type of uncertainty, typically referred to as sampling uncertainty, arises through the variability of results obtained when the same analysis strategy is applied to different samples. Another type of uncertainty arises through the variability of results obtained when using the same sample but different analysis strategies addressing the same research question. We denote this latter type of uncertainty as method uncertainty. It results from all the choices to be made for an analysis, for example, decisions related to data preparation, method choice, or model selection. In medical sciences, a large part of omics research is focused on the identification of molecular biomarkers, which can either be performed through ranking or by selection from among a large number of candidates. In this paper, we introduce a general resampling-based framework to quantify and compare sampling and method uncertainty. For illustration, we apply this framework to different scenarios related to the selection and ranking of omics biomarkers in the context of acute myeloid leukemia: variable selection in multivariable regression using different types of omics markers, the ranking of biomarkers according to their predictive performance, and the identification of differentially expressed genes from RNA-seq data. For all three scenarios, our findings suggest highly unstable results when the same analysis strategy is applied to two independent samples, indicating high sampling uncertainty and a comparatively smaller, but non-negligible method uncertainty, which strongly depends on the methods being compared.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[145]
J. Goschenhofer, R. Hvingelby, D. Rügamer, J. Thomas, M. Wagner and B. Bischl.
Deep Semi-Supervised Learning for Time Series Classification.
Preprint at arXiv (Feb. 2021). arXiv.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[144]
G. König, C. Molnar, B. Bischl and M. Grosse-Wentrup.
Relative Feature Importance.
25th International Conference on Pattern Recognition (ICPR 2020). Virtual - Milano, Italy, Jan 10-15, 2021. DOI.
Abstract

Interpretable Machine Learning (IML) methods are used to gain insight into the relevance of a feature of interest for the performance of a model. Commonly used IML methods differ in whether they consider features of interest in isolation, e.g., Permutation Feature Importance (PFI), or in relation to all remaining feature variables, e.g., Conditional Feature Importance (CFI). As such, the perturbation mechanisms inherent to PFI and CFI represent extreme reference points. We introduce Relative Feature Importance (RFI), a generalization of PFI and CFI that allows for a more nuanced feature importance computation beyond the PFI versus CFI dichotomy. With RFI, the importance of a feature relative to any other subset of features can be assessed, including variables that were not available at training time. We derive general interpretation rules for RFI based on a detailed theoretical analysis of the implications of relative feature relevance, and demonstrate the method’s usefulness on simulated examples.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Moritz Grosse-Wentrup

Moritz Grosse-Wentrup

Prof. Dr.

* Former member


[143]
S. Schmoll and M. Schubert.
Semi-Markov Reinforcement Learning for Stochastic Resource Collection.
29th International Joint Conference on Artificial Intelligence (IJCAI 2020). Yokohama, Japan (postponed due to the Corona pandemic), Jan 07-15, 2021. DOI.
Abstract

We show that the task of collecting stochastic, spatially distributed resources (Stochastic Resource Collection, SRC) may be considered as a Semi-Markov-Decision-Process. Our Deep-Q-Network (DQN) based approach uses a novel scalable and transferable artificial neural network architecture. The concrete use-case of the SRC is an officer (single agent) trying to maximize the amount of fined parking violations in his area. We evaluate our approach on a environment based on the real-world parking data of the city of Melbourne. In small, hence simple, settings with short distances between resources and few simultaneous violations, our approach is comparable to previous work. When the size of the network grows (and hence the amount of resources) our solution significantly outperforms preceding methods. Moreover, applying a trained agent to a non-overlapping new area outperforms existing approaches.

MCML Authors
Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[142]
M. Becker, S. Gruber, J. Richter, J. Moosbauer and B. Bischl.
mlr3hyperband: Hyperband for 'mlr3'.
2021. URL. GitHub.
MCML Authors
Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[141]
M. Becker, M. Lang, J. Richter, B. Bischl and D. Schalk.
mlr3tuning: Tuning for 'mlr3'.
2021. URL. GitHub.
MCML Authors
Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[140]
M. Becker, J. Richter, M. Lang, B. Bischl and M. Binder.
bbotk: Black-Box Optimization Toolkit.
2021. URL. GitHub.
MCML Authors
Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science


[139]
M. Binder.
mlrintermbo: Model-Based Optimization for 'mlr3' through 'mlrMBO'.
2021. URL. GitHub.
MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science


[138]
M. Lang.
mlr3measures: Performance Measures for 'mlr3'.
2021. URL.
MCML Authors

[137]
M. Lang, B. Bischl, J. Richter, X. Sun and M. Binder.
paradox: Define and Work with Parameter Spaces for Complex Algorithms.
2021. URL. GitHub.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science


[136]
D. Rügamer, F. Pfisterer and P. Baumann.
deepregression: Fitting Semi-Structured Deep Distributional Regression in R.
2021. GitHub.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[135]
P. Schratz and M. Becker.
mlr3spatiotempcv: Spatiotemporal Resampling Methods for 'mlr3'.
2021. URL.
MCML Authors
Link to Marc Becker

Marc Becker

Statistical Learning & Data Science


[134]
A. Beer.
On the edges of clustering: creating synergies with related problems.
Dissertation 2021. DOI.
Abstract

This thesis explores the connections between clustering and related tasks like subspace clustering, correlation clustering, outlier detection, and data ordering. It introduces novel methods such as the KISS score for subspace clustering, LUCK for correlation clustering, and the ABC algorithm for outlier detection. Additionally, it develops the Circle Index for optimizing data ordering to improve clustering performance. (Shortened.)

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member


[133]
M. Berrendorf.
Machine learning for managing structured and semi-structured data.
Dissertation 2021. DOI.
Abstract

As data availability grows across sectors, machine learning, especially graph neural networks, plays a crucial role in extracting insights by automating complex analysis, including relational learning. Knowledge graphs help store entity facts, though they often require automated methods like Link Prediction and Entity Alignment to fill in missing information due to the sheer volume. This thesis advances knowledge graph completion by improving Entity Alignment through active learning, refining Link Prediction with metadata, and introducing a new evaluation metric, as well as a software library to aid researchers. (Shortened).

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member


[132]
B. Busam.
High Performance Visual Pose Computation.
Dissertation 2021. URL.
Abstract

An outside-in system uses binocular stereo and a probabilistic sparse point cloud matcher to track objects with micrometre precision in real-time. Miniaturizing the system results in a markerless inside-out stereo method with improved rotational accuracy. Reducing the constraints, we reformulate marker-free monocular pose estimation as an action decision process where the next best pose is determined using a render-and-compare strategy. This allows instance agnostic pose estimation that generalizes to unseen objects. The methods are applied on a set of medical and industrial applications.

MCML Authors
Link to Benjamin Busam

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[131]
E. Faerman.
Representation learning on relational data.
Dissertation 2021. DOI.
Abstract

This thesis introduces methods that leverage relational information to address various problems in machine learning, such as node classification, graph matching, and argument mining. It explores unsupervised and semi-supervised approaches for node classification, graph alignment for geographical maps and knowledge graphs, and proposes a novel method for identifying and searching arguments in peer reviews. Additionally, it presents a subspace clustering method that uses relationships to improve clustering performance on large datasets. (Shortened.)

MCML Authors
Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member


[130]
V. Golkov.
Deep learning and variational analysis for high-dimensional and geometric biomedical data.
Dissertation 2021. URL.
Abstract

In this thesis, we use deep learning and variational analysis to solve various problems from biology and medicine related to advanced data structures. We predict the structure of proteins from their evolutionary statistics, and the function of proteins and small molecules from their structure. We also present image processing methods for diffusion MRI that reduce the scan duration by a factor of twelve and improve the image quality.

MCML Authors
Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence


[129]
I. Gerostathopoulos, F. Plášil, C. Prehofer, J. Thomas and B. Bischl.
Automated Online Experiment-Driven Adaptation--Mechanics and Cost Aspects.
IEEE Access 9 (2021). DOI.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[128]
H. Seibold, S. Czerny, S. Decke, R. Dieterle, T. Eder, S. Fohr, N. Hahn, R. Hartmann, C. Heindl, P. Kopper, D. Lepke, V. Loidl, M. M. Mandl, S. Musiol, J. Peter, A. Piehler, E. Rojas, S. Schmid, H. Schmidt, M. Schmoll, L. Schneider, X.-Y. To, V. Tran, A. Völker, M. Wagner, J. Wagner, M. Waize, H. Wecker, R. Yang, S. Zellner and M. Nalenz.
A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses.
PLOS One 16.6 (2021). DOI.
Abstract

Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses—such as the analysis of longitudinal data—reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies.

MCML Authors
Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Viet Tran

Viet Tran

Biomedical Statistics and Data Science


[127]
M. Weigert, A. Bauer, J. Gernert, M. Karl, A. Nalmpatian, H. Küchenhoff and J. Schmude.
Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances.
Tourism Economics 28.5 (Jan. 2021). DOI.
Abstract

This study investigates how age, period, and birth cohorts are related to altering travel distances. We analyze a repeated cross-sectional survey of German pleasure travels for the period 1971–2018 using a holistic age–period–cohort (APC) analysis framework. Changes in travel distances are attributed to the life cycle (age effect), macro-level developments (period effect), and generational membership (cohort effect). We introduce ridgeline matrices and partial APC plots as innovative visualization techniques facilitating the intuitive interpretation of complex temporal structures. Generalized additive models are used to circumvent the identification problem by fitting a bivariate tensor product spline between age and period. The results indicate that participation in short-haul trips is mainly associated with age, while participation in long-distance travel predominantly changed over the period. Generational membership shows less association with destination choice concerning travel distance. The presented APC approach is promising to address further questions of interest in tourism research.

MCML Authors
Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)


2020


[126]
M. Berrendorf, E. Faerman, L. Vermue and V. Tresp.
Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank.
IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2020). Virtual, Dec 14-17, 2020. DOI.
Abstract

In this work, we take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment. In the current experimental setting, multiple different scores are employed to assess different aspects of model performance. We analyze the informativeness of these evaluation measures and identify several shortcomings. In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets. Moreover, we demonstrate that varying size of the test size automatically has impact on the performance of the same model based on commonly used metrics for the Entity Alignment task. We show that this leads to various problems in the interpretation of results, which may support misleading conclusions. Therefore, we propose adjustments to the evaluation and demonstrate empirically how this supports a fair, comparable, and interpretable assessment of model performance.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[125]
E. Faerman, F. Borutta, J. Busch and M. Schubert.
Ada-LLD: Adaptive Node Similarity Using Multi-Scale Local Label Distributions.
IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2020). Virtual, Dec 14-17, 2020. DOI.
MCML Authors
Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Felix Borutta

Felix Borutta

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[124]
C. Böhm and C. Plant.
Massively Parallel Graph Drawing and Representation Learning.
IEEE International Conference on Big Data (IEEE BigData 2020). Virtual, Dec 10-13, 2020. DOI.
Abstract

To fully exploit the performance potential of modern multi-core processors, machine learning and data mining algorithms for big data must be parallelized in multiple ways. Today’s CPUs consist of multiple cores, each following an independent thread of control, and each equipped with multiple arithmetic units which can perform the same operation on a vector of multiple data objects. Graph embedding, i.e. converting the vertices of a graph into numerical vectors is a data mining task of high importance and is useful for graph drawing (low-dimensional vectors) and graph representation learning (high-dimensional vectors). In this paper, we propose MulticoreGEMPE (Graph Embedding by Minimizing the Predictive Entropy), an information-theoretic method which can generate low and high-dimensional vectors. MulticoreGEMPE applies MIMD (Multiple Instructions Multiple Data, using OpenMP) and SIMD (Single Instructions Multiple Data, using AVX-512) parallelism. We propose general ideas applicable in other graph-based algorithms like emph{vectorized hashing} and emph{vectorized reduction}. Our experimental evaluation demonstrates the superiority of our approach.

MCML Authors
Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[123]
C. Böhm and C. Plant.
Massively Parallel Random Number Generation.
IEEE International Conference on Big Data (IEEE BigData 2020). Virtual, Dec 10-13, 2020. DOI.
Abstract

Random numbers are of high importance for many applications, e.g. simulation, optimization, and data mining. Unlike in information security, in these applications the demands on the quality of the random numbers are only moderate while the most important issue is the runtime efficiency. We propose in this paper new SIMD (Single Instruction, Multiple Data) and MIMD (Multiple Instructions, Multiple Data) parallel methods for Linear Congruential Generators (LCG), the most widespread class of fast pseudo-random number generators. In particular, we propose algorithms for the well-known 48-bit LCG used in the Java-class Random and in the method drand48() of C++ for processors using AVX (Advanced Vector eXtensions) and OpenMP. Our focus is on consistency with the original methods which facilitates debugging and enables the user to exactly reproduce previous non-parallel experiments in a SIMD and MIMD environment. Our experimental evaluation demonstrates the superiority of our algorithms.

MCML Authors
Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[122]
Y. Ma and V. Tresp.
A Variational Quantum Circuit Model for Knowledge Graph Embeddings.
1st Workshop on Quantum Tensor Networks in Machine Learning (QTNML 2020) at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. PDF.
MCML Authors
Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[121]
S. Geisler, D. Zügner and S. Günnemann.
Reliable Graph Neural Networks via Robust Aggregation.
34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. PDF.
MCML Authors
Link to Daniel Zügner

Daniel Zügner

Dr.

* Former member

Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[120]
O. Shchur, N. Gao, M. Biloš and S. Günnemann.
Fast and Flexible Temporal Point Processes with Triangular Maps.
34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. PDF.
MCML Authors
Link to Oleksandr Shchur

Oleksandr Shchur

Dr.

* Former member

Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[119]
J. Busch, E. Faerman, M. Schubert and T. Seidl.
Learning Self-Expression Metrics for Scalable and Inductive Subspace Clustering.
Workshop on Self-Supervised Learning - Theory and Practice (SSL 2020) at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. arXiv. GitHub.
Abstract

Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. In particular, methods relying on the self-expressiveness property have recently proved especially successful. However, they suffer from two major shortcomings: First, a quadratic-size coefficient matrix is learned directly, preventing these methods from scaling beyond small datasets. Secondly, the trained models are transductive and thus cannot be used to cluster out-of-sample data unseen during training. Instead of learning self-expression coefficients directly, we propose a novel metric learning approach to learn instead a subspace affinity function using a siamese neural network architecture. Consequently, our model benefits from a constant number of parameters and a constant-size memory footprint, allowing it to scale to considerably larger datasets. In addition, we can formally show that out model is still able to exactly recover subspace clusters given an independence assumption. The siamese architecture in combination with a novel geometric classifier further makes our model inductive, allowing it to cluster out-of-sample data. Additionally, non-linear clusters can be detected by simply adding an auto-encoder module to the architecture. The whole model can then be trained end-to-end in a self-supervised manner. This work in progress reports promising preliminary results on the MNIST dataset. In the spirit of reproducible research, me make all code publicly available. In future work we plan to investigate several extensions of our model and to expand experimental evaluation.

MCML Authors
Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[118]
M. Berrendorf and E. Faerman.
mberr/ea-active-learning: Zenodo. Version 1.0.1.
2020. DOI.
MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member


[117]
M. Berrendorf, L. Wacker and E. Faerman.
mberr/ea-sota-comparison: Zenodo. Version v1.1.1.
2020. DOI.
MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member


[116]
E. Asgari, M. J. Sabet, P. Dufter, C. Ringlstetter and H. Schütze.
Subword Sampling for Low Resource Word Alignment.
Preprint at arXiv (Dec. 2020). arXiv.
Abstract

Annotation projection is an important area in NLP that can greatly contribute to creating language resources for low-resource languages. Word alignment plays a key role in this setting. However, most of the existing word alignment methods are designed for a high resource setting in machine translation where millions of parallel sentences are available. This amount reduces to a few thousands of sentences when dealing with low-resource languages failing the existing established IBM models. In this paper, we propose subword sampling-based alignment of text units. This method’s hypothesis is that the aggregation of different granularities of text for certain language pairs can help word-level alignment. For certain languages for which gold-standard alignments exist, we propose an iterative Bayesian optimization framework to optimize selecting possible subwords from the space of possible subword representations of the source and target sentences. We show that the subword sampling method consistently outperforms word-level alignment on six language pairs: English-German, English-French, English-Romanian, English-Persian, English-Hindi, and English-Inuktitut. In addition, we show that the hyperparameters learned for certain language pairs can be applied to other languages at no supervision and consistently improve the alignment results. We observe that using 5K parallel sentences together with our proposed subword sampling approach, we obtain similar F1 scores to the use of 100K’s of parallel sentences in existing word-level fast-align/eflomal alignment methods.

MCML Authors
Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[115]
M. Herrmann and F. Scheipl.
Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction.
Preprint at arXiv (Dec. 2020). arXiv.
Abstract

In recent years, manifold methods have moved into focus as tools for dimension reduction. Assuming that the high-dimensional data actually lie on or close to a low-dimensional nonlinear manifold, these methods have shown convincing results in several settings. This manifold assumption is often reasonable for functional data, i.e., data representing continuously observed functions, as well. However, the performance of manifold methods recently proposed for tabular or image data has not been systematically assessed in the case of functional data yet. Moreover, it is unclear how to evaluate the quality of learned embeddings that do not yield invertible mappings, since the reconstruction error cannot be used as a performance measure for such representations. In this work, we describe and investigate the specific challenges for nonlinear dimension reduction posed by the functional data setting. The contributions of the paper are three-fold: First of all, we define a theoretical framework which allows to systematically assess specific challenges that arise in the functional data context, transfer several nonlinear dimension reduction methods for tabular and image data to functional data, and show that manifold methods can be used successfully in this setting. Secondly, we subject performance assessment and tuning strategies to a thorough and systematic evaluation based on several different functional data settings and point out some previously undescribed weaknesses and pitfalls which can jeopardize reliable judgment of embedding quality. Thirdly, we propose a nuanced approach to make trustworthy decisions for or against competing nonconforming embeddings more objectively.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[114]
N. Kassner, B. Krojer and H. Schütze.
Are Pretrained Language Models Symbolic Reasoners over Knowledge?.
24th Conference on Computational Natural Language Learning (CoNLL 2020). Virtual, Nov 19-20, 2020. DOI.
Abstract

How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs seem to learn to apply some symbolic reasoning rules correctly but struggle with others, including two-hop reasoning. Further analysis suggests that even the application of learned reasoning rules is flawed. For memorization, we identify schema conformity (facts systematically supported by other facts) and frequency as key factors for its success.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[113]
D. Kazempour, A. Beer, P. Kröger and T. Seidl.
I fold you so! An internal evaluation measure for arbitrary oriented subspace clustering through piecewise-linear approximations of manifolds.
IEEE International Conference on Data Mining Workshops (ICDMW 2020). Sorrento, Italy, Nov 17-20, 2020. DOI.
Abstract

In this work we propose SRE, the first internal evaluation measure for arbitrary oriented subspace clustering results. For this purpose we present a new perspective on the subspace clustering task: the goal we formalize is to compute a clustering which represents the original dataset by minimizing the reconstruction loss from the obtained subspaces, while at the same time minimizing the dimensionality as well as the number of clusters. A fundamental feature of our approach is that it is model-agnostic, i.e., it is independent of the characteristics of any specific subspace clustering method. It is scale invariant and mathematically founded. The experiments show that the SRE scoring better assesses the quality of an arbitrarily oriented sub-space clustering compared to commonly used external evaluation measures.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[112]
D. Kazempour, P. Kröger and T. Seidl.
Towards an Internal Evaluation Measure for Arbitrarily Oriented Subspace Clustering.
IEEE International Conference on Data Mining Workshops (ICDMW 2020). Sorrento, Italy, Nov 17-20, 2020. DOI.
Abstract

In the setting of unsupervised machine learning, especially in clustering tasks, the evaluation of either novel algorithms or the assessment of a clustering of novel data is challenging. While mostly in the literature the evaluation of new methods is performed on labelled data, there are cases where no labels are at our disposal. In other cases we may not want to trust the “ground truth” labels. In general there exists a spectrum of so called internal evaluation measures in the literature. Each of the measures is mostly specialized towards a specific clustering model. The model of arbitrarily oriented subspace clusters is a more recent one. To the best of our knowledge there exist at the current time no internal evaluation measures tailored at assessing this particular type of clusterings. In this work we present the first internal quality measures for arbitrarily oriented subspace clusterings namely the normalized projected energy (NPE) and subspace compactness score (SCS). The results from the experiments show that especially NPE is capable of assessing clusterings by considering archetypical properties of arbitrarily oriented subspace clustering.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[111]
D. Kazempour, L. M. Yan, P. Kröger and T. Seidl.
You see a set of wagons - I see one train: Towards a unified view of local and global arbitrarily oriented subspace clusters.
IEEE International Conference on Data Mining Workshops (ICDMW 2020). Sorrento, Italy, Nov 17-20, 2020. DOI.
Abstract

Having data with a high number of features raises the need to detect clusters which exhibit within subspaces of features a high similarity. These subspaces can be arbitrarily oriented which gave rise to arbitrarily-oriented subspace clustering (AOSC) algorithms. In the diversity of such algorithms some are specialized at detecting clusters which are global, across the entire dataset regardless of any distances, while others are tailored at detecting local clusters. Both of these views (local and global) are obtained separately by each of the algorithms. While from an algebraic point of view, none of both representations can claim to be the true one, it is vital that domain scientists are presented both views, enabling them to inspect and decide which of the representations is closest to the domain specific reality. We propose in this work a framework which is capable to detect locally dense arbitrarily oriented subspace clusters which are embedded within a global one. We also first introduce definitions of locally and globally arbitrarily oriented subspace clusters. Our experiments illustrate that this approach has no significant impact on the cluster quality nor on the runtime performance, and enables scientists to be no longer limited exclusively to either of the local or global views.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[110]
N. Kassner and H. Schütze.
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). Virtual, Nov 16-20, 2020. DOI.
Abstract

Khandelwal et al. (2020) use a k-nearest-neighbor (kNN) component to improve language model performance. We show that this idea is beneficial for open-domain question answering (QA). To improve the recall of facts encountered during training, we combine BERT (Devlin et al., 2019) with a traditional information retrieval step (IR) and a kNN search over a large datastore of an embedded text collection. Our contributions are as follows: i) BERT-kNN outperforms BERT on cloze-style QA by large margins without any further training. ii) We show that BERT often identifies the correct response category (e.g., US city), but only kNN recovers the factually correct answer (e.g.,“Miami”). iii) Compared to BERT, BERT-kNN excels for rare facts. iv) BERT-kNN can easily handle facts not covered by BERT’s training set, e.g., recent events.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[109]
A. Agrawal, F. Pfisterer, B. Bischl, F. Buet-Golfouse, S. Sood, J. Chen, S. Shah and S. Vollmer.
Debiasing classifiers: is reality at variance with expectation?.
Preprint at arXiv (Nov. 2020). arXiv.
Abstract

We present an empirical study of debiasing methods for classifiers, showing that debiasers often fail in practice to generalize out-of-sample, and can in fact make fairness worse rather than better. A rigorous evaluation of the debiasing treatment effect requires extensive cross-validation beyond what is usually done. We demonstrate that this phenomenon can be explained as a consequence of bias-variance trade-off, with an increase in variance necessitated by imposing a fairness constraint. Follow-up experiments validate the theoretical prediction that the estimation variance depends strongly on the base rates of the protected class. Considering fairness–performance trade-offs justifies the counterintuitive notion that partial debiasing can actually yield better results in practice on out-of-sample data.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[108]
V. Melnychuk, E. Faerman, I. Manakov and T. Seidl.
Matching the Clinical Reality: Accurate OCT-Based Diagnosis From Few Labels.
CIKM 2020 Workshops (CIKMW 2020) co-located with the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020). Galway, Ireland, Oct 19-23, 2020. PDF. GitHub.
Abstract

Unlabeled data is often abundant in the clinic, making machine learning methods based on semi-supervised learning a good match for this setting. Despite this, they are currently receiving relatively little attention in medical image analysis literature. Instead, most practitioners and researchers focus on supervised or transfer learning approaches. The recently proposed Mix- Match and FixMatch algorithms have demonstrated promising results in extracting useful representations while requiring very few labels. Motivated by these recent successes, we apply MixMatch and FixMatch in an ophthalmological diagnos- tic setting and investigate how they fare against standard transfer learning. We find that both algorithms outperform the transfer learning baseline on all fractions of labelled data. Furthermore, our experiments show that Mean Teacher, which is a component of both algorithms, is not needed for our classification problem, as disabling it leaves the outcome unchanged.

MCML Authors
Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[107]
Y. Ma, Z. Han and V. Tresp.
Learning with Temporal Knowledge Graphs.
CIKM 2020 Workshops (CIKMW 2020) co-located with the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020). Galway, Ireland, Oct 19-23, 2020. Invited talk. PDF.
MCML Authors
Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[106]
A. Maldonado, J. Sontheim, F. Richter and T. Seidl.
Performance Skyline: Inferring Process Performance Models from Interval Events.
1st International Workshop on Streaming Analytics for Process Mining (SA4PM 2020) in conjunction with the 2nd International Conference on Process Mining (ICPM 2020). Virtual, Oct 04-09, 2020. DOI.
MCML Authors
Link to Andrea Maldonado

Andrea Maldonado

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[105]
T. Seidl.
Keynote: Data Mining on Process Data.
2nd International Conference on Process Mining (ICPM 2020). Virtual, Oct 04-09, 2020. DOI.
Abstract

Data Mining and Process Mining – is one just a variant of the other, or do worlds separate the two areas from each other? The notions sound so similar but the contents sometimes look differently, so respective researchers may get confused in their mutual perception, be it authors or reviewers. The talk recalls commonalities like model-based supervised and unsupervised learning approaches, and it also sheds light to peculiarities in process data and process mining tasks as seen from a data mining perspective. When considering trace data from event log files as time series, as sequences, or as activity sets, quite different data mining techniques apply and may be extended and improved. A particular example is rare pattern mining, which fills a gap between frequent patterns and outlier detection. The task aims at identifying patterns that occur with low frequency but above single outliers. Structural deficiences may cause malfunctions or other undesired behavior which get discarded as outliers in event logs, since they are observed infrequently only. Rare pattern mining may identify these situations, and recent approaches include clustering or ordering non-conformant traces. The talk concludes with some remarks on how to sell process mining papers to the data mining community, and vice versa, in order to improve mutual acceptance, and to increase synergies in the fields.

MCML Authors
Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[104]
S. Denner, A. Khakzar, M. Sajid, M. Saleh, Z. Spiclin, S. T. Kim and N. Navab.
Spatio-temporal learning from longitudinal data for multiple sclerosis lesion segmentation.
Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries (BrainLes 2020) at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020). Virtual, Oct 04-08, 2020. DOI.
MCML Authors
Link to Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former member

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[103]
Y. Yeganeh, A. Farshad, N. Navab and S. Albarqouni.
Inverse Distance Aggregation for Federated Learning with Non-IID Data.
Workshop on Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning (DART DCL 2020) at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020). Virtual, Oct 04-08, 2020. DOI.
MCML Authors
Link to Yousef Yeganeh

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Azade Farshad

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[102]
P. F. M. Baumann, T. Hothorn and D. Rügamer.
Deep Conditional Transformation Models.
Preprint at arXiv (Oct. 2020). arXiv.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[101]
M. Berrendorf, L. Wacker and E. Faerman.
A Critical Assessment of State-of-the-Art in Entity Alignment.
Preprint at arXiv (Oct. 2020). arXiv.
MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member


[100]
G. Fabbro, V. Golkov, T. Kemp and D. Cremers.
Speech Synthesis and Control Using Differentiable DSP.
Preprint at arXiv (Oct. 2020). arXiv.
MCML Authors
Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[99]
D. Rügamer, F. Pfisterer and B. Bischl.
Neural Mixture Distributional Regression.
Preprint at arXiv (Oct. 2020). arXiv.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[98]
A.-L. Boulesteix, A. Charlton, S. Hoffmann and H. Seibold.
A replication crisis in methodological research?.
Significance 17.5 (Oct. 2020). DOI.
Abstract

Statisticians have been keen to critique statistical aspects of the enquote{replication crisis} in other scientific disciplines. But new statistical tools are often published and promoted without any thought to replicability. This needs to change, argue Anne-Laure Boulesteix, Sabine Hoffmann, Alethea Charlton and Heidi Seibold.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[97]
A. Beer, D. Seeholzer, N. S. Schüler and T. Seidl.
Angle-Based Clustering.
13th International Conference on Similarity Search and Applications (SISAP 2020). Virtual, Sep 30-Oct 02, 2020. DOI.
Abstract

The amount of data increases steadily, and yet most clustering algorithms perform complex computations for every single data point. Furthermore, Euclidean distance which is used for most of the clustering algorithms is often not the best choice for datasets with arbitrarily shaped clusters or such with high dimensionality. Based on ABOD, we introduce ABC, the first angle-based clustering method. The algorithm first identifies a small part of the data as border points of clusters based on the angle between their neighbors. Those few border points can, with some adjustments, be clustered with well-known clustering algorithms like hierarchical clustering with single linkage or DBSCAN. Residual points can quickly and easily be assigned to the cluster of their nearest border point, so the overall runtime is heavily reduced while the results improve or remain similar.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[96]
A. Bender, D. Rügamer, F. Scheipl and B. Bischl.
A General Machine Learning Framework for Survival Analysis.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2020). Virtual, Sep 14-18, 2020. DOI.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[95]
C. Molnar, G. Casalicchio and B. Bischl.
Interpretable Machine Learning -- A Brief History, State-of-the-Art and Challenges.
Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases (Workshops ECML-PKDD 2020). Virtual, Sep 14-18, 2020. DOI.
MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[94]
A. Beer, D. Kazempour, J. Busch, A. Tekles and T. Seidl.
Grace - Limiting the Number of Grid Cells for Clustering High-Dimensional Data.
Conference on Lernen. Wissen. Daten. Analysen (LWDA 2020). Bonn, Germany, Sep 09-11, 2020. PDF.
Abstract

Using grid-based clustering algorithms on high-dimensionaldata has the advantage of being able to summarize datapoints into cells, but usually produces an exponential number of grid cells. In this paper we introduce Grace (using textit{Gr}id which is textit{a}daptive for textit{c}lusttextit{e}ring), a clustering algorithm which limits the number of cells produced depending on the number of points in the dataset. A non-equidistant grid is constructed based on the distribution of points in one-dimensional projections of the data. A density threshold is automatically deduced from the data and used to detect dense cells, which are later combined to clusters. The adaptive grid structure makes an efficient but still accurate clustering of multidimensional data possible. Experiments with synthetic as well as real-world data sets of various size and dimensionality confirm these properties.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[93]
S. Dandl, C. Molnar, M. Binder and B. Bischl.
Multi-Objective Counterfactual Explanations.
16th International Conference on Parallel Problem Solving from Nature (PPSN 2020). Leiden, Netherlands, Sep 05-09, 2020. DOI.
MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[92]
D. Zügner and S. Günnemann.
Certifiable Robustness of Graph Convolutional Networks under Structure Perturbation.
26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2020). San Diego, California, USA, Aug 23-27, 2020. DOI.
MCML Authors
Link to Daniel Zügner

Daniel Zügner

Dr.

* Former member

Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[91]
M. Herrmann, P. Probst, R. Hornung, V. Jurinovic and A.-L. Boulesteix.
Large-scale benchmark study of survival prediction methods using multi-omics data.
Briefings in Bioinformatics (Aug. 2020). DOI.
Abstract

Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database ‘The Cancer Genome Atlas’ (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan–Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups—especially clinical variables—from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[90]
R. Sonabend, F. J. Király, A. Bender, B. Bischl and M. Lang.
mlr3proba: Machine Learning Survival Analysis in R.
Preprint at arXiv (Aug. 2020). arXiv.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[89]
C. Fritz, M. Lebacher and G. Kauermann.
Tempus volat, hora fugit: A survey of tie-oriented dynamic network models in discrete and continuous time.
Statistica Neerlandica 74.3 (Aug. 2020). DOI.
Abstract

Given the growing number of available tools for modeling dynamic networks, the choice of a suitable model becomes central. The goal of this survey is to provide an overview of tie-oriented dynamic network models. The survey is focused on introducing binary network models with their corresponding assumptions, advantages, and shortfalls. The models are divided according to generating processes, operating in discrete and continuous time. First, we introduce the temporal exponential random graph model (TERGM) and the separable TERGM (STERGM), both being time-discrete models. These models are then contrasted with continuous process models, focusing on the relational event model (REM). We additionally show how the REM can handle time-clustered observations, that is, continuous-time data observed at discrete time points. Besides the discussion of theoretical properties and fitting procedures, we specifically focus on the application of the models on two networks that represent international arms transfers and email exchange, respectively. The data allow to demonstrate the applicability and interpretation of the network models.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business


[88]
M. Binder, F. Pfisterer and B. Bischl.
Collecting empirical data about hyperparameters for data driven AutoML.
7th Workshop on Automated Machine Learning (AutoML 2020) co-located with ICML 2020. Virtual, Jul 18, 2020. PDF.
Abstract

All optimization needs some kind of prior over the functions it is optimizing over. We used a large computing cluster to collect empirical data about the behavior of ML performance, by randomly sampling hyperparameter values and performing cross-validation. We also collected information about cross-validation error by performing some evaluations multiple times, and information about progression of performance with respect to training data size by performing some evaluations on data subsets. We present how we collected data, make some preliminary analyses on the surrogate models that can be built with them, and give an outlook over interesting analyses this should enable.

MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[87]
C. Molnar, G. König, J. Herbinger, T. Freiesleben, S. Dandl, C. A. Scholbeck, G. Casalicchio, M. Grosse-Wentrup and B. Bischl.
General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models.
Workshop on Extending Explainable AI Beyond Deep Models and Classifiers (XXAI 2020) at the 37th International Conference on Machine Learning (ICML 2020). Virtual, Jul 12-18, 2020. DOI.
MCML Authors
Link to Christian Scholbeck

Christian Scholbeck

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Moritz Grosse-Wentrup

Moritz Grosse-Wentrup

Prof. Dr.

* Former member

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[86]
M. Binder, J. Moosbauer, J. Thomas and B. Bischl.
Multi-Objective Hyperparameter Tuning and Feature Selection Using Filter Ensembles.
Genetic and Evolutionary Computation Conference (GECCO 2020). Cancun, Mexico, Jul 08-12, 2020. DOI.
Abstract

Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield better model interpretability and lower cost of data acquisition, data handling and model inference. While sparsity may have a beneficial or detrimental effect on predictive performance, a small drop in performance may be acceptable in return for a substantial gain in sparseness. We therefore treat feature selection as a multi-objective optimization task. We perform hyperparameter tuning and feature selection simultaneously because the choice of features of a model may influence what hyperparameters perform well. We present, benchmark, and compare two different approaches for multi-objective joint hyperparameter optimization and feature selection: The first uses multi-objective model-based optimization. The second is an evolutionary NSGA-II-based wrapper approach to feature selection which incorporates specialized sampling, mutation and recombination operators. Both methods make use of parameterized filter ensembles. While model-based optimization needs fewer objective evaluations to achieve good performance, it incurs computational overhead compared to the NSGA-II, so the preferred choice depends on the cost of evaluating a model on given data.

MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[85]
A. Beer, V. Hartmann and T. Seidl.
Orderings of Data - more than a Tripping Hazard.
32nd International Conference on Scientific and Statistical Database Management (SSDBM 2020). Vienna, Austria, Jul 07-09, 2020. DOI.
Abstract

As data processing techniques get more and more sophisticated every day, many of us researchers often get lost in the details and subtleties of the algorithms we are developing and far too easily seem to forget to look also at the very first steps of every algorithm: the input of the data. Since there are plenty of library functions for this task, we indeed do not have to think about this part of the pipeline anymore. But maybe we should. All data is stored and loaded into a program in some order. In this vision paper we study how ignoring this order can (1) lead to performance issues and (2) make research results unreproducible. We furthermore examine desirable properties of a data ordering and why current approaches are often not suited to tackle the two mentioned problems.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[84]
N. Kassner and H. Schütze.
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly.
58th Annual Meeting of the Association for Computational Linguistics (ACL 2020). Virtual, Jul 05-10, 2020. DOI.
Abstract

Building on Petroni et al. 2019, we propose two new probing tasks analyzing factual knowledge stored in Pretrained Language Models (PLMs). (1) Negation. We find that PLMs do not distinguish between negated (‘‘Birds cannot [MASK]”) and non-negated (‘‘Birds can [MASK]”) cloze questions. (2) Mispriming. Inspired by priming methods in human psychology, we add “misprimes” to cloze questions (‘‘Talk? Birds can [MASK]”). We find that PLMs are easily distracted by misprimes. These results suggest that PLMs still have a long way to go to adequately learn human-like factual knowledge.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[83]
N. Ellenbach, A.-L. Boulesteix, B. Bischl, K. Unger and R. Hornung.
Improved outcome prediction across data sources through robust parameter tuning.
Journal of Classification (Jul. 2020). DOI.
Abstract

In many application areas, prediction rules trained based on high-dimensional data are subsequently applied to make predictions for observations from other sources, but they do not always perform well in this setting. This is because data sets from different sources can feature (slightly) differing distributions, even if they come from similar populations. In the context of high-dimensional data and beyond, most prediction methods involve one or several tuning parameters. Their values are commonly chosen by maximizing the cross-validated prediction performance on the training data. This procedure, however, implicitly presumes that the data to which the prediction rule will be ultimately applied, follow the same distribution as the training data. If this is not the case, less complex prediction rules that slightly underfit the training data may be preferable. Indeed, a tuning parameter does not only control the degree of adjustment of a prediction rule to the training data, but also, more generally, the degree of adjustment to the distribution of the training data. On the basis of this idea, in this paper we compare various approaches including new procedures for choosing tuning parameter values that lead to better generalizing prediction rules than those obtained based on cross-validation. Most of these approaches use an external validation data set. In our extensive comparison study based on a large collection of 15 transcriptomic data sets, tuning on external data and robust tuning with a tuned robustness parameter are the two approaches leading to better generalizing prediction rules.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[82]
M. Lotfollahi, M. Naghipourfar, M. D. Luecken, M. Khajavi, M. Büttner, Ž. Avsec, A. V. Misharin and F. J. Theis.
Query to reference single-cell integration with transfer learning.
Preprint at bioRxiv (Jul. 2020). DOI.
MCML Authors
Link to Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems


[81]
S. Friedl, S. Schmoll, F. Borutta and M. Schubert.
SMART-Env.
21st IEEE International Conference on Mobile Data Management (MDM 2020). Versailles, France, Jun 30-Jul 03, 2020. DOI.
Abstract

In this work, we present SMART-Env (Spatial Multi-Agent Resource search Training Environment), a spatio-temporal multi-agent environment for evaluating and training different kinds of agents on resource search tasks. We explain how to simulate arbitrary spawning distributions on real-world street graphs, compare agents’ behavior and evaluate their performance over time. Finally, we demonstrate SMART-Env in a taxi dispatching scenario with three different kinds of agents.

MCML Authors
Sabrina Friedl

Sabrina Friedl

* Former member

Link to Felix Borutta

Felix Borutta

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[80]
F. Wimbauer, N. Yang, L. von Stumberg, N. Zeller and D. Cremers.
MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020). Virtual, Jun 14-19, 2020. DOI.
MCML Authors
Link to Felix Wimbauer

Felix Wimbauer

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[79]
M. Ali, C. T. Hoyt, L. Vermue, M. Galkin and M. Berrendorf.
pykeen/benchmarking. Version v1.0.
2020. DOI.
MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member


[78]
A. Beyer, G. Kauermann and H. Schütze.
Embedding Space Correlation as a Measure of Domain Similarity.
12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 13-15, 2020. URL.
Abstract

Prior work has determined domain similarity using text-based features of a corpus. However, when using pre-trained word embeddings, the underlying text corpus might not be accessible anymore. Therefore, we propose the CCA measure, a new measure of domain similarity based directly on the dimension-wise correlations between corresponding embedding spaces. Our results suggest that an inherent notion of domain can be captured this way, as we are able to reproduce our findings for different domain comparisons for English, German, Spanish and Czech as well as in cross-lingual comparisons. We further find a threshold at which the CCA measure indicates that two corpora come from the same domain in a monolingual setting by applying permutation tests. By evaluating the usability of the CCA measure in a domain adaptation application, we also show that it can be used to determine which corpora are more similar to each other in a cross-domain sentiment detection task.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning


[77]
J. Jungmaier, N. Kassner and B. Roth.
Dirichlet-Smoothed Word Embeddings for Low-Resource Settings.
12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 13-15, 2020. URL.
MCML Authors

[76]
F. Borutta, D. Kazempour, F. Marty, P. Kröger and T. Seidl.
Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform.
24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2020). Singapore, May 11-14, 2020. DOI.
Abstract

When facing high-dimensional data streams, clustering algorithms quickly reach the boundaries of their usefulness as most of these methods are not designed to deal with the curse of dimensionality. Due to inherent sparsity in high-dimensional data, distances between objects tend to become meaningless since the distances between any two objects measured in the full dimensional space tend to become the same for all pairs of objects. In this work, we present a novel oriented subspace clustering algorithm that is able to deal with such issues and detects arbitrarily oriented subspace clusters in high-dimensional data streams. Data streams generally implicate the challenge that the data cannot be stored entirely and hence there is a general demand for suitable data handling strategies for clustering algorithms such that the data can be processed within a single scan. We therefore propose the CASHSTREAM algorithm that unites state-of-the-art stream processing techniques and additionally relies on the Hough transform to detect arbitrarily oriented subspace clusters. Our experiments compare CASHSTREAM to its static counterpart and show that the amount of consumed memory is significantly decreased while there is no loss in terms of runtime.

MCML Authors
Link to Felix Borutta

Felix Borutta

Dr.

* Former member

Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[75]
S. Klau, M.-L. Martin-Magniette, A.-L. Boulesteix and S. Hoffmann.
Sampling uncertainty versus method uncertainty: a general framework with applications to omics biomarker selection.
Biometrical Journal 62.3 (May. 2020). DOI.
Abstract

Uncertainty is a crucial issue in statistics which can be considered from different points of view. One type of uncertainty, typically referred to as sampling uncertainty, arises through the variability of results obtained when the same analysis strategy is applied to different samples. Another type of uncertainty arises through the variability of results obtained when using the same sample but different analysis strategies addressing the same research question. We denote this latter type of uncertainty as method uncertainty. It results from all the choices to be made for an analysis, for example, decisions related to data preparation, method choice, or model selection. In medical sciences, a large part of omics research is focused on the identification of molecular biomarkers, which can either be performed through ranking or by selection from among a large number of candidates. In this paper, we introduce a general resampling-based framework to quantify and compare sampling and method uncertainty. For illustration, we apply this framework to different scenarios related to the selection and ranking of omics biomarkers in the context of acute myeloid leukemia: variable selection in multivariable regression using different types of omics markers, the ranking of biomarkers according to their predictive performance, and the identification of differentially expressed genes from RNA-seq data. For all three scenarios, our findings suggest highly unstable results when the same analysis strategy is applied to two independent samples, indicating high sampling uncertainty and a comparatively smaller, but non-negligible method uncertainty, which strongly depends on the methods being compared.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


[74]
J. Klicpera, J. Groß and S. Günnemann.
Directional Message Passing for Molecular Graphs.
8th International Conference on Learning Representations (ICLR 2020). Virtual, Apr 26-May 01, 2020. URL.
MCML Authors
Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[73]
O. Shchur, M. Biloš and S. Günnemann.
Intensity-Free Learning of Temporal Point Processes (selected for spotlight presentation).
8th International Conference on Learning Representations (ICLR 2020). Virtual, Apr 26-May 01, 2020. URL.
MCML Authors
Link to Oleksandr Shchur

Oleksandr Shchur

Dr.

* Former member

Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[72]
M. Berrendorf, E. Faerman and V. Tresp.
Active Learning for Entity Alignment.
5th International Workshop on Deep Learning for Graphs (DL4G@WWW2020) at the ACM Web Conference 2020 (WWW 2020). Taipeh, Taiwan, Apr 21, 2020. arXiv.
MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[71]
M. Berrendorf, E. Faerman, L. Vermue and V. Tresp.
Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank (Extended Abstract).
5th International Workshop on Deep Learning for Graphs (DL4G@WWW2020) at the ACM Web Conference 2020 (WWW 2020). Taipeh, Taiwan, Apr 21, 2020. Full papaer at WI-AT 2020. DOI.
MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining


[70]
M. C. Altinigneli, L. Miklautz, C. Böhm and C. Plant.
Hierarchical Quick Shift Guided Recurrent Clustering.
36th IEEE International Conference on Data Engineering (ICDE 2020). Dallas, TX, USA, Apr 20-24, 2020. DOI.
Abstract

We propose a novel density-based mode-seeking Hierarchical Quick Shift clustering algorithm with an optional Recurrent Neural Network (RNN) to jointly learn the cluster assignments for every sample and the underlying dynamics of the mode-seeking clustering process. As a mode-seeking clustering algorithm, Hierarchical Quick Shift constrains data samples to stay on similar trajectories. All data samples converging to the same local mode are assigned to a common cluster. The RNN enables us to learn quasi-temporal structures during the mode-seeking clustering process. It supports variable density clusters with arbitrary shapes without requiring the expected number of clusters a priori. We evaluate our method in extensive experiments to show the advantages over other density-based clustering algorithms.

MCML Authors
Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[69]
M. Berrendorf, E. Faerman, V. Melnychuk, V. Tresp and T. Seidl.
Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned.
42nd European Conference on Information Retrieval (ECIR 2020). Virtual, Apr 14-17, 2020. DOI. GitHub.
Abstract

In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive.We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets.We believe that people interested in KG matching might profit from our work, as well as novices entering the field.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[68]
L. Miklautz, D. Mautz, M. C. Altinigneli, C. Böhm and C. Plant.
Deep embedded non-redundant clustering.
34th Conference on Artificial Intelligence (AAAI 2020). New York City, New York, USA, Feb 07-12, 2020. DOI.
Abstract

Complex data types like images can be clustered in multi- ple valid ways. Non-redundant clustering aims at extracting those meaningful groupings by discouraging redundancy be- tween clusterings. Unfortunately, clustering images in pixel space directly has been shown to work unsatisfactory. This has increased interest in combining the high representational power of deep learning with clustering, termed deep cluster- ing. Algorithms of this type combine the non-linear embed- ding of an autoencoder with a clustering objective and op- timize both simultaneously. None of these algorithms try to find multiple non-redundant clusterings. In this paper, we pro- pose the novel Embedded Non-Redundant Clustering algo- rithm (ENRC). It is the first algorithm that combines neural- network-based representation learning with non-redundant clustering. ENRC can find multiple highly non-redundant clusterings of different dimensionalities within a data set. This is achieved by (softly) assigning each dimension of the embedded space to the different clusterings. For instance, in image data sets it can group the objects by color, material and shape, without the need for explicit feature engineering. We show the viability of ENRC in extensive experiments and em- pirically demonstrate the advantage of combining non-linear representation learning with non-redundant clustering.

MCML Authors
Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[67]
M. Becker, P. Schratz, M. Lang and B. Bischl.
mlr3fselect: Feature Selection for 'mlr3'.
2020. URL.
MCML Authors
Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[66]
M. Binder, F. Pfisterer, L. Schneider, B. Bischl, M. Lang and S. Dandl.
mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'.
2020. URL. GitHub.
MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[65]
M. Herrmann.
fda-ndr: Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction. R package.
2020. GitHub.
MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine


[64]
M. Herrmann.
manifun: Collection of functions to work with embeddings and functional data. R package.
2020. GitHub.
MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine


[63]
M. Lang.
mlr3db: Data Base Backend for 'mlr3'.
2020. URL. GitHub.
MCML Authors

[62]
M. Lang.
mlr3oml: Connector Between 'mlr3' and 'OpenML'.
2020. URL. GitHub.
MCML Authors

[61]
M. Lang, Q. Au, S. Coors and P. Schratz.
mlr3learners: Recommended Learners for 'mlr3'.
2020. URL. GitHub.
MCML Authors

[60]
M. Lang, P. Schratz and R. Sonabend.
mlr3viz: Visualizations for 'mlr3'.
2020. URL. GitHub.
MCML Authors

[59]
D. Pulatov and M. Lang.
mlr3cluster: Cluster Extension for 'mlr3'.
2020. URL. GitHub.
MCML Authors

[58]
F. Scheipl, J. Goldsmith and J. Wrobel.
tidyfun: Tools for Tidy Functional Data. R package.
2020. URL. GitHub.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[57]
P. Schratz, M. Lang, B. Bischl and M. Binder.
mlr3filters: Filter Based Feature Selection for 'mlr3'.
2020. URL. GitHub.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science


[56]
R. Sonabend, F. Kiraly and M. Lang.
mlr3proba: Probabilistic Supervised Learning for 'mlr3'. R package version 0.2.6.
2020. DOI. URL.
MCML Authors

[55]
J. Wrobel, A. Bauer, J. Goldsmith, E. McDonnel and F. Scheipl.
registr: Curve Registration for Exponential Family Functional Data. R package.
2020. GitHub.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[54]
F. Borutta.
Unsupervised learning on social data.
Dissertation 2020. DOI.
Abstract

This thesis addresses several challenges in social data analytics, focusing on methods for clustering, learning from network data, and analyzing dynamic social data. It introduces novel algorithms for correlation clustering on streaming data, hierarchical clustering for social maps, and user identification based on spatio-temporal mobility patterns. Additionally, the thesis presents various node embedding techniques for learning representations from network topology and proposes a graph neural network model for matching nodes across overlapping graphs. (Shortened.)

MCML Authors
Link to Felix Borutta

Felix Borutta

Dr.

* Former member


[53]
Y. Ma.
Learning with relational knowledge in the context of cognition, quantum computing, and causality.
Dissertation 2020. DOI.
Abstract

This dissertation explores the use of knowledge graphs, including semantic and episodic graphs, for representing static and evolving human knowledge, and proposes methods for improving knowledge inference. It introduces two quantum machine learning algorithms aimed at speeding up knowledge graph inference, demonstrating significant speedups over classical methods. Additionally, the work addresses causal inference in relational data, specifically in social networks, and proposes causal estimators using graph neural networks to estimate superimposed effects and optimize treatment assignments for network welfare. (Shortened.)

MCML Authors
Link to Yunpu Ma

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning


[52]
D. Davletshina, V. Melnychuk, V. Tran, H. Singla, M. Berrendorf, E. Faerman, M. Fromm and M. Schubert.
Unsupervised Anomaly Detection for X-Ray Images.
Preprint at arXiv (Jan. 2020). arXiv.
MCML Authors
Link to Valentyn Melnychuk

Valentyn Melnychuk

Artificial Intelligence in Management

Link to Viet Tran

Viet Tran

Biomedical Statistics and Data Science

Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Michael Fromm

Michael Fromm

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


2019


[51]
M. Biloš, B. Charpentier and S. Günnemann.
Uncertainty on Asynchronous Time Event Prediction (Poster).
33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, Canada, Dec 08-14, 2019. PDF.
MCML Authors
Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[50]
A. Bojchevski and S. Günnemann.
Certifiable Robustness to Graph Perturbations.
33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, Canada, Dec 08-14, 2019. PDF.
MCML Authors
Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[49]
J. Klicpera, S. Weißenberger and S. Günnemann.
Diffusion Improves Graph Learning.
33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, Canada, Dec 08-14, 2019. PDF.
MCML Authors
Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[48]
E. Faerman, O. Voggenreiter, F. Borutta, T. Emrich, M. Berrendorf and M. Schubert.
Graph Alignment Networks with Node Matching Scores.
Workshop on Graph Representation Learning at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, Canada, Dec 08-14, 2019. PDF.
Abstract

In this work we address the problem of graph node alignment at the example of Map Fusion (MF). Given two partly overlapping road networks, the goal is to match nodes that represent the same locations in both networks. For this task we propose a new model based on Graph Neural Networks (GNN). Existing GNN approaches, which have recently been successfully applied on various tasks for graph based data, show poor performance for the MF task. We hypothesize that this is mainly caused by graph regions from the non-overlapping areas, as information from those areas negatively affect the learned node representations. Therefore, our model has an additional inductive bias and learns to ignore effects of nodes that do not have a matching in the other graph. Our new model can easily be extended to other graph alignment problems, e.g., for calculating graph similarities, or for the alignment of entities in knowledge graphs, as well.

MCML Authors
Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Felix Borutta

Felix Borutta

Dr.

* Former member

Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[47]
M. Binder, J. Moosbauer, J. Thomas and B. Bischl.
Multi-Objective Hyperparameter Tuning and Feature Selection using Filter Ensembles.
Preprint at arXiv (Dec. 2019). arXiv.
MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[46]
M. Lang, M. Binder, J. Richter, P. Schratz, F. Pfisterer, S. Coors, Q. A. Q. A., G. Casalicchio, L. Kotthoff and B. Bischl.
mlr3: A modern object-oriented machine learning framework in R.
The Journal of Open Source Software 4.44 (Dec. 2019). DOI.
MCML Authors
Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[45]
D. Mautz, C. Plant and C. Böhm.
Deep Embedded Cluster Tree.
19th IEEE International Conference on Data Mining (ICDM 2019). Beijing, China, Nov 08-11, 2019. DOI.
Abstract

The idea of combining the high representational power of deep learning techniques with clustering methods has gained much interest in recent years. Optimizing representation and clustering simultaneously has been shown to have an advantage over optimizing them separately. However, so far all proposed methods have been using a flat clustering strategy, with the true number of clusters known a priori. In this paper, we propose the Deep Embedded Cluster Tree (DeepECT), the first divisive hierarchical embedded clustering method. The cluster tree does not need to know the true number of clusters during optimization. Instead, the level of detail to be analyzed can be chosen afterward and for each sub-tree separately. An optional data-augmentation-based extension allows DeepECT to ignore prior-known invariances of the dataset, such as affine transformations in image data. We evaluate and show the advantages of DeepECT in extensive experiments.

MCML Authors
Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[44]
E. Faerman, M. Rogalla, N. Strauß, A. Krüger, B. Blümel, M. Berrendorf, M. Fromm and M. Schubert.
Spatial Interpolation with Message Passing Framework.
IEEE International Conference on Data Mining Workshops (ICDMW 2019). Beijing, China, Nov 08-11, 2019. DOI.
Abstract

Spatial interpolation is the task to predict a measurement for any location in a given geographical region. To train a prediction model, we assume to have point-wise measurements for various locations in the region. In addition, it is often beneficial to consider historic measurements for these locations when training an interpolation model. Typical use cases are the interpolation of weather, pollution or traffic information. In this paper, we introduce a new type of model with strong relational inductive bias based on Message Passing Networks. In addition, we extend our new model to take geomorphological characteristics into account to improve the prediciton quality. We provide an extensive evaluation based on a large real-world weather dataset and compare our new approach with classical statistical interpolation techniques and Neural Networks without inductive bias.

MCML Authors
Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Niklas Strauß

Niklas Strauß

Database Systems & Data Mining

Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Michael Fromm

Michael Fromm

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[43]
M. Fromm, M. Berrendorf, E. Faerman, Y. Chen, B. Schüss and M. Schubert.
XD-STOD: Cross-Domain Superresolution for Tiny Object Detection.
IEEE International Conference on Data Mining Workshops (ICDMW 2019). Beijing, China, Nov 08-11, 2019. DOI.
Abstract

Monitoring the restoration of natural habitats after human intervention is an important task in the field of remote sensing. Currently, this requires extensive field studies entailing considerable costs. Unmanned Aerial vehicles (UAVs, a.k.a. drones) have the potential to reduce these costs, but generate immense amounts of data which have to be evaluated automatically with special techniques. Especially the automated detection of tree seedlings poses a big challenge, as their size and shape vary greatly across images. In addition, there is a tradeoff between different flying altitudes. Given the same camera equipment, a lower flying altitude achieves higher resolution images and thus, achieving high detection rates is easier. However, the imagery will only cover a limited area. On the other hand, flying at larger altitudes, allows for covering larger areas, but makes seedling detection more challenging due to the coarser images. In this paper we investigate the usability of super resolution (SR) networks for the case that we can collect a large amount of coarse imagery on higher flying altitudes, but only a small amount of high resolution images from lower flying altitudes. We use a collection of high-resolution images taken by a drone at 5m altitude. After training the SR models on these data, we evaluate their applicability to low quality images taken at 30m altitude (in-domain). In addition, we investigate and compare whether approaches trained on a highly diverse large data sets can be transferred to these data (cross-domain). We also evaluate the usability of the SR results based on their influence on the detection rate of different object detectors. We found that the features acquired from training on standard SR data sets are transferable to the drone footage. Furthermore, we demonstrate that the detection rate of common object detectors can be improved by SR techniques using both settings, in-domain and cross-domain.

MCML Authors
Link to Michael Fromm

Michael Fromm

Dr.

* Former member

Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[42]
F. Lüer, D. Mautz and C. Böhm.
Anomaly Detection in Time Series using Generative Adversarial Networks.
IEEE International Conference on Data Mining Workshops (ICDMW 2019). Beijing, China, Nov 08-11, 2019. DOI.
Abstract

Generative Adversarial Networks (GANs) have been applied to an increasing amount of tasks, especially related to image data. A comparably recent advance was their application to the domain of anomaly detection in images and, even more recently, on spatiotemporal data. In this work, a recurrent GAN (RGAN) is applied on cardiovascular data from the MIT-BIH dataset to learn the natural variety of normal sinus rhythms in a healthy individual. The generator is used to reconstruct samples using differently parameterized levels of similarity and thresholds. We find that solely using the generator already allows a surprisingly good anomaly detection performance. Furthermore, we discuss adding the discriminator, which might significantly improve the performance. Future work also includes only using the discriminator, minimizing the time required for inference, which is important for streaming data.

MCML Authors
Link to Christian Böhm

Christian Böhm

Prof. Dr.

* Former member


[41]
F. Borutta, S. Schmoll and S. Friedl.
Optimizing the Spatio-Temporal Resource Search Problem with Reinforcement Learning.
27th International Conference on Advances in Geographic Information Systems (SIGSPATIAL 2019). Chicago, ILL, USA, Nov 05-08, 2019. DOI.
Abstract

Collecting spatio-temporal resources is an important goal in many real-world use cases such as finding customers for taxicabs. In this paper, we tackle the resource search problem posed by the GIS Cup 2019 where the objective is to minimize the average search time of taxicabs looking for customers. The main challenge is that the taxicabs may not communicate with each other and the only observation they have is the current time and position. Inspired by radial transit route structures in urban environments, our approach relies on round trips that are used as action space for a downstream reinforcement learning procedure. Our source code is publicly available at https://github.com/Fe18/TripBanditAgent.

MCML Authors
Link to Felix Borutta

Felix Borutta

Dr.

* Former member

Sabrina Friedl

Sabrina Friedl

* Former member


[40]
F. Pfisterer, L. Beggel, X. Sun, F. Scheipl and B. Bischl.
Benchmarking time series classification -- Functional data vs machine learning approaches.
Preprint at arXiv (Nov. 2019). arXiv.
Abstract

Time series classification problems have drawn increasing attention in the machine learning and statistical community. Closely related is the field of functional data analysis (FDA): it refers to the range of problems that deal with the analysis of data that is continuously indexed over some domain. While often employing different methods, both fields strive to answer similar questions, a common example being classification or regression problems with functional covariates. We study methods from functional data analysis, such as functional generalized additive models, as well as functionality to concatenate (functional-) feature extraction or basis representations with traditional machine learning algorithms like support vector machines or classification trees. In order to assess the methods and implementations, we run a benchmark on a wide variety of representative (time series) data sets, with in-depth analysis of empirical results, and strive to provide a reference ranking for which method(s) to use for non-expert practitioners. Additionally, we provide a software framework in R for functional data analysis for supervised learning, including machine learning and more linear approaches from statistics. This allows convenient access, and in connection with the machine-learning toolbox mlr, those methods can now also be tuned and benchmarked.

MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[39]
F. Pfisterer, J. Thomas and B. Bischl.
Towards Human Centered AutoML.
Preprint at arXiv (Nov. 2019). arXiv.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[38]
G. König and M. Grosse-Wentrup.
A Causal Perspective on Challenges for AI in Precision Medicine.
2nd International Congress on Precision Medicine (PMBC 2019). Munich, Germany, Oct 14-15, 2019.
MCML Authors
Link to Moritz Grosse-Wentrup

Moritz Grosse-Wentrup

Prof. Dr.

* Former member


[37]
F. Borutta, J. Busch, E. Faerman, A. Klink and M. Schubert.
Structural Graph Representations based on Multiscale Local Network Topologies.
IEEE/WIC/ACM International Conference on Web Intelligence (WI 2019). Thessaloniki, Greece, Oct 14-17, 2019. DOI.
Abstract

In many applications, it is required to analyze a graph merely based on its topology. In these cases, nodes can only be distinguished based on their structural neighborhoods and it is common that nodes having the same functionality or role yield similar neighborhood structures. In this work, we investigate two problems: (1) how to create structural node embeddings which describe a node’s role and (2) how important the nodes’ roles are for characterizing entire graphs. To describe the role of a node, we explore the structure within the local neighborhood (or multiple local neighborhoods of various extents) of the node in the vertex domain, compute the visiting probability distribution of nodes in the local neighborhoods and summarize each distribution to a single number by computing its entropy. Furthermore, we argue that the roles of nodes are important to characterize the entire graph. Therefore, we propose to aggregate the role representations to describe whole graphs for graph classification tasks. Our experiments show that our new role descriptors outperform state-of-the-art structural node representations that are usually more expensive to compute. Additionally, we achieve promising results compared to advanced state-of-the-art approaches for graph classification on various benchmark datasets, often outperforming these approaches.

MCML Authors
Link to Felix Borutta

Felix Borutta

Dr.

* Former member

Link to Evgeny Faerman

Evgeny Faerman

Dr.

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[36]
A. Beer, J. Lauterbach and T. Seidl.
MORe++: k-Means Based Outlier Removal on High-Dimensional Data.
12th International Conference on Similarity Search and Applications (SISAP 2019). Newark, New York, USA, Oct 02-04, 2019. DOI.
Abstract

MORe++ is a k-Means based Outlier Removal method working on high dimensional data. It is simple, efficient and scalable. The core idea is to find local outliers by examining the points of different k-Means clusters separately. Like that, one-dimensional projections of the data become meaningful and allow to find one-dimensional outliers easily, which else would be hidden by points of other clusters. MORe++ does not need any additional input parameters than the number of clusters k used for k-Means, and delivers an intuitively accessible degree of outlierness. In extensive experiments it performed well compared to k-Means– and ORC.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[35]
M. Berrendorf, F. Borutta and P. Kröger.
k-Distance Approximation for Memory-Efficient RkNN Retrieval.
12th International Conference on Similarity Search and Applications (SISAP 2019). Newark, New York, USA, Oct 02-04, 2019. DOI.
Abstract

For a given query object, Reverse k-Nearest Neighbor queries retrieve those objects that have the query object among their k-nearest neighbors. However, computing the k-nearest neighbor sets for all points in a database is expensive in terms of computational costs. Therefore, specific index structures have been invented to apply pruning heuristics which aim at reducing the search space. At time, the state-of-the-art index structure for enabling fast RkNN query processing in general metric spaces is the MRkNNCoP-Tree which uses linear functions to approximate lower and upper bounds on the k-distances to prune the search space. Storing those linear functions results in additional storage costs in O(n) which might be infeasible in situation where storage space is limited, e.g., on mobile devices. In this work, we present a novel index based on the MRkNNCoP-Tree as well as recent developments in the field of neural indexing. By learning a single neural network model that approximates the k-nearest neighbor distance bounds for all points in a database, the storage complexity of the proposed index structure is reduced to O(1) while the index is still able to guarantee exact query results. As shown in our experimental evaluations on synthetic and real-world data sets, our approach can significantly reduce the required storage space in trade-off to some growth in terms of refinement sets when relying on exact query processing.

MCML Authors
Link to Max Berrendorf

Max Berrendorf

Dr.

* Former member

Link to Felix Borutta

Felix Borutta

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member


[34]
F. Borutta, P. Kröger and T. Hubauer.
A Generic Summary Structure for Arbitrarily Oriented Subspace Clustering in Data Streams.
12th International Conference on Similarity Search and Applications (SISAP 2019). Newark, New York, USA, Oct 02-04, 2019. DOI.
Abstract

Nowadays, as lots of data is gathered in large volumes and with high velocity, the development of algorithms capable of handling complex data streams in (near) real-time is a major challenge. In this work, we present the algorithm CORRSTREAM which tackles the problem of detecting arbitrarily oriented subspace clusters in high-dimensional data streams. The proposed method follows a two phase approach, where the continuous online phase aggregates data points within a proper microcluster structure that stores all necessary information to define a microcluster’s subspace and is generic enough to cope with a variety of offline procedures. Given several such microclusters, the offline phase is able to build a final clustering model which reveals arbitrarily oriented subspaces in which the data tend to cluster. In our experimental evaluation, we show that CORRSTREAM not only has an acceptable throughput but also outperforms static counterpart algorithms by orders of magnitude when considering the runtime. At the same time, the loss of accuracy is quite small.

MCML Authors
Link to Felix Borutta

Felix Borutta

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member


[33]
D. Kazempour, M. Hünemörder and T. Seidl.
On coMADs and Principal Component Analysis.
12th International Conference on Similarity Search and Applications (SISAP 2019). Newark, New York, USA, Oct 02-04, 2019. DOI.
Abstract

Principal Component Analysis (PCA) is a popular method for linear dimensionality reduction. It is often used to discover hidden correlations or to facilitate the interpretation and visualization of data. However, it is liable to suffer from outliers. Strong outliers can skew the principal components and as a consequence lead to a higher reconstruction loss. While there exist several sophisticated approaches to make the PCA more robust, we present an approach which is intriguingly simple: we replace the covariance matrix by a so-called coMAD matrix. The first experiments show that PCA based on the coMAD matrix is more robust towards outliers.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[32]
D. Kazempour and T. Seidl.
On coMADs and Principal Component Analysis.
12th International Conference on Similarity Search and Applications (SISAP 2019). Newark, New York, USA, Oct 02-04, 2019. DOI.
MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[31]
L. Della Libera, V. Golkov, Y. Zhu, A. Mielke and D. Cremers.
Deep Learning for 2D and 3D Rotatable Data: An Overview of Methods.
Preprint at arXiv (Oct. 2019). arXiv.
MCML Authors
Link to Vladimir Golkov

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[30]
A. Beer, N. S. Schüler and T. Seidl.
A Generator for Subspace Clusters.
Conference on Lernen. Wissen. Daten. Analysen (LWDA 2019). Berlin, Germany, Sep 30-Oct 02, 2019. PDF.
Abstract

We introduce a generator for data containing subspace clus- ters which is accurately tunable and adjustable to the needs of developers. It is online available and allows to give a plethora of characteristics the data should contain, while it is simultaneously able to generate mean- ingful data containing subspace clusters with a minimum of input data.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[29]
D. Kazempour, A. Beer, O. Schrüfer and T. Seidl.
Clustering Trend Data Time-Series through Segmentation of FFT-decomposed Signal Constituents.
Conference on Lernen. Wissen. Daten. Analysen (LWDA 2019). Berlin, Germany, Sep 30-Oct 02, 2019. PDF.
MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[28]
D. Kazempour, L. M. Yan and T. Seidl.
From Covariance to Comode in context of Principal Component Analysis.
Conference on Lernen. Wissen. Daten. Analysen (LWDA 2019). Berlin, Germany, Sep 30-Oct 02, 2019. PDF.
Abstract

When it comes to the task of dimensionality reduction, the Principal Component Analysis (PCA) is among the most well known methods. Despite its popularity, PCA is prone to outliers which can be traced back to the fact that this method relies on a covariance matrix. Even with the variety of sophisticated methods to enhance the robust- ness of the PCA, we provide here in this work-in-progress an approach which is intriguingly simple: the covariance matrix is replaced by a so- called comode matrix. Through this minor modification the experiments show that the reconstruction loss is significantly reduced. In this work we introduce the comode and its relation to the MeanShift algorithm, including its bandwidth parameter, compare it in an experiment against the classic covariance matrix and evaluate the impact of the bandwidth hyperparameter on the reconstruction error.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[27]
L. Beggel, M. Pfeiffer and B. Bischl.
Robust Anomaly Detection in Images Using Adversarial Autoencoders.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2019). Wuerzburg, Germany, Sep 16-20, 2019. DOI.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[26]
J. Goschenhofer, F. M. J. Pfister, K. A. Yuksel, B. Bischl, U. Fietzek and J. Thomas.
Wearable-based Parkinson's Disease Severity Monitoring using Deep Learning.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2019). Wuerzburg, Germany, Sep 16-20, 2019. DOI.
MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[25]
C. Molnar, G. Casalicchio and B. Bischl.
Quantifying Model Complexity via Functional Decomposition for Better Post-hoc Interpretability.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2019). Wuerzburg, Germany, Sep 16-20, 2019. DOI.
Abstract

Post-hoc model-agnostic interpretation methods such as partial dependence plots can be employed to interpret complex machine learning models. While these interpretation methods can be applied regardless of model complexity, they can produce misleading and verbose results if the model is too complex, especially w.r.t. feature interactions. To quantify the complexity of arbitrary machine learning models, we propose model-agnostic complexity measures based on functional decomposition: number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures is more reliable and compact. Furthermore, we demonstrate the application of these measures in a multi-objective optimization approach which simultaneously minimizes loss and complexity.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[24]
C. A. Scholbeck, C. Molnar, C. Heumann, B. Bischl and G. Casalicchio.
Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model Agnostic Interpretations.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2019). Wuerzburg, Germany, Sep 16-20, 2019. DOI.
MCML Authors
Link to Christian Scholbeck

Christian Scholbeck

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[23]
F. Pfisterer, S. Coors, J. Thomas and B. Bischl.
Multi-Objective Automatic Machine Learning with AutoxgboostMC.
Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases (Workshops ECML-PKDD 2019). Wuerzburg, Germany, Sep 16-20, 2019. arXiv.
Abstract

AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. They often combine techniques from many different sub-fields of machine learning in order to find a model or set of models that optimize a user-supplied criterion, such as predictive performance. The ultimate goal of such systems is to reduce the amount of time spent on menial tasks, or tasks that can be solved better by algorithms while leaving decisions that require human intelligence to the end-user. In recent years, the importance of other criteria, such as fairness and interpretability, and many others have become more and more apparent. Current AutoML frameworks either do not allow to optimize such secondary criteria or only do so by limiting the system’s choice of models and preprocessing steps. We propose to optimize additional criteria defined by the user directly to guide the search towards an optimal machine learning pipeline. In order to demonstrate the need and usefulness of our approach, we provide a simple multi-criteria AutoML system and showcase an exemplary application.

MCML Authors

[22]
J. Held, A. Beer and T. Seidl.
Chain-detection Between Clusters.
Datenbank-Spektrum 19 (Sep. 2019). DOI.
Abstract

Chains connecting two or more different clusters are a well known problem of clustering algorithms like DBSCAN or Single Linkage Clustering. Since already a small number of points resulting from, e.g., noise can form such a chain and build a bridge between different clusters, it can happen that the results of the clustering algorithm are distorted: several disparate clusters get merged into one. This single-link effect is rather known but to the best of our knowledge there are no satisfying solutions which extract those chains, yet. We present a new algorithm detecting not only straight chains between clusters, but also bent and noisy ones. Users are able to choose between eliminating one dimensional and higher dimensional chains connecting clusters to receive the underlying cluster structure. Also, the desired straightness can be set by the user. As this paper is an extension of ‘Chain-detection for DBSCAN’, we apply our technique not only in combination with DBSCAN but also with single link hierarchical clustering. On a real world dataset containing traffic accidents in Great Britain we were able to detect chains emerging from streets between cities and villages, which led to clusters composed of diverse villages. Additionally, we analyzed the robustness regarding the variance of chains in synthetic experiments.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[21]
S. Schmoll, S. Friedl and M. Schubert.
Scaling the Dynamic Resource Routing Problem.
16th International Symposium on Spatial and Temporal Databases (SSTD 2019). Vienna, Austria, Aug 19-21, 2019. DOI.
Abstract

Routing to a resource (e.g. a parking spot or charging station) is a probabilistic search problem due to the uncertainty as to whether the resource is available at the time of arrival or not. In recent years, more and more real-time information about the current state of resources has become available in order to facilate this task. Therefore, we consider the case of a driver receiving online updates about the current situation. In this setting, the problem can be described as a fully observable Markov Decision Process (MDP) which can be used to compute an optimal policy minimizing the expected search time. However, current approaches do not scale beyond a dozen resources in a query. In this paper, we suggest to adapt common approximate solutions for solving MDPs. We propose a new re-planning and hindsight planning algorithm that redefine the state space and rely on novel cost estimations to find close to optimal results. Unlike exact solutions for computing MDPs, our approximate planers can scale up to hundreds of resources without prohibitive computational costs. We demonstrate the result quality and the scalability of our approaches on two settings describing the search for parking spots and charging stations in an urban environment.

MCML Authors
Sabrina Friedl

Sabrina Friedl

* Former member

Link to Matthias Schubert

Matthias Schubert

Prof. Dr.

Database Systems & Data Mining


[20]
A. Beer, D. Kazempour, M. Baur and T. Seidl.
Human Learning in Data Science (Poster Extended Abstract).
21st International Conference of Human-Computer Interaction (HCII 2019). Orlando, Florida, USA, Jul 26-31, 2019. DOI.
Abstract

As machine learning becomes a more and more important area in Data Science, bringing with it a rise of abstractness and complexity, the desire for explainability rises, too. With our work we aim to gain explainability focussing on correlation clustering and try to pursue the original goals of different Data Science tasks,: Extracting knowledge from data. As well-known tools like Fold-It or GeoTime show, gamification is a very mighty approach, but not only to solve tasks which prove more difficult for machines than for humans. We could also gain knowledge from how players proceed trying to solve those difficult tasks. That is why we developed Straighten it up!, a game in which users try to find the best linear correlations in high dimensional datasets. Finding arbitrarily oriented subspaces in high dimensional data is an exponentially complex task due to the number of potential subspaces in regards to the number of dimensions. Nevertheless, linearly correlated points are as a simple pattern easy to track by the human eye. Straighten it up! gives users an overview over two-dimensional projections of a self-chosen dataset. Users decide which subspace they want to examine first, and can draw in arbitrarily many lines fitting the data. An offset inside of which points are assigned to the corresponding line can easily be chosen for every line independently, and users can switch between different projections at any time. We developed a scoring system not only as incentive, but first of all for further examination, based on the density of each cluster, its minimum spanning tree, size of offset, and coverage. By tracking every step of a user we are able to detect common mechanisms and examine differences to state-of-the-art correlation and subspace clustering algorithms, resulting in more comprehensibility.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[19]
D. Kazempour, A. Beer and T. Seidl.
Data on RAILs: On interactive generation of artificial linear correlated data (Poster Extended Abstract).
21st International Conference of Human-Computer Interaction (HCII 2019). Orlando, Florida, USA, Jul 26-31, 2019. DOI.
Abstract

Artificially generated data sets are present in many data mining and machine learning publications in the experimental section. One of the reasons to use synthetic data is, that scientists can express their understanding of a “ground truth”, having labels and thus an expectation of what an algorithm should be able to detect. This permits also a degree of control to create data sets which either emphasize the strengths of a method or reveal its weaknesses and thus potential targets for improvement. In order to develop methods which detect linear correlated clusters, the necessity of generating such artificial clusters is indispensable. This is mostly done by command-line based scripts which may be tedious since they demand from users to ‘visualize’ in their minds how the correlated clusters have to look like and be positioned within the data space. We present in this work RAIL, a generator for Reproducible Artificial Interactive Linear correlated data. With RAIL, users can add multiple planes into a data space and arbitrarily change orientation and position of those planes in an interactive fashion. This is achieved by manipulating the parameters describing each of the planes, giving users immediate feedback in real-time. With this approach scientists no longer need to imagine their data but can interactively explore and design their own artificial data sets containing linear correlated clusters. Another convenient feature in this context is that the data is only generated when the users decide that their design phase is completed. If researchers want to share data, a small file is exchanged containing the parameters which describe the clusters through information such as e.g. their Hessian-Normal-Form or number of points per cluster, instead of sharing several large csv files.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[18]
A. Beer, D. Kazempour, L. Stephan and T. Seidl.
LUCK - Linear Correlation Clustering Using Cluster Algorithms and a kNN based Distance Function (short paper).
31st International Conference on Scientific and Statistical Database Management (SSDBM 2019). Santa Cruz, CA, USA, Jul 23-25, 2019. DOI.
Abstract

LUCK allows to use any distance-based clustering algorithm to find linear correlated data. For that a novel distance function is introduced, which takes the distribution of the kNN of points into account and corresponds to the probability of two points being part of the same linear correlation. In this work in progress we tested the distance measure with DBSCAN and k-Means comparing it to the well-known linear correlation clustering algorithms ORCLUS, 4C, COPAC, LMCLUS, and CASH, receiving good results for difficult synthetic data sets containing crossing or non-continuous correlations.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[17]
A. Beer and T. Seidl.
Graph Ordering and Clustering - A Circular Approach.
31st International Conference on Scientific and Statistical Database Management (SSDBM 2019). Santa Cruz, CA, USA, Jul 23-25, 2019. DOI.
Abstract

As the ordering of data, particularly of graphs, can influence the result of diverse Data Mining tasks performed on it heavily, we introduce the Circle-Index, the first internal quality measurement for orderings of graphs. It is based on a circular arrangement of nodes, but takes in contrast to similar arrangements from the field of, e.g., visual analytics, the edge lengths in this arrangement into account. The minimization of the Circle-Index leads to an arrangement which not only offers a simple way to cluster the data using a constrained texttt{MinCut} in only linear time, but is also visually convincing. We developed the clustering algorithm CirClu which implements this minimization and texttt{MinCut}, and compared it with several established clustering algorithms achieving very good results. Simultaneously we compared the Circle-Index with several internal quality measures for clusterings. We observed a strong coherence between the Circle-Index and the matching of achieved clusterings to the respective ground truths in diverse real world datasets.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[16]
D. Kazempour, K. Emmerig, P. Kröger and T. Seidl.
Detecting Global Periodic Correlated Clusters in Event Series based on Parameter Space Transform.
31st International Conference on Scientific and Statistical Database Management (SSDBM 2019). Santa Cruz, CA, USA, Jul 23-25, 2019. DOI.
Abstract

Periodicities are omnipresent: In nature in the cycles of predator and prey populations, reoccurring patterns regarding our power consumption over the days, or the presence of flu diseases over the year. With regards to the importance of periodicities we ask: Is there a way to detect periodic correlated clusters which are hidden in event series? We propose as a work in progress a method for detecting sinusoidal periodic correlated clusters on event series which relies on parameter space transformation. Our contributions are: Providing the first non-linear correlation clustering algorithm for detecting periodic correlated clusters. Further our method provides an explicit model giving domain experts information on parameters such as amplitude, frequency, phase-shift and vertical-shift of the detected clusters. Beyond that we approach the issue of determining an adequate frequency and phase-shift of the detected correlations given a frequency and phase-shift boundary.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[15]
D. Kazempour and T. Seidl.
On systematic hyperparameter analysis through the example of subspace clustering.
31st International Conference on Scientific and Statistical Database Management (SSDBM 2019). Santa Cruz, CA, USA, Jul 23-25, 2019. DOI.
Abstract

In publications where a clustering method is described, the chosen hyperparameters are in many cases to our current observation empirically determined. In this work in progress we discuss and propose one approach on how hyperparameters can be systematically explored and their effects regarding the data set analyzed. We further introduce in the context of hyperparameter analysis a modified definition of the resilience term, which refers here to a subset of data points which persists to be in the same cluster over different hyperparameter settings. In order to analyze relations among different hyperparameters we further introduce the concept of dynamic intersection computing.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[14]
A. Bojchevski and S. Günnemann.
Adversarial Attacks on Node Embeddings via Graph Poisoning.
36th International Conference on Machine Learning (ICML 2019). Long Beach, CA, USA, Jun 09-15, 2019. URL.
MCML Authors
Link to Stephan Günnemann

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning


[13]
Q. Au, D. Schalk, G. Casalicchio, R. Schoedel, C. Stachl and B. Bischl.
Component-Wise Boosting of Targets for Multi-Output Prediction.
Preprint at arXiv (Apr. 2019). arXiv.
Abstract

Multi-output prediction deals with the prediction of several targets of possibly diverse types. One way to address this problem is the so called problem transformation method. This method is often used in multi-label learning, but can also be used for multi-output prediction due to its generality and simplicity. In this paper, we introduce an algorithm that uses the problem transformation method for multi-output prediction, while simultaneously learning the dependencies between target variables in a sparse and interpretable manner. In a first step, predictions are obtained for each target individually. Target dependencies are then learned via a component-wise boosting approach. We compare our new method with similar approaches in a benchmark using multi-label, multivariate regression and mixed-type datasets.

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[12]
A. Beer, D. Kazempour and T. Seidl.
Rock - Let the points roam to their clusters themselves.
22nd International Conference on Extending Database Technology (EDBT 2019). Lisbon, Portugal, Mar 26-29, 2019. PDF.
Abstract

In this work we present Rock, a method where the points roam to their clusters using k-NN. Rock is a draft for an algorithm which is capable of detecting non-convex clusters of arbitrary dimension while delivering representatives for each cluster similar to, e.g., Mean Shift or k-Means. Applying Rock, points roam to the mean of their k-NN while k increments in every step. Like that, rather outlying points and noise move to their nearest cluster while the clusters themselves contract first to their skeletons and further to a representative point each. Our empirical results on synthetic and real data demonstrate that Rock is able to detect clusters on datasets where either mode seeking or density-based approaches do not succeed.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[11]
D. Kazempour, L. Krombholz, P. Kröger and T. Seidl.
A Galaxy of Correlations - Detecting Linear Correlated Clusters through k-Tuples Sampling using Parameter Space Transform.
22nd International Conference on Extending Database Technology (EDBT 2019). Lisbon, Portugal, Mar 26-29, 2019. PDF.
Abstract

In different research domains conducted experiments aim for the detection of (hyper)linear correlations among multiple features within a given data set. For this purpose methods exist where one among them is highly robust against noise and detects linear correlated clusters regardless of any locality assumption. This method is based on parameter space transformation. The cur- rently available parameter transform based algorithms detect the clusters scanning explicitly for intersections of functions in pa- rameter space. This approach comes with drawbacks. It is difficult to analyze aspects going beyond the sole intersection of func- tions, such as e.g. the area around the intersections and further it is computationally expensive. The work in progress method we provide here overcomes the mentioned drawbacks by sampling d-dimensional tuples in data space, generating a (hyper)plane and representing this plane as a single point in parameter space. By this approach we no longer scan for intersection points of functions in parameter space but for dense regions of such pa- rameter vectors. By this approach in future work well established clustering algorithms can be applied in parameter space to detect e.g. dense regions, modes or hierarchies of linear correlations in parameter space.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[10]
D. Kazempour and T. Seidl.
Insights into a running clockwork: On interactive process-aware clustering.
22nd International Conference on Extending Database Technology (EDBT 2019). Lisbon, Portugal, Mar 26-29, 2019. PDF.
Abstract

In recent years the demand for having algorithms which provide not only their results, but also add explainability up to a certain extent increased. In this paper we envision a class of clustering algorithms where the users can interact not only with the input or output but also intercept within the very clustering process itself, which we coin with the term process-aware clustering. Further we aspire to sketch the challenges emerging with such type of algorithms, such as the need of adequate measures which evaluate the progression through the computation process of a clustering method. Beyond the explainability on how the results are generated, we propose methods tailored at systematically analyzing the hyperparameter space of an algorithm, determining in a more ordered fashion suitable hyperparameters rather then applying a trial-and-error schema.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[9]
D. Kazempour, M. Kazakov, P. Kröger and T. Seidl.
DICE: Density-based Interactive Clustering and Exploration.
18th Symposium of Database Systems for Business, Technology and Web (BTW 2019). Rostock, Germany, Mar 04-08, 2019. DOI.
Abstract

Clustering algorithms are mostly following the pipeline to provide input data, and hyperparameter values. Then the algorithms are executed and the output files are generated or visualized. We provide in our work an early prototype of an interactive density-based clustering tool named DICE in which the users can change the hyperparameter settings and immediately observe the resulting clusters. Further the users can browse through each of the single detected clusters and get statistics regarding as well as a convex hull profile for each cluster. Further DICE keeps track of the chosen settings, enabling the user to review which hyperparameter values have been previously chosen. DICE can not only be used in scientific context of analyzing data, but also in didactic settings in which students can learn in an exploratory fashion how a density-based clustering algorithm like e.g. DBSCAN behaves.

MCML Authors
Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

Link to Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining


[8]
P. Probst, A.-L. Boulesteix and B. Bischl.
Tunability: Importance of Hyperparameters of Machine Learning Algorithms.
Journal of Machine Learning Research 20 (Mar. 2019). PDF.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science


[7]
C. Happ, F. Scheipl, A. A. Gabriel and S. Greven.
A general framework for multivariate functional principal component analysis of amplitude and phase variation.
Stat 8.2 (Feb. 2019). DOI.
Abstract

Functional data typically contain amplitude and phase variation. In many data situations, phase variation is treated as a nuisance effect and is removed during preprocessing, although it may contain valuable information. In this note, we focus on joint principal component analysis (PCA) of amplitude and phase variation. As the space of warping functions has a complex geometric structure, one key element of the analysis is transforming the warping functions to urn:x-wiley:sta4:media:sta4220:sta4220-math-0001. We present different transformation approaches and show how they fit into a general class of transformations. This allows us to compare their strengths and limitations. In the context of PCA, our results offer arguments in favour of the centred log-ratio transformation. We further embed two existing approaches from the literature for joint PCA of amplitude and phase variation into the framework of multivariate functional PCA, where we study the properties of the estimators based on an appropriate metric. The approach is illustrated through an application from seismology.

MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[6]
M. Binder, S. Dandl and J. Moosbauer.
mosmafs: Multi-Objective Simultaneous Model and Feature Selection. R package.
2019. GitHub.
MCML Authors

[5]
J. Goldsmith, F. Scheipl, L. Huang, J. Wrobel, C. Di, J. Gellar, J. Harezlak, M. W. McLean, B. Swihart, L. Xiao, C. Crainiceanu and P. T. Reiss.
refund: Regression with Functional Data.
2019. URL.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis


[4]
G. Casalicchio.
On benchmark experiments and visualization methods for the evaluation and interpretation of machine learning models.
Dissertation 2019. DOI.
Abstract

This cumulative dissertation consists of five articles divided into three parts. The first part extends the mlr package in R to implement and benchmark multilabel classification methods. The second part focuses on simplifying benchmark experiments with OpenML.org, introducing the OpenML R package and the OpenML100 benchmarking suite for standardized dataset and result management. The third part addresses model evaluation and interpretability, proposing the residual-based predictiveness (RBP) curve to improve upon the predictiveness curve and introducing new visualization tools, including the Shapley feature importance (SFIMP) measure for model interpretation. (Shortened.)

MCML Authors
Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science


[3]
J. Thomas.
Gradient boosting in automatic machine learning: feature selection and hyperparameter optimization.
Dissertation 2019. DOI.
Abstract

This thesis focuses on automating model selection in AutoML, specifically through gradient boosting techniques like gradient tree and component-wise boosting. It addresses challenges in hyperparameter optimization using Bayesian methods, introduces a new feature selection technique, and proposes an AutoML approach that simplifies the process while maintaining accuracy. Four R packages were developed: mlrMBO for Bayesian optimization, autoxgboost for AutoML, compboost for component-wise boosting, and gamboostLSS for generalized additive models. (Shortened.)

MCML Authors

[2]
P. Probst, M. Wright and A.-L. Boulesteix.
Hyperparameters and Tuning Strategies for Random Forest.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9.3 (Jan. 2019). DOI.
Abstract

The random forest (RF) algorithm has several hyperparameters that have to be set by the user, for example, the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain, and the number of trees. In this paper, we first provide a literature review on the parameters’ influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a presenting brief overview of tuning strategies, we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


2018


[1]
J. Minkwitz, F. Scheipl, E. Binder, C. Sander, U. Hegerl and H. Himmerich.
Generalised functional additive models for brain arousal state dynamics (Poster).
20th International Pharmaco-EEG Society for Preclinical and Clinical Electrophysiological Brain Research Meeting (IPEG 2018). Zurich, Switzerland, Nov 21-25, 2018. DOI.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis