10.07.2025

MCML at ICML 2025: 25 Accepted Papers (20 Main, and 5 Workshops)

42nd International Conference on Machine Learning (ICML 2025). Vancouver, Canada, 13.07.2025–19.07.2025

We are happy to announce that MCML researchers have contributed a total of 25 papers to ICML 2025: 20 Main, and 5 Workshop papers. Congrats to our researchers!

Main Track (20 papers)

W. Durani • T. Nitzl • C. Plant • C. Böhm
Weakly Supervised Anomaly Detection via Dual-Tailed Kernel.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. URL

Abstract

Detecting anomalies with limited supervision is challenging due to the scarcity of labeled anomalies, which often fail to capture the diversity of abnormal behaviors. We propose Weakly Supervised Anomaly Detection via Dual-Tailed Kernel (WSAD-DT), a novel framework that learns robust latent representations to distinctly separate anomalies from normal samples under weak supervision. WSAD-DT introduces two centroids—one for normal samples and one for anomalies—and leverages a dual-tailed kernel scheme: a light-tailed kernel to compactly model in-class points and a heavy-tailed kernel to main- tain a wider margin against out-of-class instances. To preserve intra-class diversity, WSAD-DT in- corporates kernel-based regularization, encouraging richer representations within each class. Furthermore, we devise an ensemble strategy that partition unlabeled data into diverse subsets, while sharing the limited labeled anomalies among these partitions to maximize their impact. Empirically, WSAD-DT achieves state-of-the-art performance on several challenging anomaly detection benchmarks, outperforming leading ensemble-based methods such as XGBOD.

MCML Authors

Walid Durani

→ Group Thomas Seidl
Database Systems, Data Mining and AI

P. Fatemi • E. Sharifian • M. H. Yassaee
A New Approach to Backtracking Counterfactual Explanations: A Unified Causal Framework for Efficient Model Interpretability.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Counterfactual explanations enhance interpretability by identifying alternative inputs that produce different outputs, offering localized insights into model decisions. However, traditional methods often neglect causal relationships, leading to unrealistic examples. While newer approaches integrate causality, they are computationally expensive. To address these challenges, we propose an efficient method called BRACE based on backtracking counterfactuals that incorporates causal reasoning to generate actionable explanations. We first examine the limitations of existing methods and then introduce our novel approach and its features. We also explore the relationship between our method and previous techniques, demonstrating that it generalizes them in specific scenarios. Finally, experiments show that our method provides deeper insights into model outputs.

MCML Authors

Pouria Fatemi

→ Group Suvrit Sra
Resource Aware Machine Learning

X. Feng • Z. Jiang • T. Kaufmann • E. Hüllermeier • P. Weng • Y. Zhu
Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. URL

Abstract

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. However, traditional methods based on pairwise trajectory comparisons face notable challenges, including the difficulty in comparing trajectories with subtle differences and the limitation of conveying only ordinal information, limiting direct inference of preference strength. In this paper, we introduce a novel distinguishability query, allowing humans to express preference strength by comparing two pairs of trajectories. Labelers first indicate which pair is easier to compare, then provide preference feedback only on the easier pair. Our proposed query type directly captures preference strength and is expected to reduce the cognitive load on the labeler. We further connect this query to cardinal utility and difference relations and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results demonstrate the potential of our method for faster, data-efficient learning and improved user-friendliness in RLHF benchmarks, particularly in classical control settings where preference strength is critical for expected utility maximization.

MCML Authors

Timo Kaufmann

→ Group Eyke Hüllermeier
Artificial Intelligence and Machine Learning

Eyke Hüllermeier

Prof. Dr.

Principal Investigator

Artificial Intelligence and Machine Learning

U. Fischer Abaigar • C. Kern • J. Perdomo
The Value of Prediction in Identifying the Worst-Off.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Spotlight Presentation. Outstanding Paper Award. To be published. Preprint available. arXiv

Abstract

Machine learning is increasingly used in government programs to identify and support the most vulnerable individuals, prioritizing assistance for those at greatest risk over optimizing aggregate outcomes. This paper examines the welfare impacts of prediction in equity-driven contexts, and how they compare to other policy levers, such as expanding bureaucratic capacity. Through mathematical models and a real-world case study on long-term unemployment amongst German residents, we develop a comprehensive understanding of the relative effectiveness of prediction in surfacing the worst-off. Our findings provide clear analytical frameworks and practical, data-driven tools that empower policymakers to make principled decisions when designing these systems.

MCML Authors

Unai Fischer Abaigar

→ Group Christoph Kern
Social Data Science and AI
→ Co-Group Frauke Kreuter

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI

S. Karnik • A. Veselovska • M. Iwen • F. Krahmer
Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.

MCML Authors

Anna Veselovska

Dr.

→ Group Massimo Fornasier
Applied Numerical Analysis

Felix Krahmer

Prof. Dr.

Principal Investigator

Optimization & Data Analysis

F. Kreuter
Adaptive Alignment: Designing AI for a Changing World - Frauke Kreuter.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Invited Talk. URL

Abstract

As artificial intelligence systems become deeply embedded in our institutions, economies, and personal lives, the challenge of alignment—ensuring AI acts in accordance with human values and societal norms—has become both urgent and complex. But what exactly should these systems be aligned to—and how do we know we’re getting it right? To address this, we turn to a long-standing body of work: how societies have historically measured public preferences and moral norms—and what often goes wrong in the process. The talk will introduce underutilized datasets—from decades of survey archives to international value studies—that could serve as empirical benchmarks for aligning AI systems with lived human norms. In addition to highlighting valuable data sources, we will examine how lessons from social science can inform the design of human feedback loops in AI. These insights help avoid common pitfalls in capturing human intentions and preferences—such as measurement error, framing effects, and unrepresentative sampling—that have plagued opinion research for decades. We’ll close by addressing the fluid and evolving nature of societal norms, emphasizing the need for alignment strategies that are adaptive to cultural and temporal change. Achieving this kind of adaptability requires not just better data, but durable collaborations between social scientists and machine learning researchers—so that updates to human values can be continuously reflected in system design. The goal is to provoke a deeper, interdisciplinary conversation about what it truly means to align AI with human values—and how to do so responsibly, reliably, and at scale.

MCML Authors

Frauke Kreuter

Prof. Dr.

Principal Investigator

Social Data Science and AI

W. Lai • A. Fraser • I. Titov
Joint Localization and Activation Editing for Low-Resource Fine-Tuning.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, are commonly used to adapt LLMs. However, the effectiveness of standard PEFT methods is limited in low-resource scenarios with only a few hundred examples. Recent advances in interpretability research have inspired the emergence of activation editing techniques, which modify the activations of specific model components. These methods, due to their extremely small parameter counts, show promise for small datasets. However, their performance is highly dependent on identifying the correct modules to edit and often lacks stability across different datasets. In this paper, we propose Joint Localization and Activation Editing (JoLA), a method that jointly learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves - the vectors applied as additive offsets or multiplicative scalings to the head output. Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods.

MCML Authors

Wen Lai

→ Group Alexander Fraser
Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Principal Investigator

Data Analytics & Statistics

M. Lienen • A. Saydemir • S. Günnemann
UnHiPPO: Uncertainty-aware Initialization for State Space Models.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

State space models are emerging as a dominant model class for sequence problems with many relying on the HiPPO framework to initialize their dynamics. However, HiPPO fundamentally assumes data to be noise-free; an assumption often violated in practice. We extend the HiPPO theory with measurement noise and derive an uncertainty-aware initialization for state space model dynamics. In our analysis, we interpret HiPPO as a linear stochastic control problem where the data enters as a noise-free control signal. We then reformulate the problem so that the data become noisy outputs of a latent system and arrive at an alternative dynamics initialization that infers the posterior of this latent system from the data without increasing runtime. Our experiments show that our initialization improves the resistance of state-space models to noise both at training and inference time.

MCML Authors

Stephan Günnemann

Prof. Dr.

Principal Investigator

Data Analytics & Machine Learning

A. Modarressi • H. Deilamsalehy • F. Dernoncourt • T. Bui • R. A. Rossi • S. Yoon • H. Schütze
NoLiMa: Long-Context Evaluation Beyond Literal Matching.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves retrieving a ’needle’ (relevant information) from a ‘haystack’ (long irrelevant context). Extensions of this approach include increasing distractors, fact chaining, and in-context reasoning. However, in these benchmarks, models can exploit existing literal matches between the needle and haystack to simplify the task. To address this, we introduce NoLiMa, a benchmark extending NIAH with a carefully designed needle set, where questions and needles have minimal lexical overlap, requiring models to infer latent associations to locate the needle within the haystack. We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (<1K), performance degrades significantly as context length increases. At 32K, for instance, 10 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%. Our analysis suggests these declines stem from the increased difficulty the attention mechanism faces in longer contexts when literal matches are absent, making it harder to retrieve relevant information.

MCML Authors

Ali Modarressi

→ Group Hinrich Schütze
Computational Linguistics

Hinrich Schütze

Prof. Dr.

Principal Investigator

Computational Linguistics

S. Müller • A. Reuter • N. Hollmann • D. Rügamer • F. Hutter
Position: The Future of Bayesian Prediction Is Prior-Fitted.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Preprint. arXiv

Abstract

Training neural networks on randomly generated artificial datasets yields Bayesian models that capture the prior defined by the dataset-generating distribution. Prior-data Fitted Networks (PFNs) are a class of methods designed to leverage this insight. In an era of rapidly increasing computational resources for pre-training and a near stagnation in the generation of new real-world data in many applications, PFNs are poised to play a more important role across a wide range of applications. They enable the efficient allocation of pre-training compute to low-data scenarios. Originally applied to small Bayesian modeling tasks, the field of PFNs has significantly expanded to address more complex domains and larger datasets. This position paper argues that PFNs and other amortized inference approaches represent the future of Bayesian inference, leveraging amortized learning to tackle data-scarce problems. We thus believe they are a fruitful area of research. In this position paper, we explore their potential and directions to address their current limitations.

MCML Authors

David Rügamer

Prof. Dr.

Principal Investigator

Statistics, Data Science and Machine Learning

D. A. Nguyen • E. Araya • A. Fono • G. Kutyniok
Time to Spike? Understanding the Representational Power of Spiking Neural Networks in Discrete Time.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Recent years have seen significant progress in developing spiking neural networks (SNNs) as a potential solution to the energy challenges posed by conventional artificial neural networks (ANNs). However, our theoretical understanding of SNNs remains relatively limited compared to the ever-growing body of literature on ANNs. In this paper, we study a discrete-time model of SNNs based on leaky integrate-and-fire (LIF) neurons, referred to as discrete-time LIF-SNNs, a widely used framework that still lacks solid theoretical foundations. We demonstrate that discrete-time LIF-SNNs with static inputs and outputs realize piecewise constant functions defined on polyhedral regions, and more importantly, we quantify the network size required to approximate continuous functions. Moreover, we investigate the impact of latency (number of time steps) and depth (number of layers) on the complexity of the input space partitioning induced by discrete-time LIF-SNNs. Our analysis highlights the importance of latency and contrasts these networks with ANNs employing piecewise linear activation functions. Finally, we present numerical experiments to support our theoretical findings.

MCML Authors

Adalbert Fono

→ Group Gitta Kutyniok
Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Principal Investigator

Mathematical Foundations of Artificial Intelligence

T. Pielok • B. Bischl • D. Rügamer
Revisiting Unbiased Implicit Variational Inference.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Recent years have witnessed growing interest in semi-implicit variational inference (SIVI) methods due to their ability to rapidly generate samples from highly complicated distributions. However, since the likelihood of these samples is non-trivial to estimate in high dimensions, current research focuses on finding effective SIVI training routines. While unbiased implicit variational inference (UIVI) has largely been dismissed as imprecise and computationally prohibitive because of its inner MCMC loop, we revisit this method and identify key shortcomings. In particular, we show that UIVI’s MCMC loop can be effectively replaced via importance sampling and the optimal proposal distribution can be learned stably by minimizing an expected forward Kullback–Leibler divergence without bias. Our refined approach demonstrates superior performance or parity with state-of-the-art methods on established SIVI benchmarks.

MCML Authors

Tobias Pielok

→ Group Bernd Bischl
Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Director

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Principal Investigator

Statistics, Data Science and Machine Learning

A. Reuter • T. G. J. Rudner • V. Fortuin • D. Rügamer
Can Transformers Learn Full Bayesian Inference in Context?
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv URL

Abstract

Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context – without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows which enables us to infer complex posterior distributions for methods such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods not operating in context.

MCML Authors

Vincent Fortuin

Dr.

Associate

Bayesian Deep Learning

David Rügamer

Prof. Dr.

Principal Investigator

Statistics, Data Science and Machine Learning

R. Schulte • D. Rügamer • T. Nagler
Adjustment for Confounding using Pre-Trained Representations.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

There is growing interest in extending average treatment effect (ATE) estimation to incorporate non-tabular data, such as images and text, which may act as sources of confounding. Neglecting these effects risks biased results and flawed scientific conclusions. However, incorporating non-tabular data necessitates sophisticated feature extractors, often in combination with ideas of transfer learning. In this work, we investigate how latent features from pre-trained neural networks can be leveraged to adjust for sources of confounding. We formalize conditions under which these latent features enable valid adjustment and statistical inference in ATE estimation, demonstrating results along the example of double machine learning. In this context, we also discuss critical challenges inherent to latent feature learning and downstream parameter estimation using those. As our results are agnostic to the considered data modality, they represent an important first step towards a theoretical foundation for the usage of latent representation from foundation models in ATE estimation.

MCML Authors

Rickmer Schulte

→ Group David Rügamer
Statistics, Data Science and Machine Learning

David Rügamer

Prof. Dr.

Principal Investigator

Statistics, Data Science and Machine Learning

Thomas Nagler

Prof. Dr.

Principal Investigator

Computational Statistics & Data Science

J. Schweisthal • D. Frauen • M. Schröder • K. Heß • N. Kilbertus • S. Feuerriegel
Learning Representations of Instruments for Partial Identification of Treatment Effects.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Reliable estimation of treatment effects from observational data is important in many disciplines such as medicine. However, estimation is challenging when unconfoundedness as a standard assumption in the causal inference literature is violated. In this work, we leverage arbitrary (potentially high-dimensional) instruments to estimate bounds on the conditional average treatment effect (CATE). Our contributions are three-fold: (1) We propose a novel approach for partial identification through a mapping of instruments to a discrete representation space so that we yield valid bounds on the CATE. This is crucial for reliable decision-making in real-world applications. (2) We derive a two-step procedure that learns tight bounds using a tailored neural partitioning of the latent instrument space. As a result, we avoid instability issues due to numerical approximations or adversarial training. Furthermore, our procedure aims to reduce the estimation variance in finite-sample settings to yield more reliable estimates. (3) We show theoretically that our procedure obtains valid bounds while reducing estimation variance. We further perform extensive experiments to demonstrate the effectiveness across various settings. Overall, our procedure offers a novel path for practitioners to make use of potentially high-dimensional instruments (e.g., as in Mendelian randomization).

MCML Authors

Jonas Schweisthal

→ Group Stefan Feuerriegel
Artificial Intelligence in Management

Dennis Frauen

→ Group Stefan Feuerriegel
Artificial Intelligence in Management

Maresa Schröder

→ Group Stefan Feuerriegel
Artificial Intelligence in Management

Konstantin Heß

→ Group Stefan Feuerriegel
Artificial Intelligence in Management

Niki Kilbertus

Prof. Dr.

Principal Investigator

Ethics in Systems Design and Machine Learning

Stefan Feuerriegel

Prof. Dr.

Principal Investigator

Artificial Intelligence in Management

A. Soleymani • B. Tahmasebi • S. Jegelka • P. Jaillet
Learning with Exact Invariances in Polynomial Time.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

We study the statistical-computational trade-offs for learning with exact invariances (or symmetries) using kernel regression. Traditional methods, such as data augmentation, group averaging, canonicalization, and frame-averaging, either fail to provide a polynomial-time solution or are not applicable in the kernel setting. However, with oracle access to the geometric properties of the input space, we propose a polynomial-time algorithm that learns a classifier with emph{exact} invariances. Moreover, our approach achieves the same excess population risk (or generalization error) as the original kernel regression problem. To the best of our knowledge, this is the first polynomial-time algorithm to achieve exact (not approximate) invariances in this context. Our proof leverages tools from differential geometry, spectral theory, and optimization. A key result in our development is a new reformulation of the problem of learning under invariances as optimizing an infinite number of linearly constrained convex quadratic programs, which may be of independent interest.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Principal Investigator

Foundations of Deep Neural Networks

L. Thede • K. Roth • M. Bethge • Z. Akata • T. Hartvigsen
WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Preprint. arXiv

Abstract

Keeping large language models factually up-to-date is crucial for deployment, yet costly retraining remains a challenge. Knowledge editing offers a promising alternative, but methods are only tested on small-scale or synthetic edit benchmarks. In this work, we aim to bridge research into lifelong knowledge editing to real-world edits at practically relevant scale. We first introduce WikiBigEdit; a large-scale benchmark of real-world Wikidata edits, built to automatically extend lifelong for future-proof benchmarking. In its first instance, it includes over 500K question-answer pairs for knowledge editing alongside a comprehensive evaluation pipeline. Finally, we use WikiBigEdit to study existing knowledge editing techniques’ ability to incorporate large volumes of real-world facts and contrast their capabilities to generic modification techniques such as retrieval augmentation and continual finetuning to acquire a complete picture of the practical extent of current lifelong knowledge editing.

MCML Authors

Karsten Roth

→ Group Zeynep Akata
Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Principal Investigator

Interpretable and Reliable Machine Learning

D. Tramontano • Y. Kivva • S. Salehkaleybar • N. Kiyavash • M. Drton
Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. URL

Abstract

This paper investigates causal effect identification in latent variable Linear Non-Gaussian Acyclic Models (lvLiNGAM) using higher-order cumulants, addressing two prominent setups that are challenging in the presence of latent confounding: (1) a single proxy variable that may causally influence the treatment and (2) underspecified instrumental variable cases where fewer instruments exist than treatments. We prove that causal effects are identifiable with a single proxy or instrument and provide corresponding estimation methods. Experimental results demonstrate the accuracy and robustness of our approaches compared to existing methods, advancing the theoretical and practical understanding of causal inference in linear systems with latent confounders.

MCML Authors

Daniele Tramontano

→ Group Mathias Drton
Mathematical Statistics

Mathias Drton

Prof. Dr.

Principal Investigator

Mathematical Statistics

A. Uselis • A. Dittadi • S. J. Oh
Does Data Scaling Lead to Visual Compositional Generalization?
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL GitHub

Abstract

Compositional understanding is crucial for human intelligence, yet it remains unclear whether contemporary vision models exhibit it. The dominant machine learning paradigm is built on the premise that scaling data and model sizes will improve out-of-distribution performance, including compositional generalization. We test this premise through controlled experiments that systematically vary data scale, concept diversity, and combination coverage. We find that compositional generalization is driven by data diversity, not mere data scale. Increased combinatorial coverage forces models to discover a linearly factored representational structure, where concepts decompose into additive components. We prove this structure is key to efficiency, enabling perfect generalization from few observed combinations. Evaluating pretrained models (DINO, CLIP), we find above-random yet imperfect performance, suggesting partial presence of this structure. Our work motivates stronger emphasis on constructing diverse datasets for compositional generalization, and considering the importance of representational structure that enables efficient compositional learning.

MCML Authors

Andrea Dittadi

Dr.

→ Group Stefan Bauer
Algorithmic Machine Learning & Explainable AI

J. Zausinger • L. Pennig • A. Kozina • S. Sdahl • J. Sikora • A. Dendorfer • T. Kuznetsov • M. Hagog • N. Wiedemann • K. Chlodny • V. Limbach • A. Ketteler • T. Prein • V. M. Singh • M. M. Danziger • J. Born
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL GitHub

Abstract

While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially arithmetic. One fundamental limitation is the nature of the Cross Entropy loss, which assumes a nominal scale and thus cannot convey proximity between generated number tokens. In response, we here present a regression-like loss that operates purely on token level. Our proposed Number Token Loss (NTL) comes in two flavors and minimizes either the norm or the Wasserstein distance between the numerical values of the real and predicted number tokens. NTL can easily be added to any language model and extend the Cross Entropy objective during training without runtime overhead. We evaluate the proposed scheme on various mathematical datasets and find that it consistently improves performance in math-related tasks. In a direct comparison on a regression task, we find that NTL can match the performance of a regression head, despite operating on token level. Finally, we scale NTL up to 3B parameter models and observe improved performance, demonstrating its potential for seamless integration into LLMs. We hope that this work can inspire LLM developers to improve their pretraining objectives.

MCML Authors

Lars Pennig

→ Group Niki Kilbertus
Ethics in Systems Design and Machine Learning

Workshops (5 papers)

F. Kiwitt • B. Tahmasebi • S. Jegelka
Symmetries in Weight Space Learning: To Retain or Remove?
HiLD @ICML 2025 - Workshop on High-dimensional Learning Dynamics at the 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

Weight space learning, an emerging paradigm that seeks to understand neural networks through their space of parameters (weights), has shown promise in a variety of applications, including but not limited to predicting model behavior and addressing privacy concerns. However, weight spaces often exhibit inherent symmetries that impact both theory and practice, such as the scale and rotational invariances found in the Low-Rank Adaptation (LoRA) method, which is the state-of-the-art fine-tuning algorithm for Large Language Models (LLMs). In this work, we investigate a general weight space learning problem under symmetries, focusing on a fundamental question: What is the appropriate formulation for this problem in the presence of symmetries (such as those in LoRA), and should redundant representations that encode the same end-to-end function be removed? We address this question by fully characterizing a new space of symmetric weights, demonstrating that the relevance of redundancy depends on the function being predicted. Specifically, we show that end-to-end symmetries (such as those in LoRA) should not always be removed, as doing so may compromise the universality of the weight space learning problem. To our knowledge, this is the first time this phenomenon has been formally identified and presented, yielding insights into a broad class of weight space learning problems.

MCML Authors

Stefanie Jegelka

Prof. Dr.

Principal Investigator

Foundations of Deep Neural Networks

Z. Li • X. Han • Y. Li • N. Strauß • M. Schubert
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions.
WM @ICML 2025 - Workshop on Building Physically Plausible World Models at the 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Diffusion-based world models have demonstrated strong capabilities in synthesizing realistic long-horizon trajectories for offline reinforcement learning (RL). However, many existing methods do not directly generate actions alongside states and rewards, limiting their compatibility with standard value-based offline RL algorithms that rely on one-step temporal difference (TD) learning. While prior work has explored joint modeling of states, rewards, and actions to address this issue, such formulations often lead to increased training complexity and reduced performance in practice. Therefore, in this paper, we propose a diffusion-based world model that generates state-reward trajectories conditioned on the current state, action, and return-to-go value, and efficiently infers missing actions via an inverse dynamics model (IDM). This modular design produces complete synthetic transitions suitable for one-step TD-based offline RL, enabling effective and computationally efficient training. Empirically, we show that conservative offline RL algorithms such as TD3BC and IQL benefit significantly from training on these augmented trajectories, consistently outperforming prior diffusion-based baselines across multiple tasks in the D4RL benchmark.

MCML Authors

Zongyue Li

→ Group Matthias Schubert
Spatial Artificial Intelligence

Niklas Strauß

Dr.

→ Group Matthias Schubert
Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Principal Investigator

Spatial Artificial Intelligence

P. Spohn • L. Girrbach • J. Bader • Z. Akata
Align-then-Unlearn: Embedding Alignment for LLM Unlearning.
MUGen @ICML 2025 - Workshop on Machine Unlearning for Generative AI at the 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. URL

Abstract

MCML Authors

Leander Girrbach

→ Group Zeynep Akata
Interpretable and Reliable Machine Learning

Jessica Bader

→ Group Zeynep Akata
Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Principal Investigator

Interpretable and Reliable Machine Learning

J. von Berg • A. Fono • M. Datres • S. Maskey • G. Kutyniok
The Price of Robustness: Stable Classifiers Need Overparameterization.
HiLD @ICML 2025 - Workshop on High-dimensional Learning Dynamics at the 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. URL

Abstract

In this work, we show that class stability, the expected distance of an input to the decision boundary, captures what classical capacity measures, such as weight norms, fail to explain. We prove a generalization bound that improves inversely with the class stability, interpreted as a quantifiable notion of robustness. As a corollary, we derive a law of robustness for classification: any interpolating model with parameters must be unstable, so high stability requires significant overparameterization. Crucially, our results extend beyond smoothness assumptions and apply to discontinuous classifiers. Preliminary experiments support our theory: empirical stability increases with model size, while norm-based measures remain uninformative.

MCML Authors

Jonas von Berg

→ Group Gitta Kutyniok
Mathematical Foundations of Artificial Intelligence

Adalbert Fono

→ Group Gitta Kutyniok
Mathematical Foundations of Artificial Intelligence

Sohir Maskey

→ Group Gitta Kutyniok
Mathematical Foundations of Artificial Intelligence

Gitta Kutyniok

Prof. Dr.

Principal Investigator

Mathematical Foundations of Artificial Intelligence

L. Xu • M. Sarkar • A. I. Lonappan • Í. Zubeldia • P. Villanueva-Domingo • S. Casas • C. Fidler • C. Amancharla • U. Tiwari • A. Bayer • C. A. Ekioui • M. Cranmer • A. Dimitrov • J. Fergusson • K. Gandhi • S. Krippendorf • A. Laverick • J. Lesgourgues • A. Lewis • T. Meier • B. Sherwin • K. Surrao • F. Villaescusa-Navarro • C. Wang • X. Xu • B. Bolliet
Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery.
ML4Astro @ICML 2025 - Machine Learning for Astrophysics at the 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

We present a multi-agent system for automation of scientific research tasks, cmbagent (this https URL). The system is formed by about 30 Large Language Model (LLM) agents and implements a Planning & Control strategy to orchestrate the agentic workflow, with no human-in-the-loop at any point. Each agent specializes in a different task (performing retrieval on scientific papers and codebases, writing code, interpreting results, critiquing the output of other agents) and the system is able to execute code locally. We successfully apply cmbagent to carry out a PhD level cosmology task (the measurement of cosmological parameters using supernova data) and evaluate its performance on two benchmark sets, finding superior performance over state-of-the-art LLMs. The source code is available on GitHub, demonstration videos are also available, and the system is deployed on HuggingFace and will be available on the cloud.

MCML Authors

Thomas Meier

Dr.

Science Manager Collaborations, PhD Coordinator, and DEI Official

ICML 2025

#research #top-tier-work #akata #bauer_s #bischl #feuerriegel #fortuin #fraser #huellermeier #jegelka #kern #kilbertus #krahmer #kreuter #kutyniok #nagler #ruegamer #schubert #schuetze #seidl #sra

Subscribe to RSS News feed

20.10.2025

Björn Ommer Appointed LMU Chief AI Officer

Our PI Björn Ommer has been appointed LMU’s Chief AI Officer to strengthen AI research and collaborations.

17.10.2025

MCML at ICCV 2025: 22 Accepted Papers (19 Main, and 3 Workshops)

IEEE/CVF International Conference on Computer Vision (ICCV 2025). Honolulu, Hawaii, 19.10.2025 - 23.10.2025

16.10.2025

SIC: Making AI Image Classification Understandable

SIC by the team of Christian Wachinger at ICCV 2025: Transparent AI for intuitive, reliable, and interpretable medical image classification.

10.10.2025

Digdeep Podcast: How Do We Get Germany on Track Digitally, Cihan Sügür?

In the current episode of #digdeep, Cihan Sügür, associated partner of MHP, talks about digitalization in Germany.

09.10.2025

Rethinking AI in Public Institutions - Balancing Prediction and Capacity

Unai Fischer Abaigar explores how AI can make public decisions fairer, smarter, and more effective.

MCML at ICML 2025: 25 Accepted Papers (20 Main, and 5 Workshops)

42nd International Conference on Machine Learning (ICML 2025). Vancouver, Canada, 13.07.2025–19.07.2025

Main Track (20 papers)

Workshops (5 papers)

Related