MCML researchers with 13 papers at ICLR 2024

A3 | Computational Models
→ Group Volker Tresp

S. Chen, Z. Han, B. He, Z. Ding, W. Yu, P. Torr, V. Tresp and J. Gu.
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?.
Workshop on Secure and Trustworthy Large Language Models (SeT LLM 2024) at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.

Abstract

Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods.

MCML Authors

Shuo Chen

Database Systems & Data Mining

Zifeng Ding

Database Systems & Data Mining

A3 | Computational Models
→ Group Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

A3 | Computational Models
→ Group Niki Kilbertus

S. d'Ascoli, S. Becker, P. Schwaller, A. Mathis and N. Kilbertus.
ODEFormer: Symbolic Regression of Dynamical Systems with Transformers.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL. GitHub.

Abstract

We introduce ODEFormer, the first transformer able to infer multidimensional ordinary differential equation (ODE) systems in symbolic form from the observation of a single solution trajectory. We perform extensive evaluations on two datasets: (i) the existing ‘Strogatz’ dataset featuring two-dimensional systems; (ii) ODEBench, a collection of one- to four-dimensional systems that we carefully curated from the literature to provide a more holistic benchmark. ODEFormer consistently outperforms existing methods while displaying substantially improved robustness to noisy and irregularly sampled observations, as well as faster inference.

MCML Authors

Sören Becker

Ethics in Systems Design and Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

B1 | Computer Vision
→ Group Zeynep Akata

L. Eyring, D. Klein, T. Palla, N. Kilbertus, Z. Akata and F. J. Theis.
Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.

Abstract

In optimal transport (OT), a Monge map is known as a mapping that transports a source distribution to a target distribution in the most cost-efficient way. Recently, multiple neural estimators for Monge maps have been developed and applied in diverse unpaired domain translation tasks, e.g. in single-cell biology and computer vision. However, the classic OT framework enforces mass conservation, which makes it prone to outliers and limits its applicability in real-world scenarios. The latter can be particularly harmful in OT domain translation tasks, where the relative position of a sample within a distribution is explicitly taken into account. While unbalanced OT tackles this challenge in the discrete setting, its integration into neural Monge map estimators has received limited attention. We propose a theoretically grounded method to incorporate unbalancedness into any Monge map estimator. We improve existing estimators to model cell trajectories over time and to predict cellular responses to perturbations. Moreover, our approach seamlessly integrates with the OT flow matching (OT-FM) framework. While we show that OT-FM performs competitively in image translation, we further improve performance by incorporating unbalancedness (UOT-FM), which better preserves relevant features. We hence establish UOT-FM as a principled method for unpaired image translation.

MCML Authors

Luca Eyring

Interpretable and Reliable Machine Learning

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

B1 | Computer Vision

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

C2 | Biology

D. Frauen, F. Imrie, A. Curth, V. Melnychuk, S. Feuerriegel and M. van der Schaar.
A Neural Framework for Generalized Causal Sensitivity Analysis.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.

Abstract

Unobserved confounding is common in many applications, making causal inference from observational data challenging. As a remedy, causal sensitivity analysis is an important tool to draw causal conclusions under unobserved confounding with mathematical guarantees. In this paper, we propose NeuralCSA, a neural framework for generalized causal sensitivity analysis. Unlike previous work, our framework is compatible with (i) a large class of sensitivity models, including the marginal sensitivity model, -sensitivity models, and Rosenbaum’s sensitivity model; (ii) different treatment types (i.e., binary and continuous); and (iii) different causal queries, including (conditional) average treatment effects and simultaneous effects on multiple outcomes. This generality is achieved by learning a latent distribution shift that corresponds to a treatment intervention using two conditional normalizing flows. We provide theoretical guarantees that NeuralCSA is able to infer valid bounds on the causal query of interest and also demonstrate this empirically using both simulated and real-world data.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

K. Hess, V. Melnychuk, D. Frauen and S. Feuerriegel.
Bayesian Neural Controlled Differential Equations for Treatment Effect Estimation.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.

Abstract

Treatment effect estimation in continuous time is crucial for personalized medicine. However, existing methods for this task are limited to point estimates of the potential outcomes, whereas uncertainty estimates have been ignored. Needless to say, uncertainty quantification is crucial for reliable decision-making in medical applications. To fill this gap, we propose a novel Bayesian neural controlled differential equation (BNCDE) for treatment effect estimation in continuous time. In our BNCDE, the time dimension is modeled through a coupled system of neural controlled differential equations and neural stochastic differential equations, where the neural stochastic differential equations allow for tractable variational Bayesian inference. Thereby, for an assigned sequence of treatments, our BNCDE provides meaningful posterior predictive distributions of the potential outcomes. To the best of our knowledge, ours is the first tailored neural method to provide uncertainty estimates of treatment effects in continuous time. As such, our method is of direct practical value for promoting reliable decision-making in medicine.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Dennis Frauen

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

A1 | Statistical Foundations & Explainability

R. Kohli, M. Feurer, B. Bischl, K. Eggensperger and F. Hutter.
Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning.
Workshop on Data-centric Machine Learning Research (DMLR 2024) at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.

Abstract

Data in tabular form makes up a large part of real-world ML applications, and thus, there has been a strong interest in developing novel deep learning (DL) architectures for supervised learning on tabular data in recent years. As a result, there is a debate as to whether DL methods are superior to the ubiquitous ensembles of boosted decision trees. Typically, the advantage of one model class over the other is claimed based on an empirical evaluation, where different variations of both model classes are compared on a set of benchmark datasets that supposedly resemble relevant real-world tabular data. While the landscape of state-of- the-art models for tabular data changed, one factor has remained largely constant over the years: The datasets. Here, we examine 30 recent publications and 187 different datasets they use, in terms of age, study size and relevance. We found that the average study used less than 10 datasets and that half of the datasets are older than 20 years. Our insights raise questions about the conclusions drawn from previous studies and urge the research community to develop and publish additional recent, challenging and relevant datasets and ML tasks for supervised learning on tabular data.

MCML Authors

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

C. Koke and D. Cremers.
HoloNets: Spectral Convolutions do extend to Directed Graphs.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.

Abstract

Within the graph learning community, conventional wisdom dictates that spectral convolutional networks may only be deployed on undirected graphs: Only there could the existence of a well-defined graph Fourier transform be guaranteed, so that information may be translated between spatial- and spectral domains. Here we show this traditional reliance on the graph Fourier transform to be superfluous and – making use of certain advanced tools from complex analysis and spectral theory – extend spectral convolutions to directed graphs. We provide a frequency-response interpretation of newly developed filters, investigate the influence of the basis used to express filters and discuss the interplay with characteristic operators on which networks are based. In order to thoroughly test the developed theory, we conduct experiments in real world settings, showcasing that directed spectral convolutional networks provide new state of the art results for heterophilic node classification on many datasets and – as opposed to baselines – may be rendered stable to resolution-scale varying topological perturbations.

MCML Authors

Christian Koke

Computer Vision & Artificial Intelligence

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

B1 | Computer Vision

C. Liu, C. Albrecht, Y. Wang and X. Zhu.
CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation.
2nd Workshop Machine Learning for Remote Sensing (ML4RS 2024) at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. PDF.

Abstract

We study the potential of noisy labels y to pretrain semantic segmentation models in a multi-modal learning framework for geospatial applications. Specifically, we propose a novel Cross-modal Sample Selection method (CromSS) that utilizes the class distributions P^{(d)}(x,c) over pixels x and classes c modelled by multiple sensors/modalities d of a given geospatial scene. Consistency of predictions across sensors d is jointly informed by the entropy of P^{(d)}(x,c). Noisy label sampling we determine by the confidence of each sensor d in the noisy class label, P^{(d)}(x,c=y(x)). To verify the performance of our approach, we conduct experiments with Sentinel-1 (radar) and Sentinel-2 (optical) satellite imagery from the globally-sampled SSL4EO-S12 dataset. We pair those scenes with 9-class noisy labels sourced from the Google Dynamic World project for pretraining. Transfer learning evaluations (downstream task) on the DFC2020 dataset confirm the effectiveness of the proposed method for remote sensing image segmentation.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

C3 | Physics and Geo Sciences

V. Melnychuk, D. Frauen and S. Feuerriegel.
Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.

Abstract

State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic refutation framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATE is non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose a neural refutation framework which performs partial identification of CATE or, equivalently, aims at estimating lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our refutation framework is of direct relevance in practice where the validity of CATE estimation is of importance.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Dennis Frauen

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

M. Schröder, D. Frauen and S. Feuerriegel.
Causal Fairness under Unobserved Confounding: A Neural Sensitivity Framework.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.

Abstract

Fairness of machine learning predictions is widely required in practice for legal, ethical, and societal reasons. Existing work typically focuses on settings without unobserved confounding, even though unobserved confounding can lead to severe violations of causal fairness and, thus, unfair predictions. In this work, we analyze the sensitivity of causal fairness to unobserved confounding. Our contributions are three-fold. First, we derive bounds for causal fairness metrics under different sources of unobserved confounding. This enables practitioners to examine the sensitivity of their machine learning models to unobserved confounding in fairness-critical applications. Second, we propose a novel neural framework for learning fair predictions, which allows us to offer worst-case guarantees of the extent to which causal fairness can be violated due to unobserved confounding. Third, we demonstrate the effectiveness of our framework in a series of experiments, including a real-world case study about predicting prison sentences. To the best of our knowledge, ours is the first work to study causal fairness under unobserved confounding. To this end, our work is of direct practical value as a refutation strategy to ensure the fairness of predictions in high-stakes applications.

MCML Authors

Maresa Schröder

Artificial Intelligence in Management

Dennis Frauen

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

S. Solonets, D. Sinitsyn, L. Von Stumberg, N. Araslanov and D. Cremers.
An Analytical Solution to Gauss-Newton Loss for Direct Image Alignment.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL.

Abstract

Direct image alignment is a widely used technique for relative 6DoF pose estimation between two images, but its accuracy strongly depends on pose initialization. Therefore, recent end-to-end frameworks increase the convergence basin of the learned feature descriptors with special training objectives, such as the Gauss-Newton loss. However, the training data may exhibit bias toward a specific type of motion and pose initialization, thus limiting the generalization of these methods. In this work, we derive a closed-form solution to the expected optimum of the Gauss-Newton loss. The solution is agnostic to the underlying feature representation and allows us to dynamically adjust the basin of convergence according to our assumptions about the uncertainty in the current estimates. These properties allow for effective control over the convergence in the alignment process. Despite using self-supervised feature embeddings, our solution achieves compelling accuracy w.r.t. the state-of-the-art direct image alignment methods trained end-to-end with pose supervision, and demonstrates improved robustness to pose initialization. Our analytical solution exposes some inherent limitations of end-to-end learning with the Gauss-Newton loss, and establishes an intriguing connection between direct image alignment and feature-matching approaches.

MCML Authors

Sergei Solonets

Computer Vision & Artificial Intelligence

Daniil Sinitsyn

Computer Vision & Artificial Intelligence

Nikita Araslanov

Dr.

Computer Vision & Artificial Intelligence

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

B1 | Computer Vision

A. Vahidi, S. Schosser, L. Wimmer, Y. Li, B. Bischl, E. Hüllermeier and M. Rezaei.
Probabilistic Self-supervised Learning via Scoring Rules Minimization.
12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL. GitHub.

Abstract

In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN’s convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method’s optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.

MCML Authors

Lisa Wimmer

Statistical Learning & Data Science

Yawei Li

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning