05.12.2023

MCML researchers with 16 papers at NeurIPS 2023

37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, 10.12.2023–16.12.2023

We are happy to announce that MCML researchers are represented with 16 papers at NeurIPS 2023:

S. Chen, J. Gu, Z. Han, Y. Ma, P. Torr and V. Tresp.
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL. GitHub.

Abstract

Various adaptation methods, such as LoRA, prompts, and adapters, have been proposed to enhance the performance of pre-trained vision-language models in specific domains. As test samples in real-world applications usually differ from adaptation data, the robustness of these adaptation methods against distribution shifts are essential. In this study, we assess the robustness of 11 widely-used adaptation methods across 4 vision-language datasets under multimodal corruptions. Concretely, we introduce 7 benchmark datasets, including 96 visual and 87 textual corruptions, to investigate the robustness of different adaptation methods, the impact of available adaptation examples, and the influence of trainable parameter size during adaptation. Our analysis reveals that: 1) Adaptation methods are more sensitive to text corruptions than visual corruptions. 2) Full fine-tuning does not consistently provide the highest robustness; instead, adapters can achieve better robustness with comparable clean performance. 3) Contrary to expectations, our findings indicate that increasing the number of adaptation data and parameters does not guarantee enhanced robustness; instead, it results in even lower robustness. We hope this study could benefit future research in the development of robust multimodal adaptation methods.

MCML Authors

Shuo Chen

Database Systems & Data Mining

A3 | Computational Models
→ Group Volker Tresp

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

A3 | Computational Models
→ Group Eyke Hüllermeier

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

A3 | Computational Models

D. Frauen, V. Melnychuk and S. Feuerriegel.
Sharp Bounds for Generalized Causal Sensitivity Analysis.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

Causal inference from observational data is crucial for many disciplines such as medicine and economics. However, sharp bounds for causal effects under relaxations of the unconfoundedness assumption (causal sensitivity analysis) are subject to ongoing research. So far, works with sharp bounds are restricted to fairly simple settings (e.g., a single binary treatment). In this paper, we propose a unified framework for causal sensitivity analysis under unobserved confounding in various settings. For this, we propose a flexible generalization of the marginal sensitivity model (MSM) and then derive sharp bounds for a large class of causal effects. This includes (conditional) average treatment effects, effects for mediation analysis and path analysis, and distributional effects. Furthermore, our sensitivity model is applicable to discrete, continuous, and time-varying treatments. It allows us to interpret the partial identification problem under unobserved confounding as a distribution shift in the latent confounders while evaluating the causal effect of interest. In the special case of a single binary treatment, our bounds for (conditional) average treatment effects coincide with recent optimality results for causal sensitivity analysis. Finally, we propose a scalable algorithm to estimate our sharp bounds from observational data.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Valentyn Melnychuk

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

C4 | Computational Social Sciences

F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier and B. Hammer.
SHAP-IQ: Unified Approximation of any-order Shapley Interactions.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models.

MCML Authors

Maximilian Muschalik

Artificial Intelligence & Machine Learning

A3 | Computational Models
→ Group Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning

A3 | Computational Models

M. Ghahremani and C. Wachinger.
RegBN: Batch Normalization of Multimodal Data with Regularization.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL. GitHub.

Abstract

Recent years have witnessed a surge of interest in integrating high-dimensional data captured by multisource sensors, driven by the impressive success of neural networks in integrating multimodal data. However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted variability and bias, leading to suboptimal performance of multimodal models. Therefore, it becomes crucial to normalize the low- or high-level features extracted from data modalities before their fusion takes place. This paper introduces RegBN, a novel approach for multimodal Batch Normalization with REGularization. RegBN uses the Frobenius norm as a regularizer term to address the side effects of confounders and underlying dependencies among different data sources. The proposed method generalizes well across multiple modalities and eliminates the need for learnable parameters, simplifying training and inference. We validate the effectiveness of RegBN on eight databases from five research areas, encompassing diverse modalities such as language, audio, image, video, depth, tabular, and 3D MRI. The proposed method demonstrates broad applicability across different architectures such as multilayer perceptrons, convolutional neural networks, and vision transformers, enabling effective normalization of both low- and high-level features in multimodal neural networks.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Radiology

C1 | Medicine
→ Group Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Radiology

C1 | Medicine

C. Kümmerle and J. Maly.
Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

We propose a new algorithm for the problem of recovering data that adheres to multiple, heterogenous low-dimensional structures from linear observations. Focussing on data matrices that are simultaneously row-sparse and low-rank, we propose and analyze an iteratively reweighted least squares (IRLS) algorithm that is able to leverage both structures. In particular, it optimizes a combination of non-convex surrogates for row-sparsity and rank, a balancing of which is built into the algorithm. We prove locally quadratic convergence of the iterates to a simultaneously structured data matrix in a regime of minimal sample complexity (up to constants and a logarithmic factor), which is known to be impossible for a combination of convex surrogates. In experiments, we show that the IRLS method exhibits favorable empirical convergence, identifying simultaneously row-sparse and low-rank matrices from fewer measurements than state-of-the-art methods.

MCML Authors

Johannes Maly

Prof. Dr.

Mathematical Data Science and Artificial Intelligence

A2 | Mathematical Foundations

R. Liao, X. Jia, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
Workshop Temporal Graph Learning (TGL 2023) at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

The rapid advancements in large language models (LLMs) have ignited interest in the realm of the temporal knowledge graph (TKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex graph data structure and sequential natural expressions LLMs can handle, and between the enormous data volume of TKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and lightweight few-shot parameter-efficient instruction tuning to solve the above challenges. Extensive experiments have shown that GenTKG is a simple but effective, efficient, and generalizable approach that outperforms conventional methods on temporal relational forecasting with extremely limited computation. Our work opens a new frontier for the temporal knowledge graph domain.

MCML Authors

Ruotong Liao

Database Systems & Data Mining

A3 | Computational Models
→ Group Volker Tresp

Yunpu Ma

Dr.

Artificial Intelligence & Machine Learning

A3 | Computational Models
→ Group Eyke Hüllermeier

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

A3 | Computational Models

X. Li, E. Nie and S. Liang.
From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL.
Workshop Instruction Tuning and Instruction Following at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

The remarkable ability of Large Language Models (LLMs) to understand and follow instructions has sometimes been limited by their in-context learning (ICL) performance in low-resource languages. To address this, we introduce a novel approach that leverages cross-lingual retrieval-augmented in-context learning (CREA-ICL). By extracting semantically similar prompts from high-resource languages, we aim to bolster the zero-shot performance of multilingual pretrained language models (MPLMs) across diverse tasks. Though our approach yields steady improvements in classification tasks, it faces challenges in generation tasks, with Bangla serving as a key case study. Our evaluation offers insights into the performance dynamics of retrieval-augmented in-context learning across both classification and generation domains.

MCML Authors

Ercong Nie

Statistical NLP and Deep Learning

B2 | Natural Language Processing
→ Group Hinrich Schütze

Sheng Liang

Statistical NLP and Deep Learning

B2 | Natural Language Processing
→ Group Hinrich Schütze

S. Maskey, R. Paolino, A. Bacho and G. Kutyniok.
A Fractional Graph Laplacian Approach to Oversmoothing.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL. GitHub.

Abstract

Graph neural networks (GNNs) have shown state-of-the-art performances in various applications. However, GNNs often struggle to capture long-range dependencies in graphs due to oversmoothing. In this paper, we generalize the concept of oversmoothing from undirected to directed graphs. To this aim, we extend the notion of Dirichlet energy by considering a directed symmetrically normalized Laplacian. As vanilla graph convolutional networks are prone to oversmooth, we adopt a neural graph ODE framework. Specifically, we propose fractional graph Laplacian neural ODEs, which describe non-local dynamics. We prove that our approach allows propagating information between distant nodes while maintaining a low probability of long-distance jumps. Moreover, we show that our method is more flexible with respect to the convergence of the graph’s Dirichlet energy, thereby mitigating oversmoothing. We conduct extensive experiments on synthetic and real-world graphs, both directed and undirected, demonstrating our method’s versatility across diverse graph homophily levels.

MCML Authors

Raffaele Paolino

Mathematical Foundations of Artificial Intelligence

A2 | Mathematical Foundations
→ Group Gitta Kutyniok

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

A2 | Mathematical Foundations

V. Melnychuk, D. Frauen and S. Feuerriegel.
Partial Counterfactual Identification of Continuous Outcomes with a Curvature Sensitivity Model.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

Counterfactual inference aims to answer retrospective ‘what if’ questions and thus belongs to the most fine-grained type of inference in Pearl’s causality ladder. Existing methods for counterfactual inference with continuous outcomes aim at point identification and thus make strong and unnatural assumptions about the underlying structural causal model. In this paper, we relax these assumptions and aim at partial counterfactual identification of continuous outcomes, i.e., when the counterfactual query resides in an ignorance interval with informative bounds. We prove that, in general, the ignorance interval of the counterfactual queries has non-informative bounds, already when functions of structural causal models are continuously differentiable. As a remedy, we propose a novel sensitivity model called Curvature Sensitivity Model. This allows us to obtain informative bounds by bounding the curvature of level sets of the functions. We further show that existing point counterfactual identification methods are special cases of our Curvature Sensitivity Model when the bound of the curvature is set to zero. We then propose an implementation of our Curvature Sensitivity Model in the form of a novel deep generative model, which we call Augmented Pseudo-Invertible Decoder. Our implementation employs (i) residual normalizing flows with (ii) variational augmentations. We empirically demonstrate the effectiveness of our Augmented Pseudo-Invertible Decoder. To the best of our knowledge, ours is the first partial identification model for Markovian structural causal models with continuous outcomes.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Dennis Frauen

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

C4 | Computational Social Sciences

S. Šćepanović, I. Obadic, S. Joglekar, L. Giustarini, C. Nattero, D. Quercia and X. Zhu.
MedSat: A Public Health Dataset for England Featuring Medical Prescriptions and Satellite Imagery.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

As extreme weather events become more frequent, understanding their impact on human health becomes increasingly crucial. However, the utilization of Earth Observation to effectively analyze the environmental context in relation to health remains limited. This limitation is primarily due to the lack of fine-grained spatial and temporal data in public and population health studies, hindering a comprehensive understanding of health outcomes. Additionally, obtaining appropriate environmental indices across different geographical levels and timeframes poses a challenge. For the years 2019 (pre-COVID) and 2020 (COVID), we collected spatio-temporal indicators for all Lower Layer Super Output Areas in England. These indicators included: i) 111 sociodemographic features linked to health in existing literature, ii) 43 environmental point features (e.g., greenery and air pollution levels), iii) 4 seasonal composite satellite images each with 11 bands, and iv) prescription prevalence associated with five medical conditions (depression, anxiety, diabetes, hypertension, and asthma), opioids and total prescriptions. We combined these indicators into a single MEDSAT dataset, the availability of which presents an opportunity for the machine learning community to develop new techniques specific to public health. These techniques would address challenges such as handling large and complex data volumes, performing effective feature engineering on environmental and sociodemographic factors, capturing spatial and temporal dependencies in the models, addressing imbalanced data distributions, developing novel computer vision methods for health modeling based on satellite imagery, ensuring model explainability, and achieving generalization beyond the specific geographical region.

MCML Authors

Ivica Obadic

Data Science in Earth Observation

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

C3 | Physics and Geo Sciences

J. Schweisthal, D. Frauen, V. Melnychuk and S. Feuerriegel.
Reliable Off-Policy Learning for Dosage Combinations.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

Decision-making in personalized medicine such as cancer therapy or critical care must often make choices for dosage combinations, i.e., multiple continuous treatments. Existing work for this task has modeled the effect of multiple treatments independently, while estimating the joint effect has received little attention but comes with non-trivial challenges. In this paper, we propose a novel method for reliable off-policy learning for dosage combinations. Our method proceeds along three steps: (1) We develop a tailored neural network that estimates the individualized dose-response function while accounting for the joint effect of multiple dependent dosages. (2) We estimate the generalized propensity score using conditional normalizing flows in order to detect regions with limited overlap in the shared covariate-treatment space. (3) We present a gradient-based learning algorithm to find the optimal, individualized dosage combinations. Here, we ensure reliable estimation of the policy value by avoiding regions with limited overlap. We finally perform an extensive evaluation of our method to show its effectiveness. To the best of our knowledge, ours is the first work to provide a method for reliable off-policy learning for optimal dosage combinations.

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Dennis Frauen

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Valentyn Melnychuk

Artificial Intelligence in Management

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

C4 | Computational Social Sciences

M. Singh, A. Fono and G. Kutyniok.
Are Spiking Neural Networks more expressive than Artificial Neural Networks?.
1st Workshop on Unifying Representations in Neural Models (UniReps 2023) at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

This article studies the expressive power of spiking neural networks with firing-time-based information encoding, highlighting their potential for future energy-efficient AI applications when deployed on neuromorphic hardware. The computational power of a network of spiking neurons has already been studied via their capability of approximating any continuous function. By using the Spike Response Model as a mathematical model of a spiking neuron and assuming a linear response function, we delve deeper into this analysis and prove that spiking neural networks generate continuous piecewise linear mappings. We also show that they can emulate any multi-layer (ReLU) neural network with similar complexity. Furthermore, we prove that the maximum number of linear regions generated by a spiking neuron scales exponentially with respect to the input dimension, a characteristic that distinguishes it significantly from an artificial (ReLU) neuron. Our results further extend the understanding of the approximation properties of spiking neural networks and open up new avenues where spiking neural networks can be deployed instead of artificial neural networks without any performance loss.

MCML Authors

Gitta Kutyniok

Prof. Dr.

Mathematical Foundations of Artificial Intelligence

A2 | Mathematical Foundations

N. Sturma, C. Squires, M. Drton and C. Uhler.
Unpaired Multi-Domain Causal Representation Learning.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

The goal of causal representation learning is to find a representation of data that consists of causally related latent variables. We consider a setup where one has access to data from multiple domains that potentially share a causal representation. Crucially, observations in different domains are assumed to be unpaired, that is, we only observe the marginal distribution in each domain but not their joint distribution. In this paper, we give sufficient conditions for identifiability of the joint distribution and the shared causal graph in a linear setup. Identifiability holds if we can uniquely recover the joint distribution and the shared causal representation from the marginal distributions in each domain. We transform our results into a practical method to recover the shared latent causal graph.

MCML Authors

Nils Sturma

Mathematical Statistics

A1 | Statistical Foundations & Explainability
→ Group Mathias Drton

Mathias Drton

Prof. Dr.

Mathematical Statistics

A1 | Statistical Foundations & Explainability

G. Zhai, E. P. Örnek, S.-C. Wu, Y. Di, F. Tombari, N. Navab and B. Busam.
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graphs.
37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

Controllable scene synthesis aims to create interactive environments for numerous industrial use cases. Scene graphs provide a highly suitable interface to facilitate these applications by abstracting the scene context in a compact manner. Existing methods, reliant on retrieval from extensive databases or pre-trained shape embeddings, often overlook scene-object and object-object relationships, leading to inconsistent results due to their limited generation capacity. To address this issue, we present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes, which are semantically realistic and conform to commonsense. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes via latent diffusion, capturing global scene-object and local inter-object relationships in the scene graph while preserving shape diversity. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model. Due to the lack of a scene graph dataset offering high-quality object-level meshes with relations, we also construct SG-FRONT, enriching the off-the-shelf indoor dataset 3D-FRONT with additional scene graph labels. Extensive experiments are conducted on SG-FRONT, where CommonScenes shows clear advantages over other methods regarding generation consistency, quality, and diversity. Codes and the dataset are available on the website.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

C1 | Medicine
→ Group Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

C1 | Medicine

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

C1 | Medicine
→ Group Nassir Navab

Y. Zhang, Y. Li, H. Brown, M. Rezaei, B. Bischl, P. Torr, A. Khakzar and K. Kawaguchi.
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments.
Workshop XAI in Action: Past, Present, and Future Applications (XAIA 2023) at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.

MCML Authors

Yawei Li

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Mina Rezaei

Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Ashkan Khakzar

Dr.

* Former member

C1 | Medicine
→ Group Daniel Rückert

S. Zhang, P. Wicke, L. K. Senel, L. Figueredo, A. Naceri, S. Haddadin, B. Plank and H. Schütze.
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation.
6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.

Abstract

The convergence of embodied agents and large language models (LLMs) has brought significant advancements to embodied instruction following.Particularly, the strong reasoning capabilities of LLMs make it possible for robots to perform long-horizon tasks without expensive annotated demonstrations.However, public benchmarks for testing the long-horizon reasoning capabilities of language-conditioned robots in various scenarios are still missing. To fill this gap, this work focuses on the tabletopmanipulation task and releases a simulation benchmark,textit{LoHoRavens}, which covers various long-horizonreasoning aspects spanning color, size, space, arithmeticsand reference.Furthermore, there is a key modality bridging problem forlong-horizon manipulation tasks with LLMs: how toincorporate the observation feedback during robot executionfor the LLM’s closed-loop planning, which is however less studied by prior work. We investigate two methods of bridging the modality gap: caption generation and learnable interface for incorporating explicit and implicit observation feedback to the LLM, respectively.These methods serve as the two baselines for our proposed benchmark. Experiments show that both methods struggle to solve most tasks, indicating long-horizon manipulation tasks are still challenging for current popular models.We expect the proposed public benchmark and baselines can help the community develop better models for long-horizon tabletop manipulation tasks.

MCML Authors

Shengqiang Zhang

Statistical NLP and Deep Learning

B2 | Natural Language Processing
→ Group Hinrich Schütze

Philipp Wicke

Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing
→ Group Hinrich Schütze

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

B2 | Natural Language Processing
→ Group Hinrich Schütze

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

NeurIPS 2023

05.12.2023

06.11.2024

MCML researchers with 20 papers at EMNLP 2024

Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, 12.11.2024 - 16.11.2024

01.10.2024

MCML researchers with 16 papers at MICCAI 2024

27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, 06.10.2024 - 10.10.2024

26.09.2024

MCML researchers with 18 papers at ECCV 2024

18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, 29.09.2024 - 04.10.2024

10.09.2024

MCML at ECML-PKDD 2024

We are happy to announce that MCML researchers are represented at ECML-PKDD 2024.

20.08.2024

MCML researchers with two papers at KDD 2024

30th ACM SIGKDD International Conference on Knowledge Discovery and Data (KDD 2024). Barcelona, Spain, 25.08.2024 - 29.08.2024

MCML researchers with 16 papers at NeurIPS 2023

37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, 10.12.2023–16.12.2023

Related