Research Group Volker Tresp

Database Systems and Data Mining AI Lab

Volker Tresp

is Guest Professor for Machine Learning at the Chair of Database Systems & Data Mining at LMU Munich.

His team has a long-standing tradition in machine learning for relational structured domains. They particularly focus on (temporal) knowledge graphs and are currently investigating synergies with large language models. Driven by the interest in cognitive AI, they are increasingly exploring multimodal foundation models. The ultimate goal is to achieve a better understanding of human-level intelligence.

Team members @MCML

PostDocs

Yunpu Ma

Dr.

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

PhD Students

Shuo Chen

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Haokun Chen

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Thomas Decker

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Zifeng Ding

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Rajat Koner

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Ruotong Liao

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Tong Liu

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Yize Sun

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Gengyuan Zhang

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Yao Zhang

A3 | Computational Models
→ Group Volker Tresp

Database Systems and Data Mining AI Lab

Publications @MCML

2025

[122]

S. Chen, J. Liu, Z. Han, Y. Xia, D. Cremers, P. Torr, V. Tresp and J. Gu.
True Multimodal In-Context Learning Needs Attention to the Visual Context.
COLM 2025 - Conference on Language Modeling. Montreal, Canada, Oct 07-09, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Multimodal Large Language Models (MLLMs), built on powerful language backbones, have enabled Multimodal In-Context Learning (MICL)-adapting to new tasks from a few multimodal demonstrations consisting of images, questions, and answers. Despite showing noticeable improvement on standard vision-language datasets, current MLLMs struggle to leverage visual information in the demonstrations. Specifically, they tend to neglect visual cues and over-rely on textual patterns, leading to mere text imitation rather than genuine multimodal adaptation. This behavior makes MICL still unimodal and largely restricts its practical utility. More importantly, this limitation is often concealed by the improved performance on tasks that do not require understanding the visual context. As a result, how to effectively enhance MICL ability and reliably evaluate the MICL performance remains underexplored. To address these issues, we first introduce Dynamic Attention Reallocation (DARA), an efficient fine-tuning strategy that encourages models to attend to the visual context by rebalancing attention across visual and textual tokens. In addition, we present TrueMICL, an MICL-dedicated dataset with both support and test sets that explicitly requires the integration of multimodal information-particularly visual content-for correct task completion. Extensive experiments demonstrate the effectiveness of our holistic solution, showcasing substantial improvements in the true multimodal in-context learning capabilities.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[121]

Z. Ding, Y. Li, Y. He, A. Norelli, J. Wu, V. Tresp, Y. Ma and M. Bronstein.
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models.
TGL @KDD 2025 - Temporal Graph Learning Workshop at the 31st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2025). Toronto, ON, Canada, Aug 03-07, 2025. To be published. Preprint available. arXiv

Abstract

Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2) Meanwhile, more powerful models are needed to identify and select the most critical temporal information within the extended context provided by longer histories. To address these problems, we propose a CTDG representation learning model named DyGMamba, originating from the popular Mamba state space model (SSM). DyGMamba first leverages a node-level SSM to encode the sequence of historical node interactions. Another time-level SSM is then employed to exploit the temporal patterns hidden in the historical graph, where its output is used to dynamically select the critical information from the interaction history. We validate DyGMamba experimentally on the dynamic link prediction task. The results show that our model achieves state-of-the-art in most cases. DyGMamba also maintains high efficiency in terms of computational resources, making it possible to capture long temporal dependencies with a limited computation budget.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[120]

J. Bi, Y. Wang, H. Chen, X. Xiao, A. Hecker, V. Tresp and Y. Ma.
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL

Abstract

Multimodal Large Language Models (MLLMs) have significantly advanced visual tasks by integrating visual representations into large language models (LLMs). The textual modality, inherited from LLMs, equips MLLMs with abilities like instruction following and in-context learning. In contrast, the visual modality enhances performance in downstream tasks by leveraging rich semantic content, spatial information, and grounding capabilities. These intrinsic modalities work synergistically across various visual tasks. Our research initially reveals a persistent imbalance between these modalities, with text often dominating output generation during visual instruction tuning. This imbalance occurs when using both full fine-tuning and parameter-efficient fine-tuning (PEFT) methods. We then found that re-balancing these modalities can significantly reduce the number of trainable parameters required, inspiring a direction for further optimizing visual instruction tuning. We introduce Modality Linear Representation-Steering (MoReS) to achieve the goal. MoReS effectively re-balances the intrinsic modalities throughout the model, where the key idea is to steer visual representations through linear transformations in the visual subspace across each model layer. To validate our solution, we composed LLaVA Steering, a suite of models integrated with the proposed MoReS method. Evaluation results show that the composed LLaVA Steering models require, on average, 500 times fewer trainable parameters than LoRA needs while still achieving comparable performance across three visual benchmarks and eight visual question-answering tasks. Last, we present the LLaVA Steering Factory, an in-house developed platform that enables researchers to quickly customize various MLLMs with component-based architecture for seamlessly integrating state-of-the-art models, and evaluate their intrinsic modality imbalance.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[119]

T. Liu, Z. Lai, J. Wang, G. Zhang, S. Chen, P. Torr, V. Demberg, V. Tresp and J. Gu.
Multimodal Pragmatic Jailbreak on Text-to-image Models.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL GitHub

Abstract

Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two closed-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from around 10% to 70% where DALLE 3 demonstrates almost the highest unsafety. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while these filters may be effective for single modality detection, they fail to work against our jailbreak. We also investigate the underlying reason for such jailbreaks, from the perspective of text rendering capability and training data. Our work provides a foundation for further development towards more secure and reliable T2I models.

MCML Authors

Tong Liu

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[118]

T. Liu, X. Yu, W. Zhou, J. Gu and V. Tresp.
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL

Abstract

Efficient preference optimization algorithms such as Direct Preference Optimization (DPO) have become a popular approach in aligning large language models (LLMs) with human preferences. These algorithms implicitly treat the LLM as a reward model, and focus on training it to correct misranked preference pairs. However, recent work~citep{chen2024preference} empirically finds that DPO training textit{rarely improves these misranked preference pairs}, despite its gradient emphasizing on these cases. We introduce FocalPO, a DPO variant that instead textit{down-weighs} misranked preference pairs and prioritizes enhancing the model’s understanding of pairs that it can already rank correctly. Inspired by Focal Loss used in vision tasks, FocalPO achieves this by adding a modulating factor to dynamically scale DPO loss. Our experiment demonstrates that FocalPO surpasses DPO and its variants on popular benchmarks like Alpaca Eval 2.0 using Mistral-Base-7B and Llama-3-Instruct-8B. Additionally, we empirically reveals how FocalPO affects training on correct and incorrect sample groups, further underscoring its effectiveness.

MCML Authors

Tong Liu

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

[117]

E. Nie, B. Shao, Z. Ding, M. Wang, H. Schmid and H. Schütze.
BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. URL GitHub

Abstract

Large language models (LLMs) possess extensive parametric knowledge, but this knowledge is difficult to update with new information because retraining is very expensive and infeasible for closed-source models. Knowledge editing (KE) has emerged as a viable solution for updating the knowledge of LLMs without compromising their overall performance. On-the-fly KE methods, inspired by in-context learning (ICL), have shown great promise and allow LLMs to be treated as black boxes. In the past, KE was primarily employed in English contexts, whereas the potential for cross-lingual KE in current English-centric LLMs has not been fully explored. To foster more research in this direction, we introduce the BMIKE-53 benchmark for evaluating cross-lingual KE on 53 diverse languages across three KE task types. We also propose a gradient-free KE method called Multilingual In-context Knowledge Editing (MIKE) and evaluate it on BMIKE-53. Our evaluation focuses on cross-lingual knowledge transfer in terms of reliability, generality, locality, and portability, offering valuable insights and a framework for future research in cross-lingual KE.

MCML Authors

Ercong Nie

Computational Linguistics

Zifeng Ding

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

Mingyang Wang

Computational Linguistics

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[116]

H. Chen, H. Li, Y. Zhang, G. Zhang, J. Bi, P. Torr, J. Gu, D. Krompass and V. Tresp.
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM’s pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client’s local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Hang Li

* Former Member

Yao Zhang

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[115]

G. Zhang, M. L. A. Fok, J. Ma, Y. Xia, D. Cremers, P. Torr, V. Tresp and J. Gu.
Localizing Events in Videos with Multimodal Queries.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current research is that semantic queries are typically in natural language that depicts the semantics of the target event. This setting overlooks the potential for multimodal semantic queries composed of images and texts. To address this gap, we introduce a new benchmark, ICQ, for localizing events in videos with multimodal queries, along with a new evaluation dataset ICQ-Highlight. Our new benchmark aims to evaluate how well models can localize an event given a multimodal semantic query that consists of a reference image, which depicts the event, and a refinement text to adjust the images’ semantics. To systematically benchmark model performance, we include 4 styles of reference images and 5 types of refinement texts, allowing us to explore model performance across different domains. We propose 3 adaptation methods that tailor existing models to our new setting and evaluate 10 SOTA models, ranging from specialized to large-scale foundation models. We believe this benchmark is an initial step toward investigating multimodal queries in video event localization.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Yan Xia

Dr.

* Former Member

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[114]

T. Liu, Z. Lai, J. Wang, G. Zhang, S. Chen, P. Torr, V. Demberg, V. Tresp and J. Gu.
Multimodal Pragmatic Jailbreak on Text-to-image Models.
ReGenAI @CVPR 2025 - 2nd Workshop on Responsible Generative AI at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025). Nashville, TN, USA, Jun 11-15, 2025. Best Paper Award. URL GitHub

Abstract

MCML Authors

Tong Liu

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[113]

X. Ma, C. Lin, Y. Zhang, V. Tresp and Y. Ma.
Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation.
Preprint (Jun. 2025). arXiv

Abstract

Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative ’team’ focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase-Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase-Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles. We plan to open-source the entire framework.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

[112]

Z. S. Taghavi, A. Modarressi, Y. Ma and H. Schütze.
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge.
Preprint (Jun. 2025). arXiv GitHub

Abstract

Retrieval systems are central to many NLP pipelines, but often rely on surface-level cues such as keyword overlap and lexical semantic similarity. To evaluate retrieval beyond these shallow signals, recent benchmarks introduce reasoning-heavy queries; however, they primarily shift the burden to query-side processing techniques – like prompting or multi-hop retrieval – that can help resolve complexity. In contrast, we present ImpliRet, a benchmark that shifts the reasoning challenge to document-side processing: The queries are simple, but relevance depends on facts stated implicitly in documents through temporal (e.g., resolving ’two days ago’), arithmetic, and world knowledge relationships. We evaluate a range of sparse and dense retrievers, all of which struggle in this setting: the best nDCG@10 is only 15.07%. We also test whether long-context models can overcome this limitation. But even with a short context of only ten documents, including the positive document, GPT-4.1 scores only 35.06%, showing that document-side reasoning remains a challenge.

MCML Authors

Zeinab Sadat Taghavi

Computational Linguistics

Ali Modarressi

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[111]

A. Wang, D. Shu, Y. Wang, Y. Ma and M. Du.
Improving LLM Reasoning through Interpretable Role-Playing Steering.
Preprint (Jun. 2025). arXiv

Abstract

Role-playing has emerged as an effective technique for enhancing the reasoning capabilities of large language models (LLMs). However, existing methods primarily rely on prompt engineering, which often lacks stability and interpretability. In this paper, we introduce Sparse Autoencoder Role-Playing Steering (SRPS), a novel framework that identifies and manipulates internal model features associated with role-playing behavior. Our approach extracts latent representations from role-play prompts, selects the most relevant features based on activation patterns, and constructs a steering vector that can be injected into the model’s residual stream with controllable intensity. Our method enables fine-grained control over role-specific behavior and offers insights into how role information influences internal model activations. Extensive experiments across various reasoning benchmarks and model sizes demonstrate consistent performance gains. Notably, in the zero-shot chain-of-thought (CoT) setting, the accuracy of Llama3.1-8B on CSQA improves from 31.86% to 39.80%, while Gemma2-9B on SVAMP increases from 37.50% to 45.10%. These results highlight the potential of SRPS to enhance reasoning ability in LLMs, providing better interpretability and stability compared to traditional prompt-based role-playing.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[110]

M. Wang, S. Chen, K. Kersting, V. Tresp and Y. Ma.
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding.
Preprint (Jun. 2025). arXiv

Abstract

Recent advances in Video Large Language Models (VLLMs) have significantly enhanced their ability to understand video content. Nonetheless, processing long videos remains challenging due to high computational demands and the redundancy present in the visual data. In this work, we propose METok, a training-free, Multi-stage Event-based Token compression framework designed to accelerate VLLMs’ inference while preserving accuracy. METok progressively eliminates redundant visual tokens across three critical stages: (1) event-aware compression during vision encoding, (2) hierarchical token pruning in the prefilling stage based on semantic alignment and event importance, and (3) a decoding-stage KV Cache optimization that further reduces memory consumption. Our experiments on diverse video benchmarks demonstrate that METok achieves an optimal trade-off between efficiency and accuracy by dynamically selecting informative visual tokens. For instance, equipping LongVA-7B with METok realizes an 80.6% FLOPs reduction and 93.5% KV Cache memory savings, all while maintaining comparable or even superior accuracy.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[109]

Y. Wang, J. Bi, Y. Ma and S. Pirk.
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM.
Preprint (Jun. 2025). arXiv

Abstract

Multimodal Large Language Model (MLLM) often suffer from hallucinations. They over-rely on partial cues and generate incorrect responses. Recently, methods like Visual Contrastive Decoding (VCD) and Instruction Contrastive Decoding (ICD) have been proposed to mitigate hallucinations by contrasting predictions from perturbed or negatively prefixed inputs against original outputs. In this work, we uncover that methods like VCD and ICD fundamentally influence internal attention dynamics of the model. This observation suggests that their effectiveness may not stem merely from surface-level modifications to logits but from deeper shifts in attention distribution. Inspired by this insight, we propose an attention-steerable contrastive decoding framework that directly intervenes in attention mechanisms of the model to offer a more principled approach to mitigating hallucinations. Our experiments across multiple MLLM architectures and diverse decoding methods demonstrate that our approach significantly reduces hallucinations and improves the performance on benchmarks such as POPE, CHAIR, and MMHal-Bench, while simultaneously enhancing performance on standard VQA benchmarks.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[108]

G. Zhang, T. Hannan, H. Kleiner, B. Aydemir, X. Xie, J. Lan, T. Seidl, V. Tresp and J. Gu.
AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data Interaction.
Preprint (Jun. 2025). arXiv

Abstract

An ideal vision-language agent serves as a bridge between the human users and their surrounding physical world in real-world applications like autonomous driving and embodied agents, and proactively provides accurate and timely responses given user intents. An intriguing challenge arises when agents interact with the world as a dynamic data stream and ad-hoc queries from users: supporting knowledge for queries, namely evidence, usually appears asynchronously with the arrival time of queries, and agents need to ground their responses in historical data, present observations, and even future streams. We frame this challenge as Query-Evidence Asynchrony, where user queries and their supporting evidence typically arrive asynchronously in the streaming setting. This setting requires not only strong reasoning capabilities but also the ability to retain past observations and respond to queries with temporal awareness. In this paper, we introduce a diagnostic benchmark that evaluates Multimodal Large Language Models (MLLMs) on their ability to handle interaction with streaming data. Further, we present AViLA, Asynchronous Video-Language Agent for streaming data interaction that can handle ad-hoc queries and give time-aware responses. For this purpose, AViLA consists of three key modules: comprehensive memory retention, evidence identification, and evidence-grounded trigger, that are designed to maintain a general-purpose memory and respond readily and timely to queries. Our experiments show that existing models often fail to respond at appropriate times, while AViLA significantly improves both accuracy and temporal awareness. Our code and dataset will be publicly available.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Tanveer Hannan

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[107]

Y. Zhang, H. Gao, H. Chen, W. Li, Y. Ma and V. Tresp.
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models.
Preprint (Jun. 2025). arXiv

Abstract

Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challenges, including high computational demands, limited client capacity, substantial communication costs, and heterogeneous client data. Existing FL methods assume client-side deployment of full models, an assumption that breaks down for large-scale MLLMs due to their massive size and communication demands. To address these limitations, we propose FedNano, the first FL framework that centralizes the LLM on the server while introducing NanoEdge, a lightweight module for client-specific adaptation. NanoEdge employs modality-specific encoders, connectors, and trainable NanoAdapters with low-rank adaptation. This design eliminates the need to deploy LLM on clients, reducing client-side storage by 95%, and limiting communication overhead to only 0.01% of the model parameters. By transmitting only compact NanoAdapter updates, FedNano handles heterogeneous client data and resource constraints while preserving privacy. Experiments demonstrate that FedNano outperforms prior FL baselines, bridging the gap between MLLM scale and FL feasibility, and enabling scalable, decentralized multimodal AI systems.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Haokun Chen

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[106]

Y. Zhang, C. Lin, S. Tang, H. Chen, S. Zhou, Y. Ma and V. Tresp.
SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence.
Preprint (Jun. 2025). arXiv GitHub

Abstract

The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated system multi-agent generation.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Haokun Chen

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[105]

A. Koebler, T. Decker, I. Thon, V. Tresp and F. Buettner.
Incremental Uncertainty-aware Performance Monitoring with Active Labeling Intervention.
AISTATS 2025 - 28th International Conference on Artificial Intelligence and Statistics. Mai Khao, Thailand, May 03-05, 2025. To be published. URL

Abstract

We study the problem of monitoring machine learning models under gradual distribution shifts, where circumstances change slowly over time, often leading to unnoticed yet significant declines in accuracy. To address this, we propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a novel label-free method that estimates performance changes by modeling gradual shifts using optimal transport. In addition, IUPM quantifies the uncertainty in the performance prediction and introduces an active labeling procedure to restore a reliable estimate under a limited labeling budget. Our experiments show that IUPM outperforms existing performance estimation baselines in various gradual shift scenarios and that its uncertainty awareness guides label acquisition more effectively compared to other strategies.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[104]

J. Bi, D. Yan, Y. Wang, W. Huang, H. Chen, G. Wan, M. Ye, X. Xiao, H. Schütze, V. Tresp and Y. Ma.
CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process.
Preprint (May. 2025). arXiv

Abstract

Recent Large Reasoning Models significantly improve the reasoning ability of Large Language Models by learning to reason, exhibiting the promising performance in solving complex tasks. LRMs solve tasks that require complex reasoning by explicitly generating reasoning trajectories together with answers. Nevertheless, judging the quality of such an output answer is not easy because only considering the correctness of the answer is not enough and the soundness of the reasoning trajectory part matters as well. Logically, if the soundness of the reasoning part is poor, even if the answer is correct, the confidence of the derived answer should be low. Existing methods did consider jointly assessing the overall output answer by taking into account the reasoning part, however, their capability is still not satisfactory as the causal relationship of the reasoning to the concluded answer cannot properly reflected. In this paper, inspired by classical mechanics, we present a novel approach towards establishing a CoT-Kinetics energy equation. Specifically, our CoT-Kinetics energy equation formulates the token state transformation process, which is regulated by LRM internal transformer layers, as like a particle kinetics dynamics governed in a mechanical field. Our CoT-Kinetics energy assigns a scalar score to evaluate specifically the soundness of the reasoning phase, telling how confident the derived answer could be given the evaluated reasoning. As such, the LRM’s overall output quality can be accurately measured, rather than a coarse judgment (e.g., correct or incorrect) anymore.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[103]

H. Chen, Y. Zhang, Y. Bi, Y. Zhang, T. Liu, J. Bi, J. Lan, J. Gu, C. Grosser, D. Krompass, N. Navab and V. Tresp.
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs.
Preprint (May. 2025). arXiv

Abstract

In recent years, Large Language Models (LLMs) have achieved remarkable advancements, drawing significant attention from the research community. Their capabilities are largely attributed to large-scale architectures, which require extensive training on massive datasets. However, such datasets often contain sensitive or copyrighted content sourced from the public internet, raising concerns about data privacy and ownership. Regulatory frameworks, such as the General Data Protection Regulation (GDPR), grant individuals the right to request the removal of such sensitive information. This has motivated the development of machine unlearning algorithms that aim to remove specific knowledge from models without the need for costly retraining. Despite these advancements, evaluating the efficacy of unlearning algorithms remains a challenge due to the inherent complexity and generative nature of LLMs. In this work, we introduce a comprehensive auditing framework for unlearning evaluation, comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods. By using various auditing algorithms, we evaluate the effectiveness and robustness of different unlearning strategies. To explore alternatives beyond prompt-based auditing, we propose a novel technique that leverages intermediate activation perturbations, addressing the limitations of auditing methods that rely solely on model inputs and outputs.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Yao Zhang

Database Systems and Data Mining AI Lab

Tong Liu

Database Systems and Data Mining AI Lab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[102]

J. Lan, Y. Fu, U. Schlegel, G. Zhang, T. Hannan, H. Chen and T. Seidl.
My Answer Is NOT 'Fair': Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals.
Preprint (May. 2025). arXiv

Abstract

Social bias is a critical issue in large vision-language models (VLMs), where fairness- and ethics-related problems harm certain groups of people in society. It is unknown to what extent VLMs yield social bias in generative responses. In this study, we focus on evaluating and mitigating social bias on both the model’s response and probability distribution. To do so, we first evaluate four state-of-the-art VLMs on PAIRS and SocialCounterfactuals datasets with the multiple-choice selection task. Surprisingly, we find that models suffer from generating gender-biased or race-biased responses. We also observe that models are prone to stating their responses are fair, but indeed having mis-calibrated confidence levels towards particular social groups. While investigating why VLMs are unfair in this study, we observe that VLMs’ hidden layers exhibit substantial fluctuations in fairness levels. Meanwhile, residuals in each layer show mixed effects on fairness, with some contributing positively while some lead to increased bias. Based on these findings, we propose a post-hoc method for the inference stage to mitigate social bias, which is training-free and model-agnostic. We achieve this by ablating bias-associated residuals while amplifying fairness-associated residuals on model hidden layers during inference. We demonstrate that our post-hoc method outperforms the competing training strategies, helping VLMs have fairer responses and more reliable confidence levels.

MCML Authors

Udo Schlegel

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Tanveer Hannan

Database Systems and Data Mining AI Lab

Haokun Chen

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

[101]

M. Wang, L. Lange, H. Adel, Y. Ma, J. Strötgen and H. Schütze.
Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes.
Preprint (May. 2025). arXiv

Abstract

Reasoning language models (RLMs) excel at complex tasks by leveraging a chain-of-thought process to generate structured intermediate steps. However, language mixing, i.e., reasoning steps containing tokens from languages other than the prompt, has been observed in their outputs and shown to affect performance, though its impact remains debated. We present the first systematic study of language mixing in RLMs, examining its patterns, impact, and internal causes across 15 languages, 7 task difficulty levels, and 18 subject areas, and show how all three factors influence language mixing. Moreover, we demonstrate that the choice of reasoning language significantly affects performance: forcing models to reason in Latin or Han scripts via constrained decoding notably improves accuracy. Finally, we show that the script composition of reasoning traces closely aligns with that of the model’s internal representations, indicating that language mixing reflects latent processing preferences in RLMs. Our findings provide actionable insights for optimizing multilingual reasoning and open new directions for controlling reasoning languages to build more interpretable and adaptable RLMs.

MCML Authors

Mingyang Wang

Computational Linguistics

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Computational Linguistics

[100]

Z. Li, S. Yan, Y. Ma, Y. Li, X. Lyu and M. Schubert.
Beyond Single-Step: Multi-Frame Action-Conditiones Video Generation for Reinforcement Learning Environments.
World Models @ICLR 2025 - Workshop on World Models: Understanding, Modelling and Scaling at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. URL

Abstract

World models achieved great success in learning the dynamics from both low-dimensional and high-dimensional states. Yet, there is no existing work to address multi-step generation task with high dimensional data. In this paper, we propose the first action-conditioned multi-frame video generation model, advancing world
model development by generating future states from actions. As opposed to recent single-step or autoregressive approaches, our model directly generates multiple future frames conditioned on past observations and action sequences. Our framework extends its capabilities to action-conditioned video generation by introducing an action encoder. This addition enables the spatiotemporal variational autoencoder and diffusion transformer in Open-Sora to effectively incorporate action information, ensuring precise and coherent video generation. We evaluated performance on Atari environments (Breakout, Pong, DemonAttack) using MSE, PSNR, and LPIPS. Results show that conditioning solely on future actions and embedding-based encoding improve generation accuracy and perceptual quality while capturing complex temporal dependencies like inertia. Our work paves the way for action-conditioned multi-step generative world models in dynamic environment.

MCML Authors

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[99]

T. Decker, V. Tresp and F. Buettner.
Why Uncertainty Calibration Matters for Reliable Perturbation-based Explanations.
XAI4Science @ICLR 2025 - Workshop XAI4Science: From Understanding Model Behavior to Discovering New Scientific Knowledge at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. URL

Abstract

Perturbation-based explanations are widely utilized to enhance the transparency of modern machine-learning models. However, their reliability is often compromised by the unknown model behavior under the specific perturbations used. This paper investigates the relationship between uncertainty calibration - the alignment of model confidence with actual accuracy - and perturbation-based explanations. We show that models frequently produce unreliable probability estimates when subjected to explainability-specific perturbations and theoretically prove that this directly undermines explanation quality. To address this, we introduce ReCalX, a novel approach to recalibrate models for improved perturbation-based explanations while preserving their original predictions. Experiments on popular computer vision models demonstrate that our calibration strategy produces explanations that are more aligned with human perception and actual object locations.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[98]

Z. Ding.
Inductive representation learning and natural language question answering on temporal knowledge graphs.
Dissertation 2025. DOI

Abstract

Real-world applications such as recommendersystems, socialnetworks, andprotein-protein interactions often involve relational data. In recent years, there has been increasing interest in machine learning on such data, particularly in the context of knowledge graphs (KGs). KGs are structured relational data that store multi-relational information as directed graphs, where each node corresponds to an entity and each labeled edge represents a factual relationship between entities, e.g., (Oxford, located in, the United Kingdom). Traditional KGs assume time-invariant relationships. However, real-world relationships are dynamically evolving over time. For example, the chancellor of Germany in 2020 was Angela Merkel, but in 2022 it became Olaf Scholz. This necessitates the use of temporal knowledge graphs (TKGs), where temporal facts are introduced by coupling stationary facts with additional time identifiers, e.g., (Angela Merkel, is chancellor of, Germany, 2020). TKGs are more expressive than KGs as they model the temporal evolution of knowledge. Consequently, recent research has paid more attention to machine learning on TKGs. In this thesis, we focus on two machine learning problems: inductive knowledge representation learning and natural language question answering (QA) on TKGs. (Shortened)

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

[97]

G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
3DV 2025 - 12th International Conference on 3D Vision. Singapore, Mar 25-28, 2025. To be published. Preprint available. arXiv

Abstract

We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Code and trained models are open-sourced.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Ruotong Liao

Database Systems and Data Mining AI Lab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[96]

R. Amoroso, G. Zhang, R. Koner, L. Baraldi, R. Cucchiara and V. Tresp.
Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Video Question Answering (Video QA) is a challenging video understanding task that requires models to comprehend entire videos, identify the most relevant information based on contextual cues from a given question, and reason accurately to provide answers. Recent advancements in Multimodal Large Language Models (MLLMs) have transformed video QA by leveraging their exceptional commonsense reasoning capabilities. This progress is largely driven by the effective alignment between visual data and the language space of MLLMs. However, for video QA, an additional space-time alignment poses a considerable challenge for extracting question-relevant information across frames. In this work, we investigate diverse temporal modeling techniques to integrate with MLLMs, aiming to achieve question-guided temporal modeling that leverages pre-trained visual and textual alignment in MLLMs. We propose T-Former, a novel temporal modeling method that creates a question-guided temporal bridge between frame-wise visual perception and the reasoning capabilities of LLMs. Our evaluation across multiple video QA benchmarks demonstrates that T-Former competes favorably with existing temporal modeling approaches and aligns with recent advancements in video QA.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Rajat Koner

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[95]

S. Chen, Z. Han, B. He, J. Liu, M. Buckley, Y. Qin, P. Torr, V. Tresp and J. Gu.
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI URL

Abstract

Large Language Models (LLMs) with in-context learning (ICL) ability can quickly adapt to a specific context given a few demonstrations (demos). Recently, Multimodal Large Language Models (MLLMs) built upon LLMs have also shown multimodal ICL ability, i.e., responding to queries given a few multimodal demos, including images, queries, and answers. While ICL has been extensively studied on LLMs, its research on MLLMs remains limited. One essential question is whether these MLLMs can truly conduct multimodal ICL, or if only the textual modality is necessary. We investigate this question by examining two primary factors that influence ICL: 1) Demo content, i.e., understanding the influences of demo content in different modalities. 2) Demo selection strategy, i.e., how to select better multimodal demos for improved performance. Experiments revealed that multimodal ICL is predominantly driven by the textual content whereas the visual information in the demos has little influence. Interestingly, visual content is still necessary and useful for selecting demos to increase performance. Motivated by our analysis, we propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES), which considers both visual and language modalities when selecting demos. Extensive experiments are conducted to support our findings and verify the improvement brought by our method.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[94]

Y. Zhang, H. Chen, A. Frikha, Y. Yang, D. Krompass, G. Zhang, J. Gu and V. Tresp.
CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Visual Question Answering (VQA) systems witnessed a significant advance in recent years due to the development of large-scale Vision-Language Pre-trained Models (VLPMs). As the application scenario and user demand change over time, an advanced VQA system is expected to be capable of continuously expanding its knowledge and capabilities over time, not only to handle new tasks (i.e., new question types or visual scenes) but also to answer questions in new specialized domains without forgetting previously acquired knowledge and skills. Existing works studying CL on VQA tasks primarily consider answer- and question-type incremental learning or scene- and function-incremental learning, whereas how VQA systems perform when they encounter new domains and increasing user demands has not been studied. Motivated by this, we introduce CL-CrossVQA, a rigorous Continual Learning benchmark for Cross-domain Visual Question Answering, through which we conduct extensive experiments on 4 VLPMs, 5 CL approaches, and 5 VQA datasets from different domains. In addition, by probing the forgetting phenomenon of the intermediate layers, we provide insights into how model architecture affects CL performance, why CL approaches can help mitigate forgetting in VLPMs, and how to design CL approaches suitable for VLPMs in this challenging continual learning environment. To facilitate future work on developing an advanced All-in-One VQA system, we will release our datasets and code.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Haokun Chen

Database Systems and Data Mining AI Lab

Ahmed Frikha

Dr.

* Former Member

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[93]

H. Chen, D. Krompass, J. Gu and V. Tresp.
FedPop: Federated Population-based Hyperparameter Tuning.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their ’training-after-tuning’ framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both the client and server sides. Compared with prior tuning methods, FedPop employs an online ’tuning-while-training’ framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets, including full-sized Non-IID ImageNet-1K, demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP-tuning methods in FL.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[92]

Y. Zhang, Z. Ma, Y. Ma, Z. Han, Y. Wu and V. Tresp.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks and continuously refining this plan, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[91]

J. Bi, Y. Wang, D. Yan, X. Xiao, A. Hecker, V. Tresp and Y. Ma.
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection.
Preprint (Feb. 2025). arXiv

Abstract

Visual instruction tuning refines pre-trained Multimodal Large Language Models (MLLMs) to enhance their real-world task performance. However, the rapid expansion of visual instruction datasets introduces significant data redundancy, leading to excessive computational costs. Existing data selection methods predominantly rely on proxy models or loss-based metrics, both of which impose substantial computational overheads due to the necessity of model inference and backpropagation. To address this challenge, we propose PRISM, a novel training-free approach for efficient multimodal data selection. Unlike existing methods, PRISM eliminates the reliance on proxy models, warm-up pretraining, and gradient-based optimization. Instead, it leverages Pearson correlation analysis to quantify the intrinsic visual encoding properties of MLLMs, computing a task-specific correlation score to identify high-value instances. This not only enbles data-efficient selection,but maintains the original performance. Empirical evaluations across multiple MLLMs demonstrate that PRISM reduces the overall time required for visual instruction tuning and data selection to just 30% of conventional methods, while surpassing fully fine-tuned models across eight multimodal and three language understanding benchmarks, achieving a 101.7% relative improvement in final performance.

MCML Authors

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[90]

G. Zhang, M. Ding, T. Liu, Y. Zhang and V. Tresp.
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs.
Preprint (Feb. 2025). arXiv

Abstract

Multimodal large language models (MLLMs) have demonstrated strong performance in understanding videos holistically, yet their ability to process streaming videos-videos are treated as a sequence of visual events-remains underexplored. Intuitively, leveraging past events as memory can enrich contextual and temporal understanding of the current event. In this paper, we show that leveraging memories as contexts helps MLLMs better understand video events. However, because such memories rely on predictions of preceding events, they may contain misinformation, leading to confabulation and degraded performance. To address this, we propose a confabulation-aware memory modification method that mitigates confabulated memory for memory-enhanced event understanding.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Tong Liu

Database Systems and Data Mining AI Lab

Yao Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

2024

[89]

A. Koebler, T. Decker, I. Thon, V. Tresp and F. Buettner.
Incremental Uncertainty-aware Performance Monitoring with Labeling Intervention.
BDU @NeurIPS 2024 - Workshop Bayesian Decision-making and Uncertainty: from probabilistic and spatiotemporal modeling to sequential experiment design at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

We study the problem of monitoring machine learning models under temporal distribution shifts, where circumstances change gradually over time, often leading to unnoticed yet significant declines in accuracy. We propose Incremental Uncertainty-aware Performance Monitoring (IUPM), a novel label-free method that estimates model performance by modeling time-dependent shifts using optimal transport. IUPM also quantifies uncertainty in performance estimates and introduces an active labeling strategy to reduce this uncertainty. We further showcase the benefits of IUPM on different datasets and simulated temporal shifts over existing baselines.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[88]

T. Hannan, R. Koner, M. Bernhard, S. Shit, B. Menze, V. Tresp, M. Schubert and T. Seidl.
GRAtt-VIS: Gated Residual Attention for Video Instance Segmentation.
ICPR 2024 - 27th International Conference on Pattern Recognition. Kolkata, India, Dec 01-05, 2024. DOI GitHub

Abstract

Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce textbf{GRAtt-VIS}, textbf{G}ated textbf{R}esidual textbf{Att}ention for textbf{V}ideo textbf{I}nstance textbf{S}egmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next, based on the gate activation, we rectify degraded features from its past representation. Such a residual configuration alleviates the need for dedicated memory and provides a continuous stream of relevant instance features. Secondly, we propose a novel inter-instance interaction using gate activation as a mask for self-attention. This masking strategy dynamically restricts the unrepresentative instance queries in the self-attention and preserves vital information for long-term tracking. We refer to this novel combination of Gated Residual Connection and Masked Self-Attention as textbf{GRAtt} block, which can easily be integrated into the existing propagation-based framework. Further, GRAtt blocks significantly reduce the attention overhead and simplify dynamic temporal modeling. GRAtt-VIS achieves state-of-the-art performance on YouTube-VIS and the highly challenging OVIS dataset, significantly improving over previous methods.

MCML Authors

Tanveer Hannan

Database Systems and Data Mining AI Lab

Rajat Koner

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Maximilian Bernhard

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

[87]

Y. Liu, Y. Zhang, Q. Li, T. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy.
EMNLP 2024 - Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Full-parameter fine-tuning has become the go-to choice for adapting language models (LMs) to downstream tasks due to its excellent performance. As LMs grow in size, fine-tuning the full parameters of LMs requires a prohibitively large amount of GPU memory. Existing approaches utilize zeroth-order optimizer to conserve GPU memory, which can potentially compromise the performance of LMs as non-zero order optimizers tend to converge more readily on most downstream tasks. In this paper, we propose a novel optimizer-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step. HiFT can significantly reduce the amount of gradients and optimizer state parameters residing in GPU memory at the same time, thereby reducing GPU memory usage. Our results demonstrate that: (1) HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning. (2) HiFT supports various optimizers including AdamW, AdaGrad, SGD, etc. (3) HiFT can save more than 60% GPU memory compared with standard full-parameter fine-tuning for 7B model. (4) HiFT enables full-parameter fine-tuning of a 7B model on single 48G A6000 with a precision of 32 using the AdamW optimizer, without using any memory saving techniques.

MCML Authors

Yongkang Liu

Dr.

* Former Member

Tong Liu

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[86]

Z. Ding, J. Wu, J. Wu, Y. Xia and V. Tresp.
Temporal Fact Reasoning over Hyper-Relational Knowledge Graphs.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Stemming from traditional knowledge graphs (KGs), hyper-relational KGs (HKGs) provide additional key-value pairs (i.e., qualifiers) for each KG fact that help to better restrict the fact validity. In recent years, there has been an increasing interest in studying graph reasoning over HKGs. Meanwhile, as discussed in recent works that focus on temporal KGs (TKGs), world knowledge is ever-evolving, making it important to reason over temporal facts in KGs. Previous mainstream benchmark HKGs do not explicitly specify temporal information for each HKG fact. Therefore, almost all existing HKG reasoning approaches do not devise any module specifically for temporal reasoning. To better study temporal fact reasoning over HKGs, we propose a new type of data structure named hyper-relational TKG (HTKG). Every fact in an HTKG is coupled with a timestamp explicitly indicating its time validity. We develop two new benchmark HTKG datasets, i.e., Wiki-hy and YAGO-hy, and propose an HTKG reasoning model that efficiently models hyper-relational temporal facts. To support future research on this topic, we open-source our datasets and model.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Yan Xia

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[85]

R. Liao, M. Erler, H. Wang, G. Zhai, G. Zhang, Y. Ma and V. Tresp.
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

Abstract

In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[84]

H. Zhang, J. Liu, Z. Han, S. Chen, B. He, V. Tresp, Z. Xu and J. Gu.
Visual Question Decomposition on Multimodal Large Language Models.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Question decomposition has emerged as an effective strategy for prompting Large Language Models (LLMs) to answer complex questions. However, while existing methods primarily focus on unimodal language models, the question decomposition capability of Multimodal Large Language Models (MLLMs) has yet to be explored. To this end, this paper explores visual question decomposition on MLLMs. Specifically, we introduce a systematic evaluation framework including a dataset and several evaluation criteria to assess the quality of the decomposed sub-questions, revealing that existing MLLMs struggle to produce high-quality sub-questions. To address this limitation, we propose a specific finetuning dataset, DecoVQA+, for enhancing the model’s question decomposition capability. Aiming at enabling models to perform appropriate selective decomposition, we propose an efficient finetuning pipeline. The finetuning pipeline consists of our proposed dataset and a training objective for selective decomposition. Finetuned MLLMs demonstrate significant improvements in the quality of sub-questions and the policy of selective question decomposition. Additionally, the models also achieve higher accuracy with selective decomposition on VQA benchmark datasets.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[83]

Y. Sun, Z. Wu, Y. Ma and V. Tresp.
Quantum Architecture Search with Unsupervised Representation Learning.
Preprint (Oct. 2024). arXiv

Abstract

Unsupervised representation learning presents new opportunities for advancing Quantum Architecture Search (QAS) on Noisy Intermediate-Scale Quantum (NISQ) devices. QAS is designed to optimize quantum circuits for Variational Quantum Algorithms (VQAs). Most QAS algorithms tightly couple the search space and search algorithm, typically requiring the evaluation of numerous quantum circuits, resulting in high computational costs and limiting scalability to larger quantum circuits. Predictor-based QAS algorithms mitigate this issue by estimating circuit performance based on structure or embedding. However, these methods often demand time-intensive labeling to optimize gate parameters across many circuits, which is crucial for training accurate predictors. Inspired by the classical neural architecture search algorithm Arch2vec, we investigate the potential of unsupervised representation learning for QAS without relying on predictors. Our framework decouples unsupervised architecture representation learning from the search process, enabling the learned representations to be applied across various downstream tasks. Additionally, it integrates an improved quantum circuit graph encoding scheme, addressing the limitations of existing representations and enhancing search efficiency. This predictor-free approach removes the need for large labeled datasets. During the search, we employ REINFORCE and Bayesian Optimization to explore the latent representation space and compare their performance against baseline methods. Our results demonstrate that the framework efficiently identifies high-performing quantum circuits with fewer search iterations.

MCML Authors

Yize Sun

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[82]

G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Ruotong Liao

Database Systems and Data Mining AI Lab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[81]

S. Gilhuber, A. Beer, Y. Ma and T. Seidl.
FALCUN: A Simple and Efficient Deep Active Learning Strategy.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI

Abstract

We propose FALCUN, a novel deep batch active learning method that is label- and time-efficient. Our proposed acquisition uses a natural, self-adjusting balance of uncertainty and diversity: It slowly transitions from emphasizing uncertain instances at the decision boundary to emphasizing batch diversity. In contrast, established deep active learning methods often have a fixed weighting of uncertainty and diversity, limiting their effectiveness over diverse data sets exhibiting different characteristics. Moreover, to increase diversity, most methods demand intensive search through a deep neural network’s high-dimensional latent embedding space. This leads to high acquisition times when experts are idle while waiting for the next batch for annotation. We overcome this structural problem by exclusively operating on the low-dimensional probability space, yielding much faster acquisition times without sacrificing label efficiency. In extensive experiments, we show FALCUN’s suitability for diverse use cases, including medical images and tabular data. Compared to state-of-the-art methods like BADGE, CLUE, and AlfaMix, FALCUN consistently excels in quality and speed: while FALCUN is among the fastest methods, it has the highest average label efficiency.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Database Systems and Data Mining AI Lab

[80]

Y. Liu, E. Nie, S. Feng, Z. Hua, Z. Ding, D. Wang, Y. Zhang and H. Schütze.
A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation.
ECML-PKDD 2024 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Vilnius, Lithuania, Sep 09-13, 2024. DOI GitHub

Abstract

Current state-of-the-art dialogue systems heavily rely on extensive training datasets. However, challenges arise in domains where domain-specific training datasets are insufficient or entirely absent. To tackle this challenge, we propose a novel data Augmentation framework for Multi-Domain Dialogue Generation, referred to as AMDG. The AMDG framework consists of a data augmentation process and a two-stage training approach: domain-agnostic training and domain adaptation training. We posit that domain corpora are a blend of domain-agnostic and domain-specific features, with certain representation patterns shared among diverse domains. Domain-agnostic training aims to enable models to learn these common expressive patterns. To construct domain-agnostic dialogue corpora, we employ a de-domaining data processing technique used to remove domain-specific features. By mitigating the effects of domain-specific features, the model trained on the de-domained corpora can effectively learn common expression patterns in different domains. Subsequently, we adapt the learned domain-agnostic features to the target domain through domain adaptation training. We conduct experiments on Chinese dialogue datasets from five different domains and show that AMDG achieves superior performance compared to both direct training on the target domain corpus and collective training on all five domain corpora. Our work underscores AMDG as a viable alternative solution for low-resource multi-domain dialogue generation.

MCML Authors

Yongkang Liu

Dr.

* Former Member

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Zifeng Ding

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[79]

T. Decker, A. Koebler, M. Lebacher, I. Thon, V. Tresp and F. Buettner.
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance.
KDD 2024 - 30th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Barcelona, Spain, Aug 25-29, 2024. DOI

Abstract

Monitoring and maintaining machine learning models are among the most critical challenges in translating recent advances in the field into real-world applications. However, current monitoring methods lack the capability of provide actionable insights answering the question of why the performance of a particular model really degraded. In this work, we propose a novel approach to explain the behavior of a black-box model under feature shifts by attributing an estimated performance change to interpretable input characteristics. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation (XPE). We analyze the underlying assumptions and demonstrate the superiority of our approach over several baselines on different data sets across various data modalities such as images, audio, and tabular data. We also indicate how the generated results can lead to valuable insights, enabling explanatory model monitoring by revealing potential root causes for model deterioration and guiding toward actionable countermeasures.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[78]

T. Liu, I. Škrjanec and V. Demberg.
Temperature-scaling surprisal estimates improve fit to human reading times – but does it do so for the 'right reasons'?
ACL 2024 - 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

A wide body of evidence shows that human language processing difficulty is predicted by the information-theoretic measure surprisal, a word’s negative log probability in context. However, it is still unclear how to best estimate these probabilities needed for predicting human processing difficulty – while a long-standing belief held that models with lower perplexity would provide more accurate estimates of word predictability, and therefore lead to better reading time predictions, recent work has shown that for very large models, psycholinguistic predictive power decreases. One reason could be that language models might be more confident of their predictions than humans, because they have had exposure to several magnitudes more data. In this paper, we test what effect temperature-scaling of large language model (LLM) predictions has on surprisal estimates and their predictive power of reading times of English texts. Firstly, we show that calibration of large language models typically improves with model size, i.e. poorer calibration cannot account for poorer fit to reading times. Secondly, we find that temperature-scaling probabilities lead to a systematically better fit to reading times (up to 89% improvement in delta log likelihood), across several reading time corpora. Finally, we show that this improvement in fit is chiefly driven by words that are composed of multiple subword tokens.

MCML Authors

Tong Liu

Database Systems and Data Mining AI Lab

[77]

T. Decker, A. R. Bhattarai, J. Gu, V. Tresp and F. Buettner.
Provably Better Explanations with Optimized Aggregation of Feature Attributions.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models. Despite the numerous techniques available, individual methods often produce inconsistent and unstable results, putting their overall reliability into question. In this work, we aim to systematically improve the quality of feature attributions by combining multiple explanations across distinct methods or their variations. For this purpose, we propose a novel approach to derive optimal convex combinations of feature attributions that yield provable improvements of desired quality criteria such as robustness or faithfulness to the model behavior. Through extensive experiments involving various model architectures and popular feature attribution techniques, we demonstrate that our combination strategy consistently outperforms individual methods and existing baselines.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[76]

Y. Sun, J. Liu, Z. Wu, Z. Ding, Y. Ma, T. Seidl and V. Tresp.
SA-DQAS: Self-attention Enhanced Differentiable Quantum Architecture Search.
ICML 2024 - Workshop Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators at the 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. PDF

Abstract

We introduce SA-DQAS in this paper, a novel framework that enhances the gradient-based Differentiable Quantum Architecture Search (DQAS) with a self-attention mechanism, aimed at optimizing circuit design for Quantum Machine Learning (QML) challenges. Analogous to a sequence of words in a sentence, a quantum circuit can be viewed as a sequence of placeholders containing quantum gates. Unlike DQAS, each placeholder is independent, while the self-attention mechanism in SA-DQAS helps to capture relation and dependency information among each operation candidate placed on placeholders in a circuit. To evaluate and verify, we conduct experiments on job-shop scheduling problems (JSSP), Max-cut problems, and quantum fidelity. Incorporating self-attention improves the stability and performance of the resulting quantum circuits and refines their structural design with higher noise resilience and fidelity. Our research demonstrates the first successful integration of self-attention with DQAS.

MCML Authors

Yize Sun

Database Systems and Data Mining AI Lab

Zifeng Ding

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[75]

H. Li, C. Shen, P. Torr, V. Tresp and J. Gu.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

Diffusion-based models have gained significant popularity for text-to-image generation due to their exceptional image-generation capabilities. A risk with these models is the potential generation of inappropriate content, such as biased or harmful images. However, the underlying reasons for generating such undesired content from the perspective of the diffusion model’s internal representation remain unclear. Previous work interprets vectors in an interpretable latent space of diffusion models as semantic concepts. However, existing approaches cannot discover directions for arbitrary concepts, such as those related to inappropriate concepts. In this work, we propose a novel self-supervised approach to find interpretable latent directions for a given concept. With the discovered vectors, we further propose a simple approach to mitigate inappropriate generation. Extensive experiments have been conducted to verify the effectiveness of our mitigation approach, namely, for fair generation, safe generation, and responsible text-enhancing generation.

MCML Authors

Hang Li

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[74]

Y. Xia, L. Shi, Z. Ding, J. F. Henriques and D. Cremers.
Text2Loc: 3D Point Cloud Localization from Natural Language.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition, followed by fine localization. In global place recognition, relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM), whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover, we propose a novel matching-free fine localization method to further refine the location predictions, which completely removes the need for complicated text-instance matching and is lighter, faster, and more accurate than previous methods. Extensive experiments show that Text2Loc improves the localization accuracy by up to 2 × over the state-of-the-art on the KITTI360Pose dataset.

MCML Authors

Yan Xia

Dr.

* Former Member

Zifeng Ding

Database Systems and Data Mining AI Lab

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[73]

Z. Ding, H. Cai, J. Wu, Y. Ma, R. Liao, B. Xiong and V. Tresp.
zrLLM: Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

Modeling evolving knowledge over temporal knowledge graphs (TKGs) has become a heated topic. Various methods have been proposed to forecast links on TKGs. Most of them are embedding-based, where hidden representations are learned to represent knowledge graph (KG) entities and relations based on the observed graph contexts. Although these methods show strong performance on traditional TKG forecasting (TKGF) benchmarks, they face a strong challenge in modeling the unseen zero-shot relations that have no prior graph context. In this paper, we try to mitigate this problem as follows. We first input the text descriptions of KG relations into large language models (LLMs) for generating relation representations, and then introduce them into embedding-based TKGF methods. LLM-empowered representations can capture the semantic information in the relation descriptions. This makes the relations, whether seen or unseen, with similar semantic meanings stay close in the embedding space, enabling TKGF models to recognize zero-shot relations even without any observed graph context. Experimental results show that our approach helps TKGF models to achieve much better performance in forecasting the facts with previously unseen relations, while still maintaining their ability in link forecasting regarding seen relations.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Ruotong Liao

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[72]

R. Liao, X. Jia, Y. Li, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
NAACL 2024 - Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024. URL GitHub

Abstract

The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional embedding-based and rule-based methods dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval-augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and few-shot parameter-efficient instruction tuning to solve the above challenges, respectively. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting with low computation resources using extremely limited training data as few as 16 samples. GenTKG also highlights remarkable cross-domain generalizability with outperforming performance on unseen datasets without re-training, and in-domain generalizability regardless of time split in the same dataset. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[71]

Y. Han, Z. Ding, Y. Liu, B. He and V. Tresp.
Critical Path Identification in Supply Chain Knowledge Graphs with Large Language Models.
ESWC 2024 - Extended Semantic Web Conference. Hersonissos, Crete, Greece, May 26-30, 2024. DOI

Abstract

In the ever-evolving landscape of global commerce, supply chain management (SCM) has gained increasing significance. An important task in SCM is to find critical supply chain paths for a target company because these paths often represent potential bottlenecks in supply networks and thus could be crucial to risk management. The mainstream solution to this task requires supply chain managers to manually review supply chain data to uncover critical paths, resulting in considerable human labor costs. To better study SCM, recent efforts have been made to construct supply chain knowledge graphs (KGs) that connect supply chain-related data from different sources, facilitating the identification of critical paths through KG reasoning. In this paper, we develop an automated approach for critical path identification (CPI) based on supply chain KGs. We encode supply chain KGs into text and use large language models (LLMs) for CPI. LLMs can not only analyze the topological KG information but also leverage their world knowledge for better path identification. We experiment with two popular LLMs, i.e., GPT-3.5 and GPT-4, and find that they are able to do CPI and meanwhile generate reasonable explanations.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[70]

S. Chen, Z. Han, B. He, M. Buckley, P. Torr, V. Tresp and J. Gu.
Understanding and Improving In-Context Learning on Vision-language Models.
ME-FoMo @ICLR 2024 - Workshop on Mathematical and Empirical Understanding of Foundation Models at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Recently, in-context learning (ICL) on large language models (LLMs) has received great attention, and this technique can also be applied to vision-language models (VLMs) built upon LLMs. These VLMs can respond to queries by conditioning responses on a series of multimodal demonstrations, which comprise images, queries, and answers. Though ICL has been extensively studied on LLMs, its research on VLMs remains limited. The inclusion of additional visual information in the demonstrations motivates the following research questions: which of the two modalities in the demonstration is more significant? How can we select effective multimodal demonstrations to enhance ICL performance? This study investigates the significance of both visual and language information. Our findings indicate that ICL in VLMs is predominantly driven by the textual information in the demonstrations whereas the visual information in the demonstrations barely affects the ICL performance. Subsequently, we provide an understanding of the findings by analyzing the model information flow and comparing model inner states given different ICL settings. Motivated by our analysis, we propose a simple yet effective approach, termed Mixed Modality In-Context Example Selection (MMICES), which considers both visual and language modalities when selecting demonstrations and shows better ICL performance. Extensive experiments are conducted to support our findings, understanding, and improvement of the ICL performance of VLMs.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[69]

S. Chen, Z. Han, B. He, Z. Ding, W. Yu, P. Torr, V. Tresp and J. Gu.
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
SeT LLM @ICLR 2024 - Workshop on Secure and Trustworthy Large Language Models at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. URL

Abstract

Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Zifeng Ding

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[68]

H. Chen, Y. Zhang, D. Krompass, J. Gu and V. Tresp.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount of data for finetuning. However, collecting and centralizing training data from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as a promising solution, enabling multiple clients to collaboratively train neural networks without centralizing their local data. To alleviate client computation burdens and communication overheads, previous works have adapted Parameter-efficient Finetuning (PEFT) methods for FL. Hereby, only a small fraction of the model parameters are optimized and communicated during federated communications. Nevertheless, most previous works have focused on a single modality and neglected one common phenomenon, i.e., the presence of data heterogeneity across the clients. Therefore, in this work, we propose a finetuning framework tailored to heterogeneous multi-modal FL, called Federated Dual-Aadapter Teacher (FedDAT). Specifically, our approach leverages a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, where FedDAT substantially outperforms the existing centralized PEFT methods adapted for FL.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Yao Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[67]

M. Bernhard, R. Amoroso, Y. Kindermann, M. Schubert, L. Baraldi, R. Cucchiara and V. Tresp.
What's Outside the Intersection? Fine-grained Error Analysis for Semantic Segmentation Beyond IoU.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI GitHub

Abstract

Semantic segmentation represents a fundamental task in computer vision with various application areas such as autonomous driving, medical imaging, or remote sensing. For evaluating and comparing semantic segmentation models, the mean intersection over union (mIoU) is currently the gold standard. However, while mIoU serves as a valuable benchmark, it does not offer insights into the types of errors incurred by a model. Moreover, different types of errors may have different impacts on downstream applications. To address this issue, we propose an intuitive method for the systematic categorization of errors, thereby enabling a fine-grained analysis of semantic segmentation models. Since we assign each erroneous pixel to precisely one error type, our method seamlessly extends the popular IoU-based evaluation by shedding more light on the false positive and false negative predictions. Our approach is model- and dataset-agnostic, as it does not rely on additional information besides the predicted and ground-truth segmentation masks. In our experiments, we demonstrate that our method accurately assesses model strengths and weaknesses on a quantitative basis, thus reducing the dependence on time-consuming qualitative model inspection. We analyze a variety of state-of-the-art semantic segmentation models, revealing systematic differences across various architectural paradigms. Exploiting the gained insights, we showcase that combining two models with complementary strengths in a straightforward way is sufficient to consistently improve mIoU, even for models setting the current state of the art on ADE20K.

MCML Authors

Maximilian Bernhard

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[66]

U. Sahin, H. Li, Q. Khan, D. Cremers and V. Tresp.
Enhancing Multimodal Compositional Reasoning of Visual Language Models With Generative Negative Mining.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI GitHub

Abstract

Contemporary large-scale visual language models (VLMs) exhibit strong representation capacities, making them ubiquitous for enhancing image and text understanding tasks. They are often trained in a contrastive manner on a large and diverse corpus of images and corresponding text captions scraped from the internet. Despite this, VLMs often struggle with compositional reasoning tasks which require a fine-grained understanding of the complex interactions of objects and their attributes. This failure can be attributed to two main factors: 1) Contrastive approaches have traditionally focused on mining negative examples from existing datasets. However, the mined negative examples might not be difficult for the model to discriminate from the positive. An alternative to mining would be negative sample generation 2) But existing generative approaches primarily focus on generating hard negative texts associated with a given image. Mining in the other direction, i.e., generating negative image samples associated with a given text has been ignored. To overcome both these limitations, we propose a framework that not only mines in both directions but also generates challenging negative samples in both modalities, i.e., images and texts. Leveraging these generative hard negative samples, we significantly enhance VLMs’ performance in tasks involving multimodal compositional reasoning.

MCML Authors

Hang Li

* Former Member

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[65]

G. Zhang, Y. Zhang, K. Zhang and V. Tresp.
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI GitHub

Abstract

Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even surpass human capability in reasoning times and location. To address this question, we propose a two-stage Recognition & Reasoning probing task applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the studies, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In extensive evaluation experiments, we find that although VLMs can effectively retain times and location-relevant features in visual encoders, they still fail to make perfect reasoning with context-conditioned visual features.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

2023

[64]

S. Chen, J. Gu, Z. Han, Y. Ma, P. Torr and V. Tresp.
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL GitHub

Abstract

Various adaptation methods, such as LoRA, prompts, and adapters, have been proposed to enhance the performance of pre-trained vision-language models in specific domains. As test samples in real-world applications usually differ from adaptation data, the robustness of these adaptation methods against distribution shifts are essential. In this study, we assess the robustness of 11 widely-used adaptation methods across 4 vision-language datasets under multimodal corruptions. Concretely, we introduce 7 benchmark datasets, including 96 visual and 87 textual corruptions, to investigate the robustness of different adaptation methods, the impact of available adaptation examples, and the influence of trainable parameter size during adaptation. Our analysis reveals that: 1) Adaptation methods are more sensitive to text corruptions than visual corruptions. 2) Full fine-tuning does not consistently provide the highest robustness; instead, adapters can achieve better robustness with comparable clean performance. 3) Contrary to expectations, our findings indicate that increasing the number of adaptation data and parameters does not guarantee enhanced robustness; instead, it results in even lower robustness. We hope this study could benefit future research in the development of robust multimodal adaptation methods.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[63]

R. Liao, X. Jia, Y. Ma and V. Tresp.
GenTKG: Generative Forecasting on Temporal Knowledge Graph.
TGL @NeurIPS 2023 - Workshop Temporal Graph Learning at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

The rapid advancements in large language models (LLMs) have ignited interest in the realm of the temporal knowledge graph (TKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex graph data structure and sequential natural expressions LLMs can handle, and between the enormous data volume of TKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework named GenTKG combining a temporal logical rule-based retrieval strategy and lightweight few-shot parameter-efficient instruction tuning to solve the above challenges. Extensive experiments have shown that GenTKG is a simple but effective, efficient, and generalizable approach that outperforms conventional methods on temporal relational forecasting with extremely limited computation. Our work opens a new frontier for the temporal knowledge graph domain.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[62]

A. Koebler, T. Decker, M. Lebacher, I. Thon, V. Tresp and F. Buettner.
Towards Explanatory Model Monitoring.
XAIA @NeurIPS 2023 - Workshop XAI in Action: Past, Present, and Future Applications at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Monitoring machine learning systems and efficiently recovering their reliability after performance degradation are two of the most critical issues in real-world applications. However, current monitoring strategies lack the capability to provide actionable insights answering the question of why the performance of a particular model really degraded. To address this, we propose Explanatory Performance Estimation (XPE) as a novel method that facilitates more informed model monitoring and maintenance by attributing an estimated performance change to interpretable input features. We demonstrate the superiority of our approach compared to natural baselines on different data sets. We also discuss how the generated results lead to valuable insights that can reveal potential root causes for model deterioration and guide toward actionable countermeasures.

MCML Authors

Thomas Decker

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[61]

G. Zhang, J. Bi, J. Gu, Y. Chen and V. Tresp.
SPOT! Revisiting Video-Language Models for Event Understanding.
Preprint (Dec. 2023). arXiv

Abstract

Understanding videos is an important research topic for multimodal learning. Leveraging large-scale datasets of web-crawled video-text pairs as weak supervision has become a pre-training paradigm for learning joint representations and showcased remarkable potential in video understanding tasks. However, videos can be multi-event and multi-grained, while these video-text pairs usually contain only broad-level video captions. This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events? To address this, we introduce SPOT Prober, to benchmark existing video-language models’s capacities of distinguishing event-level discrepancies as an indicator of models’ event understanding ability. Our approach involves extracting events as tuples (<Subject, Predicate, Object, Attribute, Timestamps>) from videos and generating false event tuples by manipulating tuple components systematically. We reevaluate the existing video-language models with these positive and negative captions and find they fail to distinguish most of the manipulated events. Based on our findings, we propose to plug in these manipulated event captions as hard negative samples and find them effective in enhancing models for event understanding.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[60]

Z. Ding, Z. Li, R. Qi, J. Wu, B. He, Y. Ma, Z. Meng, S. Chen, R. Liao, Z. Han and V. Tresp.
FORECASTTKGQUESTIONS: A Benchmark for Temporal Question Answering and Forecasting over Temporal Knowledge Graphs.
ISWC 2023 - 22nd International Semantic Web Conference. Athens, Greeke, Nov 06-11, 2023. DOI

Abstract

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. Previous related works aim to develop QA systems that answer temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning this period can be fully used for inference. In real-world scenarios, however, it is common that given knowledge until the current instance, we wish the TKGQA systems to answer the questions asking about future. As humans constantly plan the future, building forecasting TKGQA systems is important. In this paper, we propose a novel task: forecasting TKGQA, and propose a coupled large-scale TKGQA benchmark dataset, i.e., FORECASTTKGQUESTIONS. It includes three types of forecasting questions, i.e., entity prediction, yes-unknown, and fact reasoning questions. For every question, a timestamp is annotated and QA models only have access to TKG information prior to it for answer inference. We find that previous TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-unknown and fact reasoning questions. To this end, we propose FORECASTTKGQA, a TKGQA model that employs a TKG forecasting module for future inference. Experiments show that it performs well in forecasting TKGQA.

MCML Authors

Zifeng Ding

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Shuo Chen

Database Systems and Data Mining AI Lab

Ruotong Liao

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[59]

H. Chen, A. Frikha, D. Krompass, J. Gu and V. Tresp.
FRAug: Tackling Federated Learning with Non-IID Features via Representation Augmentation.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

Federated Learning (FL) is a decentralized machine learning paradigm, in which multiple clients collaboratively train neural networks without centralizing their local data, and hence preserve data privacy. However, real-world FL applications usually encounter challenges arising from distribution shifts across the local datasets of individual clients. These shifts may drift the global model aggregation or result in convergence to deflected local optimum. While existing efforts have addressed distribution shifts in the label space, an equally important challenge remains relatively unexplored. This challenge involves situations where the local data of different clients indicate identical label distributions but exhibit divergent feature distributions. This issue can significantly impact the global model performance in the FL framework. In this work, we propose Federated Representation Augmentation (FRAug) to resolve this practical and challenging problem. FRAug optimizes a shared embedding generator to capture client consensus. Its output synthetic embeddings are transformed into client-specific by a locally optimized RTNet to augment the training space of each client. Our empirical evaluation on three public benchmarks and a real-world medical dataset demonstrates the effectiveness of the proposed method, which substantially outperforms the current state-of-the-art FL methods for feature distribution shifts, including PartialFed and FedBN.

MCML Authors

Haokun Chen

Database Systems and Data Mining AI Lab

Ahmed Frikha

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[58]

H. Li, J. Gu, R. Koner, S. Sharifzadeh and V. Tresp.
Do DALL-E and Flamingo Understand Each Other?
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI GitHub

Abstract

The field of multimodal research focusing on the comprehension and creation of both images and text has witnessed significant strides. This progress is exemplified by the emergence of sophisticated models dedicated to image captioning at scale, such as the notable Flamingo model and text-to-image generative models, with DALL-E serving as a prominent example. An interesting question worth exploring in this domain is whether Flamingo and DALL-E understand each other. To study this question, we propose a reconstruction task where Flamingo generates a description for a given image and DALL-E uses this description as input to synthesize a new image. We argue that these models understand each other if the generated image is similar to the given image. Specifically, we study the relationship between the quality of the image reconstruction and that of the text generation. We find that an optimal description of an image is one that gives rise to a generated image similar to the original one. The finding motivates us to propose a unified framework to finetune the text-to-image and image-to-text models. Concretely, the reconstruction part forms a regularization loss to guide the tuning of the models. Extensive experiments on multiple datasets with different image captioning and image generation models validate our findings and demonstrate the effectiveness of our proposed unified framework. As DALL-E and Flamingo are not publicly available, we use Stable Diffusion and BLIP in the remaining work.

MCML Authors

Hang Li

* Former Member

Rajat Koner

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[57]

G. Zhang, J. Ren, J. Gu and V. Tresp.
Multi-event Video-Text Retrieval.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI GitHub

Abstract

Video-Text Retrieval (VTR) is a crucial multi-modal task in an era of massive video-text data on the Internet. A plethora of work characterized by using a two-stream Vision-Language model architecture that learns a joint representation of video-text pairs has become a prominent approach for the VTR task. However, these models operate under the assumption of bijective video-text correspondences and neglect a more practical scenario where video content usually encompasses multiple events, while texts like user queries or webpage metadata tend to be specific and correspond to single events. This establishes a gap between the previous training objective and real-world applications, leading to the potential performance degradation of earlier models during inference. In this study, we introduce the Multi-event Video-Text Retrieval (MeVTR) task, addressing scenarios in which each video contains multiple different events, as a niche scenario of the conventional Video-Text Retrieval Task. We present a simple model, Me-Retriever, which incorporates key event video representation and a new MeVTR loss for the MeVTR task. Comprehensive experiments show that this straightforward framework outperforms other models in the Video-to-Text and Text-to-Video tasks, effectively establishing a robust baseline for the MeVTR task. We believe this work serves as a strong foundation for future studies.

MCML Authors

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[56]

Y. Shen, R. Liao, Z. Han, Y. Ma and V. Tresp.
GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models.
Preprint (Oct. 2023). arXiv

Abstract

While multi-modal models have successfully integrated information from image, video, and audio modalities, integrating graph modality into large language models (LLMs) remains unexplored. This discrepancy largely stems from the inherent divergence between structured graph data and unstructured text data. Incorporating graph knowledge provides a reliable source of information, enabling potential solutions to address issues in text generation, e.g., hallucination, and lack of domain knowledge. To evaluate the integration of graph knowledge into language models, a dedicated dataset is needed. However, there is currently no benchmark dataset specifically designed for multimodal graph-language models. To address this gap, we propose GraphextQA, a question answering dataset with paired subgraphs, retrieved from Wikidata, to facilitate the evaluation and future development of graph-language models. Additionally, we introduce a baseline model called CrossGNN, which conditions answer generation on the paired graphs by cross-attending question-aware graph features at decoding. The proposed dataset is designed to evaluate graph-language models’ ability to understand graphs and make use of it for answer generation. We perform experiments with language-only models and the proposed graph-language model to validate the usefulness of the paired graphs and to demonstrate the difficulty of the task.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[55]

Z. Ding, J. Wu, Z. Li, Y. Ma and V. Tresp.
Improving Few-Shot Inductive Learning on Temporal Knowledge Graphs Using Confidence-Augmented Reinforcement Learning.
ECML-PKDD 2023 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Turin, Italy, Sep 18-22, 2023. DOI GitHub

Abstract

Temporal knowledge graph completion (TKGC) aims to predict the missing links among the entities in a temporal knowledge graph (TKG). Most previous TKGC methods only consider predicting the missing links among the entities seen in the training set, while they are unable to achieve great performance in link prediction concerning newly-emerged unseen entities. Recently, a new task, i.e., TKG few-shot out-of-graph (OOG) link prediction, is proposed, where TKGC models are required to achieve great link prediction performance concerning newly-emerged entities that only have few-shot observed examples. In this work, we propose a TKGC method FITCARL that combines few-shot learning with reinforcement learning to solve this task. In FITCARL, an agent traverses through the whole TKG to search for the prediction answer. A policy network is designed to guide the search process based on the traversed path. To better address the data scarcity problem in the few-shot setting, we introduce a module that computes the confidence of each candidate action and integrate it into the policy for action selection. We also exploit the entity concept information with a novel concept regularizer to boost model performance. Experimental results show that FITCARL achieves stat-of-the-art performance on TKG few-shot OOG link prediction.

MCML Authors

Zifeng Ding

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[54]

Z. Liu, Y. Ma, H. Li, M. Hildebrandt, Y. Ouyang and Z. Xiong.
Debiased Contrastive Loss for Collaborative Filtering.
KSEM 2023 - 16th International Conference Knowledge Science, Engineering and Management. Guangzhou, China, Aug 16-18, 2023. DOI

Abstract

Collaborative filtering (CF) is the most fundamental technique in recommender systems, which reveals user preference by implicit feedback. Generally, binary cross-entropy or bayesian personalized ranking are usually employed as the loss function to optimize model parameters. Recently, the sampled softmax loss has been proposed to enhance the sampling efficiency, which adopts an in-batch sample strategy. However, it suffers from the sample bias issue, which unavoidably introduces false negative instances, resulting inaccurate representations of users’ genuine interests. To address this problem, we propose a debiased contrastive loss, incorporating a bias correction probability to alleviate the sample bias. We integrate the proposed method into several matrix factorizations (MF) and graph neural network-based (GNN) recommendation models. Besides, we theoretically analyze the effectiveness of our methods in automatically mining the hard negative instances. Experimental results on three public benchmarks demonstrate that the proposed debiased contrastive loss can augment several existing MF and GNN-based CF models and outperform popular learning objectives in the recommendation. Additionally, we demonstrate that our method substantially enhances training efficiency.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[53]

A. Giovagnoli, Y. Ma, M. Schubert and V. Tresp.
QNEAT: Natural Evolution of Variational Quantum Circuit Architecture.
GECCO 2023 - Genetic and Evolutionary Computation Conference. Lisbon, Portugal, Jul 15-19, 2023. DOI

Abstract

Quantum Machine Learning (QML) is a recent and rapidly evolving field where the theoretical framework and logic of quantum mechanics is employed to solve machine learning tasks. A variety of techniques that have a different level of quantum-classical hybridization has been presented. Here we focus on variational quantum circuits (VQC), which emerged as the most promising candidates for the quantum counterpart of neural networks in the noisy intermediate-scale quantum (NISQ) era. Although showing promising results, VQCs can be hard to train because of different issues e.g. barren plateau, periodicity of the weights or choice of the architecture. In this paper we focus on this last problem and in order to address it we propose a gradient free algorithm inspired by natural evolution to optimise both the weights and the architecture of the VQC. In particular, we present a version of the well known neuroevolution of augmenting topologies (NEAT) algorithm adapted to the case of quantum variational circuits. We test the algorithm with different benchmark problems of classical fields of machine learning i.e. reinforcement learning and optimization.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[52]

M. Fromm, M. Berrendorf, E. Faerman and T. Seidl.
Cross-Domain Argument Quality Estimation.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI GitHub

Abstract

Argumentation is one of society’s foundational pillars, and, sparked by advances in NLP, and the vast availability of text data, automated mining of arguments receives increasing attention. A decisive property of arguments is their strength or quality. While there are works on the automated estimation of argument strength, their scope is narrow:They focus on isolated datasets and neglect the interactions with related argument-mining tasks, such as argument identification and evidence detection. In this work, we close this gap by approaching argument quality estimation from multiple different angles:Grounded on rich results from thorough empirical evaluations, we assess the generalization capabilities of argument quality estimation across diverse domains and the interplay with related argument mining tasks. We find that generalization depends on a sufficient representation of different domains in the training part. In zero-shot transfer and multi-task experiments, we reveal that argument quality is among the more challenging tasks but can improve others.

MCML Authors

Michael Fromm

Dr.

* Former Member

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[51]

Z. Han, R. Liao, J. Gu, Y. Zhang, Z. Ding, Y. Gu, H. Köppl, H. Schütze and V. Tresp.
ECOLA: Enhancing Temporal Knowledge Embeddings with Contextualized Language Representations.
ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts. However, existing enhancement approaches cannot apply to temporal knowledge graphs (tKGs), which contain time-dependent event knowledge with complex temporal dynamics. Specifically, existing enhancement approaches often assume knowledge embedding is time-independent. In contrast, the entity embedding in tKG models usually evolves, which poses the challenge of aligning temporally relevant texts with entities. To this end, we propose to study enhancing temporal knowledge embedding with textual data in this paper. As an approach to this task, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which takes the temporal aspect into account and injects textual information into temporal knowledge embedding. To evaluate ECOLA, we introduce three new datasets for training and evaluating ECOLA. Extensive experiments show that ECOLA significantly enhances temporal KG embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task.

MCML Authors

Ruotong Liao

Database Systems and Data Mining AI Lab

Yao Zhang

Database Systems and Data Mining AI Lab

Zifeng Ding

Database Systems and Data Mining AI Lab

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[50]

J. Gu, Z. Han, S. Chen, A. Beirami, B. He, G. Zhang, R. Liao, Y. Qin, V. Tresp and P. Torr.
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
Preprint (Jul. 2023). arXiv

Abstract

Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. In past years, Prompt engineering has been well-studied in natural language processing. Recently, it has also been intensively studied in vision-language modeling. However, there is currently a lack of a systematic overview of prompt engineering on pre-trained vision-language models. This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e.g. Flamingo), image-text matching models (e.g. CLIP), and text-to-image generation models (e.g. Stable Diffusion). For each type of model, a brief model summary, prompting methods, prompting-based applications, and the corresponding responsibility and integrity issues are summarized and discussed. Furthermore, the commonalities and differences between prompting on vision-language models, language models, and vision models are also discussed. The challenges, future directions, and research opportunities are summarized to foster future research on this topic.

MCML Authors

Shuo Chen

Database Systems and Data Mining AI Lab

Gengyuan Zhang

Database Systems and Data Mining AI Lab

Ruotong Liao

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[49]

D. Winkel, N. Strauß, M. Schubert, Y. Ma and T. Seidl.
Constrained Portfolio Management using Action Space Decomposition for Reinforcement Learning.
PAKDD 2023 - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka, Japan, May 25-28, 2023. DOI

Abstract

Financial portfolio managers typically face multi-period optimization tasks such as short-selling or investing at least a particular portion of the portfolio in a specific industry sector. A common approach to tackle these problems is to use constrained Markov decision process (CMDP) methods, which may suffer from sample inefficiency, hyperparameter tuning, and lack of guarantees for constraint violations. In this paper, we propose Action Space Decomposition Based Optimization (ADBO) for optimizing a more straightforward surrogate task that allows actions to be mapped back to the original task. We examine our method on two real-world data portfolio construction tasks. The results show that our new approach consistently outperforms state-of-the-art benchmark approaches for general CMDPs.

MCML Authors

David Winkel

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[48]

Z. Liu, Y. Ma, M. Schubert, Y. Ouyang, W. Rong and Z. Xiong.
Multimodal Contrastive Transformer for Explainable Recommendation.
IEEE Transactions on Computational Social Systems (May. 2023). DOI

Abstract

Explanations play an essential role in helping users evaluate results from recommender systems. Various natural language generation methods have been proposed to generate explanations for the recommendation. However, they usually suffer from two problems. First, since user-provided review text contains noisy data, the generated explanations may be irrelevant to the recommended items. Second, as lacking some supervision signals, most of the generated sentences are similar, which cannot meet the diversity and personalized needs of users. To tackle these problems, we propose a multimodal contrastive transformer (MMCT) model for an explainable recommendation, which incorporates multimodal information into the learning process, including sentiment features, item features, item images, and refined user reviews. Meanwhile, we propose a dynamic fusion mechanism during the decoding stage, which generates supervision signals to guide the explanation generation. Additionally, we develop a contrastive objective to generate diverse explainable texts. Comprehensive experiments on two real-world datasets show that the proposed model outperforms comparable explainable recommendation baselines in terms of explanation performance and recommendation performance. Efficiency analysis and robustness analysis verify the advantages of the proposed model. While ablation analysis establishes the relative contributions of the respective components and various modalities, the case study shows the working of our model from an intuitive sense.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[47]

R. Koner, T. Hannan, S. Shit, S. Sharifzadeh, M. Schubert, T. Seidl and V. Tresp.
InstanceFormer: An Online Video Instance Segmentation Framework.
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI GitHub

Abstract

Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transformer-based efficient online VIS framework named InstanceFormer, which is especially suitable for long and challenging videos. We propose three novel components to model short-term and long-term dependency and temporal coherence. First, we propagate the representation, location, and semantic information of prior instances to model short-term changes. Second, we propose a novel memory cross-attention in the decoder, which allows the network to look into earlier instances within a certain temporal window. Finally, we employ a temporal contrastive loss to impose coherence in the representation of an instance across all frames. Memory attention and temporal coherence are particularly beneficial to long-range dependency modeling, including challenging scenarios like occlusion. The proposed InstanceFormer outperforms previous online benchmark methods by a large margin across multiple datasets. Most importantly, InstanceFormer surpasses offline approaches for challenging and long datasets such as YouTube-VIS-2021 and OVIS.

MCML Authors

Rajat Koner

Database Systems and Data Mining AI Lab

Tanveer Hannan

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

2022

[46]

M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, M. Galkin, S. Sharifzadeh, A. Fischer, V. Tresp and J. Lehmann.
Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models under a Unified Framework.
IEEE Transactions on Pattern Analysis and Machine Intelligence 44.12 (Dec. 2022). DOI GitHub

Abstract

The heterogeneity in recently published knowledge graph embedding models’ implementations, training, and evaluation has made fair and thorough comparisons difficult. To assess the reproducibility of previously published results, we re-implemented and evaluated 21 models in the PyKEEN software package. In this paper, we outline which results could be reproduced with their reported hyper-parameters, which could only be reproduced with alternate hyper-parameters, and which could not be reproduced at all, as well as provide insight as to why this might be the case. We then performed a large-scale benchmarking on four datasets with several thousands of experiments and 24,804 GPU hours of computation time. We present insights gained as to best practices, best configurations for each model, and where improvements could be made over previously published best configurations. Our results highlight that the combination of model architecture, training approach, loss function, and the explicit modeling of inverse relations is crucial for a model’s performance and is not only determined by its architecture. We provide evidence that several architectures can obtain results competitive to the state of the art when configured carefully.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[45]

S. Gilhuber, P. Jahn, Y. Ma and T. Seidl.
VERIPS: Verified Pseudo-label Selection for Deep Active Learning.
ICDM 2022 - 22nd IEEE International Conference on Data Mining. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI GitHub

Abstract

Active learning has the power to significantly reduce the amount of labeled data needed to build strong classifiers. Existing active pseudo-labeling methods show high potential in integrating pseudo-labels within the active learning loop but heavily depend on the prediction accuracy of the model. In this work, we propose VERIPS, an algorithm that significantly outperforms existing pseudo-labeling techniques for active learning. At its core, VERIPS uses a pseudo-label verification mechanism that consists of a second network only trained on data approved by the oracle and helps to discard questionable pseudo-labels. In particular, the verifier model eliminates all pseudo-labels for which it disagrees with the actual task model. VERIPS overcomes the problems of poorly performing initial models, e.g., due to imbalanced or too small initial pools, where previous methods select too many incorrect pseudo-labels and recovering takes long or is not possible. Moreover, VERIPS is particularly insensitive to parameter choices that existing approaches suffer from.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Philipp Jahn

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[44]

N. Strauß, M. Berrendorf, T. Haider and M. Schubert.
A Comparison of Ambulance Redeployment Systems on Real-World Data.
ICDMW 2022 - IEEE International Conference on Data Mining Workshops. Orlando, FL, USA, Nov 30-Dec 02, 2022. DOI GitHub

Abstract

Modern Emergency Medical Services (EMS) benefit from real-time sensor information in various ways as they provide up-to-date location information and help assess current local emergency risks. A critical part of EMS is dynamic ambulance redeployment, i.e., the task of assigning idle ambulances to base stations throughout a community. Although there has been a considerable effort on methods to optimize emergency response systems, a comparison of proposed methods is generally difficult as reported results are mostly based on artificial and proprietary test beds. In this paper, we present a benchmark simulation environment for dynamic ambulance redeployment based on real emergency data from the city of San Francisco. Our proposed simulation environment is highly scalable and is compatible with modern reinforcement learning frameworks. We provide a comparative study of several state-of-the-art methods for various metrics. Results indicate that even simple baseline algorithms can perform considerably well in close-to-realistic settings.

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

Max Berrendorf

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[43]

A. Frikha.
Deep knowledge transfer for generalization across tasks and domains under data scarcity: on intersections of anomaly detection, few-shot learning, continual learning, domain generalization and data-free learning.
Dissertation 2022. DOI

Abstract

This thesis addresses challenges in deep learning when key assumptions, such as abundant data or i.i.d. conditions, are violated. It introduces methods for anomaly detection with scarce data, enabling models to learn sequential tasks with minimal forgetting. For domain generalization, it proposes a feature-discovery algorithm that enhances generalization to unseen domains and a data-free approach to create robust models by synthesizing cross-domain knowledge from pre-trained models. These contributions advance deep learning for complex real-world scenarios.(Shortened).

MCML Authors

Ahmed Frikha

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[42]

E. Pretzsch, V. Heinemann, S. Stintzing, A. Bender, S. Chen, J. W. Holch, F. O. Hofmann, H. Ren, F. Böschand, H. Küchenhoff, J. Werner and M. K. Angele.
EMT-Related Genes Have No Prognostic Relevance in Metastatic Colorectal Cancer as Opposed to Stage II/III: Analysis of the Randomised, Phase III Trial FIRE-3 (AIO KRK 0306; FIRE-3).
Cancers 14.22 (Nov. 2022). DOI

Abstract

Despite huge advances in local and systemic therapies, the 5-year relative survival rate for patients with metastatic CRC is still low. To avoid over- or undertreatment, proper risk stratification with regard to treatment strategy is highly needed. As EMT (epithelial-mesenchymal transition) is a major step in metastatic spread, this study analysed the prognostic effect of EMT-related genes in stage IV colorectal cancer patients using the study cohort of the FIRE-3 trial, an open-label multi-centre randomised controlled phase III trial of stage IV colorectal cancer patients. Overall, the prognostic relevance of EMT-related genes seems stage-dependent. EMT-related genes have no prognostic relevance in stage IV CRC as opposed to stage II/III.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Shuo Chen

C4 | Computational Social Sciences

Database Systems and Data Mining AI Lab

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[41]

S. Shit, R. Koner, B. Wittmann, J. C. Paetzold, I. Ezhov, H. Li, J. Pan, S. Sharifzadeh, G. Kaissis, V. Tresp and B. Menze.
Relationformer: A Unified Framework for Image-to-Graph Generation.
ECCV 2022 - 17th European Conference on Computer Vision. Tel Aviv, Israel, Oct 23-27, 2022. DOI GitHub

Abstract

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach’s effectiveness and generalizability.

MCML Authors

Rajat Koner

Database Systems and Data Mining AI Lab

Georgios Kaissis

Dr.

* Former Principal Investigator

Volker Tresp

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

[40]

C. M. M. Frey, Y. Ma and M. Schubert.
SEA: Graph Shell Attention in Graph Neural Networks.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

A common problem in Graph Neural Networks (GNNs) is known as over-smoothing. By increasing the number of iterations within the message-passing of GNNs, the nodes’ representations of the input graph align and become indiscernible. The latest models employing attention mechanisms with Graph Transformer Layers (GTLs) are still restricted to the layer-wise computational workflow of a GNN that are not beyond preventing such effects. In our work, we relax the GNN architecture by means of implementing a routing heuristic. Specifically, the nodes’ representations are routed to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from the central node. We call this procedure Graph textbf{S}htextbf{e}ll textbf{A}ttention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Intuitively, by increasing the number of experts, the models gain in expressiveness such that a node’s representation is solely based on nodes that are located within the receptive field of an expert. We evaluate our architecture on various benchmark datasets showing competitive results while drastically reducing the number of parameters compared to state-of-the-art models.

MCML Authors

Christian Frey

Dr.

* Former Member

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

[39]

N. Strauß, D. Winkel, M. Berrendorf and M. Schubert.
Reinforcement Learning for Multi-Agent Stochastic Resource Collection.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

Stochastic Resource Collection (SRC) describes tasks where an agent tries to collect a maximal amount of dynamic resources while navigating through a road network. An instance of SRC is the traveling officer problem (TOP), where a parking officer tries to maximize the number of fined parking violations. In contrast to vehicular routing problems, in SRC tasks, resources might appear and disappear by an unknown stochastic process, and thus, the task is inherently more dynamic. In most applications of SRC, such as TOP, covering realistic scenarios requires more than one agent. However, directly applying multi-agent approaches to SRC yields challenges considering temporal abstractions and inter-agent coordination. In this paper, we propose a novel multi-agent reinforcement learning method for the task of Multi-Agent Stochastic Resource Collection (MASRC). To this end, we formalize MASRC as a Semi-Markov Game which allows the use of temporal abstraction and asynchronous actions by various agents. In addition, we propose a novel architecture trained with independent learning, which integrates the information about collaborating agents and allows us to take advantage of temporal abstractions. Our agents are evaluated on the multiple traveling officer problem, an instance of MASRC where multiple officers try to maximize the number of fined parking violations. Our simulation environment is based on real-world sensor data. Results demonstrate that our proposed agent can beat various state-of-the-art approaches.

MCML Authors

Niklas Strauß

Dr.

Spatial Artificial Intelligence

David Winkel

Database Systems and Data Mining AI Lab

Max Berrendorf

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[38]

S. Gilhuber, M. Berrendorf, Y. Ma and T. Seidl.
Accelerating Diversity Sampling for Deep Active Learning By Low-Dimensional Representations.
IAL @ECML-PKDD 2022 - 6th International Workshop on Interactive Adaptive Learning at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-23, 2022. PDF GitHub

Abstract

Selecting diverse instances for annotation is one of the key factors of successful active learning strategies. To this end, existing methods often operate on high-dimensional latent representations. In this work, we propose to use the low-dimensional vector of predicted probabilities instead, which can be seamlessly integrated into existing methods. We empirically demonstrate that this considerably decreases the query time, i.e., time to select an instance for annotation, while at the same time improving results. Low query times are relevant for active learning researchers, which use a (fast) oracle for simulated annotation and thus are often constrained by query time. It is also practically relevant when dealing with complex annotation tasks for which only a small pool of skilled domain experts is available for annotation with a limited time budget.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Max Berrendorf

Dr.

* Former Member

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[37]

Z. Ding, Z. Li, R. Qi, J. Wu, B. He, Y. Ma, Z. Meng, S. Chen, R. Liao, Z. Han and V. Tresp.
Forecasting Question Answering over Temporal Knowledge Graphs.
Preprint (Aug. 2022). arXiv

Abstract

Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. TKGQA requires temporal reasoning techniques to extract the relevant information from temporal knowledge bases. The only existing TKGQA dataset, i.e., CronQuestions, consists of temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning the same period can be fully used for answer inference, allowing the TKGQA models to use even the future knowledge to answer the questions based on the past facts. In real-world scenarios, however, it is also common that given the knowledge until now, we wish the TKGQA systems to answer the questions asking about the future. As humans constantly seek plans for the future, building TKGQA systems for answering such forecasting questions is important. Nevertheless, this has still been unexplored in previous research. In this paper, we propose a novel task: forecasting question answering over temporal knowledge graphs. We also propose a large-scale TKGQA benchmark dataset, i.e., ForecastTKGQuestions, for this task. It includes three types of questions, i.e., entity prediction, yes-no, and fact reasoning questions. For every forecasting question in our dataset, QA models can only have access to the TKG information before the timestamp annotated in the given question for answer inference. We find that the state-of-the-art TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-no questions and fact reasoning questions. To this end, we propose ForecastTKGQA, a TKGQA model that employs a TKG forecasting module for future inference, to answer all three types of questions. Experimental results show that ForecastTKGQA outperforms recent TKGQA methods on the entity prediction questions, and it also shows great effectiveness in answering the other two types of questions.

MCML Authors

Zifeng Ding

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Zongyue Li

Spatial Artificial Intelligence

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Shuo Chen

Database Systems and Data Mining AI Lab

Ruotong Liao

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[36]

M. Ali, M. Berrendorf, M. Galkin, V. Thost, T. Ma, V. Tresp and J. Lehmann.
Improving Inductive Link Prediction Using Hyper-Relational Facts (Extended Abstract).
IJCAI-ECAI 2022 - Best paper track at the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence. Vienna, Austria, Jul 23-29, 2022. DOI

Abstract

For many years, link prediction on knowledge graphs (KGs) has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based KGs, whereas their richer counterparts, hyper-relational KGs (e.g., Wikidata), have not yet been properly studied. In this work, we classify different inductive settings and study the benefits of employing hyper-relational KGs on a wide range of semi- and fully inductive link prediction tasks powered by recent advancements in graph neural networks. Our experiments on a novel set of benchmarks show that qualifiers over typed edges can lead to performance improvements of 6% of absolute gains (for the Hits@10 metric) compared to triple-only baselines.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[35]

H. Li, Q. Khan, V. Tresp and D. Cremers.
Biologically Inspired Neural Path Finding.
BI 2022 - 15th International Conference on Brain Informatics. Padova, Italy, Jul 15-15, 2022. DOI GitHub

Abstract

The human brain can be considered to be a graphical structure comprising of tens of billions of biological neurons connected by synapses. It has the remarkable ability to automatically re-route information flow through alternate paths, in case some neurons are damaged. Moreover, the brain is capable of retaining information and applying it to similar but completely unseen scenarios. In this paper, we take inspiration from these attributes of the brain to develop a computational framework to find the optimal low cost path between a source node and a destination node in a generalized graph. We show that our framework is capable of handling unseen graphs at test time. Moreover, it can find alternate optimal paths, when nodes are arbitrarily added or removed during inference, while maintaining a fixed prediction time.

MCML Authors

Hang Li

* Former Member

Qadeer Khan

Computer Vision & Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[34]

Z. Liu, Y. Ma, M. Hildebrandt, Y. Ouyang and Z. Xiong.
CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations.
Knowledge and Information Systems 64 (Jul. 2022). DOI

Abstract

Sequential recommendations play a crucial role in many real-world applications. Due to the sequential nature, reinforcement learning has been employed to iteratively produce recommendations based on an observed stream of user behavior. In this setting, a recommendation agent interacts with the environments (users) by sequentially recommending items (actions) to maximize users’ overall long-term cumulative rewards. However, most reinforcement learning-based recommendation models only focus on extrinsic rewards based on user feedback, leading to sub-optimal policies if user-item interactions are sparse and fail to obtain the dynamic rewards based on the users’ preferences. As a remedy, we propose a dynamic intrinsic reward signal integrated with a contrastive discriminator-augmented reinforcement learning framework. Concretely, our framework contains two modules: (1) a contrastive learning module is employed to learn the representation of item sequences; (2) an intrinsic reward learning function to imitate the user’s internal dynamics. Furthermore, we combine static extrinsic reward and dynamic intrinsic reward to train a sequential recommender system based on double Q-learning. We integrate our framework with five representative sequential recommendation models. Specifically, our framework augments these recommendation models with two output layers: the supervised layer that applies cross-entropy loss to perform ranking and the other for reinforcement learning. Experimental results on two real-world datasets demonstrate that the proposed framework outperforms several sequential recommendation baselines and exploration with intrinsic reward baselines.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[33]

Z. Liu, Y. Ma, M. Schubert, Y. Ouyang and Z. Xiong.
Multi-Modal Contrastive Pre-training for Recommendation.
ICMR 2022 - ACM International Conference on Multimedia Retrieval. Newark, NJ, USA, Jun 27-30, 2022. DOI

Abstract

Personalized recommendation plays a central role in various online applications. To provide quality recommendation service, it is of crucial importance to consider multi-modal information associated with users and items, e.g., review text, description text, and images. However, many existing approaches do not fully explore and fuse multiple modalities. To address this problem, we propose a multi-modal contrastive pre-training model for recommendation. We first construct a homogeneous item graph and a user graph based on the relationship of co-interaction. For users, we propose intra-modal aggregation and inter-modal aggregation to fuse review texts and the structural information of the user graph. For items, we consider three modalities: description text, images, and item graph. Moreover, the description text and image complement each other for the same item. One of them can be used as promising supervision for the other. Therefore, to capture this signal and better exploit the potential correlation of intra-modalities, we propose a self-supervised contrastive inter-modal alignment task to make the textual and visual modalities as similar as possible. Then, we apply inter-modal aggregation to obtain the multi-modal representation of items. Next, we employ a binary cross-entropy loss function to capture the potential correlation between users and items. Finally, we fine-tune the pre-trained multi-modal representations using an existing recommendation model. We have performed extensive experiments on three real-world datasets. Experimental results verify the rationality and effectiveness of the proposed method.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[32]

G. Fu, Z. Meng, Z. Han, Z. Ding, Y. Ma, M. Schubert, V. Tresp and R. Wattenhofer.
TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion.
SPNLP @ACL 2022 - 6th ACL Workshop on Structured Prediction for NLP at the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). Dublin, Ireland, May 22-27, 2022. DOI

Abstract

Temporal knowledge graphs store the dynamics of entities and relations during a time period. However, typical temporal knowledge graphs often suffer from incomplete dynamics with missing facts in real-world scenarios. Hence, modeling temporal knowledge graphs to complete the missing facts is important. In this paper, we tackle the temporal knowledge graph completion task by proposing TempCaps, which is a Capsule network-based embedding model for Temporal knowledge graph completion. TempCaps models temporal knowledge graphs by introducing a novel dynamic routing aggregator inspired by Capsule Networks. Specifically, TempCaps builds entity embeddings by dynamically routing retrieved temporal relation and neighbor information. Experimental results demonstrate that TempCaps reaches state-of-the-art performance for temporal knowledge graph completion. Additional analysis also shows that TempCaps is efficient.

MCML Authors

Zifeng Ding

Database Systems and Data Mining AI Lab

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[31]

D. Alivanistos, M. Berrendorf, M. Cochez and M. Galkin.
Query Embedding on Hyper-Relational Knowledge Graphs.
ICLR 2022 - 10th International Conference on Learning Representations. Virtual, Apr 25-29, 2022. URL GitHub

Abstract

Multi-hop logical reasoning is an established problem in the field of representation learning on knowledge graphs (KGs). It subsumes both one-hop link prediction as well as other more complex types of logical queries. Existing algorithms operate only on classical, triple-based graphs, whereas modern KGs often employ a hyper-relational modeling paradigm. In this paradigm, typed edges may have several key-value pairs known as qualifiers that provide fine-grained context for facts. In queries, this context modifies the meaning of relations, and usually reduces the answer set. Hyper-relational queries are often observed in real-world KG applications, and existing approaches for approximate query answering cannot make use of qualifier pairs. In this work, we bridge this gap and extend the multi-hop reasoning problem to hyper-relational KGs allowing to tackle this new type of complex queries. Building upon recent advancements in Graph Neural Networks and query embedding techniques, we study how to embed and answer hyper-relational conjunctive queries. Besides that, we propose a method to answer such queries and demonstrate in our experiments that qualifiers improve query answering on a diverse set of query patterns.

MCML Authors

Max Berrendorf

Dr.

* Former Member

[30]

M. Galkin, M. Berrendorf and C. T. Hoyt.
An Open Challenge for Inductive Link Prediction on Knowledge Graphs.
GLB @WWW 2022 - Workshop on Graph Learning Benchmarks at the International World Wide Web Conference (WWW 2022). Virtual, Apr 22-29, 2022. arXiv GitHub

Abstract

An emerging trend in representation learning over knowledge graphs (KGs) moves beyond transductive link prediction tasks over a fixed set of known entities in favor of inductive tasks that imply training on one graph and performing inference over a new graph with unseen entities. In inductive setups, node features are often not available and training shallow entity embedding matrices is meaningless as they cannot be used at inference time with unseen entities. Despite the growing interest, there are not enough benchmarks for evaluating inductive representation learning methods. In this work, we introduce ILPC 2022, a novel open challenge on KG inductive link prediction. To this end, we constructed two new datasets based on Wikidata with various sizes of training and inference graphs that are much larger than existing inductive benchmarks. We also provide two strong baselines leveraging recently proposed inductive methods. We hope this challenge helps to streamline community efforts in the inductive graph representation learning area. ILPC 2022 follows best practices on evaluation fairness and reproducibility.

MCML Authors

Max Berrendorf

Dr.

* Former Member

[29]

C. T. Hoyt, M. Berrendorf, M. Gaklin, V. Tresp and B. M. Gyori.
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs.
GLB @WWW 2022 - Workshop on Graph Learning Benchmarks at the International World Wide Web Conference (WWW 2022). Virtual, Apr 22-29, 2022. arXiv

Abstract

The link prediction task on knowledge graphs without explicit negative triples in the training data motivates the usage of rank-based metrics. Here, we review existing rank-based metrics and propose desiderata for improved metrics to address lack of interpretability and comparability of existing metrics to datasets of different sizes and properties. We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory. We finally propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[28]

Y. Liu, Y. Ma, M. Hildebrandt, M. Joblin and V. Tresp.
TLogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs.
AAAI 2022 - 36th Conference on Artificial Intelligence. Virtual, Feb 22-Mar 01, 2022. DOI

Abstract

Conventional static knowledge graphs model entities in relational data as nodes, connected by edges of specific relation types. However, information and knowledge evolve continuously, and temporal dynamics emerge, which are expected to influence future situations. In temporal knowledge graphs, time information is integrated into the graph by equipping each edge with a timestamp or a time range. Embedding-based methods have been introduced for link prediction on temporal knowledge graphs, but they mostly lack explainability and comprehensible reasoning chains. Particularly, they are usually not designed to deal with link forecasting – event prediction involving future timestamps. We address the task of link forecasting on temporal knowledge graphs and introduce TLogic, an explainable framework that is based on temporal logical rules extracted via temporal random walks. We compare TLogic with state-of-the-art baselines on three benchmark datasets and show better overall performance while our method also provides explanations that preserve time consistency. Furthermore, in contrast to most state-of-the-art embedding-based methods, TLogic works well in the inductive setting where already learned rules are transferred to related datasets with a common vocabulary.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[27]

S. Sharifzadeh, S. M. Baharlou, M. Schmitt, H. Schütze and V. Tresp.
Improving Scene Graph Classification by Exploiting Knowledge from Texts.
AAAI 2022 - 36th Conference on Artificial Intelligence. Virtual, Feb 22-Mar 01, 2022. DOI

Abstract

Training scene graph classification models requires a large amount of annotated image data. Meanwhile, scene graphs represent relational knowledge that can be modeled with symbolic data from texts or knowledge graphs. While image annotation demands extensive labor, collecting textual descriptions of natural scenes requires less effort. In this work, we investigate whether textual scene descriptions can substitute for annotated image data. To this end, we employ a scene graph classification framework that is trained not only from annotated images but also from symbolic data. In our architecture, the symbolic entities are first mapped to their correspondent image-grounded representations and then fed into the relational reasoning pipeline. Even though a structured form of knowledge, such as the form in knowledge graphs, is not always available, we can generate it from unstructured texts using a transformer-based language model. We show that by fine-tuning the classification pipeline with the extracted knowledge from texts, we can achieve ~8x more accurate results in scene graph classification, ~3x in object classification, and ~1.5x in predicate classification, compared to the supervised baselines with only 1% of the annotated images.

MCML Authors

Hinrich Schütze

Prof. Dr.

Computational Linguistics

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[26]

M. Berrendorf.
Machine learning for managing structured and semi-structured data.
Dissertation 2022. DOI

Abstract

As data availability grows across sectors, machine learning, especially graph neural networks, plays a crucial role in extracting insights by automating complex analysis, including relational learning. Knowledge graphs help store entity facts, though they often require automated methods like Link Prediction and Entity Alignment to fill in missing information due to the sheer volume. This thesis advances knowledge graph completion by improving Entity Alignment through active learning, refining Link Prediction with metadata, and introducing a new evaluation metric, as well as a software library to aid researchers. (Shortened).

MCML Authors

Max Berrendorf

Dr.

* Former Member

2021

[25]

M. Ali, M. Berrendorf, M. Galkin, V. Thost, T. Ma, V. Tresp and J. Lehmann.
Improving Inductive Link Prediction Using Hyper-Relational Facts.
ISWC 2021 - 20th International Semantic Web Conference. Virtual, Oct 24-28, 2021. Best Paper Award. DOI GitHub

Abstract

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[24]

Y. Ma and V. Tresp.
Causal Inference under Networked Interference and Intervention Policy Enhancement.
AISTATS 2021 - 24th International Conference on Artificial Intelligence and Statistics. Virtual, Apr 13-15, 2021. URL

Abstract

Estimating individual treatment effects from data of randomized experiments is a critical task in causal inference. The Stable Unit Treatment Value Assumption (SUTVA) is usually made in causal inference. However, interference can introduce bias when the assigned treatment on one unit affects the potential outcomes of the neighboring units. This interference phenomenon is known as spillover effect in economics or peer effect in social science. Usually, in randomized experiments or observational studies with interconnected units, one can only observe treatment responses under interference. Hence, the issue of how to estimate the superimposed causal effect and recover the individual treatment effect in the presence of interference becomes a challenging task in causal inference. In this work, we study causal effect estimation under general network interference using Graph Neural Networks, which are powerful tools for capturing node and link dependencies in graphs. After deriving causal effect estimators, we further study intervention policy improvement on the graph under capacity constraint. We give policy regret bounds under network interference and treatment capacity constraint. Furthermore, a heuristic graph structure-dependent error bound for Graph Neural Network-based causal estimators is provided.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[23]

M. Berrendorf, E. Faerman and V. Tresp.
Active Learning for Entity Alignment.
ECIR 2021 - 43rd European Conference on Information Retrieval. Virtual, Mar 28-Apr 01, 2021. DOI GitHub

Abstract

In this work, we propose a novel framework for labeling entity alignments in knowledge graph datasets. Different strategies to select informative instances for the human labeler build the core of our framework. We illustrate how the labeling of entity alignments is different from assigning class labels to single instances and how these differences affect the labeling efficiency. Based on these considerations, we propose and evaluate different active and passive learning strategies. One of our main findings is that passive learning approaches, which can be efficiently precomputed, and deployed more easily, achieve performance comparable to the active learning strategies.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[22]

M. Berrendorf, L. Wacker and E. Faerman.
A Critical Assessment of State-of-the-Art in Entity Alignment.
ECIR 2021 - 43rd European Conference on Information Retrieval. Virtual, Mar 28-Apr 01, 2021. DOI GitHub

Abstract

In this work, we perform an extensive investigation of two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs. Therefore, we first carefully examine the benchmarking process and identify several shortcomings, making the results reported in the original works not always comparable. Furthermore, we suspect that it is a common practice in the community to make the hyperparameter optimization directly on a test set, reducing the informative value of reported performance. Thus, we select a representative sample of benchmarking datasets and describe their properties. We also examine different initializations for entity representations since they are a decisive factor for model performance. Furthermore, we use a shared train/validation/test split for an appropriate evaluation setting to evaluate all methods on all datasets. In our evaluation, we make several interesting findings. While we observe that most of the time SotA approaches perform better than baselines, they have difficulties when the dataset contains noise, which is the case in most real-life applications. Moreover, in our ablation study, we find out that often different features of SotA method are crucial for good performance than previously assumed.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

[21]

M. Fromm, M. Berrendorf, S. Obermeier, T. Seidl and E. Faerman.
Diversity Aware Relevance Learning for Argument Search.
ECIR 2021 - 43rd European Conference on Information Retrieval. Virtual, Mar 28-Apr 01, 2021. DOI GitHub

Abstract

In this work, we focus on the problem of retrieving relevant arguments for a query claim covering diverse aspects. State-of-the-art methods rely on explicit mappings between claims and premises, and thus are unable to utilize large available collections of premises without laborious and costly manual annotation. Their diversity approach relies on removing duplicates via clustering which does not directly ensure that the selected premises cover all aspects. This work introduces a new multi-step approach for the argument retrieval problem. Rather than relying on ground-truth assignments, our approach employs a machine learning model to capture semantic relationships between arguments. Beyond that, it aims to cover diverse facets of the query, instead of trying to identify duplicates explicitly. Our empirical evaluation demonstrates that our approach leads to a significant improvement in the argument retrieval task even though it requires less data.

MCML Authors

Michael Fromm

Dr.

* Former Member

Max Berrendorf

Dr.

* Former Member

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Database Systems and Data Mining AI Lab

Evgeny Faerman

Dr.

* Former Member

[20]

M. Ali, M. Berrendorf, C. T. Hoyt, L. Vermue, S. Sharifzadeh, V. Tresp and J. Lehmann.
PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings.
Journal of Machine Learning Research 22.82 (Mar. 2021). PDF

Abstract

Recently, knowledge graph embeddings (KGEs) have received significant attention, and several software libraries have been developed for training and evaluation. While each of them addresses specific needs, we report on a community effort to a re-design and re-implementation of PyKEEN, one of the early KGE libraries. PyKEEN 1.0 enables users to compose knowledge graph embedding models based on a wide range of interaction models, training approaches, loss functions, and permits the explicit modeling of inverse relations. It allows users to measure each component’s influence individually on the model’s performance. Besides, an automatic memory optimization has been realized in order to optimally exploit the provided hardware. Through the integration of Optuna, extensive hyper-parameter optimization (HPO) functionalities are provided.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[19]

M. Fromm, E. Faerman, M. Berrendorf, S. Bhargava, R. Qi, Y. Zhang, L. Dennert, S. Selle, Y. Mao and T. Seidl.
Argument Mining Driven Analysis of Peer-Reviews.
AAAI 2021 - 35th Conference on Artificial Intelligence. Virtual, Feb 02-09, 2021. DOI GitHub

Abstract

Peer reviewing is a central process in modern research and essential for ensuring high quality and reliability of published work. At the same time, it is a time-consuming process and increasing interest in emerging fields often results in a high review workload, especially for senior researchers in this area. How to cope with this problem is an open question and it is vividly discussed across all major conferences. In this work, we propose an Argument Mining based approach for the assistance of editors, meta-reviewers, and reviewers. We demonstrate that the decision process in the field of scientific publications is driven by arguments and automatic argument identification is helpful in various use-cases. One of our findings is that arguments used in the peer-review process differ from arguments in other domains making the transfer of pre-trained models difficult. Therefore, we provide the community with a new peer-review dataset from different computer science conferences with annotated arguments. In our extensive empirical evaluation, we show that Argument Mining can be used to efficiently extract the most relevant parts from reviews, which are paramount for the publication decision. The process remains interpretable since the extracted arguments can be highlighted in a review without detaching them from their context.

MCML Authors

Michael Fromm

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Max Berrendorf

Dr.

* Former Member

Yao Zhang

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[18]

S. Sharifzadeh, S. M. Baharlou and V. Tresp.
Classification by Attention: Scene Graph Classification with Prior Knowledge.
AAAI 2021 - 35th Conference on Artificial Intelligence. Virtual, Feb 02-09, 2021. DOI

Abstract

A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for perception and prior knowledge. Instead, we take a multi-task learning approach by introducing schema representations and implementing the classification as an attention layer between image-based representations and the schemata. This allows for the prior knowledge to emerge and propagate within the perception model. By enforcing the model also to represent the prior, we achieve a strong inductive bias. We show that our model can accurately generate commonsense knowledge and that the iterative injection of this knowledge to scene representations, as a top-down mechanism, leads to significantly higher classification performance. Additionally, our model can be fine-tuned on external knowledge given as triples. When combined with self-supervised learning and with 1% of annotated images only, this gives more than 3% improvement in object classification, 26% in scene graph classification, and 36% in predicate prediction accuracy.

MCML Authors

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

2020

[17]

M. Berrendorf, E. Faerman, L. Vermue and V. Tresp.
Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank.
WI-IAT 2020 - IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Virtual, Dec 14-17, 2020. DOI

Abstract

In this work, we take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment. In the current experimental setting, multiple different scores are employed to assess different aspects of model performance. We analyze the informativeness of these evaluation measures and identify several shortcomings. In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets. Moreover, we demonstrate that varying size of the test size automatically has impact on the performance of the same model based on commonly used metrics for the Entity Alignment task. We show that this leads to various problems in the interpretation of results, which may support misleading conclusions. Therefore, we propose adjustments to the evaluation and demonstrate empirically how this supports a fair, comparable, and interpretable assessment of model performance.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[16]

S. Obermeier, M. Berrendorf and P. Kröger.
Memory-Efficient RkNN Retrieval by Nonlinear k-Distance Approximation.
WI-IAT 2020 - IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Virtual, Dec 14-17, 2020. DOI

Abstract

The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear lines in log-log space. In this work, we investigate this assumption and uncover that it is violated in regions of changing density, which we show are typical for real-life datasets. Towards a generic solution, we pose the estimation of k-distances as a regression problem. Thereby, we enable harnessing the power of the abundance of available Machine Learning models and profiting from their advancement. We propose a flexible approach which allows steering the performance-memory consumption trade-off, and in particular to find good solutions with a fixed memory budget crucial in the context of edge computing. Moreover, we show how to obtain and improve guaranteed bounds essential to exact query processing. In experiments on real-world datasets, we demonstrate how this framework can significantly reduce the index memory consumption, and strongly reduce the candidate set size. We publish our code at https://github.com/sobermeier/nonlinear-kdist, and a detailed technical report at https://arxiv.org/abs/2011.01773.

MCML Authors

Sandra Gilhuber (née Obermeier)

Database Systems and Data Mining AI Lab

Max Berrendorf

Dr.

* Former Member

Peer Kröger

Prof. Dr.

* Former Principal Investigator

[15]

Y. Ma and V. Tresp.
A Variational Quantum Circuit Model for Knowledge Graph Embeddings.
QTNML @NeurIPS 2020 - 1st Workshop on Quantum Tensor Networks in Machine Learning at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual, Dec 06-12, 2020. PDF

Abstract

Can quantum computing resources facilitate representation learning? In this work, we propose the first quantum Ansatz for statistical relational learning on knowledge graphs using parametric quantum circuits. We propose a variational quantum circuit for modeling knowledge graphs by introducing quantum representations of entities. In particular, latent representations of entities are encoded as coefficients of quantum states, while predicates are characterized by parametric gates acting on the quantum states. We show that quantum representations can be trained efficiently meanwhile preserving the quantum advantages. Simulations on classical machines with different datasets show that our proposed quantum circuit Ansatz and quantum representations can achieve comparable results to the state-of-the-art classical models, e.g., RESCAL, DISTMULT. Furthermore, after optimizing the models, the complexity of inductive inference on the knowledge graphs can be reduced with respect to the number of entities.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[14]

M. Berrendorf and E. Faerman.
mberr/ea-active-learning: Zenodo. Version 1.0.1.
2020. DOI

Abstract

Code for paper ‘Active Learning for Entity Alignment’ (https://arxiv.org/abs/2001.08943)

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

[13]

M. Berrendorf, L. Wacker and E. Faerman.
mberr/ea-sota-comparison: Zenodo. Version v1.1.1.
2020. DOI

Abstract

Code for paper ‘A Critical Assessment of State-of-the-Art in Entity Alignment.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

[12]

Y. Zhang, Y. Lu and T. Seidl.
KNNAC: An Efficient k Nearest Neighbor Based Clustering with Active Core Detection.
iiWAS 2020 - 22nd International Conference on Information Integration and Web-based Applications and Services. Chiang Mai, Thailand, Nov 30-Dec 02, 2020. DOI

Abstract

Density-based clustering algorithms are commonly adopted when arbitrarily shaped clusters exist. Usually, they do not need to know the number of clusters in prior, which is a big advantage. Conventional density-based approaches such as DBSCAN, utilize two parameters to define density. Recently, novel density-based clustering algorithms are proposed to reduce the problem complexity to the use of a single parameter k by utilizing the concepts of k Nearest Neighbor (kNN) and Reverse k Nearest Neighbor (RkNN) to define density. However, those kNN-based approaches are either ineffective or inefficient. In this paper, we present a new clustering algorithm KNNAC, which only requires computing the densities for a chosen subset of points due to the use of active core detection. We empirically show that, compared to other nearest neighbor based clustering approaches (e.g., RECORD, IS-DBSCAN, etc.), KNNAC can provide competitive performance while taking a fraction of the runtime.

MCML Authors

Yao Zhang

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining AI Lab

[11]

Y. Ma, Z. Han and V. Tresp.
Learning with Temporal Knowledge Graphs.
CIKMW @CIKM 2020 - Workshop at the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020). Galway, Ireland, Oct 19-23, 2020. Invited talk. PDF

Abstract

Temporal knowledge graphs, also known as episodic or time-dependent knowledge graphs, are large-scale event databases that describe temporally evolving multi-relational data. An episodic knowledge graph can be regarded as a sequence of semantic knowledge graphs incorporated with timestamps. In this talk, we review recently developed learning-based algorithms for temporal knowledge graphs completion and forecasting.

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[10]

Y. Ma.
Learning with relational knowledge in the context of cognition, quantum computing, and causality.
Dissertation 2020. DOI

Abstract

This dissertation explores the use of knowledge graphs, including semantic and episodic graphs, for representing static and evolving human knowledge, and proposes methods for improving knowledge inference. It introduces two quantum machine learning algorithms aimed at speeding up knowledge graph inference, demonstrating significant speedups over classical methods. Additionally, the work addresses causal inference in relational data, specifically in social networks, and proposes causal estimators using graph neural networks to estimate superimposed effects and optimize treatment assignments for network welfare. (Shortened.)

MCML Authors

Yunpu Ma

Dr.

Database Systems and Data Mining AI Lab

[9]

M. Ali, C. T. Hoyt, L. Vermue, M. Galkin and M. Berrendorf.
pykeen/benchmarking. Version v1.0.
2020. DOI

Abstract

pykeen/benchmarking: Accompanying arXiv announcement (v1.0). Zenodo. Mehdi Ali, Charles Tapley Hoyt, Laurent Vermue, Michael Galkin, & Max Berrendorf. (2020).

MCML Authors

Max Berrendorf

Dr.

* Former Member

[8]

M. Berrendorf, E. Faerman and V. Tresp.
Active Learning for Entity Alignment.
DL4G @WWW 2020 - 5th International Workshop on Deep Learning for Graphs at the International World Wide Web Conference (WWW 2020). Taipeh, Taiwan, Apr 21, 2020. arXiv

Abstract

In this work, we propose a novel framework for the labeling of entity alignments in knowledge graph datasets. Different strategies to select informative instances for the human labeler build the core of our framework. We illustrate how the labeling of entity alignments is different from assigning class labels to single instances and how these differences affect the labeling efficiency. Based on these considerations we propose and evaluate different active and passive learning strategies. One of our main findings is that passive learning approaches, which can be efficiently precomputed and deployed more easily, achieve performance comparable to the active learning strategies.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[7]

M. Berrendorf, E. Faerman, L. Vermue and V. Tresp.
Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank (Extended Abstract).
DL4G @WWW 2020 - 5th International Workshop on Deep Learning for Graphs at the International World Wide Web Conference (WWW 2020). Taipeh, Taiwan, Apr 21, 2020. Full paper at WI-AT 2020. DOI

Abstract

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

[6]

M. Berrendorf, E. Faerman, V. Melnychuk, V. Tresp and T. Seidl.
Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned.
ECIR 2020 - 42nd European Conference on Information Retrieval. Virtual, Apr 14-17, 2020. DOI GitHub

Abstract

In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive.We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets.We believe that people interested in KG matching might profit from our work, as well as novices entering the field.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Volker Tresp

Prof. Dr.

Database Systems and Data Mining AI Lab

Thomas Seidl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Database Systems and Data Mining AI Lab

[5]

D. Davletshina, V. Melnychuk, V. Tran, H. Singla, M. Berrendorf, E. Faerman, M. Fromm and M. Schubert.
Unsupervised Anomaly Detection for X-Ray Images.
Preprint (Jan. 2020). arXiv GitHub

Abstract

Obtaining labels for medical (image) data requires scarce and expensive experts. Moreover, due to ambiguous symptoms, single images rarely suffice to correctly diagnose a medical condition. Instead, it often requires to take additional background information such as the patient’s medical history or test results into account. Hence, instead of focusing on uninterpretable black-box systems delivering an uncertain final diagnosis in an end-to-end-fashion, we investigate how unsupervised methods trained on images without anomalies can be used to assist doctors in evaluating X-ray images of hands. Our method increases the efficiency of making a diagnosis and reduces the risk of missing important regions. Therefore, we adopt state-of-the-art approaches for unsupervised learning to detect anomalies and show how the outputs of these methods can be explained. To reduce the effect of noise, which often can be mistaken for an anomaly, we introduce a powerful preprocessing pipeline. We provide an extensive evaluation of different approaches and demonstrate empirically that even without labels it is possible to achieve satisfying results on a real-world dataset of X-ray images of hands. We also evaluate the importance of preprocessing and one of our main findings is that without it, most of our approaches perform not better than random.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Viet Tran

C2 | Biology
→ Group Christian Müller

Biomedical Statistics and Data Science

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Michael Fromm

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

2019

[4]

E. Faerman, O. Voggenreiter, F. Borutta, T. Emrich, M. Berrendorf and M. Schubert.
Graph Alignment Networks with Node Matching Scores.
NeurIPS 2019 - Workshop on Graph Representation Learning at the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 08-14, 2019. PDF

Abstract

In this work we address the problem of graph node alignment at the example of Map Fusion (MF). Given two partly overlapping road networks, the goal is to match nodes that represent the same locations in both networks. For this task we propose a new model based on Graph Neural Networks (GNN). Existing GNN approaches, which have recently been successfully applied on various tasks for graph based data, show poor performance for the MF task. We hypothesize that this is mainly caused by graph regions from the non-overlapping areas, as information from those areas negatively affect the learned node representations. Therefore, our model has an additional inductive bias and learns to ignore effects of nodes that do not have a matching in the other graph. Our new model can easily be extended to other graph alignment problems, e.g., for calculating graph similarities, or for the alignment of entities in knowledge graphs, as well.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Max Berrendorf

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

[3]

E. Faerman, M. Rogalla, N. Strauß, A. Krüger, B. Blümel, M. Berrendorf, M. Fromm and M. Schubert.
Spatial Interpolation with Message Passing Framework.
ICDMW 2019 - IEEE International Conference on Data Mining Workshops. Beijing, China, Nov 08-11, 2019. DOI

Abstract

Spatial interpolation is the task to predict a measurement for any location in a given geographical region. To train a prediction model, we assume to have point-wise measurements for various locations in the region. In addition, it is often beneficial to consider historic measurements for these locations when training an interpolation model. Typical use cases are the interpolation of weather, pollution or traffic information. In this paper, we introduce a new type of model with strong relational inductive bias based on Message Passing Networks. In addition, we extend our new model to take geomorphological characteristics into account to improve the prediciton quality. We provide an extensive evaluation based on a large real-world weather dataset and compare our new approach with classical statistical interpolation techniques and Neural Networks without inductive bias.

MCML Authors

Evgeny Faerman

Dr.

* Former Member

Niklas Strauß

Dr.

A3 | Computational Models
→ Group Matthias Schubert

Spatial Artificial Intelligence

Max Berrendorf

Dr.

* Former Member

Michael Fromm

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[2]

M. Fromm, M. Berrendorf, E. Faerman, Y. Chen, B. Schüss and M. Schubert.
XD-STOD: Cross-Domain Superresolution for Tiny Object Detection.
ICDMW 2019 - IEEE International Conference on Data Mining Workshops. Beijing, China, Nov 08-11, 2019. DOI

Abstract

Monitoring the restoration of natural habitats after human intervention is an important task in the field of remote sensing. Currently, this requires extensive field studies entailing considerable costs. Unmanned Aerial vehicles (UAVs, a.k.a. drones) have the potential to reduce these costs, but generate immense amounts of data which have to be evaluated automatically with special techniques. Especially the automated detection of tree seedlings poses a big challenge, as their size and shape vary greatly across images. In addition, there is a tradeoff between different flying altitudes. Given the same camera equipment, a lower flying altitude achieves higher resolution images and thus, achieving high detection rates is easier. However, the imagery will only cover a limited area. On the other hand, flying at larger altitudes, allows for covering larger areas, but makes seedling detection more challenging due to the coarser images. In this paper we investigate the usability of super resolution (SR) networks for the case that we can collect a large amount of coarse imagery on higher flying altitudes, but only a small amount of high resolution images from lower flying altitudes. We use a collection of high-resolution images taken by a drone at 5m altitude. After training the SR models on these data, we evaluate their applicability to low quality images taken at 30m altitude (in-domain). In addition, we investigate and compare whether approaches trained on a highly diverse large data sets can be transferred to these data (cross-domain). We also evaluate the usability of the SR results based on their influence on the detection rate of different object detectors. We found that the features acquired from training on standard SR data sets are transferable to the drone footage. Furthermore, we demonstrate that the detection rate of common object detectors can be improved by SR techniques using both settings, in-domain and cross-domain.

MCML Authors

Michael Fromm

Dr.

* Former Member

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[1]

M. Berrendorf, F. Borutta and P. Kröger.
k-Distance Approximation for Memory-Efficient RkNN Retrieval.
SISAP 2019 - 12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019. DOI

Abstract

For a given query object, Reverse k-Nearest Neighbor queries retrieve those objects that have the query object among their k-nearest neighbors. However, computing the k-nearest neighbor sets for all points in a database is expensive in terms of computational costs. Therefore, specific index structures have been invented to apply pruning heuristics which aim at reducing the search space. At time, the state-of-the-art index structure for enabling fast RkNN query processing in general metric spaces is the MRkNNCoP-Tree which uses linear functions to approximate lower and upper bounds on the k-distances to prune the search space. Storing those linear functions results in additional storage costs in O(n) which might be infeasible in situation where storage space is limited, e.g., on mobile devices. In this work, we present a novel index based on the MRkNNCoP-Tree as well as recent developments in the field of neural indexing. By learning a single neural network model that approximates the k-nearest neighbor distance bounds for all points in a database, the storage complexity of the proposed index structure is reduced to O(1) while the index is still able to guarantee exact query results. As shown in our experimental evaluations on synthetic and real-world data sets, our approach can significantly reduce the required storage space in trade-off to some growth in terms of refinement sets when relying on exact query processing.

MCML Authors

Max Berrendorf

Dr.

* Former Member

Felix Borutta

Dr.

* Former Member

Peer Kröger

Prof. Dr.