05.07.2023

MCML Researchers With Twelve Papers at ACL 2023

61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, 09.07.2023–14.07.2023

We are happy to announce that MCML researchers are represented with twelve papers at ACL 2023. Congrats to our researchers!

Main Track (4 papers)

A. Imani, P. Lin, A. H. Kargaran, S. Severini, M. J. Sabet, N. Kassner, C. Ma, H. Schmid, A. Martins, F. Yvon and H. Schütze.
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages.
ACL 2023 - 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI GitHub

Abstract

The NLP community has mainly focused on scaling Large Language Models (LLMs) vertically, i.e., making them better for about 100 languages. We instead scale LLMs horizontally: we create, through continued pretraining, Glot500-m, an LLM that covers 511 predominantly low-resource languages. An important part of this effort is to collect and clean Glot500-c, a corpus that covers these 511 languages and allows us to train Glot500-m. We evaluate Glot500-m on five diverse tasks across these languages. We observe large improvements for both high-resource and low-resource languages compared to an XLM-R baseline. Our analysis shows that no single factor explains the quality of multilingual LLM representations. Rather, a combination of factors determines quality including corpus size, script, ‘help’ from related languages and the total capacity of the model. Our work addresses an important goal of NLP research: we should notlimit NLP to a small fraction of the world’s languages and instead strive to support as many languages as possible to bring the benefits of NLP technology to all languages and cultures.

MCML Authors

Ayyoob Imani

→ Group Hinrich Schütze
Computational Linguistics

Peiqin Lin

→ Group Hinrich Schütze
Computational Linguistics

Amir Hossein Kargaran

→ Group Hinrich Schütze
Computational Linguistics

Nora Kassner

* Former Member

→ Group Hinrich Schütze
Computational Linguistics

Chunlan Ma

→ Group Hinrich Schütze
Computational Linguistics

Hinrich Schütze

Prof. Dr.

Principal Investigator

Computational Linguistics

Y. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism.
ACL 2023 - 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

We investigate response generation for multi-turn dialogue in generative chatbots. Existing generative modelsbased on RNNs (Recurrent Neural Networks) usually employ the last hidden state to summarize the history, which makesmodels unable to capture the subtle variability observed in different dialogues and cannot distinguish the differencesbetween dialogues that are similar in composition. In this paper, we propose Pseudo-Variational Gated Recurrent Unit (PVGRU). The key novelty of PVGRU is a recurrent summarizing variable thataggregates the accumulated distribution variations of subsequences. We train PVGRU without relying on posterior knowledge, thus avoiding the training-inference inconsistency problem. PVGRU can perceive subtle semantic variability through summarizing variables that are optimized by two objectives we employ for training: distribution consistency and reconstruction. In addition, we build a Pseudo-Variational Hierarchical Dialogue(PVHD) model based on PVGRU. Experimental results demonstrate that PVGRU can broadly improve the diversity andrelevance of responses on two benchmark datasets.

MCML Authors

Yongkang Liu

Dr.

* Former Member

→ Group Hinrich Schütze
Computational Linguistics

Hinrich Schütze

Prof. Dr.

Principal Investigator

Computational Linguistics

Y. Liu, H. Ye, L. Weissweiler, P. Wicke, R. Pei, R. Zangenfeind and H. Schütze.
A Crosslingual Investigation of Conceptualization in 1335 Languages.
ACL 2023 - 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Languages differ in how they divide up the world into concepts and words; e.g., in contrast to English, Swahili has a single concept for ‘belly’ and ‘womb’. We investigate these differences in conceptualization across 1,335 languages by aligning concepts in a parallel corpus. To this end, we propose Conceptualizer, a method that creates a bipartite directed alignment graph between source language concepts and sets of target language strings. In a detailed linguistic analysis across all languages for one concept (‘bird’) and an evaluation on gold standard data for 32 Swadesh concepts, we show that Conceptualizer has good alignment accuracy. We demonstrate the potential of research on conceptualization in NLP with two experiments. (1) We define crosslingual stability of a concept as the degree to which it has 1-1 correspondences across languages, and show that concreteness predicts stability. (2) We represent each language by its conceptualization pattern for 83 concepts, and define a similarity measure on these representations. The resulting measure for the conceptual similarity between two languages is complementary to standard genealogical, typological, and surface similarity measures. For four out of six language families, we can assign languages to their correct family based on conceptual similarity with accuracies between 54% and 87%.

MCML Authors

Yihong Liu

→ Group Hinrich Schütze
Computational Linguistics

Haotian Ye

→ Group Hinrich Schütze
Computational Linguistics

Leonie Weissweiler

* Former Member

→ Group Hinrich Schütze
Computational Linguistics

Philipp Wicke

Dr.

→ Group Hinrich Schütze
Computational Linguistics

Hinrich Schütze

Prof. Dr.

Principal Investigator

Computational Linguistics

A. Modarressi, M. Fayyaz, E. Aghazadeh, Y. Yaghoobzadeh and M. T. Pilehvar.
DecompX: Explaining Transformers Decisions by Propagating Token Decomposition.
ACL 2023 - 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI GitHub

Abstract

An emerging solution for explaining Transformer-based models is to use vector-based analysis on how the representations are formed. However, providing a faithful vector-based explanation for a multi-layer model could be challenging in three aspects: (1) Incorporating all components into the analysis, (2) Aggregating the layer dynamics to determine the information flow and mixture throughout the entire model, and (3) Identifying the connection between the vector-based analysis and the model’s predictions. In this paper, we present DecompX to tackle these challenges. DecompX is based on the construction of decomposed token representations and their successive propagation throughout the model without mixing them in between layers. Additionally, our proposal provides multiple advantages over existing solutions for its inclusion of all encoder components (especially nonlinear feed-forward networks) and the classification head. The former allows acquiring precise vectors while the latter transforms the decomposition into meaningful prediction-based values, eliminating the need for norm- or summation-based vector aggregation. According to the standard faithfulness evaluations, DecompX consistently outperforms existing gradient-based and vector-based approaches on various datasets.

MCML Authors

Ali Modarressi

→ Group Hinrich Schütze
Computational Linguistics

Findings Track (8 papers)

M. Fromm, M. Berrendorf, E. Faerman and T. Seidl.
Cross-Domain Argument Quality Estimation.
Findings @ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI GitHub

Abstract

Argumentation is one of society’s foundational pillars, and, sparked by advances in NLP, and the vast availability of text data, automated mining of arguments receives increasing attention. A decisive property of arguments is their strength or quality. While there are works on the automated estimation of argument strength, their scope is narrow:They focus on isolated datasets and neglect the interactions with related argument-mining tasks, such as argument identification and evidence detection. In this work, we close this gap by approaching argument quality estimation from multiple different angles:Grounded on rich results from thorough empirical evaluations, we assess the generalization capabilities of argument quality estimation across diverse domains and the interplay with related argument mining tasks. We find that generalization depends on a sufficient representation of different domains in the training part. In zero-shot transfer and multi-task experiments, we reveal that argument quality is among the more challenging tasks but can improve others.

MCML Authors

Max Berrendorf

Dr.

* Former Member

→ Group Volker Tresp
Database Systems, Data Mining and AI

Evgeny Faerman

Dr.

* Former Member

→ Group Matthias Schubert
Spatial Artificial Intelligence

Thomas Seidl

Prof. Dr.

Director

Database Systems, Data Mining and AI

K. Hämmerl, B. Deiseroth, P. Schramowski, J. Libovický, C. Rothkopf, A. Fraser and K. Kersting.
Speaking Multiple Languages Affects the Moral Bias of Language Models.
Findings @ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Pre-trained multilingual language models (PMLMs) are commonly used when dealing with data from multiple languages and cross-lingual transfer. However, PMLMs are trained on varying amounts of data for each language. In practice this means their performance is often much better on English than many other languages. We explore to what extent this also applies to moral norms. Do the models capture moral norms from English and impose them on other languages? Do the models exhibit random and thus potentially harmful beliefs in certain languages? Both these issues could negatively impact cross-lingual transfer and potentially lead to harmful outcomes. In this paper, we (1) apply the MORALDIRECTION framework to multilingual models, comparing results in German, Czech, Arabic, Chinese, and English, (2) analyse model behaviour on filtered parallel subtitles corpora, and (3) apply the models to a Moral Foundations Questionnaire, comparing with human responses from different countries. Our experiments demonstrate that, indeed, PMLMs encode differing moral biases, but these do not necessarily correspond to cultural differences or commonalities in human opinions. We release our code and models.

MCML Authors

Katharina Hämmerl

→ Group Alexander Fraser
Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Principal Investigator

Data Analytics & Statistics

K. Hämmerl, A. Fastowski, J. Libovický and A. Fraser.
Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity.
Findings @ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Previous work has shown that the representations output by contextual language models are more anisotropic than static type embeddings, and typically display outlier dimensions. This seems to be true for both monolingual and multilingual models, although much less work has been done on the multilingual context. Why these outliers occur and how they affect the representations is still an active area of research. We investigate outlier dimensions and their relationship to anisotropy in multiple pre-trained multilingual language models. We focus on cross-lingual semantic similarity tasks, as these are natural tasks for evaluating multilingual representations. Specifically, we examine sentence representations. Sentence transformers which are fine-tuned on parallel resources (that are not always available) perform better on this task, and we show that their representations are more isotropic. However, we aim to improve multilingual representations in general. We investigate how much of the performance difference can be made up by only transforming the embedding space without fine-tuning, and visualise the resulting spaces. We test different operations: Removing individual outlier dimensions, cluster-based isotropy enhancement, and ZCA whitening. We publish our code for reproducibility.

MCML Authors

Katharina Hämmerl

→ Group Alexander Fraser
Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Principal Investigator

Data Analytics & Statistics

Z. Han, R. Liao, J. Gu, Y. Zhang, Z. Ding, Y. Gu, H. Köppl, H. Schütze and V. Tresp.
ECOLA: Enhancing Temporal Knowledge Embeddings with Contextualized Language Representations.
Findings @ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts. However, existing enhancement approaches cannot apply to temporal knowledge graphs (tKGs), which contain time-dependent event knowledge with complex temporal dynamics. Specifically, existing enhancement approaches often assume knowledge embedding is time-independent. In contrast, the entity embedding in tKG models usually evolves, which poses the challenge of aligning temporally relevant texts with entities. To this end, we propose to study enhancing temporal knowledge embedding with textual data in this paper. As an approach to this task, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which takes the temporal aspect into account and injects textual information into temporal knowledge embedding. To evaluate ECOLA, we introduce three new datasets for training and evaluating ECOLA. Extensive experiments show that ECOLA significantly enhances temporal KG embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task.

MCML Authors

Ruotong Liao

→ Group Volker Tresp
Database Systems, Data Mining and AI

Yao Zhang

→ Group Volker Tresp
Database Systems, Data Mining and AI

Zifeng Ding

→ Group Volker Tresp
Database Systems, Data Mining and AI

Hinrich Schütze

Prof. Dr.

Principal Investigator

Computational Linguistics

Volker Tresp

Prof. Dr.

Principal Investigator

Database Systems, Data Mining and AI

E. Nie, S. Liang, H. Schmid and H. Schütze.
Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages.
Findings @ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Multilingual Pretrained Language Models (MPLMs) perform strongly in cross-lingual transfer. We propose Prompts Augmented by Retrieval Crosslingually (PARC) to improve zero-shot performance on low-resource languages (LRLs) by augmenting the context with prompts consisting of semantically similar sentences retrieved from a high-resource language (HRL). PARC improves zero-shot performance on three downstream tasks (sentiment classification, topic categorization, natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in unlabeled (+5.1%) and labeled settings (+16.3%). PARC also outperforms finetuning by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

MCML Authors

Ercong Nie

→ Group Hinrich Schütze
Computational Linguistics

Sheng Liang

* Former Member

→ Group Hinrich Schütze
Computational Linguistics

Hinrich Schütze

Prof. Dr.

Principal Investigator

Computational Linguistics

D. Saggau, M. Rezaei, B. Bischl and I. Chalkidis.
Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning.
Findings @ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both efficiency and quality considerations. Therefore, we train Longfomer-based document encoders using a state-of-the-art unsupervised contrastive learning method (SimCSE). Further on, we complement the baseline method -siamese neural network- with additional convex neural networks based on functional Bregman divergence aiming to enhance the quality of the output document representations. We show that overall the combination of a self-contrastive siamese network and our proposed neural Bregman network outperforms the baselines in two linear classification settings on three long document topic classification tasks from the legal and biomedical domains.

MCML Authors

Mina Rezaei

Dr.

→ Group Bernd Bischl
Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Director

Statistical Learning and Data Science

L. Weber and B. Plank.
ActiveAED: A Human in the Loop Improves Annotation Error Detection.
Findings @ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Manually annotated datasets are crucial for training and evaluating Natural Language Processing models. However, recent work has discovered that even widely-used benchmark datasets contain a substantial number of erroneous annotations. This problem has been addressed with Annotation Error Detection (AED) models, which can flag such errors for human re-annotation. However, even though many of these AED methods assume a final curation step in which a human annotator decides whether the annotation is erroneous, they have been developed as static models without any human-in-the-loop component. In this work, we propose ActiveAED, an AED method that can detect errors more accurately by repeatedly querying a human for error corrections in its prediction loop. We evaluate ActiveAED on eight datasets spanning five different tasks and find that it leads to improvements over the state of the art on seven of them, with gains of up to six percentage points in average precision.

MCML Authors

Barbara Plank

Prof. Dr.

Principal Investigator

AI and Computational Linguistics

P. Wicke.
LMs stand their Ground: Investigating the Effect of Embodiment in Figurative Language Interpretation by Language Models.
Findings @ACL 2023 - Findings of the 61th Annual Meeting of the Association for Computational Linguistics. Toronto, Canada, Jul 09-14, 2023. DOI

Abstract

Figurative language is a challenge for language models since its interpretation is based on the use of words in a way that deviates from their conventional order and meaning. Yet, humans can easily understand and interpret metaphors, similes or idioms as they can be derived from embodied metaphors. Language is a proxy for embodiment and if a metaphor is conventional and lexicalised, it becomes easier for a system without a body to make sense of embodied concepts. Yet, the intricate relation between embodiment and features such as concreteness or age of acquisition has not been studied in the context of figurative language interpretation concerning language models. Hence, the presented study shows how larger language models perform better at interpreting metaphoric sentences when the action of the metaphorical sentence is more embodied. The analysis rules out multicollinearity with other features (e.g. word length or concreteness) and provides initial evidence that larger language models conceptualise embodied concepts to a degree that facilitates figurative language understanding.

MCML Authors

Philipp Wicke

Dr.

→ Group Hinrich Schütze
Computational Linguistics

ACL 2023

Subscribe to RSS News feed

26.09.2025

Björn Ommer Featured in WELT

MCML PI Björn Ommer told WELT that AI can never be entirely neutral and that human judgment remains essential.

25.09.2025

Björn Schuller Featured in Macwelt Article

MCML PI Björn Schuller discusses in Macwelt how Apple Watch monitors health, detects subtle changes, and supports early intervention.

24.09.2025

MCML PI Björn Ommer Featured on ZDF NANO Talk

MCML PIs Björn Ommer & Alena Buyx discuss AI’s essence on ZDF NANO Talk, covering tech, ethics, and societal impact.

23.09.2025

Benjamin Lange Explores Opportunities and Risks of AI Agents

Benjamin Lange highlights both opportunities and ethical risks of AI agents and calls for clear rules to ensure they benefit society.

22.09.2025

Predicting Health With AI - With Researcher Simon Schallmoser

Simon Schallmoser uses AI to predict health risks, detect low blood sugar in drivers, and advance personalized, safer healthcare.

MCML Researchers With Twelve Papers at ACL 2023

61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, 09.07.2023–14.07.2023

Main Track (4 papers)

Findings Track (8 papers)

Related