Home | Research | Groups | Schütze

MCML - Research Group Hinrich Schütze

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Hinrich Schütze

holds the Chair of Statistical NLP and Deep Learning at LMU Munich.

His primary focus is linguistically-informed Neural NLP: His team uses deep understanding of language in its research and believes in the principle that learning is key to successful NLP – the same way that the language capabilities of humans are based on learning. The research areas are representation learning, multilinguality, machine learning for low-resource scenarios, cognitively motivated deep learning, linguistically informed deep learning (especially for morphology), digital humanities, and the intersection of NLP and robotics.

Team members @MCML

Link to Ahmad Dawar Hakimi

Ahmad Dawar Hakimi

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Molly Kennedy

Molly Kennedy

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Sheng Liang

Sheng Liang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Ali Modarressi

Ali Modarressi

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Victor Steinborn

Victor Steinborn

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Leonor Veloso

Leonor Veloso

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Axel Wisiorek

Axel Wisiorek

Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Shengqiang Zhang

Shengqiang Zhang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Publications @MCML

[70]
Y. Liu, Y. Zhang, Q. Li, T. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Full-parameter fine-tuning has become the go-to choice for adapting language models (LMs) to downstream tasks due to its excellent performance. As LMs grow in size, fine-tuning the full parameters of LMs requires a prohibitively large amount of GPU memory. Existing approaches utilize zeroth-order optimizer to conserve GPU memory, which can potentially compromise the performance of LMs as non-zero order optimizers tend to converge more readily on most downstream tasks. In this paper, we propose a novel optimizer-independent end-to-end hierarchical fine-tuning strategy, HiFT, which only updates a subset of parameters at each training step. HiFT can significantly reduce the amount of gradients and optimizer state parameters residing in GPU memory at the same time, thereby reducing GPU memory usage. Our results demonstrate that: (1) HiFT achieves comparable performance to parameter-efficient fine-tuning and standard full parameter fine-tuning. (2) HiFT supports various optimizers including AdamW, AdaGrad, SGD, etc. (3) HiFT can save more than 60% GPU memory compared with standard full-parameter fine-tuning for 7B model. (4) HiFT enables full-parameter fine-tuning of a 7B model on single 48G A6000 with a precision of 32 using the AdamW optimizer, without using any memory saving techniques.

MCML Authors
Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Tong Liu

Tong Liu

Database Systems & Data Mining

A3 | Computational Models

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[69]
A. Köksal, T. Schick, A. Korhonen and H. Schütze.
LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Instruction tuning enables language models to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive human annotation, crowd-sourced datasets with alignment issues, and generating noisy examples via LLMs. We introduce the LongForm-C dataset, which is created by reverse instructions. We generate instructions via LLMs for human-written corpus examples using reverse instructions. First we select a diverse set of human-written documents from corpora such as C4 and Wikipedia; then we generate instructions for these documents via LLMs. This approach provides a cheaper and cleaner instruction-tuning dataset with natural output and one suitable for long text generation. Our models outperform 10x larger language models without instruction tuning on tasks such as story/recipe generation and long-form question answering. Moreover, LongForm models outperform prior instruction-tuned models such as FLAN-T5 and Alpaca by a large margin, and improve language understanding capabilities further.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[68]
A. Modarressi, A. Köksal and H. Schütze.
Consistent Document-Level Relation Extraction via Counterfactuals.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Many datasets have been developed to train and evaluate document-level relation extraction (RE) models. Most of these are constructed using real-world data. It has been shown that RE models trained on real-world data suffer from factual biases. To evaluate and address this issue, we present CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using entity replacement. We first demonstrate that models trained on factual data exhibit inconsistent behavior: while they accurately extract triples from factual data, they fail to extract the same triples after counterfactual modification. This inconsistency suggests that models trained on factual data rely on spurious signals such as specific entities and external knowledge – rather than on the input context – to extract triples. We show that by generating document-level counterfactual data with CovEReD and training models on them, consistency is maintained with minimal impact on RE performance. We release our CovEReD pipeline as well as Re-DocRED-CF, a dataset of counterfactual RE documents, to assist in evaluating and addressing inconsistency in document-level RE.

MCML Authors
Link to Ali Modarressi

Ali Modarressi

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[67]
M. Wang, L. Lange, H. Adel, J. Strötgen and H. Schütze.
Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

To ensure large language models contain up-to-date knowledge, they need to be updated regularly. However, model editing is challenging as it might also affect knowledge that is unrelated to the new data. State-of-the-art methods identify parameters associated with specific knowledge and then modify them via direct weight updates. However, these locate-and-edit methods suffer from heavy computational overhead and lack theoretical validation. In contrast, directly fine-tuning the model on requested edits affects the model's behavior on unrelated knowledge, and significantly damages the model's generation fluency and consistency. To address these challenges, we propose SAUL, a streamlined model editing method that uses sentence concatenation with augmented random facts for generation regularization. Evaluations on three model editing benchmarks show that SAUL is a practical and reliable solution for model editing outperforming state-of-the-art methods while maintaining generation quality and reducing computational overhead.

MCML Authors
Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[66]
O. Xhelili, Y. Liu and H. Schütze.
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Multilingual pre-trained models (mPLMs) have shown impressive performance on cross-lingual transfer tasks. However, the transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language, even though the two languages may be related or share parts of their vocabularies. Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method aiming to improve the cross-lingual alignment between languages using diverse scripts. We select two areal language groups, Mediterranean-Amharic-Farsi and South+East Asian Languages, wherein the languages are mutually influenced but use different scripts. We apply our method to these language groups and conduct extensive experiments on a spectrum of downstream tasks. The results show that after PPA, models consistently outperform the original model (up to 50% for some tasks) in English-centric transfer. In addition, when we use languages other than English as sources in transfer, our method obtains even larger improvements.

MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[65]
A. Yüksel, A. Köksal, L. K. Senel, A. Korhonen and H. Schütze.
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Multiple choice question answering tasks evaluate the reasoning, comprehension, and mathematical abilities of Large Language Models (LLMs). While existing benchmarks employ automatic translation for multilingual evaluation, this approach is error-prone and potentially introduces culturally biased questions, especially in social sciences. We introduce the first multitask, multiple-choice Turkish QA benchmark, TurkishMMLU, to evaluate LLMs' understanding of the Turkish language. TurkishMMLU includes over 10,000 questions, covering 9 different subjects from Turkish high-school education curricula. These questions are written by curriculum experts, suitable for the high-school curricula in Turkey, covering subjects ranging from natural sciences and math questions to more culturally representative topics such as Turkish Literature and the history of the Turkish Republic. We evaluate over 20 LLMs, including multilingual open-source (e.g., Gemma, Llama, MT5), closed-source (GPT 4o, Claude, Gemini), and Turkish-adapted (e.g., Trendyol) models. We provide an extensive evaluation, including zero-shot and few-shot evaluation of LLMs, chain-of-thought reasoning, and question difficulty analysis along with model performance. We provide an in-depth analysis of the Turkish capabilities and limitations of current LLMs to provide insights for future LLMs for the Turkish language.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[64]
R. Zhao, A. Köksal, Y. Liu, L. Weissweiler, A. Korhonen and H. Schütze.
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv. GitHub.
Abstract

Traditional benchmarking in NLP typically involves using static held-out test sets. However, this approach often results in an overestimation of performance and lacks the ability to offer comprehensive, interpretable, and dynamic assessments of NLP models. Recently, works like DynaBench (Kiela et al., 2021) and CheckList (Ribeiro et al., 2020) have addressed these limitations through behavioral testing of NLP models with test types generated by a multistep human-annotated pipeline. Unfortunately, manually creating a variety of test types requires much human labor, often at prohibitive cost. In this work, we propose SYNTHEVAL, a hybrid behavioral testing framework that leverages large language models (LLMs) to generate a wide range of test types for a comprehensive evaluation of NLP models. SYNTHEVAL first generates sentences via LLMs using controlled generation, and then identifies challenging examples by comparing the predictions made by LLMs with task-specific NLP models. In the last stage, human experts investigate the challenging examples, manually design templates, and identify the types of failures the taskspecific models consistently exhibit. We apply SYNTHEVAL to two classification tasks, sentiment analysis and toxic language detection, and show that our framework is effective in identifying weaknesses of strong models on these tasks.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[63]
Y. Liu, F. Shi, D. Wang, Y. Zhang and H. Schütze.
ChatZero: Zero-Shot Cross-Lingual Dialogue Generation via Pseudo-Target Language.
27th European Conference on Artificial Intelligence (ECAI 2024). Santiago de Compostela, Spain, Oct 19-24, 2024. DOI.
Abstract

Although large language models(LLMs) show amazing capabilities, among various exciting applications discovered for LLMs fall short in other low-resource languages. Besides, most existing methods depend on large-scale dialogue corpora and thus building systems for dialogue generation in a zero-shot scenario remains a considerable challenge. To address this challenge, we propose a novel end-to-end zero-shot dialogue generation model ChatZero based on cross-lingual code-switching method. First, we construct code-switching language and pseudo-target language with placeholders. Then for cross-lingual semantic transfer, we employ unsupervised contrastive learning to minimize the semantics gap of the source language, code-switching language, and pseudo-target language that are mutually positive examples in the high dimensional semantic space. Experiments on the multilingual DailyDialog and DSTC7-AVSD datasets demonstrate that ChatZero can achieve more than 90% of the original performance under the zero-shot case compared to supervised learning, and achieve state-of-the-art performance compared with other baselines.

MCML Authors
Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[62]
Y. Liu, E. Nie, S. Feng, Z. Hua, Z. Ding, D. Wang, Y. Zhang and H. Schütze.
A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024). Vilnius, Lithuania, Sep 09-13, 2024. DOI. GitHub.
MCML Authors
Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

A3 | Computational Models

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[61]
A. Yüksel, A. Köksal, L. K. Senel, A. Korhonen and H. Schütze.
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish.
1st Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024) at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. Invited talk. arXiv. GitHub.
Abstract

Multiple choice question answering tasks evaluate the reasoning, comprehension, and mathematical abilities of Large Language Models (LLMs). While existing benchmarks employ automatic translation for multilingual evaluation, this approach is error-prone and potentially introduces culturally biased questions, especially in social sciences. We introduce the first multitask, multiple-choice Turkish QA benchmark, TurkishMMLU, to evaluate LLMs' understanding of the Turkish language. TurkishMMLU includes over 10,000 questions, covering 9 different subjects from Turkish high-school education curricula. These questions are written by curriculum experts, suitable for the high-school curricula in Turkey, covering subjects ranging from natural sciences and math questions to more culturally representative topics such as Turkish Literature and the history of the Turkish Republic. We evaluate over 20 LLMs, including multilingual open-source (e.g., Gemma, Llama, MT5), closed-source (GPT 4o, Claude, Gemini), and Turkish-adapted (e.g., Trendyol) models. We provide an extensive evaluation, including zero-shot and few-shot evaluation of LLMs, chain-of-thought reasoning, and question difficulty analysis along with model performance. We provide an in-depth analysis of the Turkish capabilities and limitations of current LLMs to provide insights for future LLMs for the Turkish language.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[60]
V. Blaschke, C. Purschke, H. Schütze and B. Plank.
What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
Abstract

Natural language processing (NLP) has largely focused on modelling standardized languages. More recently, attention has increasingly shifted to local, non-standardized languages and dialects. However, the relevant speaker populations’ needs and wishes with respect to NLP tools are largely unknown. In this paper, we focus on dialects and regional languages related to German – a group of varieties that is heterogeneous in terms of prestige and standardization. We survey speakers of these varieties (N=327) and present their opinions on hypothetical language technologies for their dialects. Although attitudes vary among subgroups of our respondents, we find that respondents are especially in favour of potential NLP tools that work with dialectal input (especially audio input) such as virtual assistants, and less so for applications that produce dialectal output such as machine translation or spellcheckers.

MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[59]
A. H. Kargaran, F. Yvon and H. Schütze.
MaskLID: Code-Switching Language Identification through Iterative Masking.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL. GitHub.
MCML Authors
Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[58]
Y. Liu, C. Ma, H. Ye and H. Schütze.
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[57]
M. Aßenmacher, A. Stephan, L. Weissweiler, E. Çano, I. Ziegler, M. Härttrich, B. Bischl, B. Roth, C. Heumann and H. Schütze.
Collaborative Development of Modular Open Source Educational Resources for Natural Language Processing.
6th Workshop on Teaching NLP (TeachingNLP 2024) at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
Abstract

In this work, we present a collaboratively and continuously developed open-source educational resource (OSER) for teaching natural language processing at two different universities. We shed light on the principles we followed for the initial design of the course and the rationale for ongoing developments, followed by a reflection on the inter-university collaboration for designing and maintaining teaching material. When reflecting on the latter, we explicitly emphasize the considerations that need to be made when facing heterogeneous groups and when having to accommodate multiple examination regulations within one single course framework. Relying on the fundamental principles of OSER developments as defined by Bothmann et al. (2023) proved to be an important guideline during this process. The final part pertains to open-sourcing our teaching material, coping with the increasing speed of developments in the field, and integrating the course digitally, also addressing conflicting priorities and challenges we are currently facing.

MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[56]
S. Yuan, E. Nie, M. Färber, H. Schmid and H. Schütze.
GNNAVI: Navigating the Information Flow in Large Language Models by Graph Neural Network.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
Abstract

Large Language Models (LLMs) exhibit strong In-Context Learning (ICL) capabilities when prompts with demonstrations are applied to them. However, fine-tuning still remains crucial to further enhance their adaptability. Prompt-based fine-tuning proves to be an effective fine-tuning method in low-data scenarios, but high demands on computing resources limit its practicality. We address this issue by introducing a prompt-based parameter-efficient fine-tuning (PEFT) approach. GNNavi leverages insights into ICL's information flow dynamics, which indicates that label words act in prompts as anchors for information propagation. GNNavi employs a Graph Neural Network (GNN) layer to precisely guide the aggregation and distribution of information flow during the processing of prompts by hardwiring the desired information flow into the GNN. Our experiments on text classification tasks with GPT-2 and Llama2 shows GNNavi surpasses standard prompt-based fine-tuning methods in few-shot settings by updating just 0.2% to 0.5% of parameters. We compare GNNavi with prevalent PEFT approaches, such as prefix tuning, LoRA and Adapter in terms of performance and efficiency. Our analysis reveals that GNNavi enhances information flow and ensures a clear aggregation process.

MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[55]
H. Ye, Y. Liu, C. Ma and H. Schütze.
MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer.
5th Workshop on Insights from Negative Results in NLP at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL.
Abstract

Transformer-based pre-trained language models (PLMs) have achieved remarkable performance in various natural language processing (NLP) tasks. However, pre-training such models can take considerable resources that are almost only available to high-resource languages. On the contrary, static word embeddings are easier to train in terms of computing resources and the amount of data required. In this paper, we introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer), a novel and challenging task that is especially relevant to low-resource languages for which static word embeddings are available. To tackle the task, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language. In this way, we can train the PLM on source-language training data and perform zero-shot transfer to the target language by simply swapping the embedding layer. However, through extensive experiments on two classification datasets, we show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines. In this paper, we attempt to explain this negative result and provide several thoughts on possible improvement.

MCML Authors
Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[54]
M. Wang, H. Adel, L. Lange, J. Strötgen and H. Schütze.
Rehearsal-Free Modular and Compositional Continual Learning for Language Models.
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL.
MCML Authors
Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[53]
Y. Liu, P. Lin, M. Wang and H. Schütze.
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining.
Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL.
Abstract

Instead of pretraining multilingual language models from scratch, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining. However, this method usually randomly initializes the embeddings of new subwords and introduces substantially more embedding parameters to the model, thus weakening the efficiency. To address these issues, we propose a novel framework: One For All (OFA), which wisely initializes the embeddings of unseen subwords and thus can adapt a PLM to multiple languages efficiently and effectively. OFA takes advantage of external well-aligned multilingual static word vectors and injects the alignment knowledge into the subword embeddings. In addition, OFA applies matrix factorization and replaces the cumbersome embeddings with two lower-dimensional matrices, which largely reduces the number of parameters. We show OFA accelerates the convergence of continued pretraining, which is environmentally friendly as much fewer carbon footprints are generated. Through extensive experiments, we demonstrate OFA can achieve competitive or better performance than default continued pretraining baselines on a wide range of crosslingual downstream tasks. We make our code and models publicly available.

MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Mingyang Wang

Mingyang Wang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[52]
V. Blaschke, B. Kovačić, S. Peng, H. Schütze and B. Plank.
MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in `within-language breadth': most treebanks focus on standard languages. Even for German, the language with the most annotations in UD, so far no treebank exists for one of its language varieties spoken by over 10M people: Bavarian. To contribute to closing this gap, we present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in UD, covering multiple text genres (wiki, fiction, grammar examples, social, non-fiction). We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies. Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries. We provide baseline parsing and POS tagging results, which are lower than results obtained on German and vary substantially between different graph-based parsers. To support further research on Bavarian syntax, we make our dataset, language-specific guidelines and code publicly available.

MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[51]
A. H. Kargaran, F. Yvon and H. Schütze.
GlotScript: A Resource and Tool for Low Resource Writing System Identification.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL. GitHub.
Abstract

We present GlotScript, an open resource and tool for low resource writing system identification. GlotScript-R is a resource that provides the attested writing systems for more than 7,000 languages. It is compiled by aggregating information from existing writing system resources. GlotScript-T is a writing system identification tool that covers all 161 Unicode 15.0 scripts. For an input text, it returns its script distribution where scripts are identified by ISO 15924 codes. We also present two use cases for GlotScript. First, we demonstrate that GlotScript can help cleaning multilingual corpora such as mC4 and OSCAR. Second, we analyze the tokenization of a number of language models such as GPT-4 using GlotScript and provide insights on the coverage of low resource scripts and languages by each language model. We hope that GlotScript will become a useful resource for work on low resource languages in the NLP community.

MCML Authors
Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[50]
A. Köksal, S. Severini and H. Schütze.
SilverAlign: MT-Based Silver Data Algorithm for Evaluating Word Alignment.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance on our silver data correlates well with gold benchmarks for 9 language pairs, making our approach a valid resource for evaluation of different domains and languages when gold data are not available. This addresses the important scenario of missing gold data alignments for low-resource languages.

MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[49]
L. Weissweiler, N. Böbel, K. Guiller, S. Herrera, W. Scivetti, A. Lorenzi, N. Melnik, A. Bhatia, H. Schütze, L. Levin, A. Zeldes, J. Nivre, W. Croft and N. Schneider.
UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages. However, the UD annotations do not tell the full story. Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements -- for example, interrogative sentences with special markers and/or word orders -- are not labeled holistically. We argue for (i) augmenting UD annotations with a 'UCxn' annotation layer for such meaning-bearing grammatical constructions, and (ii) approaching this in a typologically informed way so that morphosyntactic strategies can be compared across languages. As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns. In addition to findings regarding these particular constructions, our study yields important insights on methodology for describing and identifying constructions in language-general and language-particular ways, and lays the foundation for future constructional enrichment of UD treebanks.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Nora Schneider

Nora Schneider

Ethics in Systems Design and Machine Learning

A3 | Computational Models


[48]
S. Zhou, L. Weissweiler, T. He, H. Schütze, D. R. Mortensen and L. Levin.
Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning entailment solely based on token distinctions, and show that GPT-4 and Llama 2 fail it with strong bias. We then create further challenging sub-tasks in an effort to explain this failure. From a Computational Linguistics perspective, we identify a group of constructions with three classes of adjectives which cannot be distinguished by surface features. This enables us to probe for LLM's understanding of these constructions in various ways, and we find that they fail in a variety of ways to distinguish between them, suggesting that they don't adequately represent their meaning or capture the lexical properties of phrasal heads.

MCML Authors
Link to Shijia Zhou

Shijia Zhou

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[47]
Y. Liu, C. Ma, H. Ye and H. Schütze.
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data.
Preprint at arXiv (May. 2024). arXiv. GitHub.
MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Chunlan Ma

Chunlan Ma

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[46]
A. Modarressi, A. Köksal, A. Imani, M. Fayyaz and H. Schütze.
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory.
Preprint at arXiv (Apr. 2024). arXiv.
MCML Authors
Link to Ali Modarressi

Ali Modarressi

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[45]
A. Maronikolakis, A. Köksal and H. Schütze.
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks.
4th Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI 2024). St. Julian's, Malta, Mar 21, 2024. URL.
MCML Authors
Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[44]
P. Lin, C. Hu, Z. Zhang, A. Martins and H. Schütze.
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Recent multilingual pretrained language models (mPLMs) have been shown to encode strong language-specific signals, which are not explicitly provided during pretraining. It remains an open question whether it is feasible to employ mPLMs to measure language similarity, and subsequently use the similarity results to select source languages for boosting cross-lingual transfer. To investigate this, we propose mPLM-Sim, a language similarity measure that induces the similarities across languages from mPLMs using multi-parallel corpora. Our study shows that mPLM-Sim exhibits moderately high correlations with linguistic similarity measures, such as lexicostatistics, genealogical language family, and geographical sprachbund. We also conduct a case study on languages with low correlation and observe that mPLM-Sim yields more accurate similarity results. Additionally, we find that similarity results vary across different mPLMs and different layers within an mPLM. We further investigate whether mPLM-Sim is effective for zero-shot cross-lingual transfer by conducting experiments on both low-level syntactic tasks and high-level semantic tasks. The experimental results demonstrate that mPLM-Sim is capable of selecting better source languages than linguistic measures, resulting in a 1%-2% improvement in zero-shot cross-lingual transfer performance.

MCML Authors
Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[43]
B. Ma, E. Nie, S. Yuan, H. Schmid, M. Färber, F. Kreuter and H. Schütze.
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt Decomposition (ToPro), which facilitates the prompt-based method for token-level sequence labeling tasks. The ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer, especially for languages that are typologically different from the source language English. Our method also attains state-of-the-art performance when employed with the mT5 model. Besides, our exploratory study in multilingual large language models shows that ToPro performs much better than the current in-context learning method. Overall, the performance improvements show that ToPro could potentially serve as a novel and simple benchmarking method for sequence labeling tasks.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[42]
L. K. Şenel, B. Ebing, K. Baghirova, H. Schütze and G. Glavaš.
Kardeş-NLU: Transfer to Low-Resource Languages with Big Brother’s Help – A Benchmark and Evaluation for Turkic Languages.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Cross-lingual transfer (XLT) driven by massively multilingual language models (mmLMs) has been shown largely ineffective for low-resource (LR) target languages with little (or no) representation in mmLM’s pretraining, especially if they are linguistically distant from the high-resource (HR) source language. Much of the recent focus in XLT research has been dedicated to LR language families, i.e., families without any HR languages (e.g., families of African languages or indigenous languages of the Americas). In this work, in contrast, we investigate a configuration that is arguably of practical relevance for more of the world’s languages: XLT to LR languages that do have a close HR relative. To explore the extent to which a HR language can facilitate transfer to its LR relatives, we (1) introduce Kardeş-NLU, an evaluation benchmark with language understanding datasets in five LR Turkic languages: Azerbaijani, Kazakh, Kyrgyz, Uzbek, and Uyghur; and (2) investigate (a) intermediate training and (b) fine-tuning strategies that leverage Turkish in XLT to these target languages. Our experimental results show that both - integrating Turkish in intermediate training and in downstream fine-tuning - yield substantial improvements in XLT to LR Turkic languages. Finally, we benchmark cutting-edge instruction-tuned large language models on Kardeş-NLU, showing that their performance is highly task- and language-dependent.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[41]
S. Zhang, P. Wicke, L. K. Senel, L. Figueredo, A. Naceri, S. Haddadin, B. Plank and H. Schütze.
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation.
6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Shengqiang Zhang

Shengqiang Zhang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[40]
V. Hangya, S. Severini, R. Ralev, A. Fraser and H. Schütze.
Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
MCML Authors
Link to Viktor Hangya

Viktor Hangya

Dr.

* Former member

B2 | Natural Language Processing

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[39]
L. Weissweiler, V. Hofmann, A. Kantharuban, A. Cai, R. Dutt, A. Hengle, A. Kabra, A. Kulkarni, A. Vijayakumar, H. Yu, H. Schütze, K. Oflazer and D. Mortensen.
Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Valentin Hofmann

Valentin Hofmann

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[38]
A. H. Kargaran, A. Imani, F. Yvon and H. Schütze.
GlotLID: Language Identification for Low-Resource Languages.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI. GitHub.
MCML Authors
Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[37]
A. Köksal, T. Schick and H. Schütze.
MEAL: Stable and Active Learning for Few-Shot Prompting.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI. GitHub.
MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[36]
A. Köksal, O. Yalcin, A. Akbiyik, M. Kilavuz, A. Korhonen and H. Schütze.
Language-Agnostic Bias Detection in Language Models with Bias Probing.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI. GitHub.
MCML Authors
Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[35]
Y. Liu, H. Ye, L. Weissweiler, R. Pei and H. Schütze.
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[34]
E. Nie, H. Schmid and H. Schütze.
Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[33]
E. Nie, H. Schmid and H. Schütze.
Cross-Lingual Constituency Parsing for Middle High German: A Delexicalized Approach.
1st Workshop on Ancient Language Processing (ALP 2023) co-located with the Conference on Recent Advances in Natural Language Processing (RANLP 2023). Varna, Bulgaria, Sep 08, 2023. URL.
MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[32]
Y. Liu, A. Chronopoulou, H. Schütze and A. Fraser.
On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss.
20th International Conference on Spoken Language Translation (IWSLT 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

B2 | Natural Language Processing


[31]
A. Imani, P. Lin, A. H. Kargaran, S. Severini, M. J. Sabet, N. Kassner, C. Ma, H. Schmid, A. F. T. Martins, F. Yvon and H. Schütze.
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages.
61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI. GitHub.
MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Peiqin Lin

Peiqin Lin

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Amir Hossein Kargaran

Amir Hossein Kargaran

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

B2 | Natural Language Processing

Link to Nora Kassner

Nora Kassner

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[30]
Y. Liu, H. Ye, L. Weissweiler, P. Wicke, R. Pei, R. Zangenfeind and H. Schütze.
A Crosslingual Investigation of Conceptualization in 1335 Languages.
61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
MCML Authors
Link to Yihong Liu

Yihong Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Haotian Ye

Haotian Ye

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[29]
Y. Liu, S. Feng, D. Wang, Y. Zhang and H. Schütze.
PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism.
61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
MCML Authors
Link to Yongkang Liu

Yongkang Liu

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[28]
E. Nie, S. Liang, H. Schmid and H. Schütze.
Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages.
61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
MCML Authors
Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Sheng Liang

Sheng Liang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[27]
Z. Han, R. Liao, J. Gu, Y. Zhang, Z. Ding, Y. Gu, H. Köppl, H. Schütze and V. Tresp.
ECOLA: Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations.
Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
Abstract

Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts. However, existing enhancement approaches cannot apply to temporal knowledge graphs (tKGs), which contain time-dependent event knowledge with complex temporal dynamics. Specifically, existing enhancement approaches often assume knowledge embedding is time-independent. In contrast, the entity embedding in tKG models usually evolves, which poses the challenge of aligning temporally relevant texts with entities. To this end, we propose to study enhancing temporal knowledge embedding with textual data in this paper. As an approach to this task, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which takes the temporal aspect into account and injects textual information into temporal knowledge embedding. To evaluate ECOLA, we introduce three new datasets for training and evaluating ECOLA. Extensive experiments show that ECOLA significantly enhances temporal KG embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task.

MCML Authors
Link to Ruotong Liao

Ruotong Liao

Database Systems & Data Mining

A3 | Computational Models

Link to Yao Zhang

Yao Zhang

Database Systems & Data Mining

A3 | Computational Models

Link to Zifeng Ding

Zifeng Ding

Database Systems & Data Mining

A3 | Computational Models

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

A3 | Computational Models


[26]
P. Wicke, L. K. Senel, S. Zhang, L. Figueredo, A. Naceri, S. Haddadin and H. Schütze.
Towards Language-Based Modulation of Assistive Robots through Multimodal Models.
2nd Geriatronics Summit (Geriatronics Summit 2023). Garmisch-Partenkirchen, Germany, Jul 02-03, 2023. arXiv.
Abstract

In the field of Geriatronics, enabling effective and transparent communication between humans and robots is crucial for enhancing the acceptance and performance of assistive robots. Our early-stage research project investigates the potential of language-based modulation as a means to improve human-robot interaction. We propose to explore real-time modulation during task execution, leveraging language cues, visual references, and multimodal inputs. By developing transparent and interpretable methods, we aim to enable robots to adapt and respond to language commands, enhancing their usability and flexibility. Through the exchange of insights and knowledge at the workshop, we seek to gather valuable feedback to advance our research and contribute to the development of interactive robotic systems for Geriatronics and beyond.

MCML Authors
Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Shengqiang Zhang

Shengqiang Zhang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[25]
V. Blaschke, H. Schütze and B. Plank.
A Survey of Corpora for Germanic Low-Resource Languages and Dialects.
24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023). Tórshavn, Faroe Islands, May 22-24, 2023. URL.
MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[24]
V. Blaschke, H. Schütze and B. Plank.
Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages.
10th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023) at the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI.
MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[23]
X. Wang, L. Weissweiler, H. Schütze and B. Plank.
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives.
17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI.
MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[22]
A. Modarressi, A. Imani, M. Fayyaz and H. Schütze.
RET-LLM: Towards a General Read-Write Memory for Large Language Models.
Preprint at arXiv (May. 2023). arXiv.
MCML Authors
Link to Ali Modarressi

Ali Modarressi

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[21]
L. Weissweiler, T. He, N. Otani, D. R. Mortensen, L. Levin and H. Schütze.
Construction Grammar Provides Unique Insight into Neural Language Models.
Georgetown University Round Table on Linguistics (GURT 2023). Washington D.C., USA, Mar 09-12, 2023. URL.
Abstract

Construction Grammar (CxG) has recently been used as the basis for probing studies that have investigated the performance of large pre- trained language models (PLMs) with respect to the structure and meaning of constructions. In this position paper, we make suggestions for the continuation and augmentation of this line of research. We look at probing methodology that was not designed with CxG in mind, as well as probing methodology that was designed for specific constructions. We analyse selected previous work in detail, and provide our view of the most important challenges and research questions that this promising new field faces.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[20]
A. Imani, S. Severini, M. J. Sabet, F. Yvon and H. Schütze.
Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. URL.
Abstract

Part-of-Speech (POS) tagging is an important component of the NLP pipeline, but many low-resource languages lack labeled data for training. An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages. In this paper, we propose a novel method for transferring labels from multiple high-resource source to low-resource target languages. We formalize POS tag projection as graph-based label propagation. Given translations of a sentence in multiple languages, we create a graph with words as nodes and alignment links as edges by aligning words for all language pairs. We then propagate node labels from source to target using a Graph Neural Network augmented with transformer layers. We show that our propagation creates training sets that allow us to train POS taggers for a diverse set of languages. When combined with enhanced contextualized embeddings, our method achieves a new state-of-the-art for unsupervised POS tagging of low-resource languages.

MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[19]
L. Weissweiler, V. Hofmann, A. Köksal and H. Schütze.
The better your Syntax, the better your Semantics? Probing Pretrained Language Models for the English Comparative Correlative.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. URL.
Abstract

Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step towards assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models’ behaviour in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Valentin Hofmann

Valentin Hofmann

Dr.

* Former member

B2 | Natural Language Processing

Link to Abdullatif Köksal

Abdullatif Köksal

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[18]
A. Maronikolakis, P. Baader and H. Schütze.
Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes.
4th Workshop on Gender Bias in Natural Language Processing (GeBNLP 2022). Seattle, WA, USA, Jul 15, 2022. DOI.
MCML Authors
Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[17]
S. Yuan, A. Maronikolakis and H. Schütze.
Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing.
6th Workshop on Online Abuse and Harms (WOAH 2022). Seattle, WA, USA, Jul 14, 2022. DOI.
MCML Authors
Link to Antonis Maronikolakis

Antonis Maronikolakis

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[16]
S. Severini, A. Imani, P. Dufter and H. Schütze.
Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages.
13th International Conference on Language Resources and Evaluation (LREC 2022). Marseille, France, Jun 21-23, 2022. URL.
Abstract

Parallel corpora are ideal for extracting a multilingual named entity (MNE) resource, i.e., a dataset of names translated into multiple languages. Prior work on extracting MNE datasets from parallel corpora required resources such as large monolingual corpora or word aligners that are unavailable or perform poorly for underresourced languages. We present CLC-BN, a new method for creating an MNE resource, and apply it to the Parallel Bible Corpus, a corpus of more than 1000 languages. CLC-BN learns a neural transliteration model from parallel-corpus statistics, without requiring any other bilingual resources, word aligners, or seed data. Experimental results show that CLC-BN clearly outperforms prior work. We release an MNE resource for 1340 languages and demonstrate its effectiveness in two downstream tasks: knowledge graph augmentation and bilingual lexicon induction.

MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[15]
V. Steinborn, P. Dufter, H. Jabbar and H. Schütze.
An Information-Theoretic Approach and Dataset for Probing Gender Stereotypes in Multilingual Masked Language Models.
Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022). Seattle, WA, USA, Jun 10-15, 2022. DOI.
Abstract

Bias research in NLP is a rapidly growing and developing field. Similar to CrowS-Pairs (Nangia et al., 2020), we assess gender bias in masked-language models (MLMs) by studying pairs of sentences with gender swapped person references.Most bias research focuses on and often is specific to English.Using a novel methodology for creating sentence pairs that is applicable across languages, we create, based on CrowS-Pairs, a multilingual dataset for English, Finnish, German, Indonesian and Thai.Additionally, we propose SJSD, a new bias measure based on Jensen–Shannon divergence, which we argue retains more information from the model output probabilities than other previously proposed bias measures for MLMs.Using multilingual MLMs, we find that SJSD diagnoses the same systematic biased behavior for non-English that previous studies have found for monolingual English pre-trained MLMs. SJSD outperforms the CrowS-Pairs measure, which struggles to find such biases for smaller non-English datasets.

MCML Authors
Link to Victor Steinborn

Victor Steinborn

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[14]
M. Zhao, F. Mi, Y. Wang, M. Li, X. Jiang, Q. Liu and H. Schütze.
LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework.
Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022). Seattle, WA, USA, Jun 10-15, 2022. DOI.
Abstract

Vast efforts have been devoted to creating high-performance few-shot learners, i.e., large-scale pretrained language models (PLMs) that perform well with little downstream task training data. Training PLMs has incurred significant cost, but utilizing the few-shot learners is still challenging due to their enormous size. This work focuses on a crucial question: How to make effective use of these few-shot learners? We propose LMTurk, a novel approach that treats few-shotlearners as crowdsourcing workers. The rationale is that crowdsourcing workers are in fact few-shot learners: They are shown a few illustrative examples to learn about a task and then start annotating. LMTurk employs few-shot learners built upon PLMs as workers. We show that the resulting annotations can be utilized to train models that solve the task well and are small enough to be deployable in practical scenarios. Active learning is integrated into LMTurk to reduce the amount of queries made to PLMs, minimizing the computational cost of running PLM inference passes. Altogether, LMTurk is an important step towards making effective use of current PLMs.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[13]
L. Weissweiler, V. Hofmann, M. J. Sabet and H. Schütze.
CaMEL: Case Marker Extraction without Labels.
60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). Dublin, Ireland, May 22-27, 2022. DOI.
Abstract

We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a silver standard from UniMorph. The case markers extracted by our model can be used to detect and visualise similarities and differences between the case systems of different languages as well as to annotate fine-grained deep cases in languages in which they are not overtly marked.

MCML Authors
Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Valentin Hofmann

Valentin Hofmann

Dr.

* Former member

B2 | Natural Language Processing

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[12]
S. Sharifzadeh, S. M. Baharlou, M. Schmitt, H. Schütze and V. Tresp.
Improving Scene Graph Classification by Exploiting Knowledge from Texts.
36th Conference on Artificial Intelligence (AAAI 2022). Virtual, Feb 22-Mar 01, 2022. DOI.
MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems & Data Mining

A3 | Computational Models


[11]
Y. Elazar, N. Kassner, S. Ravfogel, A. Ravichander, E. Hovy, H. Schütze and Y. Goldberg.
Measuring and Improving Consistency in Pretrained Language Models.
Transactions of the Association for Computational Linguistics 9 (Dec. 2021). DOI.
Abstract

Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel, we show that the consistency of all PLMs we experiment with is poor— though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.

MCML Authors
Link to Nora Kassner

Nora Kassner

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[10]
A. Imani, M. J. Sabet, L. K. Senel, P. Philipp, F. Yvon and H. Schütze.
Graph Algorithms for Multiparallel Word Alignment.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). Punta Cana, Dominican Republic, Nov 07-11, 2021. DOI.
Abstract

With the advent of end-to-end deep learning approaches in machine translation, interest in word alignments initially decreased; however, they have again become a focus of research more recently. Alignments are useful for typological research, transferring formatting like markup to translated texts, and can be used in the decoding of machine translation systems. At the same time, massively multilingual processing is becoming an important NLP scenario, and pretrained language and machine translation models that are truly multilingual are proposed. However, most alignment algorithms rely on bitexts only and do not leverage the fact that many parallel corpora are multiparallel. In this work, we exploit the multiparallelity of corpora by representing an initial set of bilingual alignments as a graph and then predicting additional edges in the graph. We present two graph algorithms for edge prediction: one inspired by recommender systems and one based on network link prediction. Our experimental results show absolute improvements in F1 of up to 28{%} over the baseline bilingual word aligner in different datasets.

MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

B2 | Natural Language Processing

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[9]
N. Kassner, O. Tafjord, H. Schütze and P. Clark.
BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). Punta Cana, Dominican Republic, Nov 07-11, 2021. DOI.
Abstract

Although pretrained language models (PTLMs) contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after specialized training. As a result, it can be hard to identify what the model actually “believes” about the world, making it susceptible to inconsistent behavior and simple errors. Our goal is to reduce these problems. Our approach is to embed a PTLM in a broader system that also includes an evolving, symbolic memory of beliefs – a BeliefBank – that records but then may modify the raw PTLM answers. We describe two mechanisms to improve belief consistency in the overall system. First, a reasoning component – a weighted MaxSAT solver – revises beliefs that significantly clash with others. Second, a feedback component issues future queries to the PTLM using known beliefs as context. We show that, in a controlled experimental setting, these two mechanisms result in more consistent beliefs in the overall system, improving both the accuracy and consistency of its answers over time. This is significant as it is a first step towards PTLM-based architectures with a systematic notion of belief, enabling them to construct a more coherent picture of the world, and improve over time without model retraining.

MCML Authors
Link to Nora Kassner

Nora Kassner

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[8]
A. Imani, M. J. Sabet, P. Dufter, M. Cysouw and H. Schütze.
ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus.
Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021). Bangkok, Thailand, Aug 01-06, 2021. DOI.
Abstract

With more than 7000 languages worldwide, multilingual natural language processing (NLP) is essential both from an academic and commercial perspective. Researching typological properties of languages is fundamental for progress in multilingual NLP. Examples include assessing language similarity for effective transfer learning, injecting inductive biases into machine learning models or creating resources such as dictionaries and inflection tables. We provide ParCourE, an online tool that allows to browse a word-aligned parallel corpus, covering 1334 languages. We give evidence that this is useful for typological research. ParCourE can be set up for any parallel corpus and can thus be used for typological research on other corpora as well as for exploring their quality and properties.

MCML Authors
Link to Ayyoob Imani

Ayyoob Imani

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Masoud Jalili Sabet

Masoud Jalili Sabet

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[7]
P. Dufter, N. Kassner and H. Schütze.
Static Embeddings as Efficient Knowledge Bases?.
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021). Virtual, Jun 06-11, 2021. DOI.
Abstract

Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as 'Paris is the capital of [MASK]' are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically diverse languages, we study knowledge contained in static embeddings. We show that, when restricting the output space to a candidate set, simple nearest neighbor matching using static embeddings performs better than PLMs. E.g., static embeddings perform 1.6% points better than BERT while just using 0.3% of energy for training. One important factor in their good comparative performance is that static embeddings are standardly learned for a large vocabulary. In contrast, BERT exploits its more sophisticated, but expensive ability to com- pose meaningful representations from a much smaller subword vocabulary.

MCML Authors
Link to Nora Kassner

Nora Kassner

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[6]
N. Kassner, P. Dufter and H. Schütze.
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models.
16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021). Virtual, Apr 19-23, 2021. DOI.
Abstract

Recently, it has been found that monolingual English language models can be used as knowledge bases. Instead of structural knowledge base queries, masked sentences such as “Paris is the capital of [MASK]” are used as probes. We translate the established benchmarks TREx and GoogleRE into 53 languages. Working with mBERT, we investigate three questions. (i) Can mBERT be used as a multilingual knowledge base? Most prior work only considers English. Extending research to multiple languages is important for diversity and accessibility. (ii) Is mBERT’s performance as knowledge base language-independent or does it vary from language to language? (iii) A multilingual model is trained on more text, e.g., mBERT is trained on 104 Wikipedias. Can mBERT leverage this for better performance? We find that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance. Conversely, mBERT exhibits a language bias; e.g., when queried in Italian, it tends to predict Italy as the country of origin.

MCML Authors
Link to Nora Kassner

Nora Kassner

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[5]
E. Asgari, M. J. Sabet, P. Dufter, C. Ringlstetter and H. Schütze.
Subword Sampling for Low Resource Word Alignment.
Preprint at arXiv (Dec. 2020). arXiv.
Abstract

Annotation projection is an important area in NLP that can greatly contribute to creating language resources for low-resource languages. Word alignment plays a key role in this setting. However, most of the existing word alignment methods are designed for a high resource setting in machine translation where millions of parallel sentences are available. This amount reduces to a few thousands of sentences when dealing with low-resource languages failing the existing established IBM models. In this paper, we propose subword sampling-based alignment of text units. This method's hypothesis is that the aggregation of different granularities of text for certain language pairs can help word-level alignment. For certain languages for which gold-standard alignments exist, we propose an iterative Bayesian optimization framework to optimize selecting possible subwords from the space of possible subword representations of the source and target sentences. We show that the subword sampling method consistently outperforms word-level alignment on six language pairs: English-German, English-French, English-Romanian, English-Persian, English-Hindi, and English-Inuktitut. In addition, we show that the hyperparameters learned for certain language pairs can be applied to other languages at no supervision and consistently improve the alignment results. We observe that using 5K parallel sentences together with our proposed subword sampling approach, we obtain similar F1 scores to the use of 100K's of parallel sentences in existing word-level fast-align/eflomal alignment methods.

MCML Authors
Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[4]
N. Kassner, B. Krojer and H. Schütze.
Are Pretrained Language Models Symbolic Reasoners over Knowledge?.
24th Conference on Computational Natural Language Learning (CoNLL 2020). Virtual, Nov 19-20, 2020. DOI.
Abstract

How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs seem to learn to apply some symbolic reasoning rules correctly but struggle with others, including two-hop reasoning. Further analysis suggests that even the application of learned reasoning rules is flawed. For memorization, we identify schema conformity (facts systematically supported by other facts) and frequency as key factors for its success.

MCML Authors
Link to Nora Kassner

Nora Kassner

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[3]
N. Kassner and H. Schütze.
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). Virtual, Nov 16-20, 2020. DOI.
Abstract

Khandelwal et al. (2020) use a k-nearest-neighbor (kNN) component to improve language model performance. We show that this idea is beneficial for open-domain question answering (QA). To improve the recall of facts encountered during training, we combine BERT (Devlin et al., 2019) with a traditional information retrieval step (IR) and a kNN search over a large datastore of an embedded text collection. Our contributions are as follows: i) BERT-kNN outperforms BERT on cloze-style QA by large margins without any further training. ii) We show that BERT often identifies the correct response category (e.g., US city), but only kNN recovers the factually correct answer (e.g.,“Miami”). iii) Compared to BERT, BERT-kNN excels for rare facts. iv) BERT-kNN can easily handle facts not covered by BERT’s training set, e.g., recent events.

MCML Authors
Link to Nora Kassner

Nora Kassner

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[2]
N. Kassner and H. Schütze.
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly.
58th Annual Meeting of the Association for Computational Linguistics (ACL 2020). Virtual, Jul 05-10, 2020. DOI.
Abstract

Building on Petroni et al. 2019, we propose two new probing tasks analyzing factual knowledge stored in Pretrained Language Models (PLMs). (1) Negation. We find that PLMs do not distinguish between negated (‘‘Birds cannot [MASK]”) and non-negated (‘‘Birds can [MASK]”) cloze questions. (2) Mispriming. Inspired by priming methods in human psychology, we add “misprimes” to cloze questions (‘‘Talk? Birds can [MASK]”). We find that PLMs are easily distracted by misprimes. These results suggest that PLMs still have a long way to go to adequately learn human-like factual knowledge.

MCML Authors
Link to Nora Kassner

Nora Kassner

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[1]
A. Beyer, G. Kauermann and H. Schütze.
Embedding Space Correlation as a Measure of Domain Similarity.
12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 13-15, 2020. URL.
Abstract

Prior work has determined domain similarity using text-based features of a corpus. However, when using pre-trained word embeddings, the underlying text corpus might not be accessible anymore. Therefore, we propose the CCA measure, a new measure of domain similarity based directly on the dimension-wise correlations between corresponding embedding spaces. Our results suggest that an inherent notion of domain can be captured this way, as we are able to reproduce our findings for different domain comparisons for English, German, Spanish and Czech as well as in cross-lingual comparisons. We further find a threshold at which the CCA measure indicates that two corpora come from the same domain in a monolingual setting by applying permutation tests. By evaluating the usability of the CCA measure in a domain adaptation application, we also show that it can be used to determine which corpora are more similar to each other in a cross-domain sentiment detection task.

MCML Authors
Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

A1 | Statistical Foundations & Explainability

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing