Home | Research | Groups | Barbara Plank

Research Group Barbara Plank

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Barbara Plank

heads the Chair for AI and Computational Linguistics at LMU Munich.

Her lab carries out research in Natural Language Processing, an interdisciplinary subdiscipline of Artificial Intelligence at the interface of computer science, linguistics and cognitive science. In broad terms, the aim is human-facing NLP: to make NLP models more robust and inclusive, so that they can deal better with underlying shifts in data due to language variation, are fairer and embrace human label variation.

Team members @MCML

Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Silvia Casola

Silvia Casola

Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Beiduo Chen

Beiduo Chen

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Felicia Körner

Felicia Körner

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Robert Litschko

Robert Litschko

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Yang Janet Liu

Yang Janet Liu

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Andreas Säuberli

Andreas Säuberli

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Soh-Eun Shim

Soh-Eun Shim

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Shijia Zhou

Shijia Zhou

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Publications @MCML

[35]
Y. Zhang, Y. Li, X. Wang, Q. Shen, B. Plank, B. Bischl, M. Rezaei and K. Kawaguchi.
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models.
Workshop on Machine Learning and Compression at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Vancouver, Canada, Dec 10-15, 2024. To be published. Preprint at arXiv.
Abstract

Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all self-attention and feed-forward network (FFN) layers within blocks as individual pruning candidates. FinerCut prunes layers whose removal causes minimal alternation to the model's output -- contributing to a new, lean, interpretable, and task-agnostic pruning method. Tested across 9 benchmarks, our approach retains 90% performance of Llama3-8B with 25% layers removed, and 95% performance of Llama3-70B with 30% layers removed, all without fine-tuning or post-pruning reconstruction. Strikingly, we observe intriguing results with FinerCut: 42% (34 out of 80) of the self-attention layers in Llama3-70B can be removed while preserving 99% of its performance -- without additional fine-tuning after removal. Moreover, FinerCut provides a tool to inspect the types and locations of pruned layers, allowing to observe interesting pruning behaviors. For instance, we observe a preference for pruning self-attention layers, often at deeper consecutive decoder layers. We hope our insights inspire future efficient LLM architecture designs.

MCML Authors
Link to Yawei Li

Yawei Li

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Mina Rezaei

Mina Rezaei

Dr.

Statistical Learning & Data Science

Education Coordination

A1 | Statistical Foundations & Explainability


[34]
P. Mondorf and B. Plank.
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Knights and knaves problems represent a classic genre of logical puzzles where characters either tell the truth or lie. The objective is to logically deduce each character's identity based on their statements. The challenge arises from the truth-telling or lying behavior, which influences the logical implications of each statement. Solving these puzzles requires not only direct deductions from individual statements, but the ability to assess the truthfulness of statements by reasoning through various hypothetical scenarios. As such, knights and knaves puzzles serve as compelling examples of suppositional reasoning. In this paper, we introduce TruthQuest, a benchmark for suppositional reasoning based on the principles of knights and knaves puzzles. Our benchmark presents problems of varying complexity, considering both the number of characters and the types of logical statements involved. Evaluations on TruthQuest show that large language models like Llama 3 and Mixtral-8x7B exhibit significant difficulties solving these tasks. A detailed error analysis of the models' output reveals that lower-performing models exhibit a diverse range of reasoning errors, frequently failing to grasp the concept of truth and lies. In comparison, more proficient models primarily struggle with accurately inferring the logical implications of potentially false statements.

MCML Authors
Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[33]
B. Chen, X. Wang, S. Peng, R. Litschko, A. Korhonen and B. Plank.
'Seeing the Big through the Small': Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Human label variation (HLV) is a valuable source of information that arises when multiple human annotators provide different labels for valid reasons. In Natural Language Inference (NLI) earlier approaches to capturing HLV involve either collecting annotations from many crowd workers to represent human judgment distribution (HJD) or use expert linguists to provide detailed explanations for their chosen labels. While the former method provides denser HJD information, obtaining it is resource-intensive. In contrast, the latter offers richer textual information but it is challenging to scale up to many human judges. Besides, large language models (LLMs) are increasingly used as evaluators ('LLM judges') but with mixed results, and few works aim to study HJDs. This study proposes to exploit LLMs to approximate HJDs using a small number of expert labels and explanations. Our experiments show that a few explanations significantly improve LLMs' ability to approximate HJDs with and without explicit labels, thereby providing a solution to scale up annotations for HJD. However, fine-tuning smaller soft-label aware models with the LLM-generated model judgment distributions (MJDs) presents partially inconsistent results: while similar in distance, their resulting fine-tuned models and visualized distributions differ substantially. We show the importance of complementing instance-level distance measures with a global-level shape metric and visualization to more effectively evaluate MJDs against human judgment distributions.

MCML Authors
Link to Beiduo Chen

Beiduo Chen

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Robert Litschko

Robert Litschko

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[32]
B. Ma, X. Wang, T. Hu, A.-C. Haensch, M. A. Hedderich, B. Plank and F. Kreuter.
The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may capture and convey. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOVs). However, measuring AOVs embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of clarity on how different studies are related to each other and how they can be interpreted. This paper aims to bridge this gap by providing a comprehensive overview of recent works on the evaluation of AOVs in LLMs. Moreover, we survey related approaches in different stages of the evaluation pipeline in these works. By doing so, we address the potential and challenges with respect to understanding the model, human-AI alignment, and downstream application in social sciences. Finally, we provide practical insights into evaluation methods, model enhancement, and interdisciplinary collaboration, thereby contributing to the evolving landscape of evaluating AOVs in LLMs.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Michael Hedderich

Michael Hedderich

Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[31]
A. Sedova, R. Litschko, D. Frassinelli, B. Roth and B. Plank.
To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.

MCML Authors
Link to Robert Litschko

Robert Litschko

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[30]
P. Mondorf and B. Plank.
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models--A Survey.
Conference on Language Modeling (COLM 2024). Philadelphia, PA, USA, Oct 07-09, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[29]
X. Wang, C. Hu, B. Ma, P. Rottger and B. Plank.
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think.
Conference on Language Modeling (COLM 2024). Philadelphia, PA, USA, Oct 07-09, 2024. To be published. Preprint at arXiv. arXiv.
MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[28]
V. Blaschke, C. Purschke, H. Schütze and B. Plank.
What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
Abstract

Natural language processing (NLP) has largely focused on modelling standardized languages. More recently, attention has increasingly shifted to local, non-standardized languages and dialects. However, the relevant speaker populations’ needs and wishes with respect to NLP tools are largely unknown. In this paper, we focus on dialects and regional languages related to German – a group of varieties that is heterogeneous in terms of prestige and standardization. We survey speakers of these varieties (N=327) and present their opinions on hypothetical language technologies for their dialects. Although attitudes vary among subgroups of our respondents, we find that respondents are especially in favour of potential NLP tools that work with dialectal input (especially audio input) such as virtual assistants, and less so for applications that produce dialectal output such as machine translation or spellcheckers.

MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[27]
P. Mondorf and B. Plank.
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
Abstract

Deductive reasoning plays a pivotal role in the formulation of sound and cohesive arguments. It allows individuals to draw conclusions that logically follow, given the truth value of the information provided. Recent progress in the domain of large language models (LLMs) has showcased their capability in executing deductive reasoning tasks. Nonetheless, a significant portion of research primarily assesses the accuracy of LLMs in solving such tasks, often overlooking a deeper analysis of their reasoning behavior. In this study, we draw upon principles from cognitive psychology to examine inferential strategies employed by LLMs, through a detailed evaluation of their responses to propositional logic problems. Our findings indicate that LLMs display reasoning patterns akin to those observed in humans, including strategies like supposition following or chain construction. Moreover, our research demonstrates that the architecture and scale of the model significantly affect its preferred method of reasoning, with more advanced models tending to adopt strategies more frequently than less sophisticated ones. Importantly, we assert that a model's accuracy, that is the correctness of its final conclusion, does not necessarily reflect the validity of its reasoning process. This distinction underscores the necessity for more nuanced evaluation procedures in the field.

MCML Authors
Link to Philipp Mondorf

Philipp Mondorf

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[26]
L. Weber-Genzel, S. Peng, M.-C. de Marneffe and B. Plank.
VariErr NLI: Separating Annotation Error from Human Label Variation.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
MCML Authors
Link to Leon Weber-Genzel

Leon Weber-Genzel

Dr.

* Former member

B2 | Natural Language Processing

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[25]
S. Xu, S. T.y.s.s, O. Ichim, B. Plank and M. Grabmair.
Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification.
62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[24]
X. Wang, B. Ma, C. Hu, L. Weber-Genzel, P. Röttger, F. Kreuter, D. Hovy and B. Plank.
My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Leon Weber-Genzel

Leon Weber-Genzel

Dr.

* Former member

B2 | Natural Language Processing

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[23]
S. Eckman, B. Plank and F. Kreuter.
Position: Insights from Survey Methodology can Improve Training Data.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[22]
V. Blaschke, B. Kovačić, S. Peng, H. Schütze and B. Plank.
MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank.
Joint International Conference on Computational Linguistics, Language Resources and Evalutaion (LREC-COLING 2024). Torino, Italy, May 20-25, 2024. URL.
Abstract

Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in `within-language breadth': most treebanks focus on standard languages. Even for German, the language with the most annotations in UD, so far no treebank exists for one of its language varieties spoken by over 10M people: Bavarian. To contribute to closing this gap, we present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in UD, covering multiple text genres (wiki, fiction, grammar examples, social, non-fiction). We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies. Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries. We provide baseline parsing and POS tagging results, which are lower than results obtained on German and vary substantially between different graph-based parsers. To support further research on Bavarian syntax, we make our dataset, language-specific guidelines and code publicly available.

MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Siyao Peng

Siyao Peng

Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[21]
C. Gruber, K. Hechinger, M. Aßenmacher, G. Kauermann and B. Plank.
More Labels or Cases? Assessing Label Variation in Natural Language Inference.
3rd Workshop on Understanding Implicit and Underspecified Language (UnImplicit 2024). Malta, Mar 21, 2024. URL.
MCML Authors
Link to Matthias Aßenmacher

Matthias Aßenmacher

Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Göran Kauermann

Göran Kauermann

Prof. Dr.

Applied Statistics in Social Sciences, Economics and Business

A1 | Statistical Foundations & Explainability

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[20]
E. Artemova, V. Blaschke and B. Plank.
Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Mainstream cross-lingual task-oriented dialogue (ToD) systems leverage the transfer learning paradigm by training a joint model for intent recognition and slot-filling in English and applying it, zero-shot, to other languages.We address a gap in prior research, which often overlooked the transfer to lower-resource colloquial varieties due to limited test data.Inspired by prior work on English varieties, we craft and manually evaluate perturbation rules that transform German sentences into colloquial forms and use them to synthesize test sets in four ToD datasets.Our perturbation rules cover 18 distinct language phenomena, enabling us to explore the impact of each perturbation on slot and intent performance.Using these new datasets, we conduct an experimental evaluation across six different transformers.Here, we demonstrate that when applied to colloquial varieties, ToD systems maintain their intent recognition performance, losing 6% (4.62 percentage points) in accuracy on average. However, they exhibit a significant drop in slot detection, with a decrease of 31% (21 percentage points) in slot F1 score.Our findings are further supported by a transfer experiment from Standard American English to synthetic Urban African American Vernacular English.

MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[19]
M. Zhang, R. van der Goot, M.-Y. Kan and B. Plank.
NNOSE: Nearest Neighbor Occupational Skill Extraction.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks—combining and leveraging multiple datasets for skill extraction, to identify rarely observed skills within a dataset, and overcoming the scarcity of skills across datasets. In particular, we investigate the retrieval-augmentation of language models, employing an external datastore for retrieving similar skills in a dataset-unifying manner. Our proposed method, Nearest Neighbor Occupational Skill Extraction (NNOSE) effectively leverages multiple datasets by retrieving neighboring skills from other datasets in the datastore. This improves skill extraction without additional fine-tuning. Crucially, we observe a performance gain in predicting infrequent patterns, with substantial gains of up to 30% span-F1 in cross-dataset settings.

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[18]
M. Zhang, R. van der Goot and B. Plank.
Entity Linking in the Job Market Domain.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

In Natural Language Processing, entity linking (EL) has centered around Wikipedia, but yet remains underexplored for the job market domain. Disambiguating skill mentions can help us get insight into the current labor market demands. In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014). Previous efforts linked coarse-grained (full) sentences to a corresponding ESCO skill. In this work, we link more fine-grained span-level mentions of skills. We tune two high-performing neural EL models, a bi-encoder (Wu et al., 2020) and an autoregressive model (Cao et al., 2021), on a synthetically generated mention–skill pair dataset and evaluate them on a human-annotated skill-linking benchmark. Our findings reveal that both models are capable of linking implicit mentions of skills to their correct taxonomy counterparts. Empirically, BLINK outperforms GENRE in strict evaluation, but GENRE performs better in loose evaluation (accuracy@k).

MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[17]
S. Zhang, P. Wicke, L. K. Senel, L. Figueredo, A. Naceri, S. Haddadin, B. Plank and H. Schütze.
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation.
6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL.
MCML Authors
Link to Shengqiang Zhang

Shengqiang Zhang

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Philipp Wicke

Philipp Wicke

Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Lütfi Kerem Şenel

Lütfi Kerem Şenel

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[16]
M. Giulianelli, J. Baan, W. Aziz, R. Fernández and B. Plank.
What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[15]
R. Litschko, M. Müller-Eberstein, R. van der Goot, L. Weber-Genzel and B. Plank.
Establishing Trustworthiness: Rethinking Tasks and Model Evaluation.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
MCML Authors
Link to Robert Litschko

Robert Litschko

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Leon Weber-Genzel

Leon Weber-Genzel

Dr.

* Former member

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[14]
X. Wang and B. Plank.
ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[13]
S. Xu, S. T.y.s.s, O. Ichim, I. Risini, B. Plank and M. Grabmair.
From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[12]
M. Müller-Eberstein, R. van der Goot, B. Plank and I. Titov.
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore, Dec 06-10, 2023. DOI.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[11]
L. Weber and B. Plank.
ActiveAED: A Human in the Loop Improves Annotation Error Detection.
Findings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada, Jul 09-14, 2023. DOI.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[10]
J. Baan, N. Daheim, E. Ilia, D. Ulmer, H.-S. Li, R. Fernández, B. Plank, R. Sennrich, C. Zerva and W. Aziz.
Uncertainty in Natural Language Generation: From Theory to Applications.
Preprint at arXiv (Jul. 2023). arXiv.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[9]
V. Blaschke, H. Schütze and B. Plank.
A Survey of Corpora for Germanic Low-Resource Languages and Dialects.
24th Nordic Conference on Computational Linguistics (NoDaLiDa 2023). Tórshavn, Faroe Islands, May 22-24, 2023. URL.
MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[8]
V. Blaschke, H. Schütze and B. Plank.
Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages.
10th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023) at the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI.
MCML Authors
Link to Verena Blaschke

Verena Blaschke

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[7]
X. Wang, L. Weissweiler, H. Schütze and B. Plank.
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives.
17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023). Dubrovnik, Croatia, May 02-06, 2023. DOI.
MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Leonie Weissweiler

Leonie Weissweiler

Dr.

* Former member

B2 | Natural Language Processing

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[6]
J. Baan, W. Aziz, B. Plank and R. Fernandez.
Stop Measuring Calibration When Humans Disagree.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. URL.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[5]
E. Bassignana, M. Müller-Eberstein, M. Zhang and B. Plank.
Evidence > Intuition: Transferability Estimation for Encoder Selection.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. URL.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[4]
M. Müller-Eberstein, R. van der Goot and B. Plank.
Spectral Probing.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. URL.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[3]
B. Plank.
The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation.
Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. URL.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[2]
E. Bassignana and B. Plank.
CrossRE: A Cross-Domain Dataset for Relation Extraction.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. URL.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[1]
D. Ulmer, E. Bassignana, M. Müller-Eberstein, D. Varab, M. Zhang, R. van der Goot, C. Hardmeier and B. Plank.
Experimental Standards for Deep Learning in Natural Language Processing Research.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). Abu Dhabi, United Arab Emirates, Nov 07-11, 2022. DOI.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing