Home | Publications | PSL+24

Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

MCML Authors

Siyao Peng

Dr.

* Former Member

→ Group Barbara Plank
AI and Computational Linguistics

Barbara Plank

Prof. Dr.

Core PI

AI and Computational Linguistics

Abstract

Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition. While recent studies address and aim to correct annotation errors via re-labeling efforts, little is known about the sources of human label variation, such as text ambiguity, annotation error, or guideline divergence. This is especially the case for high-quality datasets and beyond English CoNLL03. This paper studies disagreements in expert-annotated named entity datasets for three languages: English, Danish, and Bavarian. We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions. We survey student annotations on a subset of difficult entities and substantiate the feasibility and necessity of manifold annotations for understanding named entity ambiguities from a distributional perspective.

inproceedings PSL+24

UnImplicit 2024

3rd Workshop on Understanding Implicit and Underspecified Language. Malta, Mar 21, 2024.

Authors

S. Peng • Z. Sun • S. Loftus • B. Plank

Links

URL

Research Area

B2 | Natural Language Processing

BibTeXKey: PSL+24

#p-plank