Home | Publications | BP22a

CrossRE: A Cross-Domain Dataset for Relation Extraction

MCML Authors

Barbara Plank

Prof. Dr.

Core PI

AI and Computational Linguistics

Abstract

Relation Extraction (RE) has attracted increasing attention, but current RE evaluation is limited to in-domain evaluation setups. Little is known on how well a RE system fares in challenging, but realistic out-of-distribution evaluation setups. To address this gap, we propose CrossRE, a new, freely-available cross-domain benchmark for RE, which comprises six distinct text domains and includes multi-label annotations. An additional innovation is that we release meta-data collected during annotation, to include explanations and flags of difficult instances. We provide an empirical evaluation with a state-of-the-art model for relation classification. As the meta-data enables us to shed new light on the state-of-the-art model, we provide a comprehensive analysis on the impact of difficult cases and find correlations between model and human annotations. Overall, our empirical investigation highlights the difficulty of cross-domain RE. We release our dataset, to spur more research in this direction.

inproceedings BP22a

Findings @EMNLP 2022

Findings of the Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022.

Authors

E. Bassignana • B. Plank

Links

DOI

Research Area

B2 | Natural Language Processing

BibTeXKey: BP22a

#p-plank