Home  | Publications | BCW+24

Data Augmentation Through Back-Translation for Stereotypes and Irony Detection

MCML Authors

Abstract

Complex linguistic phenomena such as stereotypes or irony are still challenging to detect, particularly due to the lower availability of annotated data. In this paper, we explore Back-Translation (BT) as a data augmentation method to enhance such datasets by artificially introducing semantics-preserving variations. We investigate French and Italian as source languages on two multilingual datasets annotated for the presence of stereotypes or irony and evaluate French/Italian, English, and Arabic as pivot languages for the BT process. We also investigate cross-translation, i.e., augmenting one language subset of a multilingual dataset with translated instances from the other languages. We conduct an intrinsic evaluation of the quality of back-translated instances, identifying linguistic or translation model-specific errors that may occur with BT. We also perform an extrinsic evaluation of different data augmentation configurations to train a multilingual Transformer-based classifier for stereotype or irony detection on mono-lingual data.

inproceedings


CLiC-it 2024

10th Italian Conference on Computational Linguistics. Pisa, Italy, Dec 04-06, 2024.

Authors

T. Bourgeade • S. Casola • A. M. Wizani • C. Bosco

Links

URL

Research Area

 B2 | Natural Language Processing

BibTeXKey: BCW+24

Back to Top