Improving Low-Resource Languages in Pre-Trained Multilingual Language Models
MCML Authors
Viktor Hangya
Dr.
* Former Member
Abstract
Viktor Hangya
Dr.
* Former Member
Abstract
Pre-trained multilingual language models are the foundation of many NLP approaches, including cross-lingual transfer solutions. However, languages with small available monolingual corpora are often not well-supported by these models leading to poor performance. We propose an unsupervised approach to improve the cross-lingual representations of low-resource languages by bootstrapping word translation pairs from monolingual corpora and using them to improve language alignment in pre-trained language models. We perform experiments on nine languages, using contextual word retrieval and zero-shot named entity recognition to measure both intrinsic cross-lingual word representation quality and downstream task performance, showing improvements on both tasks. Our results show that it is possible to improve pre-trained multilingual language models by relying only on non-parallel resources.
inproceedings HSF22
EMNLP 2022
Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022.Authors
V. Hangya • H. S. Saadi • A. FraserLinks
DOIResearch Area
BibTeXKey: HSF22