Home  | Publications | HSF22

Improving Low-Resource Languages in Pre-Trained Multilingual Language Models

MCML Authors

Link to Profile Alexander Fraser PI Matchmaking

Alexander Fraser

Prof. Dr.

Principal Investigator

Abstract

Pre-trained multilingual language models are the foundation of many NLP approaches, including cross-lingual transfer solutions. However, languages with small available monolingual corpora are often not well-supported by these models leading to poor performance. We propose an unsupervised approach to improve the cross-lingual representations of low-resource languages by bootstrapping word translation pairs from monolingual corpora and using them to improve language alignment in pre-trained language models. We perform experiments on nine languages, using contextual word retrieval and zero-shot named entity recognition to measure both intrinsic cross-lingual word representation quality and downstream task performance, showing improvements on both tasks. Our results show that it is possible to improve pre-trained multilingual language models by relying only on non-parallel resources.

inproceedings


EMNLP 2022

Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022.
Conference logo
A* Conference

Authors

V. Hangya • H. S. Saadi • A. Fraser

Links

DOI

Research Area

 B2 | Natural Language Processing

BibTeXKey: HSF22

Back to Top