Home | Publications | HSF22

Improving Low-Resource Languages in Pre-Trained Multilingual Language Models

MCML Authors

Viktor Hangya

Dr.

* Former Member

→ Group Alexander Fraser
Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Core PI

Data Analytics & Statistics

Abstract

Pre-trained multilingual language models are the foundation of many NLP approaches, including cross-lingual transfer solutions. However, languages with small available monolingual corpora are often not well-supported by these models leading to poor performance. We propose an unsupervised approach to improve the cross-lingual representations of low-resource languages by bootstrapping word translation pairs from monolingual corpora and using them to improve language alignment in pre-trained language models. We perform experiments on nine languages, using contextual word retrieval and zero-shot named entity recognition to measure both intrinsic cross-lingual word representation quality and downstream task performance, showing improvements on both tasks. Our results show that it is possible to improve pre-trained multilingual language models by relying only on non-parallel resources.

inproceedings HSF22

EMNLP 2022

Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Nov 07-11, 2022.

Authors

V. Hangya • H. S. Saadi • A. Fraser

Links

DOI

Research Area

B2 | Natural Language Processing

BibTeXKey: HSF22

#p-fraser