Home | Publications | Lin25

Massively Multilingual Language Modeling and Adaptation

MCML Authors

Peiqin Lin

Dr.

* Former Member

→ Group Hinrich Schütze
Computational Linguistics

Abstract

This dissertation advances multilingual NLP for low-resource and marginalized languages by addressing challenges in data availability, model adaptation, and cross-lingual transfer. It introduces Taxi1500, a large-scale dataset covering over 1,500 languages, proposes a new language similarity metric based on conceptual alignment, and develops methods to extend pretrained models to previously unsupported languages. The work further applies these approaches to privacy-preserving hate speech detection, promoting more inclusive and equitable language technologies. (Shortened.)

phdthesis Lin25