Home  | Publications | Lin25

Massively Multilingual Language Modeling and Adaptation

MCML Authors

Abstract

This dissertation advances multilingual NLP for low-resource and marginalized languages by addressing challenges in data availability, model adaptation, and cross-lingual transfer. It introduces Taxi1500, a large-scale dataset covering over 1,500 languages, proposes a new language similarity metric based on conceptual alignment, and develops methods to extend pretrained models to previously unsupported languages. The work further applies these approaches to privacy-preserving hate speech detection, promoting more inclusive and equitable language technologies. (Shortened.)

phdthesis Lin25


Dissertation

LMU München. Oct. 2025

Authors

P. Lin

Links

DOI

Research Area

 B2 | Natural Language Processing

BibTeXKey: Lin25

Back to Top