Peiqin Lin
Dr.
* Former Member
This dissertation advances multilingual NLP for low-resource and marginalized languages by addressing challenges in data availability, model adaptation, and cross-lingual transfer. It introduces Taxi1500, a large-scale dataset covering over 1,500 languages, proposes a new language similarity metric based on conceptual alignment, and develops methods to extend pretrained models to previously unsupported languages. The work further applies these approaches to privacy-preserving hate speech detection, promoting more inclusive and equitable language technologies. (Shortened.)
BibTeXKey: Lin25