Construction-Based Reduction of Translationese for Low-Resource Languages: A Pilot Study on Bavarian
MCML Authors
Peiqin Lin
Dr.
* Former Member
Abstract
Peiqin Lin
Dr.
* Former Member
Abstract
When translating into a low-resource language, a language model can have a tendency to produce translations that are close to the source (e.g., word-by-word translations) due to a lack of rich low-resource training data in pretraining. Thus, the output often is translationese that differs considerably from what native speakers would produce naturally. To remedy this, we synthetically create a training set in which the frequency of a construction unique to the low-resource language is artificially inflated. For the case of Bavarian, we show that, after training, the language model has learned the unique construction and that native speakers judge its output as more natural. Our pilot study suggests that construction-based mitigation of translationese is a promising approach.
inproceedings LTG+25
SIGTYP @ACL 2025
7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP at the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025.Authors
P. Lin • M. Thaler • D. Goschala • A. H. Kargaran • Y. Liu • A. Martins • H. SchützeLinks
DOIResearch Area
BibTeXKey: LTG+25