22.03.2024


©LMU
Outstanding Paper Award at EACL 2024 for Hinrich Schütze and Lütfi Kerem Senel
Advancing NLP for Low‑resource Turkic Languages
At EACL 2024, the paper “Kardeş‑NLU: Transfer to Low‑Resource Languages with Big Brother’s Help – A Benchmark and Evaluation for Turkic Languages” received the Outstanding Paper Award.
The work, co‑authored by MCML PI Hinrich Schütze and Junior Member Lütfi Kerem Senel together with Benedikt Ebing, Konul Baghirova, and Goran Glavaš, introduces a comprehensive benchmark and evaluation framework to advance natural language understanding in low‑resource Turkic languages through cross‑lingual transfer.
Congratulations from us!
Check out the full paper:
Kardeş-NLU: Transfer to Low-Resource Languages with Big Brother’s Help – A Benchmark and Evaluation for Turkic Languages.
EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. Outstanding Paper Award. URL
Abstract
Cross-lingual transfer (XLT) driven by massively multilingual language models (mmLMs) has been shown largely ineffective for low-resource (LR) target languages with little (or no) representation in mmLM’s pretraining, especially if they are linguistically distant from the high-resource (HR) source language. Much of the recent focus in XLT research has been dedicated to LR language families, i.e., families without any HR languages (e.g., families of African languages or indigenous languages of the Americas). In this work, in contrast, we investigate a configuration that is arguably of practical relevance for more of the world’s languages: XLT to LR languages that do have a close HR relative. To explore the extent to which a HR language can facilitate transfer to its LR relatives, we (1) introduce Kardeş-NLU, an evaluation benchmark with language understanding datasets in five LR Turkic languages: Azerbaijani, Kazakh, Kyrgyz, Uzbek, and Uyghur; and (2) investigate (a) intermediate training and (b) fine-tuning strategies that leverage Turkish in XLT to these target languages. Our experimental results show that both - integrating Turkish in intermediate training and in downstream fine-tuning - yield substantial improvements in XLT to LR Turkic languages. Finally, we benchmark cutting-edge instruction-tuned large language models on Kardeş-NLU, showing that their performance is highly task- and language-dependent.
MCML Authors

Lütfi Kerem Senel
Dr.
* Former Member
Related

09.10.2025
Rethinking AI in Public Institutions - Balancing Prediction and Capacity
Unai Fischer Abaigar explores how AI can make public decisions fairer, smarter, and more effective.

08.10.2025
MCML-LAMARR Workshop at University of Bonn
MCML and Lamarr researchers met in Bonn to exchange ideas on NLP, LLM finetuning, and AI ethics.


08.10.2025
Three MCML Members Win Best Paper Award at AutoML 2025
MCML PI Matthias Feurer and Director Bernd Bischl’s paper on overtuning won Best Paper at AutoML 2025, offering insights for robust HPO.

29.09.2025
Machine Learning for Climate Action - With Researcher Kerstin Forster
Kerstin Forster researches how AI can cut emissions, boost renewable energy, and drive corporate sustainability.