22.03.2024

Tiny logo
Teaser image to Outstanding Paper Award at EACL 2024 for Hinrich Schütze and Lütfi Kerem Senel

Outstanding Paper Award at EACL 2024 for Hinrich Schütze and Lütfi Kerem Senel

Advancing NLP for Low‑resource Turkic Languages

At EACL 2024, the paper “Kardeş‑NLU: Transfer to Low‑Resource Languages with Big Brother’s Help – A Benchmark and Evaluation for Turkic Languages” received the Outstanding Paper Award.

The work, co‑authored by MCML PI Hinrich Schütze and Junior Member Lütfi Kerem Senel together with Benedikt Ebing, Konul Baghirova, and Goran Glavaš, introduces a comprehensive benchmark and evaluation framework to advance natural language understanding in low‑resource Turkic languages through cross‑lingual transfer.

Congratulations from us!

Check out the full paper:

L. K. Senel, B. Ebing, K. Baghirova, H. Schütze and G. Glavaš.
Kardeş-NLU: Transfer to Low-Resource Languages with Big Brother’s Help – A Benchmark and Evaluation for Turkic Languages.
EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. Outstanding Paper Award. URL
Abstract

Cross-lingual transfer (XLT) driven by massively multilingual language models (mmLMs) has been shown largely ineffective for low-resource (LR) target languages with little (or no) representation in mmLM’s pretraining, especially if they are linguistically distant from the high-resource (HR) source language. Much of the recent focus in XLT research has been dedicated to LR language families, i.e., families without any HR languages (e.g., families of African languages or indigenous languages of the Americas). In this work, in contrast, we investigate a configuration that is arguably of practical relevance for more of the world’s languages: XLT to LR languages that do have a close HR relative. To explore the extent to which a HR language can facilitate transfer to its LR relatives, we (1) introduce Kardeş-NLU, an evaluation benchmark with language understanding datasets in five LR Turkic languages: Azerbaijani, Kazakh, Kyrgyz, Uzbek, and Uyghur; and (2) investigate (a) intermediate training and (b) fine-tuning strategies that leverage Turkish in XLT to these target languages. Our experimental results show that both - integrating Turkish in intermediate training and in downstream fine-tuning - yield substantial improvements in XLT to LR Turkic languages. Finally, we benchmark cutting-edge instruction-tuned large language models on Kardeş-NLU, showing that their performance is highly task- and language-dependent.

MCML Authors
Lütfi Kerem Senel

Lütfi Kerem Senel

Dr.

* Former Member

Link to Profile Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Computational Linguistics

22.03.2024


Subscribe to RSS News feed

Related

Link to Fabian Theis receives 2025 ISCB Innovator Award

01.08.2025

Fabian Theis Receives 2025 ISCB Innovator Award

Fabian Theis receives 2025 ISCB Innovator Award for advancing AI in biology and mentoring the next generation of scientists.

Link to Yusuf Sale receives IJAR Young Researcher Award

29.07.2025

Yusuf Sale Receives IJAR Young Researcher Award

MCML Junior Member Yusuf Sale received an IJAR Young Researcher Award at ISIPTA 2025 for his work.

Link to Barbara Plank awarded 2025 Imminent Research Grant for work on language data

29.07.2025

Barbara Plank Awarded 2025 Imminent Research Grant for Work on Language Data

Barbara Plank’s MaiNLP lab wins 2025 Imminent Research Grant for a project on language data with Peng and de Marneffe.

Link to Eyke Hüllermeier to Lead New DFG-Funded Research Training Group METEOR

22.07.2025

Eyke Hüllermeier to Lead New DFG-Funded Research Training Group METEOR

MCML PI Eyke Hüllermeier to lead new DFG-funded RTG METEOR, uniting ML and control theory to build robust, explainable AI systems.

Link to Outstanding Paper Award at ICML 2025 for MCML Researchers

18.07.2025

Outstanding Paper Award at ICML 2025 for MCML Researchers

MCML researchers win ICML 2025 Outstanding Paper Award for work on prediction and identifying the worst-off.