Home | Publications | DBF25

EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian

MCML Authors

Daryna Dementieva

Dr.

→ Group Alexander Fraser
Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Core PI

Data Analytics & Statistics

Abstract

While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date. In this work, we introduce **EmoBench-UA**, the first annotated dataset for emotion detection in Ukrainian texts. Our annotation schema is adapted from the previous English-centric works on emotion detection (Mohammad et al., 2018; Mohammad, 2022) guidelines. The dataset was created through crowdsourcing using the Toloka.ai platform ensuring high-quality of the annotation process. Then, we evaluate a range of approaches on the collected dataset, starting from linguistic-based baselines, synthetic data translated from English, to large language models (LLMs). Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian and emphasize the need for further development of Ukrainian-specific models and training resources.

inproceedings DBF25

Findings @EMNLP 2025

Findings of the Conference on Empirical Methods in Natural Language Processing. Suzhou, China, Nov 04-09, 2025.

Authors

D. Dementieva • N. Babakov • A. Fraser

Links

DOI

Research Area

B2 | Natural Language Processing

BibTeXKey: DBF25

#p-fraser