Home | Publications | HF23

LMU at HaSpeeDe3: Multi-Dataset Training for Cross-Domain Hate Speech Detection

MCML Authors

Viktor Hangya

Dr.

* Former Member

→ Group Alexander Fraser
Data Analytics & Statistics

Alexander Fraser

Prof. Dr.

Core PI

Data Analytics & Statistics

Abstract

We describe LMU Munich’s hate speech detection system for participating in the cross-domain track of the HaSpeeDe3 shared task at EVALITA 2023. The task focuses on the politics and religion domains, having no in-domain training data for the latter. Our submission combines multiple training sets from various domains in a multitask prompt-training system. We experimented with both Italian and English source datasets as well as monolingual Italian and multilingual pre-trained language models. We found that the Italian out-of-domain datasets are the most influential on the performance in the test domains and that combining both monolingual and multilingual language models using an ensemble gives the best results. Our system ranked second in both domains.

inproceedings HF23