Home | Publications | MKS24

Sociocultural Knowledge Is Needed for Selection of Shots in Hate Speech Detection Tasks

MCML Authors

Antonis Maronikolakis

* Former Member

→ Group Hinrich Schütze
Computational Linguistics

Abdullatif Köksal

* Former Member

→ Group Hinrich Schütze
Computational Linguistics

Hinrich Schütze

Prof. Dr.

Core PI

Computational Linguistics

Abstract

We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for Brazil, Germany, India and Kenya, to aid model development and interpretability. First, we demonstrate how HATELEXICON can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target group names. Further, we propose a culturally-informed method to aid shot selection for training in low-resource settings. In few-shot learning, shot selection is of paramount importance to model performance and we need to ensure we make the most of available data. We work with HASOC German and Hindi data for training and the Multilingual HateCheck (MHC) benchmark for evaluation. We show that selecting shots based on our lexicon leads to models performing better than models trained on shots sampled randomly. Thus, when given only a few training examples, using HATELEXICON to select shots containing more sociocultural information leads to better few-shot performance. With these two use-cases we show how our HATELEXICON can be used for more effective hate speech detection.

inproceedings MKS24

LT-EDI 2024

4th Workshop on Language Technology for Equality, Diversity, Inclusion. St. Julian's, Malta, Mar 21, 2024.

Authors

A. Maronikolakis • A. Köksal • H. Schütze

Links

URL

Research Area

B2 | Natural Language Processing

BibTeXKey: MKS24

#p-schuetze