Home  | Publications | MKS24

Sociocultural Knowledge Is Needed for Selection of Shots in Hate Speech Detection Tasks

MCML Authors

Antonis Maronikolakis

Link to Profile Hinrich Schütze PI Matchmaking

Hinrich Schütze

Prof. Dr.

Principal Investigator

Abstract

We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for Brazil, Germany, India and Kenya, to aid model development and interpretability. First, we demonstrate how HATELEXICON can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target group names. Further, we propose a culturally-informed method to aid shot selection for training in low-resource settings. In few-shot learning, shot selection is of paramount importance to model performance and we need to ensure we make the most of available data. We work with HASOC German and Hindi data for training and the Multilingual HateCheck (MHC) benchmark for evaluation. We show that selecting shots based on our lexicon leads to models performing better than models trained on shots sampled randomly. Thus, when given only a few training examples, using HATELEXICON to select shots containing more sociocultural information leads to better few-shot performance. With these two use-cases we show how our HATELEXICON can be used for more effective hate speech detection.

inproceedings


LT-EDI 2024

4th Workshop on Language Technology for Equality, Diversity, Inclusion. St. Julian's, Malta, Mar 21, 2024.

Authors

A. MaronikolakisA. KöksalH. Schütze

Links

URL

Research Area

 B2 | Natural Language Processing

BibTeXKey: MKS24

Back to Top