Home | Publications | UTA+24

Detecting Gender Discrimination on Actor Level Using Linguistic Discourse Analysis

MCML Authors

Matthias Aßenmacher

Dr.

→ Group Bernd Bischl
Statistical Learning and Data Science

Abstract

With the usage of tremendous amounts of text data for training powerful large language models such as ChatGPT, the issue of analysing and securing data quality has become more pressing than ever. Any biases, stereotypes and discriminatory patterns that exist in the training data can be reproduced, reinforced or broadly disseminated by the models in production. Therefore, it is crucial to carefully select and monitor the text data that is used as input to train the model. Due to the vast amount of training data, this process needs to be (at least partially) automated. In this work, we introduce a novel approach for automatically detecting gender discrimination in text data on the actor level based on linguistic discourse analysis. Specifically, we combine existing information extraction (IE) techniques to partly automate the qualitative research done in linguistic discourse analysis. We focus on two important steps: Identifying the respectiveperson-named-entity (an actor) and all forms it is referred to (Nomination), and detecting the characteristics it is ascribed (Predication). Asa proof of concept, we integrate these two steps into a pipeline for automated text analysis. The separate building blocks of the pipeline could be flexibly adapted, extended, and scaled for bigger datasets to accommodate a wide range of usage scenarios and specific ML tasks or help social scientists with analysis tasks. We showcase and evaluate our approach on several real and simulated exemplary texts.

inproceedings UTA+24

GeBNLP 2024

5th Workshop on Gender Bias in Natural Language Processing. Bangkok, Thailand, Aug 16, 2024.

Authors

S. Urchs • V. Thurner • M. Aßenmacher • C. Heumann • S. Thiemichen

Links

DOI

Research Area

A1 | Statistical Foundations & Explainability

BibTeXKey: UTA+24

#p-bischl