Home | Publications | KHK25

Mind the Gap: Gender-Based Differences in Occupational Embeddings

MCML Authors

Olga Kononykhina

→ Group Frauke Kreuter
Social Data Science and AI

Anna-Carolina Haensch

Dr.

→ Group Frauke Kreuter
Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Principal Investigator

Social Data Science and AI

Abstract

Large Language Models (LLMs) offer promising alternatives to traditional occupational coding approaches in survey research. Using a German dataset, we examine the extent to which LLM-based occupational coding differs by gender. Our findings reveal systematic disparities: gendered job titles (e.g., “Autor” vs. “Autorin”, meaning “male author” vs. “female author”) frequently result in diverging occupation codes,<br>even when semantically identical. Across all models, 54%–82% of gendered inputs obtain different Top-5 suggestions. The practical impact, however, depends on the model. GPT includes the correct code most often (62%) but demonstrates female bias (up to +18 pp). IBM is less accurate (51%) but largely balanced. Alibaba, Gemini, and MiniLM achieve about 50% correct-code inclusion, and their small (< 10 pp) and direction-flipping gaps could indicate a sampling noise rather than gender bias. We discuss these findings in the context of fairness and reproducibility in NLP applications for social data.

inproceedings KHK25

GeBNLP @ACL 2025

6th Workshop on Gender Bias in Natural Language Processing at the 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025.