Home | Publications | Sfk24a

Unveiling the Blindspots: Examining Availability and Usage of Protected Attributes in Fairness Datasets

MCML Authors

Jan Simson

→ Group Christoph Kern
Social Data Science and AI
→ Co-Group Frauke Kreuter

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI

Abstract

This work examines the representation of protected attributes across tabular datasets used in algorithmic fairness research. Drawing from international human rights and anti-discrimination laws, we compile a set of protected attributes and investigate both their availability and usage in the literature. Our analysis reveals a significant underrepresentation of certain attributes in datasets that is exacerbated by a strong focus on race and sex in dataset usage. We identify a geographical bias towards the Global North, particularly North America, potentially limiting the applicability of fairness detection and mitigation strategies in less-represented regions. The study exposes critical blindspots in fairness research, highlighting the need for a more inclusive and representative approach to data collection and usage in the field. We propose a shift away from a narrow focus on a small number of datasets and advocate for initiatives aimed at sourcing more diverse and representative data.

inproceedings SFK24a