Home  | Publications | Emk 25

Aligning NLP Models With Target Population Perspectives Using PAIR: Population-Aligned Instance Replication

MCML Authors

Link to Profile Christoph Kern

Christoph Kern

Prof. Dr.

Associate

Link to Profile Barbara Plank PI Matchmaking

Barbara Plank

Prof. Dr.

Principal Investigator

Link to Profile Frauke Kreuter PI Matchmaking

Frauke Kreuter

Prof. Dr.

Principal Investigator

Abstract

Models trained on crowdsourced annotations may not reflect population views, if those who work as annotators do not represent the broader population. In this paper, we propose PAIR: Population-Aligned Instance Replication, a post-processing method that adjusts training data to better reflect target population characteristics without collecting additional annotations. Using simulation studies on offensive language and hate speech detection with varying annotator compositions, we show that non-representative pools degrade model calibration while leaving accuracy largely unchanged. PAIR corrects these calibration problems by replicating annotations from underrepresented annotator groups to match population proportions. We conclude with recommendations for improving the representativity of training data and model performance.

inproceedings EMK+25


NLPerspectives @EMNLP 2025

4th Workshop on Perspectivist Approaches to NLP at the Conference on Empirical Methods in Natural Language Processing. Suzhou, China, Nov 04-09, 2025.

Authors

S. Eckman • B. MaC. Kern • R. Chew • B. PlankF. Kreuter

Links

DOI

Research Areas

 B2 | Natural Language Processing

 C4 | Computational Social Sciences

BibTeXKey: EMK+25

Back to Top