Aligning NLP Models With Target Population Perspectives Using PAIR: Population-Aligned Instance Replication
MCML Authors
Abstract
Abstract
Models trained on crowdsourced annotations may not reflect population views, if those who work as annotators do not represent the broader population. In this paper, we propose PAIR: Population-Aligned Instance Replication, a post-processing method that adjusts training data to better reflect target population characteristics without collecting additional annotations. Using simulation studies on offensive language and hate speech detection with varying annotator compositions, we show that non-representative pools degrade model calibration while leaving accuracy largely unchanged. PAIR corrects these calibration problems by replicating annotations from underrepresented annotator groups to match population proportions. We conclude with recommendations for improving the representativity of training data and model performance.
inproceedings EMK+25
NLPerspectives @EMNLP 2025
4th Workshop on Perspectivist Approaches to NLP at the Conference on Empirical Methods in Natural Language Processing. Suzhou, China, Nov 04-09, 2025.Authors
S. Eckman • B. Ma • C. Kern • R. Chew • B. Plank • F. KreuterLinks
DOIResearch Areas
BibTeXKey: EMK+25