Home | Research | Groups | Christoph Kern

Research Group Christoph Kern

Christoph Kern

Prof. Dr.

Associate

C4 | Computational Social Sciences

Social Data Science and AI Lab

Christoph Kern

is Junior Professor of Social Data Science and Statistical Learning at LMU Munich.

His work focuses on the reliable use of machine learning methods and new data sources in social science, survey research, and algorithmic fairness.

Team members @MCML

PhD Students

Unai Fischer Abaigar

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Leonhard Kestel

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Marcus Novotny

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Jan Simson

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Publications @MCML

2025

[21]

U. Fischer Abaigar, C. Kern and J. Perdomo.
The Value of Prediction in Identifying the Worst-Off.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Spotlight Presentation. To be published. Preprint available. arXiv

Abstract

Machine learning is increasingly used in government programs to identify and support the most vulnerable individuals, prioritizing assistance for those at greatest risk over optimizing aggregate outcomes. This paper examines the welfare impacts of prediction in equity-driven contexts, and how they compare to other policy levers, such as expanding bureaucratic capacity. Through mathematical models and a real-world case study on long-term unemployment amongst German residents, we develop a comprehensive understanding of the relative effectiveness of prediction in surfacing the worst-off. Our findings provide clear analytical frameworks and practical, data-driven tools that empower policymakers to make principled decisions when designing these systems.

MCML Authors

Unai Fischer Abaigar

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[20]

U. Fischer Abaigar, C. Kern and F. Kreuter.
Adjusting survey estimates with multi-accuracy post-processing.
ITACOSM 2025 - Italian Conference on Survey Methodology. Bologna, Italy, Jul 01-04, 2025. Invited talk. publish_preprint.

Abstract

With the rise of non-probability samples and new data sources, survey researchers face growing challenges related to selection bias. One emerging line of work adapts algorithmic tools from machine learning to improve robustness in such settings. This talk introduces multi-accuracy boosting (Kim et al., 2019), a post-processing method that reduces subgroup-level prediction error. Originally developed in the context of fairness, it has since been explored for use in survey adjustment tasks (Kim & Kern et al., 2022). I offer an accessible overview of the method and share reflections on its potential, and open questions for future research.

MCML Authors

Unai Fischer Abaigar

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI

[19]

C. Kern, U. Fischer-Abaigar, J. Schweisthal, D. Frauen, R. Ghani, S. Feuerriegel, M. van der Schaar and F. Kreuter.
Algorithms for reliable decision-making need causal reasoning.
Nature Computational Science 5 (May. 2025). DOI

Abstract

Decision-making inherently involves cause–effect relationships that introduce causal challenges. We argue that reliable algorithms for decision-making need to build upon causal reasoning. Addressing these causal challenges requires explicit assumptions about the underlying causal structure to ensure identifiability and estimatability, which means that the computational methods must successfully align with decision-making objectives in real-world tasks. Algorithmic decision-making (ADM) has become common in a wide range of domains, including precision medicine, manufacturing, education, hiring, the public sector, and smart cities. At the core of ADM systems are data-driven models that learn from data to recommend decisions, often with the goal of maximizing a defined utility function1. For example, in smart city contexts, ADM is frequently used to optimize traffic flow through predictive models that analyze real-time data, thereby reducing congestion and improving urban mobility. Another prominent application area for ADM are normative decision support systems (often subsumed under ‘prescriptive analytics’) or, more recently, artificial intelligence (AI) agents that either inform or automatically execute managerial and operational decisions in industry. Yet, the applications of ADM to high-stakes decisions face safety and reliability issues1,2,3. Often, the objectives of ADM systems fail to align with the nuanced goals of real-world decision-making, thus creating a tension between the potential of ADM and the risk of harm and failure. Especially when deployed in dynamic, real-world environments, ADM can amplify systemic disadvantages for vulnerable communities and lead to flawed decisions. In this Comment, we argue that reliable algorithmic decision-making — systems that perform safely and robustly under deployment conditions — must be grounded in causal reasoning.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences

Artificial Intelligence in Management

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI

[18]

R. L. Bach and C. Kern.
Fairness, Justice, and Social Inequality in Machine Learning.
Preprint (May. 2025). DOI

Abstract

As machine learning (ML) systems increasingly shape decision-making across crucial societal domains, the discourse around fairness in algorithmic systems (fairML) has intensified. Although fairML research is rapidly expanding, contributions from social science, particularly sociology, remain limited. This chapter aims to address this gap by examining fairness in ML through a sociological lens, focusing on the interplay between algorithmic decision-making and social inequality. We argue that fairML frameworks must explicitly distinguish technical fairness—focused on unbiased predictions—from normative justice, which addresses broader ethical and distributive considerations. We identify and discuss five key challenges confronting fairML today: (1) clearly separating fairness and justice, (2) developing more sophisticated measures of vulnerability and protected attributes, (3) incorporating historical disadvantage and social origin into fairness evaluations, (4) assessing unintended social consequences of algorithmic interventions, and (5) empirically investigating stakeholder preferences toward AI systems. By highlighting these sociologically informed challenges, this chapter advocates for a more holistic, context-sensitive approach to algorithmic fairness. Ultimately, our analysis proposes a sociologically grounded research agenda aimed at critically assessing and enhancing the role of fairML in either perpetuating or alleviating social inequalities.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[17]

J. Simson, F. Draxler, S. Mehr and C. Kern.
Preventing Harmful Data Practices by Using Participatory Input to Navigate the Machine Learning Multiverse.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

In light of inherent trade-offs regarding fairness, privacy, interpretability and performance, as well as normative questions, the machine learning (ML) pipeline needs to be made accessible for public input, critical reflection and engagement of diverse stakeholders. In this work, we introduce a participatory approach to gather
input from the general public on the design of an ML pipeline. We show how people’s input can be used to navigate and constrain the multiverse of decisions during both model development and evaluation. We highlight that central design decisions should be democratized rather than “optimized” to acknowledge their critical impact on the system’s output downstream. We describe the iterative development of our approach and its exemplary implementation on a citizen science platform. Our results demonstrate how public participation can inform critical design decisions along the model-building pipeline and combat widespread lazy data practices.

MCML Authors

Jan Simson

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[16]

C. Kern, M. P. Kim and A. Zhou.
Multi-Accurate CATE is Robust to Unknown Covariate Shifts.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Estimating heterogeneous treatment effects is important to tailor treatments to those individuals who would most likely benefit. However, conditional average treatment effect predictors may often be trained on one population but possibly deployed on different, possibly unknown populations. We use methodology for learning multi-accurate predictors to post-process CATE T-learners (differenced regressions) to become robust to unknown covariate shifts at the time of deployment. The method works in general for pseudo-outcome regression, such as the DR-learner. We show how this approach can combine (large) confounded observational and (smaller) randomized datasets by learning a confounded predictor from the observational dataset, and auditing for multi-accuracy on the randomized controlled trial. We show improvements in bias and mean squared error in simulations with increasingly larger covariate shift, and on a semi-synthetic case study of a parallel large observational study and smaller randomized controlled experiment. Overall, we establish a connection between methods developed for multi-distribution learning and achieve appealing desiderata (e.g. external validity) in causal inference and machine learning.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[15]

E. Achterhold, M. Mühlböck, N. Steiber and C. Kern.
Fairness in Algorithmic Profiling: The AMAS Case.
Minds and Machines 35.9 (Jan. 2025). DOI

Abstract

We study a controversial application of algorithmic profiling in the public sector, the Austrian AMAS system. AMAS was supposed to help caseworkers at the Public Employment Service (PES) Austria to allocate support measures to job seekers based on their predicted chance of (re-)integration into the labor market. Shortly after its release, AMAS was criticized for its apparent unequal treatment of job seekers based on gender and citizenship. We systematically investigate the AMAS model using a novel real-world dataset of young job seekers from Vienna, which allows us to provide the first empirical evaluation of the AMAS model with a focus on fairness measures. We further apply bias mitigation strategies to study their effectiveness in our real-world setting. Our findings indicate that the prediction performance of the AMAS model is insufficient for use in practice, as more than 30% of job seekers would be misclassified in our use case. Further, our results confirm that the original model is biased with respect to gender as it tends to (incorrectly) assign women to the group with high chances of re-employment, which is not prioritized in the PES’ allocation of support measures. However, most bias mitigation strategies were able to improve fairness without compromising performance and thus may form an important building block in revising profiling schemes in the present context.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[14]

S. Eckman, B. Ma, C. Kern, R. Chew, B. Plank and F. Kreuter.
Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR).
Preprint (Jan. 2025). arXiv

Abstract

Models trained on crowdsourced labels may not reflect broader population views when annotator pools are not representative. Since collecting representative labels is challenging, we propose Population-Aligned Instance Replication (PAIR), a method to address this bias through statistical adjustment. Using a simulation study of hate speech and offensive language detection, we create two types of annotators with different labeling tendencies and generate datasets with varying proportions of the types. Models trained on unbalanced annotator pools show poor calibration compared to those trained on representative data. However, PAIR, which duplicates labels from underrepresented annotator groups to match population proportions, significantly reduces bias without requiring new data collection. These results suggest statistical techniques from survey research can help align model training with target populations even when representative annotator pools are unavailable. We conclude with three practical recommendations for improving training data quality.

MCML Authors

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

Barbara Plank

Prof. Dr.

B2 | Natural Language Processing

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI

2024

[13]

U. Fischer Abaigar, C. Kern, N. Barda and F. Kreuter.
Bridging the gap: Towards an expanded toolkit for AI-driven decision-making in the public sector.
Government Information Quarterly 41.4 (Dec. 2024). DOI

Abstract

AI-driven decision-making systems are becoming instrumental in the public sector, with applications spanning areas like criminal justice, social welfare, financial fraud detection, and public health. While these systems offer great potential benefits to institutional decision-making processes, such as improved efficiency and reliability, these systems face the challenge of aligning machine learning (ML) models with the complex realities of public sector decision-making. In this paper, we examine five key challenges where misalignment can occur, including distribution shifts, label bias, the influence of past decision-making on the data side, as well as competing objectives and human-in-the-loop on the model output side. Our findings suggest that standard ML methods often rely on assumptions that do not fully account for these complexities, potentially leading to unreliable and harmful predictions. To address this, we propose a shift in modeling efforts from focusing solely on predictive accuracy to improving decision-making outcomes. We offer guidance for selecting appropriate modeling frameworks, including counterfactual prediction and policy learning, by considering how the model estimand connects to the decision-maker’s utility. Additionally, we outline technical methods that address specific challenges within each modeling approach. Finally, we argue for the importance of external input from domain experts and stakeholders to ensure that model assumptions and design choices align with real-world policy objectives, taking a step towards harmonizing AI and public sector objectives.

MCML Authors

Unai Fischer Abaigar

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI

[12]

C. Kern, R. Bach, H. Mautner and F. Kreuter.
When Small Decisions Have Big Impact: Fairness Implications of Algorithmic Profiling Schemes.
ACM Journal on Responsible Computing (Nov. 2024). DOI

Abstract

Algorithmic profiling is increasingly used in the public sector with the hope of allocating limited public resources more effectively and objectively. One example is the prediction-based profiling of job seekers to guide the allocation of support measures by public employment services. However, empirical evaluations of potential side-effects such as unintended discrimination and fairness concerns are rare in this context. We systematically compare and evaluate statistical models for predicting job seekers’ risk of becoming long-term unemployed concerning subgroup prediction performance, fairness metrics, and vulnerabilities to data analysis decisions. Focusing on Germany as a use case, we evaluate profiling models under realistic conditions using large-scale administrative data. We show that despite achieving high prediction performance on average, profiling models can be considerably less accurate for vulnerable social subgroups. In this setting, different classification policies can have very different fairness implications. We therefore call for rigorous auditing processes before such models are put to practice.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI

[11]

P. O. Schenk and C. Kern.
Connecting algorithmic fairness to quality dimensions in machine learning in official statistics and survey production.
AStA Wirtschafts- und Sozialstatistisches Archiv 18 (Oct. 2024). DOI

Abstract

National Statistical Organizations (NSOs) increasingly draw on Machine Learning (ML) to improve the timeliness and cost-effectiveness of their products. When introducing ML solutions, NSOs must ensure that high standards with respect to robustness, reproducibility, and accuracy are upheld as codified, e.g., in the Quality Framework for Statistical Algorithms (QF4SA; Yung et al. 2022, Statistical Journal of the IAOS). At the same time, a growing body of research focuses on fairness as a pre-condition of a safe deployment of ML to prevent disparate social impacts in practice. However, fairness has not yet been explicitly discussed as a quality aspect in the context of the application of ML at NSOs. We employ the QF4SA quality framework and present a mapping of its quality dimensions to algorithmic fairness. We thereby extend the QF4SA framework in several ways: First, we investigate the interaction of fairness with each of these quality dimensions. Second, we argue for fairness as its own, additional quality dimension, beyond what is contained in the QF4SA so far. Third, we emphasize and explicitly address data, both on its own and its interaction with applied methodology. In parallel with empirical illustrations, we show how our mapping can contribute to methodology in the domains of official statistics, algorithmic fairness, and trustworthy machine learning.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[10]

C. Kern, M. P. Kim and A. Zhou.
Multi-Accurate CATE is Robust to Unknown Covariate Shifts.
Transactions on Machine Learning Research (Oct. 2024). URL

Abstract

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[9]

U. Fischer Abaigar, C. Kern and F. Kreuter.
The Missing Link: Allocation Performance in Causal Machine Learning.
ICML 2024 - Workshop Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact at the 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. arXiv URL

Abstract

Automated decision-making (ADM) systems are being deployed across a diverse range of critical problem areas such as social welfare and healthcare. Recent work highlights the importance of causal ML models in ADM systems, but implementing them in complex social environments poses significant challenges. Research on how these challenges impact the performance in specific downstream decision-making tasks is limited. Addressing this gap, we make use of a comprehensive real-world dataset of jobseekers to illustrate how the performance of a single CATE model can vary significantly across different decision-making scenarios and highlight the differential influence of challenges such as distribution shifts on predictions and allocations.

MCML Authors

Unai Fischer Abaigar

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI

[8]

E. Kraus and C. Kern.
Measurement Modeling of Predictors and Outcomes in Algorithmic Fairness.
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

This contribution investigates structural equation modeling (SEM) as a pre-processing approach to mitigate measurement bias in algorithmic decision-making systems. We construct latent predictors and latent targets based on different measurement modeling strategies and evaluate their interplay in simulations and an application study. We systematically compare SEMs which preserve group-differences (group-overarching) to models which equalize group-differences (group-specific) in predictors and outcomes. In our simulations, we find that group-overarching models are a more effective strategy than group-specific models and lead to smaller subgroup prediction error and better calibrated risk scores. In the application study we apply SEM to a health risk prediction task and find support for the benefit of group-overarching models. We conclude that tackling fairness concerns by utilizing measurement models of both the predictors and the outcome can contribute to the fairness of ADM systems. Utilizing SEM during preprocessing allows to incorporate substantive knowledge about the prediction task into the model implementation.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[7]

J. Simson, A. Fabris and C. Kern.
Unveiling the Blindspots: Examining Availability and Usage of Protected Attributes in Fairness Datasets.
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

This work examines the representation of protected attributes across tabular datasets used in algorithmic fairness research. Drawing from international human rights and anti-discrimination laws, we compile a set of protected attributes and investigate both their availability and usage in the literature. Our analysis reveals a significant underrepresentation of certain attributes in datasets that is exacerbated by a strong focus on race and sex in dataset usage. We identify a geographical bias towards the Global North, particularly North America, potentially limiting the applicability of fairness detection and mitigation strategies in less-represented regions. The study exposes critical blindspots in fairness research, highlighting the need for a more inclusive and representative approach to data collection and usage in the field. We propose a shift away from a narrow focus on a small number of datasets and advocate for initiatives aimed at sourcing more diverse and representative data.

MCML Authors

Jan Simson

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[6]

C. Strasser Ceballos and C. Kern.
Deciding the Future of Refugees: Rolling the Dice or Algorithmic Location Assignment?
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

Upon arrival in Germany, refugees are distributed among the 16 federal states. This distribution decision is based on a fixed formula consisting of two components: tax revenue and the population size of the federal state. Research suggests that optimal refugee-location matching enhances refugee integration into the labor market. However, the current mechanism fails to align refugees’ characteristics with their assigned locations, resulting in a missed opportunity to leverage synergies. To this end, we use comprehensive refugee data in Germany and exploit an existing machine learning matching tool to assign refugees to states algorithmically. Our findings reveal potential improvements in refugee employment, depending on the modeling setup. Our study provides two key contributions. First, we evaluate the effectiveness of an algorithmic matching tool within Germany. Second, we investigate the fairness implications of such an algorithmic decision-making tool by evaluating the impact of different train data setups on group-specific model performance.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[5]

S. Jaime and C. Kern.
Ethnic Classifications in Algorithmic Fairness: Concepts, Measures and Implications in Practice.
ACM FAccT 2024 - 7th ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI

Abstract

We address the challenges and implications of ensuring fairness in algorithmic decision-making (ADM) practices related to ethnicity. Expanding beyond the U.S.-centric approach to race, we provide an overview of ethnic classification schemes in European countries and emphasize how the distinct approaches to ethnicity in Europe can impact fairness assessments in ADM. Drawing on large-scale German survey data, we highlight differences in ethnic disadvantage across subpopulations defined by different measures of ethnicity. We build prediction models in the labor market, health, and finance domain and investigate the fairness implications of different ethnic classification schemes across multiple prediction tasks and fairness metrics. Our results show considerable variation in fairness scores across ethnic classifications, where error disparities for the same model can be twice as large when using different operationalizations of ethnicity. We argue that ethnic classifications differ in their ability to identify ethnic disadvantage across ADM domains and advocate for context-sensitive operationalizations of ethnicity and its transparent reporting in fair machine learning (ML) applications.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[4]

J. Simson, A. Fabris and C. Kern.
Lazy Data Practices Harm Fairness Research.
ACM FAccT 2024 - 7th ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI

Abstract

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a lack of representation for certain protected attributes in both data and evaluations, (2) the widespread exclusion of minorities during data preprocessing, and (3) a lack of transparency about consequential yet overlooked dataset processing choices. We further note additional factors, such as limitations in publicly available data, privacy considerations and a general lack of awareness that further contribute to these issues. Through exemplary analyses on the usage of popular datasets, we demonstrate how opaque data choices significantly impact minorities, fairness metrics, and the resulting model comparison. To address these challenges, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.

MCML Authors

Jan Simson

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[3]

J. Simson, F. Pfisterer and C. Kern.
One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions.
ACM FAccT 2024 - 7th ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI

Abstract

A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. The downstream effects of ADM systems critically depend on the decisions made during a systems’ design, implementation, and evaluation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these decisions are made implicitly, without knowing exactly how they will influence the final system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit decisions during design and evaluation into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible “universes” of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can investigate the variability and robustness of fairness scores and see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand fairness implications of design and evaluation decisions using an exemplary case study of predicting public health care coverage for vulnerable populations. Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model. This is problematic, as a nefarious actor could optimise or “hack” a fairness metric to portray a discriminating model as fair merely by changing how it is evaluated. We illustrate how a multiverse analysis can help to address this issue.

MCML Authors

Jan Simson

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[2]

Á. F. Junquera and C. Kern.
From rules to forests: rule-based versus statistical models for jobseeker profiling.
Preprint (Jun. 2024). DOI

Abstract

Public employment services (PES) commonly apply profiling models to target labour market programs to jobseekers at risk of becoming long-term unemployed. Such allocation systems often codify institutional experiences in a set of profiling rules, whose predictive ability, however, is seldomly tested. We systematically evaluate the predictive performance of a rule-based profiling procedure currently implemented by the PES of Catalonia, Spain, in comparison to the performance of statistical models in predicting future long-term unemployment (LTU) episodes. Using comprehensive administrative data, we develop logit and machine learning models and evaluate their performance with respect to both discrimination and calibration. Compared to the current rule-based procedure of Catalonia, our machine learning models achieve greater discrimination ability and remarkable improvements in calibration. Particularly, our random forest model is able to accurately forecast LTU episodes and outperforms the rule-based model by offering robust predictions that perform well under stress tests. This paper presents the first performance comparison between a complex, currently implemented, rule-based approach and complex statistical profiling models. Our work illustrates the importance of assessing the calibration of profiling models and the potential of statistical tools to assist public employment offices in Spain.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

[1]

C. Kern, M. Kim and A. Zhou.
Multi-CATE: Multi-Accurate Conditional Average Treatment Effect Estimation Robust to Unknown Covariate Shifts.
Preprint (May. 2024). arXiv

Abstract

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences

Social Data Science and AI Lab

Research Group Christoph Kern

Christoph Kern

Team members @MCML

PhD Students

Recent News @MCML

Why Causal Reasoning Is Crucial for Reliable AI Decisions

MCML Researchers With Seven Papers at CHI 2025

MCML Researchers With 51 Papers at ICLR 2025

Publications @MCML

2025

2024