Home | Research | Groups | Frauke Kreuter

Research Group Frauke Kreuter

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences

Frauke Kreuter

holds the Chair of Statistics and Data Science at LMU Munich.

Her reserach is interested in all matters of social data science, ranging from fairness in automated decision-making to the use of new data sources in the social sciences, multiple imputation methods, survey methodology, social NLP, and statistical training.

Team members @MCML

Link to Sarah Ball

Sarah Ball

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Jacob Beck

Jacob Beck

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Unai Fischer Abaigar

Unai Fischer Abaigar

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Olga Kononykhina

Olga Kononykhina

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Ailin Liu

Ailin Liu

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Patrick Schenk

Patrick Schenk

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Malte Schierholz

Malte Schierholz

Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Jan Simson

Jan Simson

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Anna Steinberg

Anna Steinberg

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Leah von der Heyde

Leah von der Heyde

Social Data Science and AI Lab

C4 | Computational Social Sciences

Publications @MCML

[12]
B. Ma, X. Wang, T. Hu, A.-C. Haensch, M. A. Hedderich, B. Plank and F. Kreuter.
The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models.
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami, FL, USA, Nov 12-16, 2024. To be published. Preprint at arXiv. arXiv.
Abstract

Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may capture and convey. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOVs). However, measuring AOVs embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of clarity on how different studies are related to each other and how they can be interpreted. This paper aims to bridge this gap by providing a comprehensive overview of recent works on the evaluation of AOVs in LLMs. Moreover, we survey related approaches in different stages of the evaluation pipeline in these works. By doing so, we address the potential and challenges with respect to understanding the model, human-AI alignment, and downstream application in social sciences. Finally, we provide practical insights into evaluation methods, model enhancement, and interdisciplinary collaboration, thereby contributing to the evolving landscape of evaluating AOVs in LLMs.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Michael Hedderich

Michael Hedderich

Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[11]
X. Wang, B. Ma, C. Hu, L. Weber-Genzel, P. Röttger, F. Kreuter, D. Hovy and B. Plank.
My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models.
Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
MCML Authors
Link to Xinpeng Wang

Xinpeng Wang

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Leon Weber-Genzel

Leon Weber-Genzel

Dr.

* Former member

B2 | Natural Language Processing

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing


[10]
A. Dimmelmeier, H. Doll, M. Schierholz, E. Kormanyos, M. Fehr, B. Ma, J. Beck, A. Fraser and F. Kreuter.
Informing climate risk analysis using textual information - A research agenda.
Workshop Natural Language Processing meets Climate Change (ClimateNLP 2024) at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. URL.
MCML Authors
Link to Malte Schierholz

Malte Schierholz

Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Jacob Beck

Jacob Beck

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Alexander Fraser

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

B2 | Natural Language Processing

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[9]
S. Eckman, B. Plank and F. Kreuter.
Position: Insights from Survey Methodology can Improve Training Data.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
MCML Authors
Link to Barbara Plank

Barbara Plank

Prof. Dr.

Artificial Intelligence and Computational Linguistics

B2 | Natural Language Processing

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[8]
U. Fischer Abaigar, C. Kern and F. Kreuter.
The Missing Link: Allocation Performance in Causal Machine Learning.
Workshop Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. arXiv. URL.
MCML Authors
Link to Unai Fischer Abaigar

Unai Fischer Abaigar

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Christoph Kern

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[7]
P. Resnik, B. Ma, A. Hoyle, P. Goel, R. Sarkar, M. Gearing, A.-C. Haensch and F. Kreuter.
TOPCAT: Topic-Oriented Protocol for Content Analysis of Text – A Preliminary Study.
6th Workshop on Natural Language Processing and Computational Social Science (NLP+CSS 2024) at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. To be published.
Abstract

Identifying constructs in text data is a labor-intensive task in social science research. Despite the potential richness of open-ended survey responses, the complexity of analyzing them often leads researchers to underutilize or ignore them entirely. While topic modeling offers a technological solution, qualitative researchers may remain skeptical of its rigor. In this paper, we introduce TOPCAT: Topic-Oriented Protocol for Content Analysis of Text, a systematic approach that integrates off-the-shelf topic modeling with human decisionmaking and curation. Our method aims to provide a viable solution for topicalizing open-ended responses in survey research, ensuring both efficiency and trustworthiness. We present the TOPCAT protocol, define an evaluation process, and demonstrate its effectiveness using open-ended responses from a U.S. survey on COVID-19 impact. Our findings suggest that TOPCAT enables efficient and rigorous qualitative analysis, offering a promising avenue for future research in this domain. Furthermore, our findings challenge the adequacy of expert coding schemes as ''gold'' standards, emphasizing the subjectivity inherent in qualitative content interpretation.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[6]
J. Beck, S. Eckman, B. Ma, R. Chew and F. Kreuter.
Order Effects in Annotation Tasks: Further Evidence of Annotation Sensitivity.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

The data-centric revolution in AI has revealed the importance of high-quality training data for developing successful AI models. However, annotations are sensitive to annotator characteristics, training materials, and to the design and wording of the data collection instrument. This paper explores the impact of observation order on annotations. We find that annotators’ judgments change based on the order in which they see observations. We use ideas from social psychology to motivate hypotheses about why this order effect occurs. We believe that insights from social science can help AI researchers improve data and model quality.

MCML Authors
Link to Jacob Beck

Jacob Beck

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[5]
B. Ma, E. Nie, S. Yuan, H. Schmid, M. Färber, F. Kreuter and H. Schütze.
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks.
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL.
Abstract

Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt Decomposition (ToPro), which facilitates the prompt-based method for token-level sequence labeling tasks. The ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer, especially for languages that are typologically different from the source language English. Our method also attains state-of-the-art performance when employed with the mT5 model. Besides, our exploratory study in multilingual large language models shows that ToPro performs much better than the current in-context learning method. Overall, the performance improvements show that ToPro could potentially serve as a novel and simple benchmarking method for sequence labeling tasks.

MCML Authors
Link to Bolei Ma

Bolei Ma

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Ercong Nie

Ercong Nie

Statistical NLP and Deep Learning

B2 | Natural Language Processing

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Statistical NLP and Deep Learning

B2 | Natural Language Processing


[4]
U. Fischer Abaigar, C. Kern, N. Barda and F. Kreuter.
Bridging the Gap: Towards an Expanded Toolkit for ML-Supported Decision-Making in the Public Sector.
Preprint at arXiv (Oct. 2023). arXiv.
MCML Authors
Link to Unai Fischer Abaigar

Unai Fischer Abaigar

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Christoph Kern

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences

Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[3]
M. Trappmann, G.-C. Haas, S. Malich, F. Keusch, S. Bähr, F. Kreuter and S. Schwarz.
Augmenting survey data with digital trace data: Is there a threat to panel retention?.
Journal of Survey Statistics and Methodology 11.3 (Jun. 2023). DOI.
MCML Authors
Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[2]
K. E. Riehm, E. Badillo Goicoechea, F. M. Wang, E. Kim, L. R. Aldridge, C. P. Lupton-Smith, R. Presskreischer, T.-H. Chang, S. LaRocca, F. Kreuter and E. A. Stuart.
Association of Non-Pharmaceutical Interventions to Reduce the Spread of SARS-CoV-2 With Anxiety and Depressive Symptoms: A Multi-National Study of 43 Countries.
International Journal of Public Health 67 (Mar. 2022). DOI.
MCML Authors
Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences


[1]
R. Valliant, J. A. Dever, F. Kreuter and M. R. Valliant.
Package ‘PracTools’.
2022. URL.
MCML Authors
Link to Frauke Kreuter

Frauke Kreuter

Prof. Dr.

Social Data Science and AI Lab

C4 | Computational Social Sciences