Home  | Publications | RMH+24

TOPCAT: Topic-Oriented Protocol for Content Analysis of Text – A Preliminary Study

MCML Authors

Abstract

Identifying constructs in text data is a labor-intensive task in social science research. Despite the potential richness of open-ended survey responses, the complexity of analyzing them often leads researchers to underutilize or ignore them entirely. While topic modeling offers a technological solution, qualitative researchers may remain skeptical of its rigor. In this paper, we introduce TOPCAT: Topic-Oriented Protocol for Content Analysis of Text, a systematic approach that integrates off-the-shelf topic modeling with human decisionmaking and curation. Our method aims to provide a viable solution for topicalizing open-ended responses in survey research, ensuring both efficiency and trustworthiness. We present the TOPCAT protocol, define an evaluation process, and demonstrate its effectiveness using open-ended responses from a U.S. survey on COVID-19 impact. Our findings suggest that TOPCAT enables efficient and rigorous qualitative analysis, offering a promising avenue for future research in this domain. Furthermore, our findings challenge the adequacy of expert coding schemes as ''gold'' standards, emphasizing the subjectivity inherent in qualitative content interpretation.

inproceedings


NLP+CSS @NAACL 2024

6th Workshop on Natural Language Processing and Computational Social Science at the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mexico City, Mexico, Jun 16-21, 2024.

Authors

P. Resnik • B. Ma • A. Hoyle • P. Goel • R. Sarkar • M. Gearing • A.-C. HaenschF. Kreuter

Links

URL

Research Area

 C4 | Computational Social Sciences

BibTeXKey: RMH+24

Back to Top