This dissertation examines annotation quality, bias, and automation in machine learning data collection. Through five empirical studies, it shows how annotation outcomes are shaped by task design, annotator characteristics, and cognitive biases, and how these effects impact training data quality. The work further evaluates large language models in automated and human-in-the-loop annotation pipelines, demonstrating substantial cost savings and efficiency gains while highlighting new risks introduced by automation. (Shortened.)
BibTeXKey: Bec25