Home  | Publications | BKZ25a

Multi-Instance Learning for Social Media- Based Spatiotemporal Public Opinion Analysis

MCML Authors

Abstract

This work-in-progress explores applying the framework of multi-instance learning (MIL) to spatiotemporally grounded public opinion analysis using social media data. While traditional surveys offer depth and precision, social media provides a scalable, cost-effective complement for real-time tracking of public sentiment. However, weak supervision in data collection often results in a large volume of ambiguous or uninformative posts, complicating both prediction accuracy and interpretability. We address these challenges by framing public opinion analysis as a MIL task, where social media posts (instances) are grouped into bags based on shared spatial (e.g., city, region) or temporal (e.g., daily, weekly intervals) attributes. This formulation supports learning both at the bag level (e.g., tracking how opinion shifts over time or across locations) and at the instance level (e.g., identifying specific posts that drive a shift or reflect conflicting viewpoints). In recently completed but unpublished work, we treated geo-tagged tweets from specific buildings as instances and used non-deep MIL models to infer building functionality. That study demonstrated MIL’s ability to handle noisy data and model rare or underrepresented classes. Building on this, we are developing a more robust MIL framework aimed at public opinion modeling. Drawing on established use cases of MIL in computer vision (e.g., tumor region identification) and NLP (e.g., document-level sentiment and relation extraction), we define bags by shared spatiotemporal and demographic features and pursue two core objectives: Implicit Noise Handling: MIL enables the model to learn directly from weakly labeled data by distinguishing informative from uninformative instances without explicit filtering. Interpretability via Instance Scoring: By modeling both the bag and its constituent instances, the framework reveals which posts contribute to opinion dynamics or internal disagreement in a region or time window. While our current work focuses on developing the MIL framework and evaluating its suitability for spatiotemporal opinion modeling and interpretability, we acknowledge that selecting a specific public opinion task, dataset, and labeling strategy is essential for empirical validation. To that end, we are currently surveying existing social media datasets with geo-temporal metadata (e.g., Twitter, Reddit) and exploring options for weak labeling. Our aim is to apply this framework to a real-world public opinion case study, enhancing the accountability, transparency, and actionability of models trained on noisy, weakly supervised social media data.

inproceedings BKZ25a


NLPOR @COLM 2025

1st Workshop on Bridging NLP and Public Opinion Research at the Conference on Language Modeling. Montreal, Canada, Oct 07-09, 2025.

Authors

S. Bai • A. Kruspe • X. Zhu

Links

URL

Research Area

 C3 | Physics and Geo Sciences

BibTeXKey: BKZ25a

Back to Top