Home  | Publications | WZP+24

MultiClimate: Multimodal Stance Detection on Climate Change Videos

MCML Authors

Abstract

Climate change (CC) has attracted increasing attention in NLP in recent years. However, detecting the stance on CC in multimodal data is understudied and remains challenging due to a lack of reliable datasets. To improve the understanding of public opinions and communication strategies, this paper presents MultiClimate, the first open-source manually-annotated stance detection dataset with 100 CC-related YouTube videos and 4,209 frame-transcript pairs. We deploy state-of-the-art vision and language models, as well as multimodal models for MultiClimate stance detection. Results show that text-only BERT significantly outperforms image-only ResNet50 and ViT. Combining both modalities achieves state-of-the-art, 0.747/0.749 in accuracy/F1. Our 100M-sized fusion models also beat CLIP and BLIP, as well as the much larger 9B-sized multimodal IDEFICS and text-only Llama3 and Gemma2, indicating that multimodal stance detection remains challenging for large language models.

inproceedings


NLP4PI @EMNLP 2024

3rd Workshop on NLP for Positive Impact at the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024.

Authors

J. Wang • L. Zuo • S. PengB. Plank

Links

DOI GitHub

Research Area

 B2 | Natural Language Processing

BibTeXKey: WZP+24

Back to Top