Home  | Publications | YXD+26

ChatEarthBench: Benchmarking Multimodal Large Language Models for Earth Observation

MCML Authors

Abstract

The recent advancements in multimodal large language models (MLLMs) offer new opportunities for Earth observation (EO) tasks by enhancing reasoning and analysis capabilities. However, fair and systematic evaluation of these models remains challenging. Existing assessments often suffer from dataset biases, which can lead to an overestimation of model performance and inconsistent comparisons across MLLMs. To address this issue, we introduce ChatEarthBench, a comprehensive benchmark dataset specifically designed for zero-shot evaluation of MLLMs in EO. ChatEarthBench comprises 10 image-text datasets spanning three data modalities. Importantly, these datasets are unseen by the evaluated MLLMs in our work to enable rigorous and fair zero-shot evaluation across diverse real-world EO tasks. By systematically analyzing MLLM performance across various EO tasks, we provide critical insights into their capabilities and limitations. Our findings offer essential guidance for the development of more robust and generalizable MLLMs for EO applications.

article YXD+26


IEEE Geoscience and Remote Sensing Magazine

Early Access. Jan. 2026.

Authors

Z. Yuan • Z. Xiong • T. Dujardin • X. Li • L. Mou • X. Zhu

Links

DOI

Research Area

 C3 | Physics and Geo Sciences

BibTeXKey: YXD+26

Back to Top