Home | Publications | XKX+26

FINER: MLLMs Hallucinate Under Fine-Grained Negative Queries

MCML Authors

Rui Xiao

→ Group Zeynep Akata
Interpretable and Reliable Machine Learning

Sanghwan Kim

→ Group Zeynep Akata
Interpretable and Reliable Machine Learning

Zeynep Akata

Prof. Dr.

Principal Investigator

Interpretable and Reliable Machine Learning

Abstract

Multimodal large language models (MLLMs) struggle with hallucinations, particularly with fine-grained queries, a challenge underrepresented by existing benchmarks that focus on coarse image-related questions. We introduce FIne-grained NEgative queRies (FINER), alongside two benchmarks: FINER-CompreCap and FINER-DOCCI. Using FINER, we analyze hallucinations across four settings: multi-object, multi-attribute, multi-relation, and ``what'' questions. Our benchmarks reveal that MLLMs hallucinate when fine-grained mismatches co-occur with genuinely present elements in the image. To address this, we propose FINER-Tuning, leveraging Direct Preference Optimization (DPO) on FINER-inspired data. Finetuning four frontier MLLMs with FINER-Tuning yields up to 24.2% gains (InternVL3.5-14B) on hallucinations from our benchmarks, while simultaneously improving performance on eight existing hallucination suites, and enhancing general multimodal capabilities across six benchmarks.

inproceedings XKX+26