Home  | Publications | ZCF+25

CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering

MCML Authors

Abstract

Visual Question Answering (VQA) systems witnessed a significant advance in recent years due to the development of large-scale Vision-Language Pre-trained Models (VLPMs). As the application scenario and user demand change over time, an advanced VQA system is expected to be capable of continuously expanding its knowledge and capabilities over time, not only to handle new tasks (i.e., new question types or visual scenes) but also to answer questions in new specialized domains without forgetting previously acquired knowledge and skills. Existing works studying CL on VQA tasks primarily consider answer- and question-type incremental learning or scene- and function-incremental learning, whereas how VQA systems perform when they encounter new domains and increasing user demands has not been studied. Motivated by this, we introduce CL-CrossVQA, a rigorous Continual Learning benchmark for Cross-domain Visual Question Answering, through which we conduct extensive experiments on 4 VLPMs, 5 CL approaches, and 5 VQA datasets from different domains. In addition, by probing the forgetting phenomenon of the intermediate layers, we provide insights into how model architecture affects CL performance, why CL approaches can help mitigate forgetting in VLPMs, and how to design CL approaches suitable for VLPMs in this challenging continual learning environment. To facilitate future work on developing an advanced All-in-One VQA system, we will release our datasets and code.

inproceedings


WACV 2025

IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025.
Conference logo
A Conference

Authors

Y. ZhangH. Chen • A. Frikha • Y. Yang • D. Krompass • G. Zhang • J. Gu • V. Tresp

Links

DOI

Research Area

 A3 | Computational Models

BibTeXKey: ZCF+25

Back to Top