RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance
MCML Authors
Abstract
Abstract
Conversational AI tools for generating and discussing accurate radiology reports could transform radiology by enabling collaborative, human-in-the-loop diagnostic processes, saving time and enhancing report quality. While, to this end, Large Vision-Language Models hold promise, current methods lack clinical correctness or are single-task models without conversational abilities. We propose a novel architecture and dataset to address these limitations. First, we propose a secondary image branch, explicitly focusing on structured clinical findings, improving the clinical correctness score by 13.3%. Second, we propose a catastrophic forgetting mitigation strategy and instruct dataset with variable dialog-based tasks, to enable our model to handle a multitude of different queries. RaDialog marks a foundational step toward clinical dialog systems, outperforming existing medical LVLMs by 15.0% in clinical correctness in report generation, 23.4% in interactive report correction, and is preferred by radiologists in 84.0% of cases over a comparative method.
inproceedings POP+25
MIDL 2025
Medical Imaging with Deep Learning. Salt Lake City, UT, USA, Jul 09-11, 2025.Authors
C. Pellegrini • E. Özsoy • B. Busam • B. Wiestler • N. Navab • M. KeicherLinks
URL GitHubResearch Areas
BibTeXKey: POP+25