27.05.2026

©Florian Generotzky / LMU

Medical Diagnoses: How AI Explanations Help Doctors

Stefan Feuerriegel Shows That AI Models Like ChatGPT Can Improve Diagnostic Accuracy in Radiology

That the Universe is expanding has been known for almost a hundred years now, but how fast? The exact rate of that expansion remains hotly debated, even challenging the standard model of cosmology. A research team at LMU, TUM, and the Max Planck Institutes for Astrophysics (MPA) and for Extraterrestrial Physics (MPE) has now imaged and modelled an exceptionally rare supernova nicknamed SN Winny that could provide a new, independent way to measure how fast the Universe is expanding.

There is increasing discussion around the use of large language models like ChatGPT to support medical diagnosis. These LLMs can summarize information, suggest diagnoses, and justify their assessments in simple language. This represents a key promise of such systems: As well as providing a diagnosis, they can explain why a certain diagnosis is appropriate. But it has not yet been established whether such explanations actually help physicians – and which format is most useful.

This MRI image of a skull shows diffuse contrast-enhancing lesions in the brain.

©NEJM

Radiological images such as CT and MRI scans were at the heart of the study. This MRI image of a skull shows diffuse contrast-enhancing lesions in the brain. It is the job of radiologists to correctly classify these as, for example, inflammation, a tumor, or multiple sclerosis. With the right clinical questions, AI can provide support in reaching a diagnosis

Not All Forms of AI Assistance Are Equally Helpful

A research team from LMU Munich, LMU University Hospital, Karlsruhe Institute of Technology, and the University of Bayreuth has now investigated how different forms of AI explanations influence diagnostic accuracy in radiology. In a randomized experiment, 101 radiologists were asked to review real patient cases with radiological images such as CT or MRI scans and provide a diagnosis for each case in the form of an open-ended text.

“Radiology often involves combining complex imaging findings with clinical information,” explains Boj Friedrich Hoppe from LMU University Hospital. “In principle, language models can support radiologists here. Our study shows, however, that not every form of AI assistance is equally helpful. What’s crucial is whether the physicians can follow the reasoning and critically evaluate the recommendation.”

Diagnosis Alone Is Not Enough

Participants were randomly assigned to one of four groups. One group worked without AI support, while the other three received different outputs from a multimodal language model. The AI either provided a diagnosis alone, a differential diagnosis, or a chain-of-thought explanation. The latter explained imaging characteristics, clinical indications, and exclusion criteria in a verifiable manner and particularly helped physicians compare the recommendation against their domain knowledge.

“For clinical practice, it’s not enough for an AI system to just give a plausible-sounding answer,” says Hoppe. “Physicians must be able to follow which indications provide grounds for a particular diagnosis and where possible uncertainties exist.”

«Our results show that people can use such AI systems much more effectively if they do not just ask for an answer, but also for an account of the reasoning. A good AI answer is not just correct, but verifiable.»

Stefan Feuerriegel

MCML PI

Step-by-Step Explanations Improve Accuracy

The study shows that radiologists obtain the highest diagnostic accuracy with step-by-step AI explanations – the success rate was 12.2 percentage points above that of the control group without AI. Simple diagnostic outputs and differential diagnoses performed less well. Particularly in the case of incorrect AI suggestions, participants followed the differential diagnosis more frequently, which points to automation bias. Step-by-step explanations, by contrast, helped the physicians adopt correct suggestions in a more informed manner while also making them more likely to recognize errors.

The results suggest that the quality of the diagnosis alone is not decisive, but that the format of the explanation helps physicians critically evaluate the recommendation. Step-by-step justifications make the model’s argumentation more visible and allow doctors to compare it against their domain knowledge.

Differential diagnoses are important in medicine. In conjunction with language models, however, they can give the impression that the various diagnoses they present cover the entire diagnostic space. When dealing with rare or complex cases, this can make physicians less likely to think beyond the diagnoses provided by the AI.

Significance Beyond Medicine

Although the study focuses on radiology, its results apply well beyond this field, according to MCML PI Stefan Feuerriegel from the LMU Munich School of Management and corresponding author of the study. Systems like ChatGPT are increasingly being used for decision-making in everyday personal and professional contexts. “Our results show that people can use such AI systems much more effectively if they do not just ask for an answer, but also for an account of the reasoning.”

The type of interaction is vital here as well as the capabilities of the models. Users should actively assess AI answers, notes Feuerriegel: “A good AI answer is not just correct, but verifiable.”

Errors That Sound Convincing

The researchers emphasize that language models can make errors – both in diagnoses and their justification. Accordingly, AI systems should not be used as a substitute for medical expertise, but as tools to support physicians.

Step-by-step explanations in particular can render the AI’s assumptions visible and help doctors critically evaluate recommendations. The study demonstrates that AI improves diagnostic performance above all when its suggestions are presented along with explanations of its reasoning. By contrast, short answers and unelaborated lists can foster misplaced confidence in the AI’s suggestions.

©LMU Munich

#research #research-project #feuerriegel

Subscribe to RSS News feed

26.07.2026

Barbara Plank Becomes President of the Association for Computational Linguistics

MCML PI Barbara Plank becomes President of the Association for Computational Linguistics, the world's leading NLP organization.

24.07.2026

MCML Researchers at ECCV 2026 Workshop on Geometric Intelligence

MCML researchers co-organize an ECCV 2026 workshop exploring how geometry can drive the next generation of AI and scientific discovery.

22.07.2026

MCML Welcomes Student Delegation From HEC Montréal

MCML welcomed students from HEC Montréal for discussions on AI research, ethics, and international academic collaboration.

21.07.2026

Timo Heiß Receives Best Student Paper Award at XAI 2026

Timo Heiß receives the Best Student Paper Award at XAI 2026 for research on improving feature effect estimation in explainable AI.

21.07.2026

The Learning Rate Does More Than Set the Pace

New ICML 2026 research by Gitta Kutyniok and her team shows how learning rates balance competing biases that shape neural network generalization.

2026-05-27 - Last modified: 2026-05-27