Home  | News

27.05.2026

Teaser image to Medical diagnoses: how AI explanations help doctors

Medical Diagnoses: How AI Explanations Help Doctors

Stefan Feuerriegel Shows That AI Models Like ChatGPT Can Improve Diagnostic Accuracy in Radiology

That the Universe is expanding has been known for almost a hundred years now, but how fast? The exact rate of that expansion remains hotly debated, even challenging the standard model of cosmology. A research team at LMU, TUM, and the Max Planck Institutes for Astrophysics (MPA) and for Extraterrestrial Physics (MPE) has now imaged and modelled an exceptionally rare supernova nicknamed SN Winny that could provide a new, independent way to measure how fast the Universe is expanding.

There is increasing discussion around the use of large language models like ChatGPT to support medical diagnosis. These LLMs can summarize information, suggest diagnoses, and justify their assessments in simple language. This represents a key promise of such systems: As well as providing a diagnosis, they can explain why a certain diagnosis is appropriate. But it has not yet been established whether such explanations actually help physicians – and which format is most useful.


This MRI image of a skull shows diffuse contrast-enhancing lesions in the brain.

Radiological images such as CT and MRI scans were at the heart of the study. This MRI image of a skull shows diffuse contrast-enhancing lesions in the brain. It is the job of radiologists to correctly classify these as, for example, inflammation, a tumor, or multiple sclerosis. With the right clinical questions, AI can provide support in reaching a diagnosis


Not All Forms of AI Assistance Are Equally Helpful

A research team from LMU Munich, LMU University Hospital, Karlsruhe Institute of Technology, and the University of Bayreuth has now investigated how different forms of AI explanations influence diagnostic accuracy in radiology. In a randomized experiment, 101 radiologists were asked to review real patient cases with radiological images such as CT or MRI scans and provide a diagnosis for each case in the form of an open-ended text.

“Radiology often involves combining complex imaging findings with clinical information,” explains Boj Friedrich Hoppe from LMU University Hospital. “In principle, language models can support radiologists here. Our study shows, however, that not every form of AI assistance is equally helpful. What’s crucial is whether the physicians can follow the reasoning and critically evaluate the recommendation.”

Diagnosis Alone Is Not Enough

Participants were randomly assigned to one of four groups. One group worked without AI support, while the other three received different outputs from a multimodal language model. The AI either provided a diagnosis alone, a differential diagnosis, or a chain-of-thought explanation. The latter explained imaging characteristics, clinical indications, and exclusion criteria in a verifiable manner and particularly helped physicians compare the recommendation against their domain knowledge.

“For clinical practice, it’s not enough for an AI system to just give a plausible-sounding answer,” says Hoppe. “Physicians must be able to follow which indications provide grounds for a particular diagnosis and where possible uncertainties exist.”


«Our results show that people can use such AI systems much more effectively if they do not just ask for an answer, but also for an account of the reasoning. A good AI answer is not just correct, but verifiable.»


Stefan Feuerriegel

MCML PI


Step-by-Step Explanations Improve Accuracy

The study shows that radiologists obtain the highest diagnostic accuracy with step-by-step AI explanations – the success rate was 12.2 percentage points above that of the control group without AI. Simple diagnostic outputs and differential diagnoses performed less well. Particularly in the case of incorrect AI suggestions, participants followed the differential diagnosis more frequently, which points to automation bias. Step-by-step explanations, by contrast, helped the physicians adopt correct suggestions in a more informed manner while also making them more likely to recognize errors.

The results suggest that the quality of the diagnosis alone is not decisive, but that the format of the explanation helps physicians critically evaluate the recommendation. Step-by-step justifications make the model’s argumentation more visible and allow doctors to compare it against their domain knowledge.

Differential diagnoses are important in medicine. In conjunction with language models, however, they can give the impression that the various diagnoses they present cover the entire diagnostic space. When dealing with rare or complex cases, this can make physicians less likely to think beyond the diagnoses provided by the AI.

Significance Beyond Medicine

Although the study focuses on radiology, its results apply well beyond this field, according to MCML PI Stefan Feuerriegel from the LMU Munich School of Management and corresponding author of the study. Systems like ChatGPT are increasingly being used for decision-making in everyday personal and professional contexts. “Our results show that people can use such AI systems much more effectively if they do not just ask for an answer, but also for an account of the reasoning.”

The type of interaction is vital here as well as the capabilities of the models. Users should actively assess AI answers, notes Feuerriegel: “A good AI answer is not just correct, but verifiable.”

Errors That Sound Convincing

The researchers emphasize that language models can make errors – both in diagnoses and their justification. Accordingly, AI systems should not be used as a substitute for medical expertise, but as tools to support physicians.

Step-by-step explanations in particular can render the AI’s assumptions visible and help doctors critically evaluate recommendations. The study demonstrates that AI improves diagnostic performance above all when its suggestions are presented along with explanations of its reasoning. By contrast, short answers and unelaborated lists can foster misplaced confidence in the AI’s suggestions.

#research #research-project #feuerriegel

Related

Link to MCML at ICWSM 2026

26.05.2026

MCML at ICWSM 2026

MCML researchers are represented with 1 paper at ICWSM 2026.

Read more
Link to Björn Eskofier Featured in Heise Online

21.05.2026

Björn Eskofier Featured in Heise Online

Björn Eskofier participated in the panel discussion “How Research Scientists Build Health AI” at the Digital Health Innovation Forum.

Read more
Link to Cordelia Schmid Featured in Süddeutsche Zeitung

11.05.2026

Cordelia Schmid Featured in Süddeutsche Zeitung

Cordelia Schmid, a member of the MCML Advisory Board, was recently featured in Süddeutsche Zeitung for her work in computer vision and robotics.

Read more
Link to Right answer, wrong reasoning - Is AI Thinking or Cheating?

08.05.2026

Right Answer, Wrong Reasoning - Is AI Thinking or Cheating?

Can AI cheat without us noticing? Our PI Barbara Plank and her team introduce a new detection method at ICLR 2026.

Read more
Link to MCML Delegation Visit to the UK

07.05.2026

MCML Delegation Visit to the UK

MCML delegation visited top U.S. universities to advance AI X-Change and foster collaboration in generative and medical AI.

Read more
Back to Top