11.05.2026
Research Stay at Imperial College London
Jun Li – Funded by the MCML AI X-Change Program
After the MCML delegation trip to London, I was very happy to stay for one week at Imperial College London and visit Professor Wenjia Bai’s group as part of the MCML AI X-Change Program. It was a short but very meaningful week: I deepened our research exchange with Wenjia’s group, discussed new directions for medical vision-language models, and also had the chance to visit Google for a research discussion.

A memorable moment with Wenjia's research group at Imperial College London.
Research Discussions and New Directions
As a PhD student at the Technical University of Munich and the Munich Center for Machine Learning, my research focuses on vision-language models for medical AI. During the week, I met with Wenjia and several members of his group, including Dr. Che Liu and Yicheng Wu, and also had a discussion with Rishabh Kabra, a Research Engineer at Google DeepMind. Across these meetings, our conversations naturally circled around a few larger questions.
What’s next in medical AI? With Wenjia, we discussed the roles of two central modalities in medicine: language and images. Language is currently an exceptionally strong modality because it carries patient history, clinical reasoning, diagnostic impressions, and the way doctors communicate uncertainty and decisions. Medical images provide anatomical and pathological evidence, but the same image can often mean different things depending on the clinical context. This made us think about when a model should rely more on visual evidence, when it should rely more on textual context, and how the two should support each other rather than being forced into a single representation. The discussion also led to broader questions around privacy, ethics, and human-centered AI: as models become stronger, future research may need to start less from the model itself and more from the people whose data, decisions, and lives are involved.
Hyde Park near Imperial College London, a peaceful moment to reflect on the productive week.
To unify or not in medical multimodal AI? With Che, we discussed representation learning for medical multimodal models, especially the role of encoders. Should medical AI aim for one unified encoder across modalities and tasks, or are specialized encoders still necessary because medical data differs so much across imaging protocols, organs, diseases, and clinical contexts? A unified encoder is appealing because it promises scalability and shared representation learning, but medical data may require inductive biases that are not easily captured by a single generic representation. This made me think more carefully about what “foundation” should mean in medical foundation models: not only large-scale pretraining, but also the ability to preserve clinically meaningful distinctions. This question also reflects a broader challenge in medical multimodal learning. A medical model should not only answer isolated questions; it should help connect patient history, imaging, clinical notes, uncertainty, and possible next steps into a coherent clinical picture.
Research discussion at Google.
How should uncertainty be quantified in medical AI? With Yicheng, we spent a lot of time discussing uncertainty. This felt especially important because uncertainty is not a side issue in medicine; it is often part of the decision itself. Many diagnoses, treatment plans, and next-step recommendations are made under incomplete information. We discussed both aleatoric uncertainty, which comes from inherent ambiguity in the data or clinical situation, and epistemic uncertainty, which comes from limited knowledge, limited evidence, or model ignorance. The key question is not only how to predict uncertainty, but how to quantify it in a way that can support real clinical decisions. A useful medical AI system should help characterize what is known, what remains uncertain, and whether more evidence is needed.

Imperial College London campus entrance.
When should we align modalities? The broader London visit also gave me the opportunity to talk with Rishabh at DeepMind. We discussed how to align different modalities and how to avoid potential representation collapse. One idea that stayed with me is that different modalities can be seen as different observations of the same object or state, but they may contain very different amounts and types of information. Some modalities may be rich, while others may be more limited or noisy. Therefore, the alignment strategy should depend on the information inherent in each modality and on the goal of the model, rather than applying one unified recipe to every setting.
Research discussion and knowledge exchange at the department.
Collaboration and Outcomes
The visit also strengthened my collaboration with Wenjia’s group. I was very happy that our work on test-time decision learning for abnormality grounding in rare diseases was accepted to ICML. The research stay also gave us the chance to continue the exchange around this direction and think about possible next steps.
The Royal Albert Hall, an iconic landmark of London's cultural heritage.
Reflections
Looking back, this week in London was short but extremely productive. One lesson I took from the visit is that defining the problem may become increasingly important in medical AI. Models are moving very fast, especially under the scaling laws of large language models. If we want AI systems to support clinical reasoning and medical decisions, we first need to discover the right problem, define it carefully, and think about how it can be quantified. I also realized that even researchers working at the frontier are unsure about what the future of AI will ultimately look like. Perhaps one grand vision is to build a world model, and perhaps we as researchers are also gradually building our own world models, shaped by the questions we ask and the problems we choose to care about. What I hope, and still believe, is that even if we work on different topics, we are moving in the same direction: AI for good and AI for humans.
For me, this means: “Keep your eyes on the stars, and your feet on the ground.” Stay open to ambitious ideas, while remaining anchored in concrete problems where research can be meaningful. In medical AI, this means not only chasing more powerful models, but also asking how our work can help people and respect the responsibility that comes with real-world medicine. I am very grateful to the MCML AI X-Change Program for supporting this visit and to everyone I met for the meaningful discussions.
Related
08.05.2026
Right Answer, Wrong Reasoning - Is AI Thinking or Cheating?
Can AI cheat without us noticing? Our PI Barbara Plank and her team introduce a new detection method at ICLR 2026.
28.04.2026
Björn Ommer: How AI Can Transform Society if We Use It Responsibly
MCML PI Björn Ommer explains the philosophy behind Stable Diffusion and why his team focuses on efficiency.
27.04.2026
Research Stay at Imperial College London
Lennart Bastian joined a research stay at Imperial College London via MCML AI X-Change, working on higher-order topology for complex systems.