13.10.2024

©TUM

Success for the Group of Nassir Navab at MICCAI 2024

Multiple Awards Highlight Their Research in Medical Imaging and AI

The group of our PI Nassir Navab achieved remarkable recognition at MICCAI 2024, one of the world’s leading conferences in medical image computing and computer‑assisted interventions. Their outstanding research was honored with multiple prestigious awards across various tracks.

MICCAI Best Paper Runner‑up

E. Özsoy, C. Pellegrini, M. Keicher and N. Navab.
ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. Main Conference Best Paper Runner-up. DOI GitHub

Abstract

Every day, countless surgeries are performed worldwide, each within the distinct settings of operating rooms (ORs) that vary not only in their setups but also in the personnel, tools, and equipment used. This inherent diversity poses a substantial challenge for achieving a holistic understanding of the OR, as it requires models to generalize beyond their initial training datasets. To reduce this gap, we introduce ORacle, an advanced vision-language model designed for holistic OR domain modeling, which incorporates multi-view and temporal capabilities and can leverage external knowledge during inference, enabling it to adapt to previously unseen surgical scenarios. This capability is further enhanced by our novel data augmentation framework, which significantly diversifies the training dataset, ensuring ORacle’s proficiency in applying the provided knowledge effectively. In rigorous testing, in scene graph generation, and downstream tasks on the 4D-OR dataset, ORacle not only demonstrates state-of-the-art performance but does so requiring less data than existing models. Furthermore, its adaptability is displayed through its ability to interpret unseen views, actions, and appearances of tools and equipment. This demonstrates ORacle’s potential to significantly enhance the scalability and affordability of OR domain modeling and opens a pathway for future advancements in surgical data science.

MCML Authors

Ege Özsoy

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Principal Investigator

Computer Aided Medical Procedures & Augmented Reality

ICCAI GRAIL Best Paper

Ç. Köksal, G. Ghazaei, F. Holm, A. Farshad and N. Navab.
SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction.
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. GRAIL @MICCAI 2024 Best Paper Award. DOI

Abstract

Graph-based holistic scene representations facilitate surgical workflow understanding and have recently demonstrated significant success. However, this task is often hindered by the limited availability of densely annotated surgical scene data. In this work, we introduce an end-to-end framework for the generation and optimization of surgical scene graphs on a downstream task. Our approach leverages the flexibility of graph-based spectral clustering and the generalization capability of foundation models to generate unsupervised scene graphs with learnable properties. We reinforce the initial spatial graph with sparse temporal connections using local matches between consecutive frames to predict temporally consistent clusters across a temporal neighborhood. By jointly optimizing the spatiotemporal relations and node features of the dynamic scene graph with the downstream task of phase segmentation, we address the costly and annotation-burdensome task of semantic scene comprehension and scene graph generation in surgical videos using only weak surgical phase labels. Further, by incorporating effective intermediate scene representation disentanglement steps within the pipeline, our solution outperforms the SOTA on the CATARACTS dataset by 8% accuracy and 10% F1 score in surgical workflow recognition.

MCML Authors

Felix Holm

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Principal Investigator

Computer Aided Medical Procedures & Augmented Reality

MICCAI CLIP Best Paper

F. De Benetti, Y. Yaganeh, C. Belka, S. Corradini, N. Navab, C. Kurz, G. Landry, S. Albarqouni and T. Wendler.
CloverNet – Leveraging Planning Annotations for Enhanced Procedural MR Segmentation: An Application to Adaptive Radiation Therapy.
CLIP @MICCAI 2024 - 13th International Workshop on Clinical Image-Based Procedures at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. CLIP @MICCAI 2024 Best Paper Award. DOI

Abstract

In radiation therapy (RT), an accurate delineation of the regions of interest (ROI) and organs at risk (OAR) allows for a more targeted irradiation with reduced side effects. The current clinical workflow for combined MR-linear accelerator devices (MR-linacs) requires the acquisition of a planning MR volume (MR-P), in which the ROI and OAR are accurately segmented by the clinical team. These segmentation maps (S-P) are transferred to the MR acquired on the day of the RT fraction (MR-Fx) using registration, followed by time-consuming manual corrections. The goal of this paper is to enable accurate automatic segmentation of MR-Fx using S-P without clinical workflow disruption. We propose a novel UNet-based architecture, CloverNet, that takes as inputs MR-Fx and S-P in two separate encoder branches, whose latent spaces are concatenated in the bottleneck to generate an improved segmentation of MP-Fx. CloverNet improves the absolute Dice Score by 3.73% (relative +4.34%, p<0.001) when compared with conventional 3D UNet. Moreover, we believe this approach is potentially applicable to other longitudinal use cases in which a prior segmentation of the ROI is available.

MCML Authors

Nassir Navab

Prof. Dr.

Principal Investigator

Computer Aided Medical Procedures & Augmented Reality

MICCAI EARTH Best Paper

Y. Yeganeh, R. Lazuardi, A. Shamseddin, E. Dari, Y. Thirani, N. Navab and A. Farshad.
VISAGE: Video Synthesis using Action Graphs for Surgery.
EARTH @MICCAI 2024 - Workshop on Embodied AI and Robotics for HealTHcare at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. EARTH @MICCAI 2024 Best Paper Award. DOI

Abstract

Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.

MCML Authors

Yousef Yeganeh

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Principal Investigator

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

MICCAI ASMUS Best Paper Runner‑up

F. Dülmer, W. Simson, M. F. Azampour, M. Wysocki, A. Karlas and N. Navab.
PHOCUS: Physics-Based Deconvolution for Ultrasound Resolution Enhancement.
ASMUS @MICCAI 2024 - 5th International Workshop on Advances in Simplifying Medical Ultrasound at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. ASMUS @MICCAI 2024 Best Paper Award. DOI

Abstract

Ultrasound is widely used in medical diagnostics allowing for accessible and powerful imaging but suffers from resolution limitations due to diffraction and the finite aperture of the imaging system, which restricts diagnostic use. The impulse function of an ultrasound imaging system is called the point spread function (PSF), which is convolved with the spatial distribution of reflectors in the image formation process. Recovering high-resolution reflector distributions by removing image distortions induced by the convolution process improves image clarity and detail. Conventionally, deconvolution techniques attempt to rectify the imaging system’s dependent PSF, working directly on the radio-frequency (RF) data. However, RF data is often not readily accessible. Therefore, we introduce a physics-based deconvolution process using a modeled PSF, working directly on the more commonly available B-mode images. By leveraging Implicit Neural Representations (INRs), we learn a continuous mapping from spatial locations to their respective echogenicity values, effectively compensating for the discretized image space. Our contribution consists of a novel methodology for retrieving a continuous echogenicity map directly from a B-mode image through a differentiable physics-based rendering pipeline for ultrasound resolution enhancement. We qualitatively and quantitatively evaluate our approach on synthetic data, demonstrating improvements over traditional methods in metrics such as PSNR and SSIM. Furthermore, we show qualitative enhancements on an ultrasound phantom and an in-vivo acquisition of a carotid artery.

MCML Authors