Home | Research | Area C

C | Domain-Specific Machine Learning

shows an immense potential, as both universities have several highly visible scientific domains with internationally renowned experts. This area facilitates translating ML concepts and technologies to many different domains.

C1 | Medicine

The research at MCML in Medicine and Healthcare focus on objectives that are necessary to overcome the hurdles for the deployment of ML approaches in clinical environments. In particular, advances are required in interpretable and explainable deep learning, robust and data efficient learning, privacy preserving learning, as well as in trust and safety of autonomous AI and ML systems.

Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Link to Profile Martin Menten

Martin Menten

Dr.

JRG Leader AI for Vision

Artificial Intelligence in Healthcare and Medicine

Link to Profile Peter Schüffler

Peter Schüffler

Prof. Dr.

Computational Pathology

Publication in Research Area C1
[130]
D. Huang, N. Navab and Z. Jiang.
Improving Probe Localization for Freehand 3D Ultrasound using Lightweight Cameras.
ICRA 2025 - IEEE International Conference on Robotics and Automation. Atlanta, GA, USA, May 19-23, 2025. To be published.
Abstract

Ultrasound (US) probe localization relative to the examined subject is essential for freehand 3D US imaging, which offers significant clinical value due to its affordability and unrestricted field of view. However, existing methods often rely on expensive tracking systems or bulky probes, while recent US image-based deep learning methods suffer from accumulated errors during probe maneuvering. To address these challenges, this study proposes a versatile, cost-effective probe pose localization method for freehand 3D US imaging, utilizing two lightweight cameras. To eliminate accumulated errors during US scans, we introduce PoseNet, which directly predicts the probe’s 6D pose relative to a preset world coordinate system based on camera observations. We first jointly train pose and camera image encoders based on pairs of 6D pose and camera observations densely sampled in simulation. This will encourage each pair of probe pose and its corresponding camera observation to share the same representation in latent space. To ensure the two encoders handle unseen images and poses effectively, we incorporate a triplet loss that enforces smaller differences in latent features between nearby poses compared to distant ones. Then, the pose decoder uses the latent representation of the camera images to predict the probe’s 6D pose. To bridge the sim-to-real gap, in the real world, we use the trained image encoder and pose decoder for initial predictions, followed by an additional MLP layer to refine the estimated pose, improving accuracy. The results obtained from an arm phantom demonstrate the effectiveness of the proposed method, which notably surpasses state-of-the-art techniques, achieving average positional and rotational errors of 2.03 mm and 0.37◦, respectively.

MCML Authors
Link to website

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality


[129]
S. Dahan, G. Bénédict, L. Z. J. Williams, Y. Guo, D. Rückert, R. Leech and E. C. Robinson.
SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv GitHub
Abstract

Current AI frameworks for brain decoding and encoding, typically train and test models within the same datasets. This limits their utility for brain computer interfaces (BCI) or neurofeedback, for which it would be useful to pool experiences across individuals to better simulate stimuli not sampled during training. A key obstacle to model generalisation is the degree of variability of inter-subject cortical organisation, which makes it difficult to align or compare cortical signals across participants. In this paper we address this through the use of surface vision transformers, which build a generalisable model of cortical functional dynamics, through encoding the topography of cortical networks and their interactions as a moving image across a surface. This is then combined with tri-modal self-supervised contrastive (CLIP) alignment of audio, video, and fMRI modalities to enable the retrieval of visual and auditory stimuli from patterns of cortical activity (and vice-versa). We validate our approach on 7T task-fMRI data from 174 healthy participants engaged in the movie-watching experiment from the Human Connectome Project (HCP). Results show that it is possible to detect which movie clips an individual is watching purely from their brain activity, even for individuals and movies not seen during training. Further analysis of attention maps reveals that our model captures individual patterns of brain activity that reflect semantic and visual systems. This opens the door to future personalised simulations of brain function.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[128]
L. Lux, A. H. Berger, A. Weers, N. Stucki, D. Rückert, U. Bauer and J. C. Paetzold.
Topograph: An efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

Topological correctness plays a critical role in many image segmentation tasks, yet most networks are trained using pixel-wise loss functions, such as Dice, neglecting topological accuracy. Existing topology-aware methods often lack robust topological guarantees, are limited to specific use cases, or impose high computational costs. In this work, we propose a novel, graph-based framework for topologically accurate image segmentation that is both computationally efficient and generally applicable. Our method constructs a component graph that fully encodes the topological information of both the prediction and ground truth, allowing us to efficiently identify topologically critical regions and aggregate a loss based on local neighborhood information. Furthermore, we introduce a strict topological metric capturing the homotopy equivalence between the union and intersection of prediction-label pairs. We formally prove the topological guarantees of our approach and empirically validate its effectiveness on binary and multi-class datasets. Our loss demonstrates state-of-the-art performance with up to fivefold faster loss computation compared to persistent homology methods.

MCML Authors
Link to website

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to website

Nico Stucki

Applied Topology and Geometry

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Ulrich Bauer

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry


[127]
J. R. Jostan, L. M. Rodriguez, D. Z. Bernal, J. O. Berdugo, V. Aljure, F. Lopez, J. R. Lopez, N. Navab, D. Mateus and V. G. Duque.
Ultrasound Nerve Segmentation with Deep Learning for Leprosy.
ISBI 2025 - IEEE 22nd International Symposium on Biomedical Imaging. Houston, TX, USA, Apr 14-17, 2025. To be published.
Abstract

Purpose: This study aims to provide an AI tool for detecting nerves in ultrasound images to help diagnose Hansen’s disease (Leprosy) in rural areas. The significant difference in the cross-sectional area (CSA) of superficial nerves in symmetrical extremities is a landmark in the early stages of the disease. Despite its potential, ultrasound nerve evaluation is limited due to the difficulty in accurately identifying nerves in ultrasound images.
Methodology: We propose the first Leprosy video nerve segmentation pipeline based on YOLOv8 and X-Mem architectures to automate frame detection, segmentation, and label propagation. We ensure alignment with clinical practices and evaluate the inference in real time of the method and its energy efficiency, confirming the approach’s feasibility in resource-limited settings.
Results: We establish a baseline for nerve segmentation of ultrasound Leprosy videos, presenting the first results to identify relevant frames, segment, and propagate labels. To support further research, we have open source a new leprosy test dataset and created a demo web page to try our method on real patient data. This initiative aims to promote research on AI techniques to improve healthcare in rural communities, where healthcare professionals are scarce and assistance is essential.

MCML Authors
Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[126]
Ö. Turgut, P. Müller, P. Hager, S. Shit, S. Starck, M. J. Menten, E. Martens and D. Rückert.
Unlocking the diagnostic potential of electrocardiograms through information transfer from cardiac magnetic resonance imaging.
Medical Image Analysis 101.103451 (Apr. 2025). DOI GitHub
Abstract

Cardiovascular diseases (CVD) can be diagnosed using various diagnostic modalities. The electrocardiogram (ECG) is a cost-effective and widely available diagnostic aid that provides functional information of the heart. However, its ability to classify and spatially localise CVD is limited. In contrast, cardiac magnetic resonance (CMR) imaging provides detailed structural information of the heart and thus enables evidence-based diagnosis of CVD, but long scan times and high costs limit its use in clinical routine. In this work, we present a deep learning strategy for cost-effective and comprehensive cardiac screening solely from ECG. Our approach combines multimodal contrastive learning with masked data modelling to transfer domain-specific information from CMR imaging to ECG representations. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalisability of our method for subject-specific risk prediction of CVD and the prediction of cardiac phenotypes using only ECG data. Specifically, our novel multimodal pre-training paradigm improves performance by up to 12.19% for risk prediction and 27.59% for phenotype prediction. In a qualitative analysis, we demonstrate that our learned ECG representations incorporate information from CMR image regions of interest.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[125]
A. H. Berger, L. Lux, S. Shit, I. Ezhov, G. Kaissis, M. Menten, D. Rückert and J. C. Paetzold.
Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. To be published. Preprint available. arXiv
Abstract

Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task’s complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method’s utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.

MCML Authors
Link to website

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to Profile Georgios Kaissis

Georgios Kaissis

Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Martin Menten

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[124]
Y. Li, M. Ghahremani, Y. Wally and C. Wachinger.
DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. To be published. Preprint available. arXiv
Abstract

Diagnosing dementia, particularly for Alzheimer’s Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. Moreover, the potential of multi-modal approaches in differential diagnosis, which holds significant clinical importance, remains largely unexplored. We propose a novel framework, DiaMond, to address these issues with vision Transformers to effectively integrate MRI and PET. DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance. DiaMond significantly outperforms existing multi-modal methods across various datasets, achieving a balanced accuracy of 92.4% in AD diagnosis, 65.2% for AD-MCI-CN classification, and 76.5% in differential diagnosis of AD and FTD. We also validated the robustness of DiaMond in a comprehensive ablation study.

MCML Authors
Link to website

Yitong Li

Artificial Intelligence in Medical Imaging

Link to website

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[123]
Y. Shen, Z. Zhuang, K. Yuan, M.-I. Nicolae, N. Navab, N. Padoy and M. Fritz.
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. To be published. Preprint available. arXiv
Abstract

Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems, assisting medical personnel with decision making and results analysis. Models for radiology report generation are able to interpret medical imagery, thus reducing the workload of radiologists. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. However, these assets are potentially vulnerable to model stealing, where attackers aim to replicate their functionality via black-box access. So far, model stealing for the medical domain has focused on classification; however, existing attacks are not effective against MLLMs. In this paper, we introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs. ADA-STEAL relies on natural images, which are public and widely available, as opposed to their medical counterparts. We show that data augmentation with adversarial noise is sufficient to overcome the data distribution gap between natural images and the domain-specific distribution of the victim MLLM. Experiments on the IU X-RAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data.

MCML Authors
Link to website

Kun Yuan

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[122]
Y. Bi, Y. Su, N. Navab and Z. Jiang.
Gaze-Guided Robotic Vascular Ultrasound Leveraging Human Intention Estimation.
IEEE Robotics and Automation Letters Early Access (Feb. 2025). DOI
Abstract

Medical ultrasound has been widely used to examine vascular structure in modern clinical practice. However, traditional ultrasound examination often faces challenges related to inter- and intra-operator variation. The robotic ultrasound system (RUSS) appears as a potential solution for such challenges because of its superiority in stability and reproducibility. Given the complex anatomy of human vasculature, multiple vessels often appear in ultrasound images, or a single vessel bifurcates into branches, complicating the examination process. To tackle this challenge, this work presents a gaze-guided RUSS for vascular applications. A gaze tracker captures the eye movements of the operator. The extracted gaze signal guides the RUSS to follow the correct vessel when it bifurcates. Additionally, a gaze-guided segmentation network is proposed to enhance segmentation robustness by exploiting gaze information. However, gaze signals are often noisy, requiring interpretation to accurately discern the operator’s true intentions. To this end, this study proposes a stabilization module to process raw gaze data. The inferred attention heatmap is utilized as a region proposal to aid segmentation and serve as a trigger signal when the operator needs to adjust the scanning target, such as when a bifurcation appears. To ensure appropriate contact between the probe and surface during scanning, an automatic ultrasound confidence-based orientation correction method is developed. In experiments, we demonstrated the efficiency of the proposed gaze-guided segmentation pipeline by comparing it with other methods. Besides, the performance of the proposed gaze-guided RUSS was also validated as a whole on a realistic arm phantom with an uneven surface.

MCML Authors
Link to website

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality


[121]
W. Li, H. Xu, J. Huang, H. Jung, P. Yu, N. Navab and B. Busam.
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation.
Preprint (Feb. 2025). arXiv GitHub
Abstract

A key challenge in model-free category-level pose estimation is the extraction of contextual object features that generalize across varying instances within a specific category. Recent approaches leverage foundational features to capture semantic and geometry cues from data. However, these approaches fail under partial visibility. We overcome this with a first-complete-then-aggregate strategy for feature extraction utilizing class priors. In this paper, we present GCE-Pose, a method that enhances pose estimation for novel instances by integrating category-level global context prior. GCE-Pose performs semantic shape reconstruction with a proposed Semantic Shape Reconstruction (SSR) module. Given an unseen partial RGB-D object instance, our SSR module reconstructs the instance’s global geometry and semantics by deforming category-specific 3D semantic prototypes through a learned deep Linear Shape Model. We further introduce a Global Context Enhanced (GCE) feature fusion module that effectively fuses features from partial RGB-D observations and the reconstructed global context. Extensive experiments validate the impact of our global context prior and the effectiveness of the GCE fusion module, demonstrating that GCE-Pose significantly outperforms existing methods on challenging real-world datasets HouseCat6D and NOCS-REAL275.

MCML Authors
Link to website

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[120]
S. Grosu, M. P. Fabritius, M. Winkelmann, D. Puhr-Westerheide, M. Ingenerf, S. Maurus, A. Graser, C. Schulz, T. Knösel, C. C. Cyran, J. Ricke, P. M. Kazmierczak, M. Ingrisch and P. Wesp.
Effect of artificial intelligence-aided differentiation of adenomatous and non-adenomatous colorectal polyps at CT colonography on radiologists’ therapy management.
European Radiology Early Access (Jan. 2025). DOI
Abstract

Objectives: Adenomatous colorectal polyps require endoscopic resection, as opposed to non-adenomatous hyperplastic colorectal polyps. This study aims to evaluate the effect of artificial intelligence (AI)-assisted differentiation of adenomatous and non-adenomatous colorectal polyps at CT colonography on radiologists’ therapy management.
Materials and methods: Five board-certified radiologists evaluated CT colonography images with colorectal polyps of all sizes and morphologies retrospectively and decided whether the depicted polyps required endoscopic resection. After a primary unassisted reading based on current guidelines, a second reading with access to the classification of a radiomics-based random-forest AI-model labelling each polyp as ’non-adenomatous’ or ‘adenomatous’ was performed. Performance was evaluated using polyp histopathology as the reference standard.
Results: 77 polyps in 59 patients comprising 118 polyp image series (47% supine position, 53% prone position) were evaluated unassisted and AI-assisted by five independent board-certified radiologists, resulting in a total of 1180 readings (subsequent polypectomy: yes or no). AI-assisted readings had higher accuracy (76% +/− 1% vs. 84% +/− 1%), sensitivity (78% +/− 6% vs. 85% +/− 1%), and specificity (73% +/− 8% vs. 82% +/− 2%) in selecting polyps eligible for polypectomy (p < 0.001). Inter-reader agreement was improved in the AI-assisted readings (Fleiss’ kappa 0.69 vs. 0.92).
Conclusion: AI-based characterisation of colorectal polyps at CT colonography as a second reader might enable a more precise selection of polyps eligible for subsequent endoscopic resection. However, further studies are needed to confirm this finding and histopathologic polyp evaluation is still mandatory.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to website

Philipp Wesp

Dr.

Clinical Data Science in Radiology


[119]
J. Li, T. Su, B. Zhao, F. Lv, Q. Wang, N. Navab, Y. Hu and Z. Jiang.
Ultrasound Report Generation With Cross-Modality Feature Alignment via Unsupervised Guidance.
IEEE Transactions on Medical Imaging 44.1 (Jan. 2025). DOI
Abstract

Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets.

MCML Authors
Link to website

Jun Li

Computational Imaging and AI in Medicine

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality


[118]
A. Bitarafan, M. Mozafari, M. F. Azampour, M. S. Baghshah, N. Navab and A. Farshad.
Self-supervised 3D medical image segmentation by flow-guided mask propagation learning.
Medical Image Analysis Journal Pre-proof.103478 (Jan. 2025). DOI GitHub
Abstract

Despite significant progress in 3D medical image segmentation using deep learning, manual annotation remains a labor-intensive bottleneck. Self-supervised mask propagation (SMP) methods have emerged to alleviate this challenge, allowing intra-volume segmentation with just a single slice annotation. However, the previous SMP methods often rely on 2D information and ignore volumetric contexts. While our previous work, called Vol2Flow, attempts to address this concern, it exhibits limitations, including not focusing enough on local (i.e., slice-pair) information, neglecting global information (i.e., volumetric contexts) in the objective function, and error accumulation during slice-to-slice reconstruction. This paper introduces Flow2Mask, a novel SMP method, developed to overcome the limitations of previous SMP approaches, particularly Vol2Flow. During training, Flow2Mask proposes the Local-to-Global (L2G) loss to learn inter-slice flow fields among all consecutive slices within a volume in an unsupervised manner. This dynamic loss is based on curriculum learning to gradually learn information within a volume from local to global contexts. Additionally, the Inter-Slice Smoothness (ISS) loss is introduced as a regularization term to encourage changes between the slices occur consistently and continuously. During inference, Flow2Mask leverages these 3D flow fields for inter-slice mask propagation in a 3D image, spreading annotation from a single annotated slice to the entire volume. Moreover, we propose an automatic strategy to select the most representative slice as initial annotation in the mask propagation process. Experimental evaluations on different abdominal datasets demonstrate that our proposed SMP method outperforms previous approaches and improves the overall mean DSC of Vol2Flow by +2.1%, +8.2%, and +4.0% for the Sliver, CHAOS, and 3D-IRCAD datasets, respectively. Furthermore, Flow2Mask even exhibits substantial improvements in weakly-supervised and self-supervised few-shot segmentation methods when applied as a mask completion tool.

MCML Authors
Link to website

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[117]
F. Drexel, V. Sideri-Lampretsa, H. Bast, A. W. Marka, T. Koehler, F. T. Gassert, D. Pfeiffer, D. Rückert and F. Pfeiffer.
Deformable Image Registration of Dark-Field Chest Radiographs for Local Lung Signal Change Assessment.
Preprint (Jan. 2025). arXiv
Abstract

Dark-field radiography of the human chest has been demonstrated to have promising potential for the analysis of the lung microstructure and the diagnosis of respiratory diseases. However, previous studies of dark-field chest radiographs evaluated the lung signal only in the inspiratory breathing state. Our work aims to add a new perspective to these previous assessments by locally comparing dark-field lung information between different respiratory states. To this end, we discuss suitable image registration methods for dark-field chest radiographs to enable consistent spatial alignment of the lung in distinct breathing states. Utilizing full inspiration and expiration scans from a clinical chronic obstructive pulmonary disease study, we assess the performance of the proposed registration framework and outline applicable evaluation approaches. Our regional characterization of lung dark-field signal changes between the breathing states provides a proof-of-principle that dynamic radiography-based lung function assessment approaches may benefit from considering registered dark-field images in addition to standard plain chest radiographs.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[116]
F. Duelmer, M. Azampour and N. Navab.
UltraRay: Full-Path Ray Tracing for Enhancing Realism in Ultrasound Simulation.
Preprint (Jan. 2025). arXiv
Abstract

Traditional ultrasound simulators solve the wave equation to model pressure distribution fields, achieving high accuracy but requiring significant computational time and resources. To address this, ray tracing approaches have been introduced, modeling wave propagation as rays interacting with boundaries and scatterers. However, existing models simplify ray propagation, generating echoes at interaction points without considering return paths to the sensor. This can result in unrealistic artifacts and necessitates careful scene tuning for plausible results. We propose a novel ultrasound simulation pipeline that utilizes a ray tracing algorithm to generate echo data, tracing each ray from the transducer through the scene and back to the sensor. To replicate advanced ultrasound imaging, we introduce a ray emission scheme optimized for plane wave imaging, incorporating delay and steering capabilities. Furthermore, we integrate a standard signal processing pipeline to simulate end-to-end ultrasound image formation. We showcase the efficacy of the proposed pipeline by modeling synthetic scenes featuring highly reflective objects, such as bones. In doing so, our proposed approach, UltraRay, not only enhances the overall visual quality but also improves the realism of the simulated images by accurately capturing secondary reflections and reducing unnatural artifacts. By building on top of a differentiable framework, the proposed pipeline lays the groundwork for a fast and differentiable ultrasound simulation tool necessary for gradient-based optimization, enabling advanced ultrasound beamforming strategies, neural network integration, and accurate inverse scene reconstruction.

MCML Authors
Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[115]
Z. Haouari, J. Weidner, I. Ezhov, A. Varma, D. Rückert, B. Menze and B. Wiestler.
Efficient Deep Learning-based Forward Solvers for Brain Tumor Growth Models.
Preprint (Jan. 2025). arXiv
Abstract

Glioblastoma, a highly aggressive brain tumor, poses major challenges due to its poor prognosis and high morbidity rates. Partial differential equation-based models offer promising potential to enhance therapeutic outcomes by simulating patient-specific tumor behavior for improved radiotherapy planning. However, model calibration remains a bottleneck due to the high computational demands of optimization methods like Monte Carlo sampling and evolutionary algorithms. To address this, we recently introduced an approach leveraging a neural forward solver with gradient-based optimization to significantly reduce calibration time. This approach requires a highly accurate and fully differentiable forward model. We investigate multiple architectures, including (i) an enhanced TumorSurrogate, (ii) a modified nnU-Net, and (iii) a 3D Vision Transformer (ViT). The optimized TumorSurrogate achieved the best overall results, excelling in both tumor outline matching and voxel-level prediction of tumor cell concentration. It halved the MSE relative to the baseline model and achieved the highest Dice score across all tumor cell concentration thresholds. Our study demonstrates significant enhancement in forward solver performance and outlines important future research directions.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy


[114]
B. Jian, J. Pan, Y. Li, F. Bongratz, R. Li, D. Rückert, B. Wiestler and C. Wachinger.
TimeFlow: Longitudinal Brain Image Registration and Aging Progression Analysis.
Preprint (Jan. 2025). arXiv
Abstract

Predicting future brain states is crucial for understanding healthy aging and neurodegenerative diseases. Longitudinal brain MRI registration, a cornerstone for such analyses, has long been limited by its inability to forecast future developments, reliance on extensive, dense longitudinal data, and the need to balance registration accuracy with temporal smoothness. In this work, we present emph{TimeFlow}, a novel framework for longitudinal brain MRI registration that overcomes all these challenges. Leveraging a U-Net architecture with temporal conditioning inspired by diffusion models, TimeFlow enables accurate longitudinal registration and facilitates prospective analyses through future image prediction. Unlike traditional methods that depend on explicit smoothness regularizers and dense sequential data, TimeFlow achieves temporal consistency and continuity without these constraints. Experimental results highlight its superior performance in both future timepoint prediction and registration accuracy compared to state-of-the-art methods. Additionally, TimeFlow supports novel biological brain aging analyses, effectively differentiating neurodegenerative conditions from healthy aging. It eliminates the need for segmentation, thereby avoiding the challenges of non-trivial annotation and inconsistent segmentation errors. TimeFlow paves the way for accurate, data-efficient, and annotation-free prospective analyses of brain aging and chronic diseases.

MCML Authors
Link to website

Bailiang Jian

Artificial Intelligence in Medical Imaging

Link to website

Yitong Li

Artificial Intelligence in Medical Imaging

Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[113]
T. N. Wolf and C. Wachinger.
WASUP: Interpretable Classification with Weight-Input Alignment and Class-Discriminative SUPports Vectors.
Preprint (Jan. 2025). arXiv
Abstract

The deployment of deep learning models in critical domains necessitates a balance between high accuracy and interpretability. We introduce WASUP, an inherently interpretable neural network that provides local and global explanations of its decision-making process. We prove that these explanations are faithful by fulfilling established axioms for explanations. Leveraging the concept of case-based reasoning, WASUP extracts class-representative support vectors from training images, ensuring they capture relevant features while suppressing irrelevant ones. Classification decisions are made by calculating and aggregating similarity scores between these support vectors and the input’s latent feature vector. We employ B-Cos transformations, which align model weights with inputs to enable faithful mappings of latent features back to the input space, facilitating local explanations in addition to global explanations of case-based reasoning. We evaluate WASUP on three tasks: fine-grained classification on Stanford Dogs, multi-label classification on Pascal VOC, and pathology detection on the RSNA dataset. Results indicate that WASUP not only achieves competitive accuracy compared to state-of-the-art black-box models but also offers insightful explanations verified through theoretical analysis. Our findings underscore WASUP’s potential for applications where understanding model decisions is as critical as the decisions themselves.

MCML Authors
Link to website

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[112]
J. Wang, M. Ghahremani, Y. Li, B. Ommer and C. Wachinger.
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub
Abstract

Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skeleton human poses, especially in complex pose conditions such as side or rear perspectives of human figures. To address this issue, we present Stable-Pose, a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer (ViT) to gain accurate pose guidance for T2I models. Stable-Pose is designed to adeptly handle pose conditions within pre-trained Stable Diffusion, providing a refined and efficient way of aligning pose representation during image synthesis. We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons. Masked pose images are used to smoothly refine the attention maps based on target pose-related features in a hierarchical manner, transitioning from coarse to fine levels. Additionally, our loss function is formulated to allocate increased emphasis to the pose region, thereby augmenting the model’s precision in capturing intricate pose details. We assessed the performance of Stable-Pose across five public datasets under a wide range of indoor and outdoor human pose scenarios. Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet.

MCML Authors
Link to website

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Link to website

Yitong Li

Artificial Intelligence in Medical Imaging

Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Computer Vision & Learning

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[111]
A. H. B. Alexander H. Berger, L. Lux, A. Weers, M. Menten, D. Rückert and J. C. Paetzold.
Pitfalls of topology-aware image segmentation.
Preprint (Dec. 2024). arXiv
Abstract

Topological correctness, i.e., the preservation of structural integrity and specific characteristics of shape, is a fundamental requirement for medical imaging tasks, such as neuron or vessel segmentation. Despite the recent surge in topology-aware methods addressing this challenge, their real-world applicability is hindered by flawed benchmarking practices. In this paper, we identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts in ground truth annotations, and inappropriate use of evaluation metrics. Through detailed empirical analysis, we uncover these issues’ profound impact on the evaluation and ranking of segmentation methods. Drawing from our findings, we propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.

MCML Authors
Link to website

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to Profile Martin Menten

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[110]
M. Fischer, P. Neher, P. J. Schüffler, S. Ziegler, S. Xiao, R. Peretzke, D. Clunie, C. Ulrich, M. Baumgartner, A. Muckenhuber, S. Dias Almeida, M. Götz, J. Kleesiek, M. Nolden, R. Braren and K. Maier-Hein.
Unlocking the Potential of Digital Pathology: Novel Baselines for Compression.
Preprint (Dec. 2024). arXiv
Abstract

Digital pathology offers a groundbreaking opportunity to transform clinical practice in histopathological image analysis, yet faces a significant hurdle: the substantial file sizes of pathological Whole Slide Images (WSI). While current digital pathology solutions rely on lossy JPEG compression to address this issue, lossy compression can introduce color and texture disparities, potentially impacting clinical decision-making. While prior research addresses perceptual image quality and downstream performance independently of each other, we jointly evaluate compression schemes for perceptual and downstream task quality on four different datasets. In addition, we collect an initially uncompressed dataset for an unbiased perceptual evaluation of compression schemes. Our results show that deep learning models fine-tuned for perceptual quality outperform conventional compression schemes like JPEG-XL or WebP for further compression of WSI. However, they exhibit a significant bias towards the compression artifacts present in the training data and struggle to generalize across various compression schemes. We introduce a novel evaluation metric based on feature similarity between original files and compressed files that aligns very well with the actual downstream performance on the compressed WSI. Our metric allows for a general and standardized evaluation of lossy compression schemes and mitigates the requirement to independently assess different downstream tasks. Our study provides novel insights for the assessment of lossy compression schemes for WSI and encourages a unified evaluation of lossy compression schemes to accelerate the clinical uptake of digital pathology.

MCML Authors
Link to Profile Peter Schüffler

Peter Schüffler

Prof. Dr.

Computational Pathology


[109]
S. Liang, S. Wang, K. Li, M. Niemeyer, S. Gasperini, N. Navab and F. Tombari.
SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians.
Preprint (Dec. 2024). arXiv
Abstract

3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While the vanilla Gaussian Splatting representation is mainly designed for view synthesis, more recent works investigated how to extend it with scene understanding and language features. However, existing methods lack a detailed comprehension of scenes, limiting their ability to segment and interpret complex structures. To this end, We introduce SuperGSeg, a novel approach that fosters cohesive, context-aware scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural Gaussians to learn instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of what we call Super-Gaussians. Super-Gaussians facilitate the distillation of 2D language features into 3D space. Through Super-Gaussians, our method enables high-dimensional language feature rendering without extreme increases in GPU memory. Extensive experiments demonstrate that SuperGSeg outperforms prior works on both open-vocabulary object localization and semantic segmentation tasks.

MCML Authors
Link to website

Kunyi Li

Computer Aided Medical Procedures & Augmented Reality

Link to website

Stefano Gasperini

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[108]
A. Reithmeir, V. Spieker, V. Sideri-Lampretsa, D. Rückert, J. A. Schnabel and V. A. Zimmer.
From Model Based to Learned Regularization in Medical Image Registration: A Comprehensive Review.
Preprint (Dec. 2024). arXiv
Abstract

Image registration is fundamental in medical imaging applications, such as disease progression analysis or radiation therapy planning. The primary objective of image registration is to precisely capture the deformation between two or more images, typically achieved by minimizing an optimization problem. Due to its inherent ill-posedness, regularization is a key component in driving the solution toward anatomically meaningful deformations. A wide range of regularization methods has been proposed for both conventional and deep learning-based registration. However, the appropriate application of regularization techniques often depends on the specific registration problem, and no one-fits-all method exists. Despite its importance, regularization is often overlooked or addressed with default approaches, assuming existing methods are sufficient. A comprehensive and structured review remains missing. This review addresses this gap by introducing a novel taxonomy that systematically categorizes the diverse range of proposed regularization methods. It highlights the emerging field of learned regularization, which leverages data-driven techniques to automatically derive deformation properties from the data. Moreover, this review examines the transfer of regularization methods from conventional to learning-based registration, identifies open challenges, and outlines future research directions. By emphasizing the critical role of regularization in image registration, we hope to inspire the research community to reconsider regularization strategies in modern registration algorithms and to explore this rapidly evolving field further.

MCML Authors
Link to website

Anna Reithmeir

Computational Imaging and AI in Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[107]
J. Weidner, M. Balcerak, I. Ezhov, A. Datchev, L. Lux, L. Zimmer, D. Rückert, B. Menze and B. Wiestler.
Spatial Brain Tumor Concentration Estimation for Individualized Radiotherapy Planning.
Preprint (Dec. 2024). arXiv
Abstract

Biophysical modeling of brain tumors has emerged as a promising strategy for personalizing radiotherapy planning by estimating the otherwise hidden distribution of tumor cells within the brain. However, many existing state-of-the-art methods are computationally intensive, limiting their widespread translation into clinical practice. In this work, we propose an efficient and direct method that utilizes soft physical constraints to estimate the tumor cell concentration from preoperative MRI of brain tumor patients. Our approach optimizes a 3D tumor concentration field by simultaneously minimizing the difference between the observed MRI and a physically informed loss function. Compared to existing state-of-the-art techniques, our method significantly improves predicting tumor recurrence on two public datasets with a total of 192 patients while maintaining a clinically viable runtime of under one minute - a substantial reduction from the 30 minutes required by the current best approach. Furthermore, we showcase the generalizability of our framework by incorporating additional imaging information and physical constraints, highlighting its potential to translate to various medical diffusion phenomena with imperfect data.

MCML Authors
Link to website

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy


[106]
Y. Yeganeh, I. Charisiadis, M. Hasny, M. Hartenberger, B. Ommer, N. Navab, A. Farshad and E. Adeli.
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis.
Preprint (Dec. 2024). arXiv
Abstract

Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models; however, such large datasets are not always accessible in medical imaging due to cost and privacy issues, which contradicts one of the main applications of such models to produce synthetic samples where real data is scarce. Also, finetuning on pre-trained general models has been a challenge due to the distribution shift between the medical domain and the pre-trained models. Here, we propose Latent Drift (LD) for diffusion models that can be adopted for any fine-tuning method to mitigate the issues faced by the distribution shift or employed in inference time as a condition. Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation, which is crucial to investigate how parameters such as gender, age, and adding or removing diseases in a patient would alter the medical images. We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation. Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes. The source code of this work will be publicly released upon its acceptance.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Computer Vision & Learning

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[105]
Y. Yeganeh, R. Xiao, G. Guvercin, N. Navab and A. Farshad.
Conformable Convolution for Topologically Aware Learning of Complex Anatomical Structures.
Preprint (Dec. 2024). arXiv
Abstract

While conventional computer vision emphasizes pixel-level and feature-based objectives, medical image analysis of intricate biological structures necessitates explicit representation of their complex topological properties. Despite their successes, deep learning models often struggle to accurately capture the connectivity and continuity of fine, sometimes pixel-thin, yet critical structures due to their reliance on implicit learning from data. Such shortcomings can significantly impact the reliability of analysis results and hinder clinical decision-making. To address this challenge, we introduce Conformable Convolution, a novel convolutional layer designed to explicitly enforce topological consistency. Conformable Convolution learns adaptive kernel offsets that preferentially focus on regions of high topological significance within an image. This prioritization is guided by our proposed Topological Posterior Generator (TPG) module, which leverages persistent homology. The TPG module identifies key topological features and guides the convolutional layers by applying persistent homology to feature maps transformed into cubical complexes. Our proposed modules are architecture-agnostic, enabling them to be integrated seamlessly into various architectures. We showcase the effectiveness of our framework in the segmentation task, where preserving the interconnectedness of structures is critical. Experimental results on three diverse datasets demonstrate that our framework effectively preserves the topology in the segmentation downstream task, both quantitatively and qualitatively.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[104]
R. Liao, M. Erler, H. Wang, G. Zhai, G. Zhang, Y. Ma and V. Tresp.
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub
Abstract

In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA.

MCML Authors
Link to website

Ruotong Liao

Database Systems and Data Mining

Link to website

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to website

Gengyuan Zhang

Database Systems and Data Mining

Link to website

Yunpu Ma

Dr.

Artificial Intelligence and Machine Learning

Link to Profile Volker Tresp

Volker Tresp

Prof. Dr.

Database Systems and Data Mining


[103]
A. T. Stüber, M. M. Heimer, J. Ta, M. P. Fabritius, B. F. Hoppe, G. Sheikh, M. Brendel, L. Unterrainer, P. Jurmeister, A. Tufman, J. Ricke, C. C. Cyran and M. Ingrisch.
Replication study of PD-L1 status prediction in NSCLC using PET/CT radiomics.
European Journal of Radiology (Nov. 2024). In press. DOI
Abstract

This study investigates the predictive capability of radiomics in determining programmed cell death ligand 1 (PD-L1) expression (>=1%) status in non-small cell lung cancer (NSCLC) patients using a newly collected [18F]FDG PET/CT dataset. We aimed to replicate and validate the radiomics-based machine learning (ML) model proposed by Zhao et al. [2] predicting PD-L1 status from PET/CT-imaging.
An independent cohort of 254 NSCLC patients underwent [18F]FDG PET/CT imaging, with primary tumor segmentation conducted using lung tissue window (LTW) and more conservative soft tissue window (STW) methods. Radiomics models (“Rad-score” and “complex model”) and a clinical-stage model from Zhao et al. were evaluated via 10-fold cross-validation and AUC analysis, alongside a benchmark-study comparing different ML-model pipelines. Clinicopathological data were collected from medical records.
On our data, the Rad-score model yielded mean AUCs of 0.593 (STW) and 0.573 (LTW), below Zhao et al.’s 0.761. The complex model achieved mean AUCs of 0.505 (STW) and 0.519 (LTW), lower than Zhao et al.’s 0.769. The clinical model showed a mean AUC of 0.555, below Zhao et al.’s 0.64. All models performed significantly lower than Zhao et al.’s findings. Our benchmark study on four ML pipelines revealed consistently low performance across all configurations.
Our study failed to replicate original findings, suggesting poor model performance and questioning predictive value of radiomics features in classifying PD-L1 expression from PET/CT imaging. These results highlight challenges in replicating radiomics-based ML models and stress the need for rigorous validation

MCML Authors
Link to website

Theresa Stüber

Clinical Data Science in Radiology

Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[102]
M. Azampour, K. Mach, E. Fatemizadeh, B. Demiray, K. Westenfelder, K. Steiger, M. Eiber, T. Wendler, B. Kainz and N. Navab.
Multitask Weakly Supervised Generative Network for MR-US Registration.
IEEE Transactions on Medical Imaging 43.11 (Nov. 2024). DOI
Abstract

Registering pre-operative modalities, such as magnetic resonance imaging or computed tomography, to ultrasound images is crucial for guiding clinicians during surgeries and biopsies. Recently, deep-learning approaches have been proposed to increase the speed and accuracy of this registration problem. However, all of these approaches need expensive supervision from the ultrasound domain. In this work, we propose a multitask generative framework that needs weak supervision only from the pre-operative imaging domain during training. To perform a deformable registration, the proposed framework translates a magnetic resonance image to the ultrasound domain while preserving the structural content. To demonstrate the efficacy of the proposed method, we tackle the registration problem of pre-operative 3D MR to transrectal ultrasonography images as necessary for targeted prostate biopsies. We use an in-house dataset of 600 patients, divided into 540 for training, 30 for validation, and the remaining for testing. An expert manually segmented the prostate in both modalities for validation and test sets to assess the performance of our framework. The proposed framework achieves a 3.58 mm target registration error on the expert-selected landmarks, 89.2% in the Dice score, and 1.81 mm 95th percentile Hausdorff distance on the prostate masks in the test set. Our experiments demonstrate that the proposed generative model successfully translates magnetic resonance images into the ultrasound domain. The translated image contains the structural content and fine details due to an ultrasound-specific two-path design of the generative model. The proposed framework enables training learning-based registration methods while only weak supervision from the pre-operative domain is available.

MCML Authors
Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[101]
Y. Li, Y. Zhang, K. Kawaguchi, A. Khakzar, B. Bischl and M. Rezaei.
A Dual-Perspective Approach to Evaluating Feature Attribution Methods.
Transactions on Machine Learning Research (Nov. 2024). URL
Abstract

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model’s behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

MCML Authors
Link to website

Yawei Li

Statistical Learning and Data Science

Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to website

Mina Rezaei

Dr.

Statistical Learning and Data Science


[100]
F. Bongratz, M. Karmann, A. Holz, M. Bonhoeffer, V. Neumaier, S. Deli, B. Schmitz-Koep, C. Zimmer, C. Sorg, M. Thalhammer, D. M. Hedderich and C. Wachinger.
MLV2-Net: Rater-Based Majority-Label Voting for Consistent Meningeal Lymphatic Vessel Segmentation.
Preprint (Nov. 2024). arXiv
Abstract

Meningeal lymphatic vessels (MLVs) are responsible for the drainage of waste products from the human brain. An impairment in their functionality has been associated with aging as well as brain disorders like multiple sclerosis and Alzheimer’s disease. However, MLVs have only recently been described for the first time in magnetic resonance imaging (MRI), and their ramified structure renders manual segmentation particularly difficult. Further, as there is no consistent notion of their appearance, human-annotated MLV structures contain a high inter-rater variability that most automatic segmentation methods cannot take into account. In this work, we propose a new rater-aware training scheme for the popular nnU-Net model, and we explore rater-based ensembling strategies for accurate and consistent segmentation of MLVs. This enables us to boost nnU-Net’s performance while obtaining explicit predictions in different annotation styles and a rater-based uncertainty estimation. Our final model, MLV2-Net, achieves a Dice similarity coefficient of 0.806 with respect to the human reference standard. The model further matches the human inter-rater reliability and replicates age-related associations with MLV volume.

MCML Authors
Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[99]
V. Ehm, N. El Amrani, Y. Xie, L. Bastian, M. Gao, W. Wang, L. Sang, D. Cao, Z. Lähner, D. Cremers and F. Bernard.
Beyond Complete Shapes: A Quantitative Evaluation of 3D Shape Matching Algorithms.
Preprint (Nov. 2024). arXiv
Abstract

Finding correspondences between 3D shapes is an important and long-standing problem in computer vision, graphics and beyond. While approaches based on machine learning dominate modern 3D shape matching, almost all existing (learning-based) methods require that at least one of the involved shapes is complete. In contrast, the most challenging and arguably most practically relevant setting of matching partially observed shapes, is currently underexplored. One important factor is that existing datasets contain only a small number of shapes (typically below 100), which are unable to serve data-hungry machine learning approaches, particularly in the unsupervised regime. In addition, the type of partiality present in existing datasets is often artificial and far from realistic. To address these limitations and to encourage research on these relevant settings, we provide a generic and flexible framework for the procedural generation of challenging partial shape matching scenarios. Our framework allows for a virtually infinite generation of partial shape matching instances from a finite set of shapes with complete geometry. Further, we manually create cross-dataset correspondences between seven existing (complete geometry) shape matching datasets, leading to a total of 2543 shapes. Based on this, we propose several challenging partial benchmark settings, for which we evaluate respective state-of-the-art methods as baselines.

MCML Authors
Link to website

Viktoria Ehm

Computer Vision & Artificial Intelligence

Link to website

Lennart Bastian

Computer Aided Medical Procedures & Augmented Reality

Link to website

Maolin Gao

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[98]
M. Szép, D. Rückert, R. Eisenhart-Rothe and F. Hinterwimmer.
A Practical Guide to Fine-tuning Language Models with Limited Data.
Preprint (Nov. 2024). arXiv
Abstract

Employing pre-trained Large Language Models (LLMs) has become the de facto standard in Natural Language Processing (NLP) despite their extensive data requirements. Motivated by the recent surge in research focused on training LLMs with limited data, particularly in low-resource domains and languages, this paper surveys recent transfer learning approaches to optimize model performance in downstream tasks where data is scarce. We first address initial and continued pre-training strategies to better leverage prior knowledge in unseen domains and languages. We then examine how to maximize the utility of limited data during fine-tuning and few-shot learning. The final section takes a task-specific perspective, reviewing models and methods suited for different levels of data scarcity. Our goal is to provide practitioners with practical guidelines for overcoming the challenges posed by constrained data while also highlighting promising directions for future research.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[97]
O. Wysocki, Y. Tan, T. Froech, Y. Xia, M. Wysocki, L. Hoegner, D. Cremers and C. Holst.
ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset.
Preprint (Nov. 2024). arXiv
Abstract

Facade semantic segmentation is a long-standing challenge in photogrammetry and computer vision. Although the last decades have witnessed the influx of facade segmentation methods, there is a lack of comprehensive facade classes and data covering the architectural variability. In ZAHA, we introduce Level of Facade Generalization (LoFG), novel hierarchical facade classes designed based on international urban modeling standards, ensuring compatibility with real-world challenging classes and uniform methods’ comparison. Realizing the LoFG, we present to date the largest semantic 3D facade segmentation dataset, providing 601 million annotated points at five and 15 classes of LoFG2 and LoFG3, respectively. Moreover, we analyze the performance of baseline semantic segmentation methods on our introduced LoFG classes and data, complementing it with a discussion on the unresolved challenges for facade segmentation. We firmly believe that ZAHA shall facilitate further development of 3D facade semantic segmentation methods, enabling robust segmentation indispensable in creating urban digital twins.

MCML Authors
Link to website

Yan Xia

Dr.

Computer Vision & Artificial Intelligence

Link to website

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[96]
A. Ranne, L. Kuang, Y. Velikova, N. Navab and F. Baena.
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers.
IROS 2024 - IEEE/RSJ International Conference on Intelligent Robots and Systems. Abu Dhabi, United Arab Emirates, Oct 14-18, 2024. DOI
Abstract

In minimally invasive endovascular procedures, contrast-enhanced angiography remains the most robust imaging technique. However, it is at the expense of the patient and clinician’s health due to prolonged radiation exposure. As an alternative, interventional ultrasound has notable benefits such as being radiation-free, fast to deploy, and having a small footprint in the operating room. Yet, ultrasound is hard to interpret, and highly prone to artifacts and noise. Additionally, interventional radiologists must undergo extensive training before they become qualified to diagnose and treat patients effectively, leading to a shortage of staff, and a lack of open-source datasets. In this work, we seek to address both problems by introducing a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images, without demanding any labeled data. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism, and is capable of learning feature changes across time and space. To facilitate training, we used synthetic ultrasound data based on physics-driven catheter insertion simulations, and translated the data into a unique CT-Ultrasound common domain, CACTUSS, to improve the segmentation performance. We generated ground truth segmentation masks by computing the optical flow between adjacent frames using FlowNet2, and performed thresholding to obtain a binary map estimate. Finally, we validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms, thus demonstrating its potential for applications to clinical data in the future.

MCML Authors
Link to website

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[95]
F. Dülmer, W. Simson, M. F. Azampour, M. Wysocki, A. Karlas and N. Navab.
PHOCUS: Physics-Based Deconvolution for Ultrasound Resolution Enhancement.
ASMUS @MICCAI 2024 - 5th International Workshop on Advances in Simplifying Medical Ultrasound at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. ASMUS @MICCAI 2024 Best Paper. DOI
Abstract

Ultrasound is widely used in medical diagnostics allowing for accessible and powerful imaging but suffers from resolution limitations due to diffraction and the finite aperture of the imaging system, which restricts diagnostic use. The impulse function of an ultrasound imaging system is called the point spread function (PSF), which is convolved with the spatial distribution of reflectors in the image formation process. Recovering high-resolution reflector distributions by removing image distortions induced by the convolution process improves image clarity and detail. Conventionally, deconvolution techniques attempt to rectify the imaging system’s dependent PSF, working directly on the radio-frequency (RF) data. However, RF data is often not readily accessible. Therefore, we introduce a physics-based deconvolution process using a modeled PSF, working directly on the more commonly available B-mode images. By leveraging Implicit Neural Representations (INRs), we learn a continuous mapping from spatial locations to their respective echogenicity values, effectively compensating for the discretized image space. Our contribution consists of a novel methodology for retrieving a continuous echogenicity map directly from a B-mode image through a differentiable physics-based rendering pipeline for ultrasound resolution enhancement. We qualitatively and quantitatively evaluate our approach on synthetic data, demonstrating improvements over traditional methods in metrics such as PSNR and SSIM. Furthermore, we show qualitative enhancements on an ultrasound phantom and an in-vivo acquisition of a carotid artery.

MCML Authors
Link to website

Felix Dülmer

Computer Aided Medical Procedures & Augmented Reality

Walter Simson

Walter Simson

Dr.

* Former Member

Link to website

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Link to website

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[94]
F. De Benetti, Y. Yaganeh, C. Belka, S. Corradini, N. Navab, C. Kurz, G. Landry, S. Albarqouni and T. Wendler.
CloverNet – Leveraging Planning Annotations for Enhanced Procedural MR Segmentation: An Application to Adaptive Radiation Therapy.
CLIP @MICCAI 2024 - 13th International Workshop on Clinical Image-Based Procedures at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. CLIP @MICCAI 2024 Best Paper. DOI
Abstract

In radiation therapy (RT), an accurate delineation of the regions of interest (ROI) and organs at risk (OAR) allows for a more targeted irradiation with reduced side effects. The current clinical workflow for combined MR-linear accelerator devices (MR-linacs) requires the acquisition of a planning MR volume (MR-P), in which the ROI and OAR are accurately segmented by the clinical team. These segmentation maps (S-P) are transferred to the MR acquired on the day of the RT fraction (MR-Fx) using registration, followed by time-consuming manual corrections. The goal of this paper is to enable accurate automatic segmentation of MR-Fx using S-P without clinical workflow disruption. We propose a novel UNet-based architecture, CloverNet, that takes as inputs MR-Fx and S-P in two separate encoder branches, whose latent spaces are concatenated in the bottleneck to generate an improved segmentation of MP-Fx. CloverNet improves the absolute Dice Score by 3.73% (relative +4.34%, p<0.001) when compared with conventional 3D UNet. Moreover, we believe this approach is potentially applicable to other longitudinal use cases in which a prior segmentation of the ROI is available.

MCML Authors
Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[93]
D. Daum, R. Osuala, A. Riess, G. Kaissis, J. A. Schnabel and M. Di Folco.
On Differentially Private 3D Medical Image Synthesis with Controllable Latent Diffusion Models.
DGM4 @MICCAI 2024 - 4th International Workshop on Deep Generative Models at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub
Abstract

Generally, the small size of public medical imaging datasets coupled with stringent privacy concerns, hampers the advancement of data-hungry deep learning models in medical imaging. This study addresses these challenges for 3D cardiac MRI images in the short-axis view. We propose Latent Diffusion Models that generate synthetic images conditioned on medical attributes, while ensuring patient privacy through differentially private model training. To our knowledge, this is the first work to apply and quantify differential privacy in 3D medical image generation. We pre-train our models on public data and finetune them with differential privacy on the UK Biobank dataset. Our experiments reveal that pre-training significantly improves model performance, achieving a Fréchet Inception Distance (FID) of 26.77 at ϵ=10, compared to 92.52 for models without pre-training. Additionally, we explore the trade-off between privacy constraints and image quality, investigating how tighter privacy budgets affect output controllability and may lead to degraded performance. Our results demonstrate that proper consideration during training with differential privacy can substantially improve the quality of synthetic cardiac MRI images, but there are still notable challenges in achieving consistent medical realism.

MCML Authors
Link to Profile Georgios Kaissis

Georgios Kaissis

Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[92]
A. Riess, A. Ziller, S. Kolek, D. Rückert, J. A. Schnabel and G. Kaissis.
Complex-Valued Federated Learning with Differential Privacy and MRI Applications.
DeCaF @MICCAI 2024 - 5th Workshop on Distributed, Collaborative and Federated Learning at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI
Abstract

Federated learning enhanced with Differential Privacy (DP) is a powerful privacy-preserving strategy to protect individuals sharing their sensitive data for processing in fields such as medicine and healthcare. Many medical applications, for example magnetic resonance imaging (MRI), rely on complex-valued signal processing techniques for data acquisition and analysis. However, the appropriate application of DP to complex-valued data is still underexplored. To address this issue, from the theoretical side, we introduce the complex-valued Gaussian mechanism, whose behaviour we characterise in terms of f-DP, -DP and Rényi-DP. Moreover, we generalise the fundamental algorithm DP stochastic gradient descent to complex-valued neural networks and present novel complex-valued neural network primitives compatible with DP. Experimentally, we showcase a proof-of-concept by training federated complex-valued neural networks with DP on a real-world task (MRI pulse sequence classification in k-space), yielding excellent utility and privacy. Our results highlight the relevance of combining federated learning with robust privacy-preserving techniques in the MRI context.

MCML Authors
Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

Link to Profile Georgios Kaissis

Georgios Kaissis

Dr.

Artificial Intelligence in Healthcare and Medicine


[91]
Y. Yeganeh, R. Lazuardi, A. Shamseddin, E. Dari, Y. Thirani, N. Navab and A. Farshad.
VISAGE: Video Synthesis using Action Graphs for Surgery.
EARTH @MICCAI 2024 - Workshop on Embodied AI and Robotics for HealTHcare at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. EARTH @MICCAI 2024 Best Paper. DOI
Abstract

Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[90]
Ç. Köksal, G. Ghazaei, F. Holm, A. Farshad and N. Navab.
SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction.
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. GRAIL @MICCAI 2024 Best Paper. arXiv
Abstract

Graph-based holistic scene representations facilitate surgical workflow understanding and have recently demonstrated significant success. However, this task is often hindered by the limited availability of densely annotated surgical scene data. In this work, we introduce an end-to-end framework for the generation and optimization of surgical scene graphs on a downstream task. Our approach leverages the flexibility of graph-based spectral clustering and the generalization capability of foundation models to generate unsupervised scene graphs with learnable properties. We reinforce the initial spatial graph with sparse temporal connections using local matches between consecutive frames to predict temporally consistent clusters across a temporal neighborhood. By jointly optimizing the spatiotemporal relations and node features of the dynamic scene graph with the downstream task of phase segmentation, we address the costly and annotation-burdensome task of semantic scene comprehension and scene graph generation in surgical videos using only weak surgical phase labels. Further, by incorporating effective intermediate scene representation disentanglement steps within the pipeline, our solution outperforms the SOTA on the CATARACTS dataset by 8% accuracy and 10% F1 score in surgical workflow recognition.

MCML Authors
Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[89]
A. H. Berger, L. Lux, N. Stucki, V. Bürgin, S. Shit, A. Banaszaka, D. Rückert, U. Bauer and J. C. Paetzold.
Topologically faithful multi-class segmentation in medical images.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI
Abstract

Topological accuracy in medical image segmentation is a highly important property for downstream applications such as network analysis and flow modeling in vessels or cell counting. Recently, significant methodological advancements have brought well-founded concepts from algebraic topology to binary segmentation. However, these approaches have been underexplored in multi-class segmentation scenarios, where topological errors are common. We propose a general loss function for topologically faithful multi-class segmentation extending the recent Betti matching concept, which is based on induced matchings of persistence barcodes. We project the N-class segmentation problem to N single-class segmentation tasks, which allows us to use 1-parameter persistent homology, making training of neural networks computationally feasible. We validate our method on a comprehensive set of four medical datasets with highly variant topological characteristics. Our loss formulation significantly enhances topological correctness in cardiac, cell, artery-vein, and Circle of Willis segmentation.

MCML Authors
Link to website

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Link to website

Nico Stucki

Applied Topology and Geometry

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Ulrich Bauer

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry


[88]
M. Domínguez, Y. Velikova, N. Navab and M. F. Azampour.
Diffusion as Sound Propagation: Physics-Inspired Model for Ultrasound Image Generation.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub
Abstract

Deep learning (DL) methods typically require large datasets to effectively learn data distributions. However, in the medical field, data is often limited in quantity, and acquiring labeled data can be costly. To mitigate this data scarcity, data augmentation techniques are commonly employed. Among these techniques, generative models play a pivotal role in expanding datasets. However, when it comes to ultrasound (US) imaging, the authenticity of generated data often diminishes due to the oversight of ultrasound physics.
We propose a novel approach to improve the quality of generated US images by introducing a physics-based diffusion model that is specifically designed for this image modality. The proposed model incorporates an US-specific scheduler scheme that mimics the natural behavior of sound wave propagation in ultrasound imaging. Our analysis demonstrates how the proposed method aids in modeling the attenuation dynamics in US imaging. We present both qualitative and quantitative results based on standard generative model metrics, showing that our proposed method results in overall more plausible images.

MCML Authors
Link to website

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality


[87]
S. M. Fischer, L. Felsner, R. Osuala, J. Kiechle, D. M. Lang, J. C. Peeken and J. A. Schnabel.
Progressive Growing of Patch Size: Resource-Efficient Curriculum Learning for Dense Prediction Tasks.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub
Abstract

In this work, we introduce Progressive Growing of Patch Size, a resource-efficient implicit curriculum learning approach for dense prediction tasks. Our curriculum approach is defined by growing the patch size during model training, which gradually increases the task’s difficulty. We integrated our curriculum into the nnU-Net framework and evaluated the methodology on all 10 tasks of the Medical Segmentation Decathlon. With our approach, we are able to substantially reduce runtime, computational costs, and emissions of network training compared to classical constant patch size training. In our experiments, the curriculum approach resulted in improved convergence. We are able to outperform standard nnU-Net training, which is trained with constant patch size, in terms of Dice Score on 7 out of 10 MSD tasks while only spending roughly 50% of the original training runtime. To the best of our knowledge, our Progressive Growing of Patch Size is the first successful employment of a sample-length curriculum in the form of patch size in the field of computer vision.

MCML Authors
Link to website

Johannes Kiechle

Computational Imaging and AI in Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[86]
Y. Li, I. Yakushev, D. M. Hedderich and C. Wachinger.
PASTA: Pathology-Aware MRI to PET Cross-Modal Translation with Diffusion Models.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub
Abstract

Positron emission tomography (PET) is a well-established functional imaging technique for diagnosing brain disorders. However, PET’s high costs and radiation exposure limit its widespread use. In contrast, magnetic resonance imaging (MRI) does not have these limitations. Although it also captures neurodegenerative changes, MRI is a less sensitive diagnostic tool than PET. To close this gap, we aim to generate synthetic PET from MRI. Herewith, we introduce PASTA, a novel pathology-aware image translation framework based on conditional diffusion models. Compared to the state-of-the-art methods, PASTA excels in preserving both structural and pathological details in the target modality, which is achieved through its highly interactive dual-arm architecture and multi-modal condition integration. A cycle exchange consistency and volumetric generation strategy elevate PASTA’s capability to produce high-quality 3D PET scans. Our qualitative and quantitative results confirm that the synthesized PET scans from PASTA not only reach the best quantitative scores but also preserve the pathology correctly. For Alzheimer’s classification, the performance of synthesized scans improves over MRI by 4%, almost reaching the performance of actual PET.

MCML Authors
Link to website

Yitong Li

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[85]
A. Reithmeir, L. Felsner, R. Braren, J. A. Schnabel and V. A. Zimmer.
Data-Driven Tissue- and Subject-Specific Elastic Regularization for Medical Image Registration.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub
Abstract

Physics-inspired regularization is desired for intra-patient image registration since it can effectively capture the biomechanical characteristics of anatomical structures. However, a major challenge lies in the reliance on physical parameters: Parameter estimations vary widely across the literature, and the physical properties themselves are inherently subject-specific. In this work, we introduce a novel data-driven method that leverages hypernetworks to learn the tissue-dependent elasticity parameters of an elastic regularizer. Notably, our approach facilitates the estimation of patient-specific parameters without the need to retrain the network. We evaluate our method on three publicly available 2D and 3D lung CT and cardiac MR datasets. We find that with our proposed subject-specific tissue-dependent regularization, a higher registration quality is achieved across all datasets compared to using a global regularizer.

MCML Authors
Link to website

Anna Reithmeir

Computational Imaging and AI in Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[84]
O. Tmenova, Y. Velikova, M. Saleh and N. Navab.
Deep Spectral Methods for Unsupervised Ultrasound Image Interpretation.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI
Abstract

Ultrasound imaging is challenging to interpret due to non-uniform intensities, low contrast, and inherent artifacts, necessitating extensive training for non-specialists. Advanced representation with clear tissue structure separation could greatly assist clinicians in mapping underlying anatomy and distinguishing between tissue layers. Decomposing an image into semantically meaningful segments is mainly achieved using supervised segmentation algorithms. Unsupervised methods are beneficial, as acquiring large labeled datasets is difficult and costly, but despite their advantages, they still need to be explored in ultrasound. This paper proposes a novel unsupervised deep learning strategy tailored to ultrasound to obtain easily interpretable tissue separations. We integrate key concepts from unsupervised deep spectral methods, which combine spectral graph theory with deep learning methods. We utilize self-supervised transformer features for spectral clustering to generate meaningful segments based on ultrasound-specific metrics and shape and positional priors, ensuring semantic consistency across the dataset. We evaluate our unsupervised deep learning strategy on three ultrasound datasets, showcasing qualitative results across anatomical contexts without label requirements. We also conduct a comparative analysis against other clustering algorithms to demonstrate superior segmentation performance, boundary preservation, and label consistency.

MCML Authors
Link to website

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[83]
H. Zerouaoui, G. P. Oderinde, R. Lefdali, K. Echihabi, S. P. Akpulu, N. A. Agbon, A. S. Musa, Y. Yeganeh, A. Farshad and N. Navab.
AMONuSeg: A Histological Dataset for African Multi-organ Nuclei Semantic Segmentation.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub
Abstract

Nuclei semantic segmentation is a key component for advancing machine learning and deep learning applications in digital pathology. However, most existing segmentation models are trained and tested on high-quality data acquired with expensive equipment, such as whole slide scanners, which are not accessible to most pathologists in developing countries. These pathologists rely on low-resource data acquired with low-precision microscopes, smartphones, or digital cameras, which have different characteristics and challenges than high-resource data. Therefore, there is a gap between the state-of-the-art segmentation models and the real-world needs of low-resource settings. This work aims to bridge this gap by presenting the first fully annotated African multi-organ dataset for histopathology nuclei semantic segmentation acquired with a low-precision microscope. We also evaluate state-of-the-art segmentation models, including spectral feature extraction encoder and vision transformer-based models, and stain normalization techniques for color normalization of Hematoxylin and Eosin-stained histopathology slides. Our results provide important insights for future research on nuclei histopathology segmentation with low-resource data.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[82]
E. Özsoy, C. Pellegrini, M. Keicher and N. Navab.
ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. Main Conference Best Paper Runner-up. DOI GitHub
Abstract

Every day, countless surgeries are performed worldwide, each within the distinct settings of operating rooms (ORs) that vary not only in their setups but also in the personnel, tools, and equipment used. This inherent diversity poses a substantial challenge for achieving a holistic understanding of the OR, as it requires models to generalize beyond their initial training datasets. To reduce this gap, we introduce ORacle, an advanced vision-language model designed for holistic OR domain modeling, which incorporates multi-view and temporal capabilities and can leverage external knowledge during inference, enabling it to adapt to previously unseen surgical scenarios. This capability is further enhanced by our novel data augmentation framework, which significantly diversifies the training dataset, ensuring ORacle’s proficiency in applying the provided knowledge effectively. In rigorous testing, in scene graph generation, and downstream tasks on the 4D-OR dataset, ORacle not only demonstrates state-of-the-art performance but does so requiring less data than existing models. Furthermore, its adaptability is displayed through its ability to interpret unseen views, actions, and appearances of tools and equipment. This demonstrates ORacle’s potential to significantly enhance the scalability and affordability of OR domain modeling and opens a pathway for future advancements in surgical data science.

MCML Authors
Link to website

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Link to website

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Link to website

Matthias Keicher

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[81]
M. Fischer, P. Neher, T. Wald, S. Dias Almeida, S. Xiao, P. J. Schüffler, R. Braren, M. Götz, A. Muckenhuber, J. Kleesiek, M. Nolden and K. Maier-Hein.
Learned Image Compression for HE-Stained Histopathological Images via Stain Deconvolution.
MOVI @MICCAI 2024 - 2nd International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub
Abstract

Processing histopathological Whole Slide Images (WSI) leads to massive storage requirements for clinics worldwide. Even after lossy image compression during image acquisition, additional lossy compression is frequently possible without substantially affecting the performance of deep learning-based (DL) downstream tasks. In this paper, we show that the commonly used JPEG algorithm is not best suited for further compression and we propose Stain Quantized Latent Compression (SQLC), a novel DL based histopathology data compression approach. SQLC compresses staining and RGB channels before passing it through a compression autoencoder (CAE) in order to obtain quantized latent representations for maximizing the compression. We show that our approach yields superior performance in a classification downstream task, compared to traditional approaches like JPEG, while image quality metrics like the Multi-Scale Structural Similarity Index (MS-SSIM) is largely preserved.

MCML Authors
Link to Profile Peter Schüffler

Peter Schüffler

Prof. Dr.

Computational Pathology


[80]
D. Bani-Harouni, N. Navab and M. Keicher.
MAGDA: Multi-agent Guideline-Driven Diagnostic Assistance.
MedAGI @MICCAI 2024 - 2nd International Workshop on Foundation Models for General Medical AI at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI
Abstract

In emergency departments, rural hospitals, or clinics in less developed regions, clinicians often lack fast image analysis by trained radiologists, which can have a detrimental effect on patients’ healthcare. Large Language Models (LLMs) have the potential to alleviate some pressure from these clinicians by providing insights that can help them in their decision-making. While these LLMs achieve high test results on medical exams showcasing their great theoretical medical knowledge, they tend not to follow medical guidelines. In this work, we introduce a new approach for zero-shot guideline-driven decision support. We model a system of multiple LLM agents augmented with a contrastive vision-language model that collaborate to reach a patient diagnosis. After providing the agents with simple diagnostic guidelines, they will synthesize prompts and screen the image for findings following these guidelines. Finally, they provide understandable chain-of-thought reasoning for their diagnosis, which is then self-refined to consider inter-dependencies between diseases. As our method is zero-shot, it is adaptable to settings with rare diseases, where training data is limited, but expert-crafted disease descriptions are available. We evaluate our method on two chest X-ray datasets, CheXpert and ChestX-ray 14 Longtail, showcasing performance improvement over existing zero-shot methods and generalizability to rare diseases.

MCML Authors
Link to website

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Matthias Keicher

Computer Aided Medical Procedures & Augmented Reality


[79]
D. Grzech, L. Le Folgoc, M. F. Azampour, A. Vlontzos, B. Glocker, N. Navab, J. A. Schnabel and B. Kainz.
Unsupervised Similarity Learning for Image Registration with Energy-Based Models.
WBIR @MICCAI 2024 - 11th International Workshop on Biomedical Image Registration at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI
Abstract

We present a new model for deformable image registration, which learns in an unsupervised way a data-specific similarity metric. The proposed method consists of two neural networks, one that maps pairs of input images to transformations which align them, and one that provides the similarity metric whose maximisation guides the image alignment. We parametrise the similarity metric as an energy-based model, which is simple to train and allows us to improve the accuracy of image registration compared to other models with learnt similarity metrics by taking advantage of a more general mathematical formulation, as well as larger datasets. We also achieve substantial improvement in the accuracy of inter-patient image registration on MRI scans from the OASIS dataset compared to models that rely on traditional functions.

MCML Authors
Link to website

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[78]
B. Jian, J. Pan, M. Ghahremani, D. Rückert, C. Wachinger and B. Wiestler.
Mamba? Catch The Hype Or Rethink What Really Helps for Image Registration.
WBIR @MICCAI 2024 - 11th International Workshop on Biomedical Image Registration at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI
Abstract

VoxelMorph, proposed in 2018, utilizes Convolutional Neural Networks (CNNs) to address medical image registration problems. In 2021 TransMorph advanced this approach by replacing CNNs with Attention mechanisms, claiming enhanced performance. More recently, the rise of Mamba with selective state space models has led to MambaMorph, which substituted Attention with Mamba blocks, asserting superior registration. These developments prompt a critical question: does chasing the latest computational trends with “more advanced” computational blocks genuinely enhance registration accuracy, or is it merely hype? Furthermore, the role of classic high-level registration-specific designs, such as coarse-to-fine pyramid mechanism, correlation calculation, and iterative optimization, warrants scrutiny, particularly in differentiating their influence from the aforementioned low-level computational blocks. In this study, we critically examine these questions through a rigorous evaluation in brain MRI registration. We employed modularized components for each block and ensured unbiased comparisons across all methods and designs to disentangle their effects on performance. Our findings indicate that adopting “advanced” computational elements fails to significantly improve registration accuracy. Instead, well-established registration-specific designs offer fair improvements, enhancing results by a marginal 1.5% over the baseline. Our findings emphasize the importance of rigorous, unbiased evaluation and contribution disentanglement of all low- and high-level registration components, rather than simply following the computer vision trends with “more advanced” computational blocks. We advocate for simpler yet effective solutions and novel evaluation metrics that go beyond conventional registration accuracy, warranting further research across various organs and modalities.

MCML Authors
Link to website

Bailiang Jian

Artificial Intelligence in Medical Imaging

Link to website

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy


[77]
M. M. Heimer, Y. Dikhtyar, B. F. Hoppe, F. L. Herr, A. T. Stüber, T. Burkard, E. Zöller, M. P. Fabritius, L. Unterrainer, L. Adams, A. Thurner, D. Kaufmann, T. Trzaska, M. Kopp, O. Hamer, K. Maurer, I. Ristow, M. S. May, A. Tufman, J. Spiro, M. Brendel, M. Ingrisch, J. Ricke and C. C. Cyran.
Software-assisted structured reporting and semi-automated TNM classification for NSCLC staging in a multicenter proof of concept study.
Insights into Imaging 15.258 (Oct. 2024). DOI
Abstract

In this multi-center study, we proposed a structured reporting (SR) framework for non-small cell lung cancer (NSCLC) and developed a software-assisted tool to automatically translate image-based findings and annotations into TNM classifications. The aim of this study was to validate the software-assisted SR tool for NSCLC, assess its potential clinical impact in a proof-of-concept study, and evaluate current reporting standards in participating institutions.

MCML Authors
Link to website

Theresa Stüber

Clinical Data Science in Radiology

Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[76]
S. Lüpke, Y. Yeganeh, E. Adeli, N. Navab and A. Farshad.
Physics-Informed Latent Diffusion for Multimodal Brain MRI Synthesis.
Preprint (Oct. 2024). arXiv
Abstract

Recent advances in generative models for medical imaging have shown promise in representing multiple modalities. However, the variability in modality availability across datasets limits the general applicability of the synthetic data they produce. To address this, we present a novel physics-informed generative model capable of synthesizing a variable number of brain MRI modalities, including those not present in the original dataset. Our approach utilizes latent diffusion models and a two-step generative process: first, unobserved physical tissue property maps are synthesized using a latent diffusion model, and then these maps are combined with a physical signal model to generate the final MRI scan. Our experiments demonstrate the efficacy of this approach in generating unseen MR contrasts and preserving physical plausibility. Furthermore, we validate the distributions of generated tissue properties by comparing them to those measured in real brain tissue.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[75]
P. Müller, G. Kaissis and D. Rückert.
ChEX: Interactive Localization and Region Description in Chest X-rays.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub
Abstract

Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX’s interactive capabilities.

MCML Authors
Link to Profile Georgios Kaissis

Georgios Kaissis

Dr.

Artificial Intelligence in Healthcare and Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[74]
G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI
Abstract

We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Our code and models are open-sourced.

MCML Authors
Link to website

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to website

Ruotong Liao

Database Systems and Data Mining

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[73]
A. Mittermeier, M. Aßenmacher, B. Schachtner, S. Grosu, V. Dakovic, V. Kandratovich, B. Sabel and M. Ingrisch.
Automatische ICD-10-Codierung.
Die Radiologie 64 (Aug. 2024). DOI
Abstract

Hintergrund: Die medizinische Codierung von radiologischen Befunden ist essenziell für eine gute Qualität der Versorgung und die korrekte Abrechnung, gleichzeitig aber eine aufwändige und fehleranfällige Aufgabe.
Ziel der Arbeit: Bewertung der Anwendbarkeit natürlicher Sprachverarbeitung (Natural Language Processing, NLP) für die ICD-10-Codierung von radiologischen Befunden in deutscher Sprache durch Finetuning geeigneter Sprachmodelle.
Material und Methoden: In dieser retrospektiven Studie wurden alle Magnetresonanztomographie(MRT)-Befunde unseres Instituts zwischen 2010 und 2020 berücksichtigt. Die ICD-10-Codes bei Entlassung wurden den jeweiligen Befunden zugeordnet, um einen Datensatz für eine Multiclass-Klassifizierung zu erstellen. Finetuning von GermanBERT und flanT5 wurde auf dem Gesamtdatensatz (dstotal) mit 1035 verschiedenen ICD-10-Codes und zwei reduzierten Datensätzen mit den 100 (ds100) und 50 (ds50) häufigsten Codes durchgeführt. Die Performance der Modelle wurde mit Top-k-Genauigkeit für k = 1, 3, 5 evaluiert. In einer Ablationsstudie wurden beide Modelle einmal auf den zugehörigen Metadaten und dem Befund allein trainiert.
Ergebnisse: Der Gesamtdatensatz bestand aus 100.672 radiologischen Befunden, die reduzierten Datensätze ds100 aus 68.103 und ds50 aus 52.293 Berichten. Die Modellperformance stieg, wenn mehrere der besten Voraussagen des Modells in Betracht gezogen wurden, die Anzahl der Zielklassen reduziert wurde und die Metadaten mit dem Befund kombiniert wurden. FlanT5 übertraf GermanBERT in allen Datensätzen und Metriken und eignet sich am besten als medizinischer Codierungsassistent, wobei eine Top-3-Genauigkeit von fast 70% im realitätsnahen Datensatz dstotal erreicht wurde.
Schlussfolgerung: Finetuning von Sprachmodellen verspricht eine zuverlässige Vorhersage von ICD-10-Codes deutscher radiologischer MRT-Befunde in unterschiedlichen Szenarien. Als Codierungsassistent kann flanT5 medizinischen Codierern helfen, informierte Entscheidungen zu treffen und potenziell ihre Arbeitsbelastung reduzieren.

MCML Authors
Link to website

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Link to website

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Link to website

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[72]
R. Klaar, M. Rabe, A. T. Stüber, S. Hering, S. Corradini, C. Eze, S. Marschner, C. Belka, G. Landry, J. Dinkel and C. Kurz.
MRI-based ventilation and perfusion imaging to predict radiation-induced pneumonitis in lung tumor patients at a 0.35T MR-Linac.
Radiotherapy and Oncology (Aug. 2024). DOI
Abstract

Radiation-induced pneumonitis (RP), diagnosed 6–12 weeks after treatment, is a complication of lung tumor radiotherapy. So far, clinical and dosimetric parameters have not been reliable in predicting RP. We propose using non-contrast enhanced magnetic resonance imaging (MRI) based functional parameters acquired over the treatment course for patient stratification for improved follow-up.

MCML Authors
Link to website

Theresa Stüber

Clinical Data Science in Radiology


[71]
T. Löhr, M. Ingrisch and E. Hüllermeier.
Towards Aleatoric and Epistemic Uncertainty in Medical Image Classification.
AIME 2024 - 22nd International Conference on Artificial Intelligence in Medicine. Salt Lake City, UT, USA, Jul 09-12, 2024. DOI
Abstract

Medical domain applications require a detailed understanding of the decision making process, in particular when data-driven modeling via machine learning is involved, and quantifying uncertainty in the process adds trust and interpretability to predictive models. However, current uncertainty measures in medical imaging are mostly monolithic and do not distinguish between different sources and types of uncertainty. In this paper, we advocate the distinction between so-called aleatoric and epistemic uncertainty in the medical domain and illustrate its potential in clinical decision making for the case of PET/CT image classification.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Profile Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning


[70]
F. Bongratz, V. Golkov, L. Mautner, L. Della Libera, F. Heetmeyer, F. Czaja, J. Rodemann and D. Cremers.
How to Choose a Reinforcement-Learning Algorithm.
Preprint (Jul. 2024). arXiv GitHub
Abstract

The field of reinforcement learning offers a large variety of concepts and methods to tackle sequential decision-making problems. This variety has become so large that choosing an algorithm for a task at hand can be challenging. In this work, we streamline the process of choosing reinforcement-learning algorithms and action-distribution families. We provide a structured overview of existing methods and their properties, as well as guidelines for when to choose which methods.

MCML Authors
Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Link to website

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[69]
Y. Chen, Y. Di, G. Zhai, F. Manhardt, C. Zhang, R. Zhang, F. Tombari, N. Navab and B. Busam.
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI
Abstract

Category-level object pose estimation, aiming to predict the 6D pose and 3D size of objects from known categories, typically struggles with large intra-class shape variation. Existing works utilizing mean shapes often fall short of cap-turing this variation. To address this issue, we present Sec-ondPose, a novel approach integrating object-specific ge-ometric features with semantic category priors from DI-NOv2. Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object represen-tation under SE(3) transformations, facilitating the map-ping from camera space to the pre-defined canonical space, thus further enhancing pose estimation. Extensive exper-iments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% leap forward over the state-of-the-art. Moreover, on a more complex dataset HouseCat6D which provides photometrically challenging objects, SecondPose still surpasses other competitors by a large margin.

MCML Authors
Link to website

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[68]
M. Ghahremani, M. Khateri, B. Jian, B. Wiestler, E. Adeli and C. Wachinger.
H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub
Abstract

This paper introduces a novel top-down representation approach for deformable image registration, which estimates the deformation field by capturing various short-and long-range flow features at different scale levels. As a Hierarchical Vision Transformer (H-ViT), we propose a dual self-attention and cross-attention mechanism that uses high-level features in the deformation field to represent low-level ones, enabling information streams in the deformation field across all voxel patch embeddings irrespective of their spatial proximity. Since high-level features contain abstract flow patterns, such patterns are expected to effectively contribute to the representation of the deformation field in lower scales. When the self-attention module utilizes within-scale short-range patterns for representation, the cross-attention modules dynamically look for the key tokens across different scales to further interact with the local query voxel patches. Our method shows superior accuracy and visual quality over the state-of-the-art registration methods in five publicly available datasets, highlighting a substantial enhancement in the performance of medical imaging registration.

MCML Authors
Link to website

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Link to website

Bailiang Jian

Artificial Intelligence in Medical Imaging

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[67]
J. Huang, H. Yu, K.-T. Yu, N. Navab, S. Ilic and B. Busam.
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI
Abstract

Recent learning methods for object pose estimation require resource-intensive training for each individual object instance or category, hampering their scalability in real applications when confronted with previously unseen objects. In this paper, we propose MatchU, a Fuse-Describe-Match strategy for 6D pose estimation from RGB-D images. MatchU is a generic approach that fuses 2D texture and 3D geometric cues for 6D pose prediction of unseen objects. We rely on learning geometric 3D descriptors that are rotation-invariant by design. By encoding pose-agnostic geometry, the learned descriptors naturally generalize to unseen objects and capture symmetries. To tackle ambiguous associations using 3D geometry only, we fuse additional RGB information into our descriptor. This is achieved through a novel attention-based mechanism that fuses cross-modal information, together with a matching loss that leverages the latent space learned from RGB data to guide the descriptor learning process. Extensive experiments reveal the generalizability of both the RGB-D fusion strategy as well as the descriptor efficacy. Benefiting from the novel designs, MatchU surpasses all existing methods by a significant margin in terms of both accuracy and speed, even without the requirement of expensive re-training or rendering.

MCML Authors
Link to website

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[66]
H. Jung, S.-C. Wu, P. Ruhkamp, G. Zhai, H. Schieber, G. Rizzoli, P. Wang, H. Zhao, L. Garattoni, D. Roth, S. Meier, N. Navab and B. Busam.
HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI
Abstract

Estimating 6D object poses is a major challenge in 3D computer vision. Building on successful instance-level approaches, research is shifting towards category-level pose estimation for practical applications. Current category-level datasets, however, fall short in annotation quality and pose variety. Addressing this, we introduce HouseCat6D, a new category-level 6D pose dataset. It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household cat-egories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm. The dataset also includes 4) 41 large-scale scenes with comprehensive view-point and occlusion coverage,5) a checkerboard-free en-vironment, and 6) dense 6D parallel-jaw robotic grasp annotations. Additionally, we present benchmark results for leading category-level pose estimation networks.

MCML Authors
Link to website

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[65]
S. M. Fischer, J. Kiechle, D. M. Lang, J. C. Peeken and J. A. Schnabel.
Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge.
Machine Learning for Biomedical Imaging 2 (Jun. 2024). DOI GitHub
Abstract

Pathological lymph node delineation is crucial in cancer diagnosis, progression assessment, and treatment planning. The MICCAI 2023 Lymph Node Quantification Challenge published the first public dataset for pathological lymph node segmentation in the mediastinum. As lymph node annotations are expensive, the challenge was formed as a weakly supervised learning task, where only a subset of all lymph nodes in the training set have been annotated. For the challenge submission, multiple methods for training on these weakly supervised data were explored, including noisy label training, loss masking of unlabeled data, and an approach that integrated the TotalSegmentator toolbox as a form of pseudo labeling in order to reduce the number of unknown voxels. Furthermore, multiple public TCIA datasets were incorporated into the training to improve the performance of the deep learning model. Our submitted model achieved a Dice score of 0.628 and an average symmetric surface distance of 5.8~mm on the challenge test set. With our submitted model, we accomplished the third rank in the MICCAI2023 LNQ challenge. A finding of our analysis was that the integration of all visible, including non-pathological lymph nodes improved the overall segmentation performance on pathological lymph nodes of the test set. Furthermore, segmentation models trained only on clinically enlarged lymph nodes, as given in the challenge scenario, could not generalize to smaller pathological lymph nodes.

MCML Authors
Link to website

Johannes Kiechle

Computational Imaging and AI in Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[64]
J. Kiechle, S. M. Fischer, D. M. Lang, M. Folco, S. C. Foreman, V. K. N. Rösner, A.-K. Lohse, C. Mogler, C. Knebel, M. R. Makowski, K. Woertler, S. E. Combs, A. S. Gersing, J. C. Peeken and J. A. Schnabel.
Unifying local and global shape descriptors to grade soft-tissue sarcomas using graph convolutional networks.
ISBI 2024 - IEEE 21st International Symposium on Biomedical Imaging. Athens, Greece, May 27-30, 2024. DOI
Abstract

The tumor grading of patients suffering from soft-tissue sarcomas is a critical task, as an accurate classification of this high-mortality cancer entity constitutes a decisive factor in devising optimal treatment strategies. In this work, we focus on distinguishing soft-tissue sarcoma subtypes solely based on their 3D morphological characteristics, derived from tumor segmentation masks. Notably, we direct attention to overcoming the limitations of texture-based methodologies, which often fall short of providing adequate shape delineation. To this end, we propose a novel yet elegant modular geometric deep learning framework coined Global Local Graph Convolutional Network (GloLo-GCN) that integrates local and global shape characteristics into a meaningful unified shape descriptor. Evaluated on a multi-center dataset, our proposed model performs better in soft-tissue sarcoma grading than GCNs based on state-of-the-art graph convolutions and a volumetric 3D convolutional neural network, also evaluated on binary segmentation masks exclusively.

MCML Authors
Link to website

Johannes Kiechle

Computational Imaging and AI in Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[63]
N. Stolt-Ansó, V. Sideri-Lampretsa, M. Dannecker and D. Rückert.
Intensity-based 3D motion correction for cardiac MR images.
ISBI 2024 - IEEE 21st International Symposium on Biomedical Imaging. Athens, Greece, May 27-30, 2024. DOI
Abstract

Cardiac magnetic resonance (CMR) image acquisition requires subjects to hold their breath while 2D cine images are acquired. This process assumes that the heart remains in the same position across all slices. However, differences in breathhold positions or patient motion introduce 3D slice misalignments. In this work, we propose an algorithm that simultaneously aligns all SA and LA slices by maximizing the pair-wise intensity agreement between their intersections. Unlike previous works, our approach is formulated as a subject-specific optimization problem and requires no prior knowledge of the underlying anatomy. We quantitatively demonstrate that the proposed method is robust against a large range of rotations and translations by synthetically misaligning 10 motion-free datasets and aligning them back using the proposed method.

MCML Authors
Link to website

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[62]
Y. Zhang, N. Stolt-Ansó, J. Pan, W. Huang, K. Hammernik and D. Rückert.
Direct Cardiac Segmentation from Undersampled K-Space using Transformers.
ISBI 2024 - IEEE 21st International Symposium on Biomedical Imaging. Athens, Greece, May 27-30, 2024. DOI
Abstract

The prevailing deep learning-based methods of predicting cardiac segmentation involve reconstructed magnetic resonance (MR) images. The heavy dependency of segmentation approaches on image quality significantly limits the acceleration rate in fast MR reconstruction. Moreover, the practice of treating reconstruction and segmentation as separate sequential processes leads to artifact generation and information loss in the intermediate stage. These issues pose a great risk to achieving high-quality outcomes. To leverage the redundant k-space information overlooked in this dual-step pipeline, we introduce a novel approach to directly deriving segmentations from sparse k-space samples using a transformer (DiSK). DiSK operates by globally extracting latent features from 2D+time k-space data with attention blocks and subsequently predicting the segmentation label of query points. We evaluate our model under various acceleration factors (ranging from 4 to 64) and compare against two image-based segmentation baselines. Our model consistently outperforms the baselines in Dice and Hausdorff distances across foreground classes for all presented sampling rates.

MCML Authors
Link to website

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[61]
Y. Velikova, M. F. Azampour, W. Simson, M. Esposito and N. Navab.
Implicit Neural Representations for Breathing-compensated Volume Reconstruction in Robotic Ultrasound Aorta Screening.
ICRA 2024 - IEEE International Conference on Robotics and Automation. Yokohoma, Japan, May 13-17, 2024. DOI
Abstract

Ultrasound (US) imaging is widely used in diagnosing and staging abdominal diseases due to its lack of non-ionizing radiation and prevalent availability. However, significant inter-operator variability and inconsistent image acquisition hinder the widespread adoption of extensive screening programs. Robotic ultrasound systems have emerged as a promising solution, offering standardized acquisition protocols and the possibility of automated acquisition. Additionally, these systems enable access to 3D data via robotic tracking, enhancing volumetric reconstruction for improved ultrasound interpretation and precise disease diagnosis.However, the interpretability of 3D US reconstruction of abdominal images can be affected by the patient’s breathing motion. This study introduces a method to compensate for breathing motion in 3D US compounding by leveraging implicit neural representations. Our approach employs a robotic ultrasound system for automated screenings. To demonstrate the method’s effectiveness, we evaluate our proposed method for the diagnosis and monitoring of abdominal aorta aneurysms as a representative use case.Our experiments demonstrate that our proposed pipeline facilitates robust automated robotic acquisition, mitigating artifacts from breathing motion, and yields smoother 3D reconstructions for enhanced screening and medical diagnosis.

MCML Authors
Link to website

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to website

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Walter Simson

Walter Simson

Dr.

* Former Member

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[60]
J. Kiechle, S. C. Foreman, S. Fischer, D. Rusche, V. Rösner, A.-K. Lohse, C. Mogler, S. E. Combs, M. R. Makowski, K. Woertler, D. M. Lang, J. A. Schnabel, A. S. Gersing and J. C. Peeken.
Investigating the role of morphology in deep learning-based liposarcoma grading.
ESTRO 2024 - Annual Meeting of the European Society for Radiotherapy and Oncology. Glasgow, UK, May 03-07, 2024. URL
MCML Authors
Link to website

Johannes Kiechle

Computational Imaging and AI in Medicine

Link to website

Stefan Fischer

Computational Imaging and AI in Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[59]
A. Kazemi, A. Rasouli-Saravani, M. Gharib, T. Albuquerque, S. Eslami and P. J. Schüffler.
A systematic review of machine learning-based tumor-infiltrating lymphocytes analysis in colorectal cancer: Overview of techniques, performance metrics, and clinical outcomes.
Computers in Biology and Medicine 173 (May. 2024). DOI
Abstract

The incidence of colorectal cancer (CRC), one of the deadliest cancers around the world, is increasing. Tissue microenvironment (TME) features such as tumor-infiltrating lymphocytes (TILs) can have a crucial impact on diagnosis or decision-making for treating patients with CRC. While clinical studies showed that TILs improve the host immune response, leading to a better prognosis, inter-observer agreement for quantifying TILs is not perfect. Incorporating machine learning (ML) based applications in clinical routine may promote diagnosis reliability. Recently, ML has shown potential for making progress in routine clinical procedures. We aim to systematically review the TILs analysis based on ML in CRC histological images. Deep learning (DL) and non-DL techniques can aid pathologists in identifying TILs, and automated TILs are associated with patient outcomes. However, a large multi-institutional CRC dataset with a diverse and multi-ethnic population is necessary to generalize ML methods.

MCML Authors
Link to Profile Peter Schüffler

Peter Schüffler

Prof. Dr.

Computational Pathology


[58]
K. Jeblick, B. Schachtner, J. Dexl, A. Mittermeier, A. T. Stüber, J. Topalis, T. Weber, P. Wesp, B. O. Sabel, J. Ricke and M. Ingrisch.
ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports.
European Radiology 34 (May. 2024). DOI
Abstract

Objectives: To assess the quality of simplified radiology reports generated with the large language model (LLM) ChatGPT and to discuss challenges and chances of ChatGPT-like LLMs for medical text simplification.
Methods: In this exploratory case study, a radiologist created three fictitious radiology reports which we simplified by prompting ChatGPT with ‘Explain this medical report to a child using simple language.’’ In a questionnaire, we tasked 15 radiologists to rate the quality of the simplified radiology reports with respect to their factual correctness, completeness, and potential harm for patients. We used Likert scale analysis and inductive free-text categorization to assess the quality of the simplified reports.
Results: Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed relevant medical information, and potentially harmful passages were reported.
Conclusion: While we see a need for further adaption to the medical field, the initial insights of this study indicate a tremendous potential in using LLMs like ChatGPT to improve patient-centered care in radiology and other medical domains.
Clinical relevance statement: Patients have started to use ChatGPT to simplify and explain their medical reports, which is expected to affect patient-doctor interaction. This phenomenon raises several opportunities and challenges for clinical routine.

MCML Authors
Link to website

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Link to website

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Link to website

Jakob Dexl

Clinical Data Science in Radiology

Link to website

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Link to website

Theresa Stüber

Clinical Data Science in Radiology

Link to website

Philipp Wesp

Dr.

Clinical Data Science in Radiology

Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[57]
V. G. Duque, A. Marquardt, Y. Velikova, L. Lacourpaille, A. Nordez, M. Crouzier, H. J. Lee, D. Mateus and N. Navab.
Ultrasound segmentation analysis via distinct and completed anatomical bordersd.
International Journal of Computer Assisted Radiology and Surgery 19 (May. 2024). DOI
Abstract

Segmenting ultrasound images is important for precise area and/or volume calculations, ensuring reliable diagnosis and effective treatment evaluation for diseases. Recently, many segmentation methods have been proposed and shown impressive performance. However, currently, there is no deeper understanding of how networks segment target regions or how they define the boundaries. In this paper, we present a new approach that analyzes ultrasound segmentation networks in terms of learned borders because border delimitation is challenging in ultrasound.

MCML Authors
Link to website

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to website

Hong Joo Lee

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[56]
M.-A. Gafencu, Y. Velikova, M. Saleh, T. Ungi, N. Navab, T. Wendler and M. F. Azampour.
Shape completion in the dark: completing vertebrae morphology from 3D ultrasound.
International Journal of Computer Assisted Radiology and Surgery 19 (May. 2024). DOI
Abstract

Ultrasound (US) imaging, while advantageous for its radiation-free nature, is challenging to interpret due to only partially visible organs and a lack of complete 3D information. While performing US-based diagnosis or investigation, medical professionals therefore create a mental map of the 3D anatomy. In this work, we aim to replicate this process and enhance the visual representation of anatomical structures.

MCML Authors
Link to website

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality


[55]
Y. Li, T. Wolf, S. Pölsterl, I. Yakushev, D. M. Hedderich and C. Wachinger.
From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data.
Preprint (Apr. 2024). arXiv
Abstract

Differential diagnosis of dementia is challenging due to overlapping symptoms, with structural magnetic resonance imaging (MRI) being the primary method for diagnosis. Despite the clinical value of computer-aided differential diagnosis, research has been limited, mainly due to the absence of public datasets that contain diverse types of dementia. This leaves researchers with small in-house datasets that are insufficient for training deep neural networks (DNNs). Self-supervised learning shows promise for utilizing unlabeled MRI scans in training, but small batch sizes for volumetric brain scans make its application challenging. To address these issues, we propose Triplet Training for differential diagnosis with limited target data. It consists of three key stages: (i) self-supervised pre-training on unlabeled data with Barlow Twins, (ii) self-distillation on task-related data, and (iii) fine-tuning on the target dataset. Our approach significantly outperforms traditional training strategies, achieving a balanced accuracy of 75.6%. We further provide insights into the training process by visualizing changes in the latent space after each step. Finally, we validate the robustness of Triplet Training in terms of its individual components in a comprehensive ablation study.

MCML Authors
Link to website

Yitong Li

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[54]
T. Weber, J. Dexl, D. Rügamer and M. Ingrisch.
Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition.
Preprint (Apr. 2024). arXiv
Abstract

We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model’s parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.

MCML Authors
Link to website

Jakob Dexl

Clinical Data Science in Radiology

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[53]
C. Wachinger, D. Hedderich and F. Bongratz.
Stochastic Cortical Self-Reconstruction.
Preprint (Mar. 2024). arXiv
Abstract

Magnetic resonance imaging (MRI) is critical for diagnosing neurodegenerative diseases, yet accurately assessing mild cortical atrophy remains a challenge due to its subtlety. Automated cortex reconstruction, paired with healthy reference ranges, aids in pinpointing pathological atrophy, yet their generalization is limited by biases from image acquisition and processing. We introduce the concept of stochastic cortical self-reconstruction (SCSR) that creates a subject-specific healthy reference by taking MRI-derived thicknesses as input and, therefore, implicitly accounting for potential confounders. SCSR randomly corrupts parts of the cortex and self-reconstructs them from the remaining information. Trained exclusively on healthy individuals, repeated self-reconstruction generates a stochastic reference cortex for assessing deviations from the norm. We present three implementations of this concept: XGBoost applied on parcels, and two autoencoders on vertex level – one based on a multilayer perceptron and the other using a spherical U-Net. These models were trained on healthy subjects from the UK Biobank and subsequently evaluated across four public Alzheimer’s datasets. Finally, we deploy the model on clinical in-house data, where deviation maps’ high spatial resolution aids in discriminating between four types of dementia.

MCML Authors
Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Medical Imaging


[52]
T. N. Wolf, F. Bongratz, A.-M. Rickmann, S. Pölsterl and C. Wachinger.
Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI
Abstract

Explaining predictions of black-box neural networks is crucial when applied to decision-critical tasks. Thus, attribution maps are commonly used to identify important image regions, despite prior work showing that humans prefer explanations based on similar examples. To this end, ProtoPNet learns a set of class-representative feature vectors (prototypes) for case-based reasoning. During inference, similarities of latent features to prototypes are linearly classified to form predictions and attribution maps are provided to explain the similarity. In this work, we evaluate whether architectures for case-based reasoning fulfill established axioms required for faithful explanations using the example of ProtoPNet. We show that such architectures allow the extraction of faithful explanations. However, we prove that the attribution maps used to explain the similarities violate the axioms. We propose a new procedure to extract explanations for trained ProtoPNets, named ProtoPFaith. Conceptually, these explanations are Shapley values, calculated on the similarity scores of each prototype. They allow to faithfully answer which prototypes are present in an unseen image and quantify each pixel’s contribution to that presence, thereby complying with all axioms. The theoretical violations of ProtoPNet manifest in our experiments on three datasets (CUB-200-2011, Stanford Dogs, RSNA) and five architectures (ConvNet, ResNet, ResNet50, WideResNet50, ResNeXt50). Our experiments show a qualitative difference between the explanations given by ProtoPNet and ProtoPFaith. Additionally, we quantify the explanations with the Area Over the Perturbation Curve, on which ProtoPFaith outperforms ProtoPNet on all experiments by a factor >10^3.

MCML Authors
Link to website

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[51]
A. Reithmeir, J. A. Schnabel and V. A. Zimmer.
Learning physics-inspired regularization for medical image registration with hypernetworks.
SPIE 2024 - SPIE Medical Imaging: Image Processing. San Diego, CA, USA, Feb 18-22, 2024. DOI GitHub
Abstract

Medical image registration aims to identify the spatial deformation between images of the same anatomical region and is fundamental to image-based diagnostics and therapy. To date, the majority of the deep learning-based registration methods employ regularizers that enforce global spatial smoothness, e.g., the diffusion regularizer. However, such regularizers are not tailored to the data and might not be capable of reflecting the complex underlying deformation. In contrast, physics-inspired regularizers promote physically plausible deformations. One such regularizer is the linear elastic regularizer, which models the deformation of elastic material. These regularizers are driven by parameters that define the material’s physical properties. For biological tissue, a wide range of estimations of such parameters can be found in the literature, and it remains an open challenge to identify suitable parameter values for successful registration. To overcome this problem and to incorporate physical properties into learning-based registration, we propose to use a hypernetwork that learns the effect of the physical parameters of a physics-inspired regularizer on the resulting spatial deformation field. In particular, we adapt the HyperMorph framework to learn the effect of the two elasticity parameters of the linear elastic regularizer. Our approach enables the efficient discovery of suitable, data-specific physical parameters at test time. To the best of our knowledge, we are the first to use a hypernetwork to learn physics-inspired regularization for medical image registration. We evaluate our approach on 3D intrapatient lung CT images. The results show that the linear elastic regularizer can yield comparable results to the diffusion regularizer in unsupervised learning-based registration while predicting deformations with fewer foldings. With our method, the adaptation of the physical parameters to the data can successfully be performed at test time.

MCML Authors
Link to website

Anna Reithmeir

Computational Imaging and AI in Medicine

Link to Profile Julia Schnabel

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine


[50]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Constrained Probabilistic Mask Learning for Task-specific Undersampled MRI Reconstruction.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI
Abstract

Undersampling is a common method in Magnetic Resonance Imaging (MRI) to subsample the number of data points in k-space, reducing acquisition times at the cost of decreased image quality. A popular approach is to employ undersampling patterns following various strategies, e.g., variable density sampling or radial trajectories. In this work, we propose a method that directly learns the under-sampling masks from data points, thereby also providing task- and domain-specific patterns. To solve the resulting discrete optimization problem, we propose a general optimization routine called ProM: A fully probabilistic, differentiable, versatile, and model-free framework for mask optimization that enforces acceleration factors through a convex constraint. Analyzing knee, brain, and cardiac MRI datasets with our method, we discover that different anatomic regions reveal distinct optimal undersampling masks, demonstrating the benefits of using custom masks, tailored for a downstream task. For example, ProM can create undersampling masks that maximize performance in downstream tasks like segmentation with networks trained on fully-sampled MRIs. Even with extreme acceleration factors, ProM yields reasonable performance while being more versatile than existing methods, paving the way for data-driven all-purpose mask generation

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning


[49]
A. Farshad.
Learning to Learn Neural Representations with Limited Data and Supervision.
Dissertation 2024. URL
Abstract

Learning to learn is a powerful paradigm that enables machine learning models to leverage the previously learned features for new tasks and domains more effectively. This thesis explores different aspects of learning to learn from data, models, and semantics, and shows how they can enhance various computer vision and medical imaging tasks. In the first part of the thesis, we present novel and fundamental research on learning to learn from data, and in the second part, we investigate the use of high-level semantics in generative models.

MCML Authors
Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[48]
P. Wesp.
Application of machine learning in CT colonography and radiological age assessment: enhancing traditional diagnostics in radiology.
Dissertation 2024. DOI
Abstract

Machine learning can address limitations in radiology where traditional methods fall short, as shown by this work’s focus on two clinical problems: differentiating premalignant from benign colorectal polyps and continuous age prediction through clavicle ossification in CT scans. For colorectal polyps, a random forest classifier and CNN models enabled non-invasive differentiation between benign and premalignant types in CT colonography, potentially supporting more precise cancer prevention. For age assessment, a deep learning model trained on automatically detected clavicle regions achieved superior accuracy compared to human estimates, demonstrating machine learning’s potential to enhance radiological diagnostics in complex cases. (Shortened).

MCML Authors
Link to website

Philipp Wesp

Dr.

Clinical Data Science in Radiology


[47]
P. Wesp, B. M. Schachtner, K. Jeblick, J. Topalis, M. Weber, F. Fischer, R. Penning, J. Ricke, M. Ingrisch and B. O. Sabel.
Radiological age assessment based on clavicle ossification in CT: enhanced accuracy through deep learning.
International Journal of Legal Medicine (Jan. 2024). DOI
Abstract

Background: Radiological age assessment using reference studies is inherently limited in accuracy due to a finite number of assignable skeletal maturation stages. To overcome this limitation, we present a deep learning approach for continuous age assessment based on clavicle ossification in computed tomography (CT).
Methods: Thoracic CT scans were retrospectively collected from the picture archiving and communication system. Individuals aged 15.0 to 30.0 years examined in routine clinical practice were included. All scans were automatically cropped around the medial clavicular epiphyseal cartilages. A deep learning model was trained to predict a person’s chronological age based on these scans. Performance was evaluated using mean absolute error (MAE). Model performance was compared to an optimistic human reader performance estimate for an established reference study method.
Results: The deep learning model was trained on 4,400 scans of 1,935 patients (training set: mean age =
24.2 years ± 4.0, 1132 female) and evaluated on 300 scans of 300 patients with a balanced age and sex distribution (test set: mean age = 22.5 years ± 4.4, 150 female). Model MAE was 1.65 years, and the highest absolute error was 6.40 years for females and 7.32 years for males. However, performance could be attributed to norm-variants or pathologic disorders. Human reader estimate MAE was 1.84 years and the highest absolute error was 3.40 years for females and 3.78 years for males.
Conclusions: We present a deep learning approach for continuous age predictions using CT volumes highlighting the medial clavicular epiphyseal cartilage with performance comparable to the human reader estimate.

MCML Authors
Link to website

Philipp Wesp

Dr.

Clinical Data Science in Radiology

Link to website

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[46]
F. Bongratz, A.-M. Rickmann and C. Wachinger.
Neural deformation fields for template-based reconstruction of cortical surfaces from MRI.
Medical Image Analysis 93 (Jan. 2024). DOI
Abstract

The reconstruction of cortical surfaces is a prerequisite for quantitative analyses of the cerebral cortex in magnetic resonance imaging (MRI). Existing segmentation-based methods separate the surface registration from the surface extraction, which is computationally inefficient and prone to distortions. We introduce Vox2Cortex-Flow (V2C-Flow), a deep mesh-deformation technique that learns a deformation field from a brain template to the cortical surfaces of an MRI scan. To this end, we present a geometric neural network that models the deformation-describing ordinary differential equation in a continuous manner. The network architecture comprises convolutional and graph-convolutional layers, which allows it to work with images and meshes at the same time. V2C-Flow is not only very fast, requiring less than two seconds to infer all four cortical surfaces, but also establishes vertex-wise correspondences to the template during reconstruction. In addition, V2C-Flow is the first approach for cortex reconstruction that models white matter and pial surfaces jointly, therefore avoiding intersections between them. Our comprehensive experiments on internal and external test data demonstrate that V2C-Flow results in cortical surfaces that are state-of-the-art in terms of accuracy. Moreover, we show that the established correspondences are more consistent than in FreeSurfer and that they can directly be utilized for cortex parcellation and group analyses of cortical thickness.

MCML Authors
Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[45]
D. Zhu, Q. Khan and D. Cremers.
Multi-vehicle trajectory prediction and control at intersections using state and intention information.
Neurocomputing 574 (Jan. 2024). DOI GitHub
Abstract

Traditional deep learning approaches for prediction of future trajectory of multiple road agents rely on knowing information about their past trajectory. In contrast, this work utilizes information of only the current state and intended direction to predict the future trajectory of multiple vehicles at intersections. Incorporating intention information has two distinct advantages: (1) It allows to not just predict the future trajectory but also control the multiple vehicles. (2) By manipulating the intention, the interaction among the vehicles is adapted accordingly to achieve desired behavior. Both these advantages would otherwise not be possible using only past trajectory information Our model utilizes message passing of information between the vehicle nodes for a more holistic overview of the environment, resulting in better trajectory prediction and control of the vehicles. This work also provides a thorough investigation and discussion into the disparity between offline and online metrics for the task of multi-agent control. We particularly show why conducting only offline evaluation would not suffice, thereby necessitating online evaluation. We demonstrate the superiority of utilizing intention information rather than past trajectory in online scenarios. Lastly, we show the capability of our method in adapting to different domains through experiments conducted on two distinct simulation platforms i.e. SUMO and CARLA.

MCML Authors
Link to website

Dekai Zhu

Computer Aided Medical Procedures & Augmented Reality

Link to website

Qadeer Khan

Computer Vision & Artificial Intelligence

Link to Profile Daniel Cremers

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence


[44]
F. Bongratz, J. Fecht, A.-M. Rickmann and C. Wachinger.
V2C-Long: Longitudinal Cortex Reconstruction with Spatiotemporal Correspondence.
Preprint (2024). arXiv
Abstract

Reconstructing the cortex from longitudinal MRI is indispensable for analyzing morphological changes in the human brain. Despite the recent disruption of cortical surface reconstruction with deep learning, challenges arising from longitudinal data are still persistent. Especially the lack of strong spatiotemporal point correspondence hinders downstream analyses due to the introduced noise. To address this issue, we present V2C-Long, the first dedicated deep learning-based cortex reconstruction method for longitudinal MRI. In contrast to existing methods, V2C-Long surfaces are directly comparable in a cross-sectional and longitudinal manner. We establish strong inherent spatiotemporal correspondences via a novel composition of two deep mesh deformation networks and fast aggregation of feature-enhanced within-subject templates. The results on internal and external test data demonstrate that V2C-Long yields cortical surfaces with improved accuracy and consistency compared to previous methods. Finally, this improvement manifests in higher sensitivity to regional cortical atrophy in Alzheimer’s disease.

MCML Authors
Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[43]
M. Ghahremani and C. Wachinger.
RegBN: Batch Normalization of Multimodal Data with Regularization.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL GitHub
Abstract

Recent years have witnessed a surge of interest in integrating high-dimensional data captured by multisource sensors, driven by the impressive success of neural networks in integrating multimodal data. However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted variability and bias, leading to suboptimal performance of multimodal models. Therefore, it becomes crucial to normalize the low- or high-level features extracted from data modalities before their fusion takes place. This paper introduces RegBN, a novel approach for multimodal Batch Normalization with REGularization. RegBN uses the Frobenius norm as a regularizer term to address the side effects of confounders and underlying dependencies among different data sources. The proposed method generalizes well across multiple modalities and eliminates the need for learnable parameters, simplifying training and inference. We validate the effectiveness of RegBN on eight databases from five research areas, encompassing diverse modalities such as language, audio, image, video, depth, tabular, and 3D MRI. The proposed method demonstrates broad applicability across different architectures such as multilayer perceptrons, convolutional neural networks, and vision transformers, enabling effective normalization of both low- and high-level features in multimodal neural networks.

MCML Authors
Link to website

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[42]
G. Zhai, E. P. Örnek, S.-C. Wu, Y. Di, F. Tombari, N. Navab and B. Busam.
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graphs.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL
Abstract

Controllable scene synthesis aims to create interactive environments for numerous industrial use cases. Scene graphs provide a highly suitable interface to facilitate these applications by abstracting the scene context in a compact manner. Existing methods, reliant on retrieval from extensive databases or pre-trained shape embeddings, often overlook scene-object and object-object relationships, leading to inconsistent results due to their limited generation capacity. To address this issue, we present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes, which are semantically realistic and conform to commonsense. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes via latent diffusion, capturing global scene-object and local inter-object relationships in the scene graph while preserving shape diversity. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model. Due to the lack of a scene graph dataset offering high-quality object-level meshes with relations, we also construct SG-FRONT, enriching the off-the-shelf indoor dataset 3D-FRONT with additional scene graph labels. Extensive experiments are conducted on SG-FRONT, where CommonScenes shows clear advantages over other methods regarding generation consistency, quality, and diversity. Codes and the dataset are available on the website.

MCML Authors
Link to website

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[41]
Y. Zhang, Y. Li, H. Brown, M. Rezaei, B. Bischl, P. Torr, A. Khakzar and K. Kawaguchi.
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments.
XAIA @NeurIPS 2023 - Workshop XAI in Action: Past, Present, and Future Applications at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL
Abstract

Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.

MCML Authors
Link to website

Yawei Li

Statistical Learning and Data Science

Link to website

Mina Rezaei

Dr.

Statistical Learning and Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member


[40]
M. F. Azampour, Y. Velikova, E. Fatemizadeh, S. P. Dakua and N. Navab.
Self-supervised Probe Pose Regression via Optimized Ultrasound Representations for US-CT Fusion.
MICAD 2023 - International Conference on Medical Imaging and Computer-Aided Diagnosis. Cambridge, UK, Dec 09-10, 2023. DOI GitHub
Abstract

Aligning 2D ultrasound images with 3D CT scans of the liver holds significant clinical value in enhancing diagnostic precision, surgical planning, and treatment delivery. Conventional approaches primarily rely on optimization techniques, which often have a limited capture range and are susceptible to initialization errors. To address these limitations, we define the problem as “probe pose regression” and leverage deep learning for a more robust and efficient solution for liver US-CT registration without access to paired data. The proposed method is a three-part framework that combines ultrasound rendering, generative model and pose regression. In the first stage, we exploit a differentiable ultrasound rendering model designed to synthesize ultrasound images given segmentation labels. We let the downstream task optimize the rendering parameters, enhancing the performance of the overall method. In the second stage, a generative model bridges the gap between real and rendered ultrasound images, enabling application on real B-mode images. Finally, we use a patient-specific pose regression network, trained self-supervised with only synthetic images and their known poses. We use ultrasound, and CT scans from a dual-modality human abdomen phantom to validate the proposed method.
Our experimental results indicate that the proposed method can estimate probe poses within an acceptable error margin, which can later be fine-tuned using conventional methods. This capability confirms that the proposed framework can serve as a reliable initialization step for US-CT fusion and achieve fully automated US-CT fusion when coupled with conventional methods.

MCML Authors
Link to website

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Link to website

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[39]
A. T. Stüber, S. Coors, B. Schachtner, T. Weber, D. Rügamer, A. Bender, A. Mittermeier, O. Öcal, M. Seidensticker, J. Ricke, B. Bischl and M. Ingrisch.
A comprehensive machine learning benchmark study for radiomics-based survival analysis of CT imaging data in patients with hepatic metastases of CRC.
Investigative Radiology 58.12 (Dec. 2023). DOI
Abstract

Optimizing a machine learning (ML) pipeline for radiomics analysis involves numerous choices in data set composition, preprocessing, and model selection. Objective identification of the optimal setup is complicated by correlated features, interdependency structures, and a multitude of available ML algorithms. Therefore, we present a radiomics-based benchmarking framework to optimize a comprehensive ML pipeline for the prediction of overall survival. This study is conducted on an image set of patients with hepatic metastases of colorectal cancer, for which radiomics features of the whole liver and of metastases from computed tomography images were calculated. A mixed model approach was used to find the optimal pipeline configuration and to identify the added prognostic value of radiomics features.

MCML Authors
Link to website

Theresa Stüber

Clinical Data Science in Radiology

Link to website

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Link to website

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to website

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[38]
C. Wachinger, T. N. Wolf and S. Pölsterl.
Deep learning for the prediction of type 2 diabetes mellitus from neck-to-knee Dixon MRI in the UK biobank.
Heliyon 9.11 (Nov. 2023). DOI
Abstract

Rationale and objectives: We evaluate the automatic identification of type 2 diabetes from neck-to-knee, two-point Dixon MRI scans with 3D convolutional neural networks on a large, population-based dataset. To this end, we assess the best combination of MRI contrasts and stations for diabetes prediction, and the benefit of integrating risk factors.
Materials and methods: Subjects with type 2 diabetes mellitus have been identified in the prospective UK Biobank Imaging study, and a matched control sample has been created to avoid confounding bias. Five-fold cross-validation is used for the evaluation. All scans from the two-point Dixon neck-to-knee sequence have been standardized. A neural network that considers multi-channel MRI input was developed and integrates clinical information in tabular format. An ensemble strategy is used to combine multi-station MRI predictions. A subset with quantitative fat measurements is identified for comparison to prior approaches.
Results: MRI scans from 3406 subjects (mean age, 66.2 years ± 7.1 [standard deviation]; 1128 women) were analyzed with 1703 diabetics. A balanced accuracy of 78.7%, AUC ROC of 0.872, and an average precision of 0.878 was obtained for the classification of diabetes. The ensemble over multiple Dixon MRI stations yields better performance than selecting the individually best station. Moreover, combining fat and water scans as multi-channel inputs to the networks improves upon just using single contrasts as input. Integrating clinical information about known risk factors of diabetes in the network boosts the performance across all stations and the ensemble. The neural network achieved superior results compared to the prediction based on quantitative MRI measurements.
Conclusions: The developed deep learning model accurately predicted type 2 diabetes from neck-to-knee two-point Dixon MRI scans.

MCML Authors
Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Link to website

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging


[37]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Unreading Race: Purging Protected Features from Chest X-ray Embeddings.
Under review. Preprint available (Nov. 2023). arXiv
Abstract

Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray image classification.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning


[36]
N. Stolt-Ansó, J. McGinnis, J. Pan, K. Hammernik and D. Rückert.
NISF: Neural implicit segmentation functions.
MICCAI 2023 - 26th International Conference on Medical Image Computing and Computer Assisted Intervention. Vancouver, Canada, Oct 08-12, 2023. DOI
Abstract

Segmentation of anatomical shapes from medical images has taken an important role in the automation of clinical measurements. While typical deep-learning segmentation approaches are performed on discrete voxels, the underlying objects being analysed exist in a real-valued continuous space. Approaches that rely on convolutional neural networks (CNNs) are limited to grid-like inputs and not easily applicable to sparse or partial measurements. We propose a novel family of image segmentation models that tackle many of CNNs’ shortcomings: Neural Implicit Segmentation Functions (NISF). Our framework takes inspiration from the field of neural implicit functions where a network learns a mapping from a real-valued coordinate-space to a shape representation. NISFs have the ability to segment anatomical shapes in high-dimensional continuous spaces. Training is not limited to voxelized grids, and covers applications with sparse and partial data. Interpolation between observations is learnt naturally in the training procedure and requires no post-processing. Furthermore, NISFs allow the leveraging of learnt shape priors to make predictions for regions outside of the original image plane. We go on to show the framework achieves dice scores of on a (3D+t) short-axis cardiac segmentation task using the UK Biobank dataset. We also provide a qualitative analysis on our frameworks ability to perform segmentation and image interpolation on unseen regions of an image volume at arbitrary resolutions.

MCML Authors
Link to website

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Link to Profile Daniel Rückert

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine


[35]
Y. Yeganeh, A. Farshad and N. Navab.
Anatomy-Aware Masking for Inpainting in Medical Imaging.
ShapeMI @MICCAI 2023 - 3rd Workshop on Shape in Medical Imaging at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI GitHub
Abstract

Inpainting has recently been employed as a successful deep-learning technique for unsupervised model discovery in medical image analysis by taking advantage of the strong priors learned by models to reconstruct the structure and texture of missing parts in images. Even though the learned features depend on the masks as well as the images, the masks used for inpainting are typically random and independent of the dataset, due to the unpredictability of the content of images, i.e., different objects and shapes can appear in different locations in images. However, this is rarely the case for medical imaging data since they are obtained from similar anatomies. Still, random square masks are the most popular technique for inpainting in medical imaging. In this work, we propose a pipeline to generate, position and sample the masks to efficiently learn the shape and structures of the anatomy and generate a myriad of diverse anatomy-aware masks, aiding the model in learning the statistical shape prior to the topology of the organs of interest. We demonstrate the impact of our approach compared to other masking mechanisms in the reconstruction of anatomy. We compare the effectiveness of our proposed masking approach over square-shaped masks, which are traditionally used in medical imaging, and irregular shape masks, which are used in SOTA inpainting literature.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[34]
A. Farshad, Y. Yeganeh, Y. Chi, C. Shen, B. Ommer and N. Navab.
Scenegenie: Scene graph guided diffusion models for image synthesis.
ICCV 2023 - Workshop at the IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI
Abstract

Text-conditioned image generation has made significant progress in recent years with generative adversarial networks and more recently, diffusion models. While diffusion models conditioned on text prompts have produced impressive and high-quality images, accurately representing complex text prompts such as the number of instances of a specific object remains challenging.To address this limitation, we propose a novel guidance approach for the sampling process in the diffusion model that leverages bounding box and segmentation map information at inference time without additional training data. Through a novel loss in the sampling process, our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints, leading to high-resolution images that accurately represent the scene. To obtain bounding box and segmentation map information, we structure the text prompt as a scene graph and enrich the nodes with CLIP embeddings. Our proposed model achieves state-of-the-art performance on two public benchmarks for image generation from scene graphs, surpassing both scene graph to image and text-based diffusion models in various metrics. Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process for more accurate text-to-image generation.

MCML Authors
Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Björn Ommer

Björn Ommer

Prof. Dr.

Computer Vision & Learning

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[33]
Y. Yeganeh, A. Farshad, P. Weinberger, S.-A. Ahmadi, E. Adeli and N. Navab.
Transformers pay attention to convolutions leveraging emerging properties of vits by dual attention-image network.
ICCV 2023 - Workshop at the IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI
Abstract

Although purely transformer-based architectures pretrained on large datasets are introduced as foundation models for general computer vision tasks, hybrid models that incorporate combinations of convolution and transformer blocks showed state-of-the-art performance in more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to convolutional networks, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose a novel and simple architecture based on only convolutional layers and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network, complex transformer-based networks, and even 3D architectures are outperformed with much fewer computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model in the other branch. The results of our experiments on medical imaging datasets show that the extracted attention map visualizations from the attention heads of a pre-trained transformer architecture combined with the image provide strong prior knowledge for a pure CNN architecture to outperform CNN-based and transformer-based architectures.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[32]
F. Bongratz, A.-M. Rickmann and C. Wachinger.
Abdominal organ segmentation via deep diffeomorphic mesh deformations.
Scientific Reports 13.1 (Oct. 2023). DOI
Abstract

Abdominal organ segmentation from CT and MRI is an essential prerequisite for surgical planning and computer-aided navigation systems. It is challenging due to the high variability in the shape, size, and position of abdominal organs. Three-dimensional numeric representations of abdominal shapes with point-wise correspondence to a template are further important for quantitative and statistical analyses thereof. Recently, template-based surface extraction methods have shown promising advances for direct mesh reconstruction from volumetric scans. However, the generalization of these deep learning-based approaches to different organs and datasets, a crucial property for deployment in clinical environments, has not yet been assessed. We close this gap and employ template-based mesh reconstruction methods for joint liver, kidney, pancreas, and spleen segmentation. Our experiments on manually annotated CT and MRI data reveal limited generalization capabilities of previous methods to organs of different geometry and weak performance on small datasets. We alleviate these issues with a novel deep diffeomorphic mesh-deformation architecture and an improved training scheme. The resulting method, UNetFlow, generalizes well to all four organs and can be easily fine-tuned on new data. Moreover, we propose a simple registration-based post-processing that aligns voxel and mesh outputs to boost segmentation accuracy.

MCML Authors
Fabian Bongratz

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging


[31]
A. Stüber, S. Coors and M. Ingrisch.
Revitalize the Potential of Radiomics: Interpretation and Feature Stability in Medical Imaging Analyses through Groupwise Feature Importance.
LB-D-DC @xAI 2023 - Late-breaking Work, Demos and Doctoral Consortium at the 1st World Conference on eXplainable Artificial Intelligence (xAI 2023). Lisbon, Portugal, Jul 26-28, 2023. PDF
Abstract

Radiomics, involving analysis of calculated, quantitative features from medical images with machine learning tools, shares the instability challenge with other high-dimensional data analyses due to variations in the training set. This instability affects model interpretation and feature importance assessment. To enhance stability and interpretability, we introduce grouped feature importance, shedding light on tool limitations and advocating for more reliable radiomics-based analysis methods.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[30]
A. Farshad.
Representation learning for semantic scene understanding.
HHAI 2023 - 2nd International Conference on Hybrid Human-Artificial Intelligence. Munich, Germany, Jun 26-30, 2023. DOI
Abstract

Recent advances in semantic scene understanding have underscored its growing significance in the field of computer vision. Enhanced representations can be achieved by incorporating semantic information derived from textual data and applying it to generative models for scene modeling. Nevertheless, the features extracted from text prompts may not seamlessly model a scene.
Scene graphs offer a robust solution to address this challenge, serving as a powerful representation for semantic image generation and manipulation. In this study, we delve into the utilization of scene graphs for this purpose and propose novel methodologies to augment both the representation and learning processes involved in image generation and manipulation.
For image generation, we examine meta-learning for producing images in unprecedented scenes and refine the generated images using an autoregressive scene graph generation model. In terms of image manipulation, we put forth a novel self-supervised method that eliminates the need for paired before-and-after data. Additionally, we boost image manipulation performance by disentangling latent and graph representations in a self-supervised manner.
By evaluating the efficacy of our proposed approaches on a diverse range of publicly available benchmarks, we demonstrate their superiority, ultimately achieving state-of-the-art performance in the domain of semantic image generation and manipulation.

MCML Authors
Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality


[29]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis.
PAKDD 2023 - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka, Japan, May 25-28, 2023. DOI
Abstract

While recent advances in large-scale foundational models show promising results, their application to the medical domain has not yet been explored in detail. In this paper, we progress into the realms of large-scale modeling in medical synthesis by proposing Cheff - a foundational cascaded latent diffusion model, which generates highly-realistic chest radiographs providing state-of-the-art quality on a 1-megapixel scale. We further propose MaCheX, which is a unified interface for public chest datasets and forms the largest open collection of chest X-rays up to date. With Cheff conditioned on radiological reports, we further guide the synthesis process over text prompts and unveil the research area of report-to-chest-X-ray generation.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning


[28]
Y. Yeganeh, A. Farshad, G. Guevercin, A. Abu-zer, R. Xiao, Y. Tang, E. Adeli and N. Navab.
SCOPE: Structural Continuity Preservation for Medical Image Segmentation.
Preprint (Apr. 2023). arXiv
Abstract

Although the preservation of shape continuity and physiological anatomy is a natural assumption in the segmentation of medical images, it is often neglected by deep learning methods that mostly aim for the statistical modeling of input data as pixels rather than interconnected structures. In biological structures, however, organs are not separate entities; for example, in reality, a severed vessel is an indication of an underlying problem, but traditional segmentation models are not designed to strictly enforce the continuity of anatomy, potentially leading to inaccurate medical diagnoses. To address this issue, we propose a graph-based approach that enforces the continuity and connectivity of anatomical topology in medical images. Our method encodes the continuity of shapes as a graph constraint, ensuring that the network’s predictions maintain this continuity. We evaluate our method on two public benchmarks on retinal vessel segmentation, showing significant improvements in connectivity metrics compared to traditional methods while getting better or on-par performance on segmentation metrics.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[27]
Y. Yeganeh, A. Farshad, P. Weinberger, S.-A. Ahmadi, E. Adeli and N. Navab.
DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation.
Preprint (Apr. 2023). arXiv
Abstract

Although purely transformer-based architectures showed promising performance in many computer vision tasks, many hybrid models consisting of CNN and transformer blocks are introduced to fit more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to CNNs in medical imaging segmentation, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose simple architectures based on purely convolutional layers, and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network (e.g., DINO) one can outperform complex transformer-based networks with much less computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model (as multiple channels) in the other branch. The results of our experiments on two publicly available medical imaging datasets show that the proposed pipeline outperforms U-Net and the state-of-the-art medical image segmentation models.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[26]
A. Khakzar.
Rethinking Feature Attribution for Neural Network Explanation.
Dissertation 2023. DOI
Abstract

Feature attribution is arguably the predominant approach for illuminating black-box neural networks. This dissertation rethinks feature attribution by leveraging critical neural pathways, identifying input features with predictive information, and evaluating feature attribution using the neural network model. The dissertation also rethinks feature attribution for the explanation of medical imaging models.

MCML Authors
Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member


[25]
A. Mittermeier.
Robust evaluation of contrast-enhanced imaging for perfusion quantification.
Dissertation 2023. DOI
Abstract

This thesis advances the quantification and prediction of hemodynamic parameters in dynamic contrast-enhanced (DCE) imaging through two innovative approaches. The Bayesian Tofts model (BTM) improves the reliability and uncertainty estimation of perfusion parameters, demonstrating its potential for enhanced treatment response assessment in cancer care. Additionally, the development of a deep learning model offers a promising alternative by directly predicting clinical endpoints from raw DCE-CT data, eliminating the need for traditional tracer-kinetic modeling and paving the way for more efficient and accurate clinical applications in stroke and other conditions. (Shortened.)

MCML Authors
Link to website

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology


[24]
A. Farshad, Y. Yeganeh, H. Dhamo, F. Tombari and N. Navab.
DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation.
BMVC 2022 - 33rd British Machine Vision Conference. London, UK, Nov 21-24, 2022. URL GitHub
Abstract

Graph representation of objects and their relations in a scene, known as a scene graph, provides a precise and discernible interface to manipulate a scene by modifying the nodes or the edges in the graph. Although existing works have shown promising results in modifying the placement and pose of objects, scene manipulation often leads to losing some visual characteristics like the appearance or identity of objects. In this work, we propose DisPositioNet, a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs in a self-supervised manner. Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph. In addition to producing more realistic images due to the decomposition of features like pose and identity, our method takes advantage of the probabilistic sampling in the intermediate features to generate more diverse images in object replacement or addition tasks. The results of our experiments show that disentangling the feature representations in the latent manifold of the model outperforms the previous works qualitatively and quantitatively on two public benchmarks.

MCML Authors
Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[23]
A. Farshad, A. Makarevich, V. Belagiannis and N. Navab.
MetaMedSeg: Volumetric Meta-learning for Few-Shot Organ Segmentation.
DART @MICCAI 2022 - 4th Workshop on Domain Adaptation and Representation Transfer at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI GitHub
Abstract

The lack of sufficient annotated image data is a common issue in medical image segmentation. For some organs and densities, the annotation may be scarce, leading to poor model training convergence, while other organs have plenty of annotated data. In this work, we present MetaMedSeg, a gradient-based meta-learning algorithm that redefines the meta-learning task for the volumetric medical data with the goal of capturing the variety between the slices. We also explore different weighting schemes for gradients aggregation, arguing that different tasks might have different complexity and hence, contribute differently to the initialization. We propose an importance-aware weighting scheme to train our model. In the experiments, we evaluate our method on the medical decathlon dataset by extracting 2D slices from CT and MRI volumes of different organs and performing semantic segmentation. The results show that our proposed volumetric task definition leads to up to improvement in terms of IoU compared to related baselines. The proposed update rule is also shown to improve the performance for complex scenarios where the data distribution of the target organ is very different from the source organs.

MCML Authors
Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[22]
Y. Yeganeh, A. Farshad, J. Boschmann, R. Gaus, M. Frantzen and N. Navab.
FedAP: Adaptive Personalization in Federated Learning for Non-IID Data.
DeCaF FAIR @MICCAI 2022 - 3rd Workshop on Distributed, Collaborative, and Federated Learning, and Affordable AI and Healthcare for Resource Diverse Global Health at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI
Abstract

Federated learning (FL) is a distributed learning method that offers medical institutes the prospect of collaboration in a global model while preserving the privacy of their patients. Although most medical centers conduct similar medical imaging tasks, their differences, such as specializations, number of patients, and devices, lead to distinctive data distributions. Data heterogeneity poses a challenge for FL and the personalization of the local models. In this work, we investigate an adaptive hierarchical clustering method for FL to produce intermediate semi-global models, so clients with similar data distribution have the chance of forming a more specialized model. Our method forms several clusters consisting of clients with the most similar data distributions; then, each cluster continues to train separately. Inside the cluster, we use meta-learning to improve the personalization of the participants’ models. We compare the clustering approach with classical FedAvg and centralized training by evaluating our proposed methods on the HAM10k dataset for skin lesion classification with extreme heterogeneous data distribution. Our experiments demonstrate significant performance gain in heterogeneous distribution compared to standard FL methods in classification accuracy. Moreover, we show that the models converge faster if applied in clusters and outperform centralized training while using only a small subset of data.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[21]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Implicit Embeddings via GAN Inversion for High Resolution Chest Radiographs.
MAD @MICCAI 2022 - 1st Workshop on Medical Applications with Disentanglements at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI
Abstract

Generative models allow for the creation of highly realistic artificial samples, opening up promising applications in medical imaging. In this work, we propose a multi-stage encoder-based approach to invert the generator of a generative adversarial network (GAN) for high resolution chest radiographs. This gives direct access to its implicitly formed latent space, makes generative models more accessible to researchers, and enables to apply generative techniques to actual patient’s images. We investigate various applications for this embedding, including image compression, disentanglement in the encoded dataset, guided image manipulation, and creation of stylized samples. We find that this type of GAN inversion is a promising research direction in the domain of chest radiograph modeling and opens up new ways to combine realistic X-ray sample synthesis with radiological image analysis.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning


[20]
A. Farshad, Y. Yeganeh, P. Gehlbach and N. Navab.
Y-Net: A Spatiospectral Dual-Encoder Network for Medical Image Segmentation.
MICCAI 2022 - 25th International Conference on Medical Image Computing and Computer Assisted Intervention. Singapore, Sep 18-22, 2022. DOI GitHub
Abstract

Automated segmentation of retinal optical coherence tomography (OCT) images has become an important recent direction in machine learning for medical applications. We hypothesize that the anatomic structure of layers and their high-frequency variation in OCT images make retinal OCT a fitting choice for extracting spectral domain features and combining them with spatial domain features. In this work, we present Y-Net, an architecture that combines the frequency domain features with the image domain to improve the segmentation performance of OCT images. The results of this work demonstrate that the introduction of two branches, one for spectral and one for spatial domain features, brings very significant improvement in fluid segmentation performance and allows outperformance as compared to the well-known U-Net model. Our improvement was 13% on the fluid segmentation dice score and 1.9% on the average dice score. Finally, removing selected frequency ranges in the spectral domain demonstrates the impact of these features on the fluid segmentation outperformance.

MCML Authors
Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[19]
P. Engstler, M. Keicher, D. Schinz, K. Mach, A. S. Gersing, S. C. Foreman, S. S. Goller, J. Weissinger, J. Rischewski, A.-S. Dietrich, B. Wiestler, J. S. Kirschke, A. Khakzar and N. Navab.
Interpretable Vertebral Fracture Diagnosis.
iMIMIC @MICCAI 2022 - Workshop on Interpretability of Machine Intelligence in Medical Image Computing at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI GitHub
Abstract

Do black-box neural network models learn clinically relevant features for fracture diagnosis? The answer not only establishes reliability, quenches scientific curiosity, but also leads to explainable and verbose findings that can assist the radiologists in the final and increase trust. This work identifies the concepts networks use for vertebral fracture diagnosis in CT images. This is achieved by associating concepts to neurons highly correlated with a specific diagnosis in the dataset. The concepts are either associated with neurons by radiologists pre-hoc or are visualized during a specific prediction and left for the user’s interpretation. We evaluate which concepts lead to correct diagnosis and which concepts lead to false positives. The proposed frameworks and analysis pave the way for reliable and explainable vertebral fracture diagnosis.

MCML Authors
Link to website

Matthias Keicher

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Benedikt Wiestler

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[18]
A. Khakzar, Y. Li, Y. Zhang, M. Sanisoglu, S. T. Kim, M. Rezaei, B. Bischl and N. Navab.
Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models.
IMLH @ICML 2022 - 2nd Workshop on Interpretable Machine Learning in Healthcare at the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 17-23, 2022. arXiv
Abstract

One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing the pathology-related concepts encoded by the networks. Our study reveals differences and insights regarding the trained models that are not reflected by quantitative metrics such as AUROC and AP and show up only by looking at the models through a lens.

MCML Authors
Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to website

Yawei Li

Statistical Learning and Data Science

Link to website

Mina Rezaei

Dr.

Statistical Learning and Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[17]
Y. Yeganeh, A. Farshad and N. Navab.
Shape-Aware Masking for Inpainting in Medical Imaging.
Preprint (Jul. 2022). arXiv
Abstract

Inpainting has recently been proposed as a successful deep learning technique for unsupervised medical image model discovery. The masks used for inpainting are generally independent of the dataset and are not tailored to perform on different given classes of anatomy. In this work, we introduce a method for generating shape-aware masks for inpainting, which aims at learning the statistical shape prior. We hypothesize that although the variation of masks improves the generalizability of inpainting models, the shape of the masks should follow the topology of the organs of interest. Hence, we propose an unsupervised guided masking approach based on an off-the-shelf inpainting model and a superpixel over-segmentation algorithm to generate a wide range of shape-dependent masks. Experimental results on abdominal MR image reconstruction show the superiority of our proposed masking method over standard methods using square-shaped or dataset of irregular shape masks.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[16]
A. Khakzar, P. Khorsandi, R. Nobahari and N. Navab.
Do Explanations Explain? Model Knows Best.
CVPR 2022 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA, Jun 19-24, 2022. DOI GitHub
Abstract

It is a mystery which input features contribute to a neural network’s output. Various explanation (feature attribution) methods are proposed in the literature to shed light on the problem. One peculiar observation is that these explanations (attributions) point to different features as being important. The phenomenon raises the question, which explanation to trust? We propose a framework for evaluating the explanations using the neural network model itself. The framework leverages the network to generate input features that impose a particular behavior on the output. Using the generated features, we devise controlled experimental setups to evaluate whether an explanation method conforms to an axiom. Thus we propose an empirical framework for axiomatic evaluation of explanation methods. We evaluate well-known and promising explanation solutions using the proposed framework. The framework provides a toolset to reveal properties and drawbacks within existing and future explanation solutions

MCML Authors
Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[15]
M. Keicher, K. Zaripova, T. Czempiel, K. Mach, A. Khakzar and N. Navab.
Few-shot Structured Radiology Report Generation Using Natural Language Prompts.
Preprint (Mar. 2022). arXiv
Abstract

The automation of chest X-ray reporting has garnered significant interest due to the time-consuming nature of the task. However, the clinical accuracy of free-text reports has proven challenging to quantify using natural language processing metrics, given the complexity of medical information, the variety of writing styles, and the potential for typos and inconsistencies. Structured reporting and standardized reports, on the other hand, can provide consistency and formalize the evaluation of clinical correctness. However, high-quality annotations for structured reporting are scarce. Therefore, we propose a method to predict clinical findings defined by sentences in structured reporting templates, which can be used to fill such templates. The approach involves training a contrastive language-image model using chest X-rays and related free-text radiological reports, then creating textual prompts for each structured finding and optimizing a classifier to predict clinical findings in the medical image. Results show that even with limited image-level annotations for training, the method can accomplish the structured reporting tasks of severity assessment of cardiomegaly and localizing pathologies in chest X-rays.

MCML Authors
Link to website

Matthias Keicher

Computer Aided Medical Procedures & Augmented Reality

Link to website

Kamilia Zaripova

Computer Aided Medical Procedures & Augmented Reality

Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[14]
W. Simson.
Physics-Informed Deep Learning for Advanced Medical Ultrasound.
Dissertation 2022. DOI
Abstract

Freehand ultrasound imaging is an important medical imaging modality due to its ease of applicability and wide application spectrum. Still, modern ultrasound imaging is a largely passive imaging modality, and does not dynamically adapt to the physics in the medium of interest. This dissertation presents the application of physics-informed deep learning for ultrasound imaging applied to sound speed estimation.

MCML Authors
Walter Simson

Walter Simson

Dr.

* Former Member


[13]
Y. Zhang, A. Khakzar, Y. Li, A. Farshad, S. T. Kim and N. Navab.
Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information.
NeurIPS 2021 - Track on Datasets and Benchmarks at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. URL
Abstract

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network’s prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features’ information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

MCML Authors
Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to website

Yawei Li

Statistical Learning and Data Science

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[12]
T. Weber, M. Ingrisch, M. Fabritius, B. Bischl and D. Rügamer.
Survival-oriented embeddings for improving accessibility to complex data structures.
NeurIPS 2021 - Workshop on Bridging the Gap: from Machine Learning Research to Clinical Practice at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. arXiv
Abstract

Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and consequently, neither medical personnel nor patients do usually accept black box models as reason or basis for decisions. Apart from averseness to new technologies, this is due to missing interpretability, transparency and accountability of many machine learning methods. We propose a hazard-regularized variational autoencoder that supports straightforward interpretation of deep neural architectures in the context of survival analysis, a field highly relevant in healthcare. We apply the proposed approach to abdominal CT scans of patients with liver tumors and their corresponding survival times.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning


[11]
T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation.
NeurIPS 2021 - Workshop on Deep Generative Models and Downstream Applications at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. PDF
Abstract

The application of deep learning in survival analysis (SA) allows utilizing unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance, and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches. We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.

MCML Authors
Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning


[10]
A. Farshad, S. Musatian, H. Dhamo and N. Navab.
MIGS: Meta Image Generation from Scene Graphs.
BMVC 2021 - 32nd British Machine Vision Conference. Virtual, Nov 22-25, 2021. URL GitHub
Abstract

Generation of images from scene graphs is a promising direction towards explicit scene generation and manipulation. However, the images generated from the scene graphs lack quality, which in part comes due to high difficulty and diversity in the data. We propose MIGS (Meta Image Generation from Scene Graphs), a meta-learning based approach for few-shot image generation from graphs that enables adapting the model to different scenes and increases the image quality by training on diverse sets of tasks. By sampling the data in a task-driven fashion, we train the generator using meta-learning on different sets of tasks that are categorized based on the scene attributes. Our results show that using this meta-learning approach for the generation of images from scene graphs achieves state-of-the-art performance in terms of image quality and capturing the semantic relationships in the scene.

MCML Authors
Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[9]
S. Garg, H. Dhamo, A. Farshad, S. Musatian, N. Navab and F. Tombari.
Unconditional Scene Graph Generation.
ICCV 2021 - IEEE/CVF International Conference on Computer Vision. Virtual, Oct 11-17, 2021. DOI
Abstract

Despite recent advancements in single-domain or single-object image generation, it is still challenging to generate complex scenes containing diverse, multiple objects and their interactions. Scene graphs, composed of nodes as objects and directed-edges as relationships among objects, offer an alternative representation of a scene that is more semantically grounded than images. We hypothesize that a generative model for scene graphs might be able to learn the underlying semantic structure of real-world scenes more effectively than images, and hence, generate realistic novel scenes in the form of scene graphs. In this work, we explore a new task for the unconditional generation of semantic scene graphs. We develop a deep auto-regressive model called SceneGraphGen which can directly learn the probability distribution over labelled and directed graphs using a hierarchical recurrent architecture. The model takes a seed object as input and generates a scene graph in a sequence of steps, each step generating an object node, followed by a sequence of relationship edges connecting to the previous nodes. We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes. Additionally, we demonstrate the application of the generated graphs in image synthesis, anomaly detection and scene graph completion.

MCML Authors
Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[8]
A. Khakzar, S. Musatian, J. Buchberger, I. V. Quiroz, N. Pinger, S. Baselizadeh, S. T. Kim and N. Navab.
Towards Semantic Interpretation of Thoracic Disease and COVID-19 Diagnosis Models.
MICCAI 2021 - 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg, France, Sep 27-Oct 01, 2021. DOI GitHub
Abstract

Convolutional neural networks are showing promise in the automatic diagnosis of thoracic pathologies on chest x-rays. Their black-box nature has sparked many recent works to explain the prediction via input feature attribution methods (aka saliency methods). However, input feature attribution methods merely identify the importance of input regions for the prediction and lack semantic interpretation of model behavior. In this work, we first identify the semantics associated with internal units (feature maps) of the network. We proceed to investigate the following questions; Does a regression model that is only trained with COVID-19 severity scores implicitly learn visual patterns associated with thoracic pathologies? Does a network that is trained on weakly labeled data (e.g. healthy, unhealthy) implicitly learn pathologies? Moreover, we investigate the effect of pretraining and data imbalance on the interpretability of learned features. In addition to the analysis, we propose semantic attribution to semantically explain each prediction. We present our findings using publicly available chest pathologies (CheXpert [5], NIH ChestX-ray8 [25]) and COVID-19 datasets (BrixIA [20], and COVID-19 chest X-ray segmentation dataset [4]).

MCML Authors
Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[7]
A. Khakzar, Y. Zhang, W. Mansour, Y. Cai, Y. Li, Y. Zhang, S. T. Kim and N. Navab.
Explaining COVID-19 and Thoracic Pathology Model Predictions by Identifying Informative Input Features.
MICCAI 2021 - 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg, France, Sep 27-Oct 01, 2021. DOI GitHub
Abstract

Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks’ prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building on Information Bottleneck Attribution (IBA) method, for each prediction we identify the chest X-ray regions that have high mutual information with the network’s output. Original IBA identifies input regions that have sufficient predictive information. We propose Inverse IBA to identify all informative regions. Thus all predictive cues for pathologies are highlighted on the X-rays, a desirable property for chest X-ray diagnosis. Moreover, we propose Regression IBA for explaining regression models. Using Regression IBA we observe that a model trained on cumulative severity score labels implicitly learns the severity of different X-ray regions. Finally, we propose Multi-layer IBA to generate higher resolution and more detailed attribution/saliency maps. We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-agnostic feature importance metrics on NIH Chest X-ray8 and BrixIA datasets.

MCML Authors
Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to website

Yawei Li

Statistical Learning and Data Science

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[6]
M. P. Fabritius, M. Seidensticker, J. Rueckel, C. Heinze, M. Pech, K. J. Paprottka, P. M. Paprottka, J. Topalis, A. Bender, J. Ricke, A. Mittermeier and M. Ingrisch.
Bi-Centric Independent Validation of Outcome Prediction after Radioembolization of Primary and Secondary Liver Cancer.
Journal of Clinical Medicine 10.16 (Aug. 2021). DOI
Abstract

Background: Yttrium-90 radioembolization (RE) plays an important role in the treatment of liver malignancies. Optimal patient selection is crucial for an effective and safe treatment. In this study, we aim to validate the prognostic performance of a previously established random survival forest (RSF) with an external validation cohort from a different national center. Furthermore, we compare outcome prediction models with different established metrics. Methods: A previously established RSF model, trained on a consecutive cohort of 366 patients who had received RE due to primary or secondary liver tumor at a national center (center 1), was used to predict the outcome of an independent consecutive cohort of 202 patients from a different national center (center 2) and vice versa. Prognostic performance was evaluated using the concordance index (C-index) and the integrated Brier score (IBS). The prognostic importance of designated baseline parameters was measured with the minimal depth concept, and the influence on the predicted outcome was analyzed with accumulated local effects plots. RSF values were compared to conventional cox proportional hazards models in terms of C-index and IBS. Results: The established RSF model achieved a C-index of 0.67 for center 2, comparable to the results obtained for center 1, which it was trained on (0.66). The RSF model trained on center 2 achieved a C-index of 0.68 on center 2 data and 0.66 on center 1 data. CPH models showed comparable results on both cohorts, with C-index ranging from 0.68 to 0.72. IBS validation showed more differentiated results depending on which cohort was trained on and which cohort was predicted (range: 0.08 to 0.20). Baseline cholinesterase was the most important variable for survival prediction. Conclusion: The previously developed predictive RSF model was successfully validated with an independent external cohort. C-index and IBS are suitable metrics to compare outcome prediction models, with IBS showing more differentiated results. The findings corroborate that survival after RE is critically determined by functional hepatic reserve and thus baseline liver function should play a key role in patient selection.

MCML Authors
Link to website

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to website

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Link to Profile Michael Ingrisch

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology


[5]
A. Khakzar, S. Baselizadeh, S. Khanduja, C. Rupprecht, S. T. Kim and N. Navab.
Neural Response Interpretation through the Lens of Critical Pathways.
CVPR 2021 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual, Jun 19-25, 2021. DOI
Abstract

Is critical input information encoded in specific sparse pathways within the neural network? In this work, we discuss the problem of identifying these critical pathways and subsequently leverage them for interpreting the network’s response to an input. The pruning objective — selecting the smallest group of neurons for which the response remains equivalent to the original network — has been previously proposed for identifying critical pathways. We demonstrate that sparse pathways derived from pruning do not necessarily encode critical input information. To ensure sparse pathways include critical fragments of the encoded input information, we propose pathway selection via neurons’ contribution to the response. We proceed to explain how critical pathways can reveal critical input features. We prove that pathways selected via neuron contribution are locally linear (in an ℓ 2 -ball), a property that we use for proposing a feature attribution method: ‘pathway gradient’. We validate our interpretation method using mainstream evaluation experiments. The validation of pathway gradient interpretation method further confirms that selected pathways using neuron contributions correspond to critical input features. The code 1 2 is publicly available.

MCML Authors
Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[4]
P. Kopper, S. Pölsterl, C. Wachinger, B. Bischl, A. Bender and D. Rügamer.
Semi-Structured Deep Piecewise Exponential Models.
AAAI-SPACA 2021 - AAAI Spring Symposium Series on Survival Prediction: Algorithms, Challenges and Applications. Palo Alto, California, USA, Mar 21-24, 2021. PDF
Abstract

We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise expo-nential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data sources and higher-order interaction effects into the model, we embed the model class in a neural network and thereby enable the si-multaneous estimation of both inherently interpretable structured regression inputs as well as deep neural network components which can potentially process additional unstructured data sources. A proof of concept is provided by using the framework to predict Alzheimer’s disease progression based on tabular and 3D point cloud data and applying it to synthetic data.

MCML Authors
Link to Profile Christian Wachinger

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Link to website

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning


[3]
B. Busam.
High Performance Visual Pose Computation.
Dissertation 2021. URL
Abstract

An outside-in system uses binocular stereo and a probabilistic sparse point cloud matcher to track objects with micrometre precision in real-time. Miniaturizing the system results in a markerless inside-out stereo method with improved rotational accuracy. Reducing the constraints, we reformulate marker-free monocular pose estimation as an action decision process where the next best pose is determined using a render-and-compare strategy. This allows instance agnostic pose estimation that generalizes to unseen objects. The methods are applied on a set of medical and industrial applications.

MCML Authors
Link to website

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality


[2]
S. Denner, A. Khakzar, M. Sajid, M. Saleh, Z. Spiclin, S. T. Kim and N. Navab.
Spatio-temporal learning from longitudinal data for multiple sclerosis lesion segmentation.
BrainLes @MICCAI 2020 - Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020). Virtual, Oct 04-08, 2020. DOI GitHub
Abstract

Segmentation of Multiple Sclerosis (MS) lesions in longitudinal brain MR scans is performed for monitoring the progression of MS lesions. We hypothesize that the spatio-temporal cues in longitudinal data can aid the segmentation algorithm. Therefore, we propose a multi-task learning approach by defining an auxiliary self-supervised task of deformable registration between two time-points to guide the neural network toward learning from spatio-temporal changes. We show the efficacy of our method on a clinical dataset comprised of 70 patients with one follow-up study for each patient. Our results show that spatio-temporal information in longitudinal data is a beneficial cue for improving segmentation. We improve the result of current state-of-the-art by 2.6% in terms of overall score (p < 0.05).

MCML Authors
Ashkan Khakzar

Ashkan Khakzar

Dr.

* Former Member

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


[1]
Y. Yeganeh, A. Farshad, N. Navab and S. Albarqouni.
Inverse Distance Aggregation for Federated Learning with Non-IID Data.
DART DCL @MICCAI 2020 - Workshop on Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020). Virtual, Oct 04-08, 2020. DOI
Abstract

Federated learning (FL) has been a promising approach in the field of medical imaging in recent years. A critical problem in FL, specifically in medical scenarios is to have a more accurate shared model which is robust to noisy and out-of distribution clients. In this work, we tackle the problem of statistical heterogeneity in data for FL which is highly plausible in medical data where for example the data comes from different sites with different scanner settings. We propose IDA (Inverse Distance Aggregation), a novel adaptive weighting approach for clients based on meta-information which handles unbalanced and non-iid data. We extensively analyze and evaluate our method against the well-known FL approach, Federated Averaging as a baseline.

MCML Authors
Link to website

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Link to website

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Link to Profile Nassir Navab

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality


C2 | Biology

MCML focuses on crucial issues in Biology and Biomedicine, addressing AI challenges such as liability, black-box behavior, and privacy. The goals include advancing personalized healthcare and fostering collaboration between algorithms and human experts. Additionally, MCML aims to be a key training hub for the next generation of AI-empowered professionals in medical and biological fields.

Link to Profile Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

Link to Profile Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Link to Profile Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Link to Profile Ralf Zimmer

Ralf Zimmer

Prof. Dr.

Bioinformatics

Publication in Research Area C2
[23]
T. Uscidda, L. Eyring, K. Roth, F. J. Theis, Z. Akata and M. Cuturi.
Disentangled Representation Learning with the Gromov-Monge Gap.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv
Abstract

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior while preserving geometric features is challenging, as a mapping that fully preserves these features while aligning the data distribution with the prior does not exist in general. To address these challenges, we introduce a novel approach to disentangled representation learning based on quadratic optimal transport. We formulate the problem using Gromov-Monge maps that transport one distribution onto another with minimal distortion of predefined geometric features, preserving them as much as can be achieved. To compute such maps, we propose the Gromov-Monge-Gap (GMG), a regularizer quantifying whether a map moves a reference distribution with minimal geometry distortion. We demonstrate the effectiveness of our approach for disentanglement across four standard benchmarks, outperforming other methods leveraging geometric considerations.

MCML Authors
Link to website

Luca Eyring

Interpretable and Reliable Machine Learning

Link to website

Karsten Roth

Interpretable and Reliable Machine Learning

Link to Profile Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning


[22]
E. Ailer, C. L. Müller and N. .
Instrumental variable estimation for compositional treatments.
Scientific Reports 15.5158 (Feb. 2025). DOI
Abstract

Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. First, we crisply articulate potential pitfalls for practitioners regarding the interpretation of compositional causes from the viewpoint of interventions and warn against attributing causal meaning to common summary statistics such as diversity indices in microbiome data analysis. We then advocate for and develop multivariate methods using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account while still yielding scientifically interpretable results. In a comparative analysis on synthetic and real microbiome data we show the advantages and limitations of our proposal. We posit that our analysis provides a useful framework and guidance for valid and informative cause-effect estimation in the context of compositional data.

MCML Authors
Link to website

Elisabeth Ailer

Ethics in Systems Design and Machine Learning

Link to Profile Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science


[21]
T. Uscidda, L. Eyring, K. Roth, F. J. Theis, Z. Akata and M. Cuturi.
Disentangled Representation Learning through Geometry Preservation with the Gromov-Monge Gap.
SPIGM @ICML 2024 - Workshop on Structured Probabilistic Inference & Generative Modeling at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. arXiv
Abstract

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior while preserving geometric features is challenging, as a mapping that fully preserves these features while aligning the data distribution with the prior does not exist in general. To address these challenges, we introduce a novel approach to disentangled representation learning based on quadratic optimal transport. We formulate the problem using Gromov-Monge maps that transport one distribution onto another with minimal distortion of predefined geometric features, preserving them as much as can be achieved. To compute such maps, we propose the Gromov-Monge-Gap (GMG), a regularizer quantifying whether a map moves a reference distribution with minimal geometry distortion. We demonstrate the effectiveness of our approach for disentanglement across four standard benchmarks, outperforming other methods leveraging geometric considerations.

MCML Authors
Link to website

Luca Eyring

Interpretable and Reliable Machine Learning

Link to website

Karsten Roth

Interpretable and Reliable Machine Learning

Link to Profile Fabian Theis

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Link to Profile Zeynep Akata

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning