C | Domain-Specific Machine Learning

Clinical Data Science in Radiology

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Martin Menten

Dr.

JRG Leader AI for Vision

Artificial Intelligence in Healthcare and Medicine

Peter Schüffler

Prof. Dr.

Associate

Computational Pathology

Federico Tombari

PD Dr.

Associate

Computer Aided Medical Procedures & Augmented Reality

Publications in Research Area C1

[230]

C. Pellegrini, E. Özsoy, B. Busam, B. Wiestler, N. Navab and M. Keicher.
RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance.
MIDL 2025 - Medical Imaging with Deep Learning. Salt Lake City, UT, USA, Jul 09-11, 2025. URL GitHub

Abstract

Conversational AI tools for generating and discussing accurate radiology reports could transform radiology by enabling collaborative, human-in-the-loop diagnostic processes, saving time and enhancing report quality. While, to this end, Large Vision-Language Models hold promise, current methods lack clinical correctness or are single-task models without conversational abilities. We propose a novel architecture and dataset to address these limitations. First, we propose a secondary image branch, explicitly focusing on structured clinical findings, improving the clinical correctness score by 13.3%. Second, we propose a catastrophic forgetting mitigation strategy and instruct dataset with variable dialog-based tasks, to enable our model to handle a multitude of different queries. RaDialog marks a foundational step toward clinical dialog systems, outperforming existing medical LVLMs by 15.0% in clinical correctness in report generation, 23.4% in interactive report correction, and is preferred by radiologists in 84.0% of cases over a comparative method.

MCML Authors

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

[229]

L. A. Heidrich, A. Rastogi, P. Upadhya, G. Brugnara, M. Foltyn-Dumitru, B. Wiestler and P. Vollmuth.
Curriculum Learning for Language-guided, Multi-modal Detection of Various Pathologies.
MIDL 2025 - Medical Imaging with Deep Learning. Salt Lake City, UT, USA, Jul 09-11, 2025. To be published. Preprint available. URL

Abstract

Pathology detection in medical imaging is crucial for radiologists, yet current approaches that train specialized models for each region of interest often lack efficiency and robustness. Furthermore, the scarcity of annotated medical data, particularly for diverse phenotypes, poses significant challenges in achieving generalizability. To address these challenges, we present a novel language-guided object detection pipeline for medical imaging that leverages curriculum learning strategies, chosen for their ability to progressively train models on increasingly complex samples, thereby improving generalization across pathologies, phenotypes, and modalities. We developed a unified pipeline to convert segmentation datasets into bounding box annotations, and applied two curriculum learning approaches - teacher curriculum and bounding box size curriculum - to train a Grounding DINO model. Our method was evaluated on different tumor types in MRI and CT scans and showed significant improvements in detection accuracy. The teacher and bounding box size curriculum learning approaches yielded a 4.9% AP and 5.2% AP increase over baseline, respectively. The results highlight the potential of curriculum learning to optimize medical image analysis and clinical workflow by providing a versatile and efficient detection algorithm.

MCML Authors

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

[228]

Y. Li, M. Ghahremani and C. Wachinger.
MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis.
ICVSS 2025 - International Computer Vision Summer School: Computer Vision for Spatial Intelligence. Sicily, Italy, Jul 06-12, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Recent vision-language foundation models deliver state-of-the-art results on natural image classification but falter on medical images due to pronounced domain shifts. At the same time, training a medical foundation model requires substantial resources, including extensive annotated data and high computational capacity. To bridge this gap with minimal overhead, we introduce MedBridge, a lightweight multimodal adaptation framework that re-purposes pretrained VLMs for accurate medical image diagnosis. MedBridge comprises three key components. First, a Focal Sampling module that extracts high-resolution local regions to capture subtle pathological features and compensate for the limited input resolution of general-purpose VLMs. Second, a Query Encoder (QEncoder) injects a small set of learnable queries that attend to the frozen feature maps of VLM, aligning them with medical semantics without retraining the entire backbone. Third, a Mixture of Experts mechanism, driven by learnable queries, harnesses the complementary strength of diverse VLMs to maximize diagnostic performance. We evaluate MedBridge on five medical imaging benchmarks across three key adaptation tasks, demonstrating its superior performance in both cross-domain and in-domain adaptation settings, even under varying levels of training data availability. Notably, MedBridge achieved over 6-15% improvement in AUC compared to state-of-the-art VLM adaptation methods in multi-label thoracic disease diagnosis, underscoring its effectiveness in leveraging foundation models for accurate and data-efficient medical diagnosis.

MCML Authors

Yitong Li

Artificial Intelligence in Medical Imaging

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[227]

W. Li, W. Chen, S. Qian, J. Chen, D. Cremers and H. Li.
DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair.
To be published. Preprint available (Jul 06-12, 2025). arXiv GitHub

Abstract

Recent advances in 3D Gaussian Splatting have shown promising results. Existing methods typically assume static scenes and/or multiple images with prior poses. Dynamics, sparse views, and unknown poses significantly increase the problem complexity due to insufficient geometric constraints. To overcome this challenge, we propose a method that can use only two images without prior poses to fit Gaussians in dynamic environments. To achieve this, we introduce two technical contributions. First, we propose an object-level two-view bundle adjustment. This strategy decomposes dynamic scenes into piece-wise rigid components, and jointly estimates the camera pose and motions of dynamic objects. Second, we design an SE(3) field-driven Gaussian training method. It enables fine-grained motion modeling through learnable per-Gaussian transformations. Our method leads to high-fidelity novel view synthesis of dynamic scenes while accurately preserving temporal consistency and object motion. Experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art approaches designed for the cases of static environments, multiple images, and/or known poses.

MCML Authors

Weihang Li

Computer Aided Medical Procedures & Augmented Reality

Weirong Chen

Computer Vision & Artificial Intelligence

Shenhan Qian

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Haoang Li

Dr.

* Former Member

[226]

V. Ehm, N. El Amrani, Y. Xie, L. Bastian, M. Gao, W. Wang, L. Sang, D. Cao, Z. Lähner, D. Cremers and F. Bernard.
Beyond Complete Shapes: A Quantitative Evaluation of 3D Shape Matching Algorithms.
SGP 2025 - Symposium on Geometry Processing. Bilbao, Spain, Jun 30-Jul 04, 2025. To be published. Preprint available. arXiv

Abstract

Finding correspondences between 3D shapes is an important and long-standing problem in computer vision, graphics and beyond. While approaches based on machine learning dominate modern 3D shape matching, almost all existing (learning-based) methods require that at least one of the involved shapes is complete. In contrast, the most challenging and arguably most practically relevant setting of matching partially observed shapes, is currently underexplored. One important factor is that existing datasets contain only a small number of shapes (typically below 100), which are unable to serve data-hungry machine learning approaches, particularly in the unsupervised regime. In addition, the type of partiality present in existing datasets is often artificial and far from realistic. To address these limitations and to encourage research on these relevant settings, we provide a generic and flexible framework for the procedural generation of challenging partial shape matching scenarios. Our framework allows for a virtually infinite generation of partial shape matching instances from a finite set of shapes with complete geometry. Further, we manually create cross-dataset correspondences between seven existing (complete geometry) shape matching datasets, leading to a total of 2543 shapes. Based on this, we propose several challenging partial benchmark settings, for which we evaluate respective state-of-the-art methods as baselines.

MCML Authors

Viktoria Ehm

Computer Vision & Artificial Intelligence

Lennart Bastian

Computer Aided Medical Procedures & Augmented Reality

Maolin Gao

Computer Vision & Artificial Intelligence

Lu Sang

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[225]

F. Li, Y. Bi, D. Huang, Z. Jiang and N. Navab.
Robotic CBCT Meets Robotic Ultrasound.
IPCAI 2025 - International Conference on Information Processing in Computer-Assisted Interventions. Berlin, Germany, Jun 17-18, 2025. To be published. Preprint available. arXiv

Abstract

The multi-modality imaging system offers optimal fused images for safe and precise interventions in modern clinical practices, such as computed tomography - ultrasound (CT-US) guidance for needle insertion. However, the limited dexterity and mobility of current imaging devices hinder their integration into standardized workflows and the advancement toward fully autonomous intervention systems. In this paper, we present a novel clinical setup where robotic cone beam computed tomography (CBCT) and robotic US are pre-calibrated and dynamically co-registered, enabling new clinical applications. This setup allows registration-free rigid registration, facilitating multi-modal guided procedures in the absence of tissue deformation. First, a one-time pre-calibration is performed between the systems. To ensure a safe insertion path by highlighting critical vasculature on the 3D CBCT, SAM2 segments vessels from B-mode images, using the Doppler signal as an autonomously generated prompt. Based on the registration, the Doppler image or segmented vessel masks are then mapped onto the CBCT, creating an optimally fused image with comprehensive detail. To validate the system, we used a specially designed phantom, featuring lesions covered by ribs and multiple vessels with simulated moving flow. The mapping error between US and CBCT resulted in an average deviation of 1.72+-0.62 mm. A user study demonstrated the effectiveness of CBCT-US fusion for needle insertion guidance, showing significant improvements in time efficiency, accuracy, and success rate. Needle intervention performance improved by approximately 50% compared to the conventional US-guided workflow. We present the first robotic dual-modality imaging system designed to guide clinical applications. The results show significant performance improvements compared to traditional manual interventions.

MCML Authors

Feng Li

Computer Aided Medical Procedures & Augmented Reality

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[224]

Y. Yeganeh, A. Farshad, I. Charisiadis, M. Hasny, M. Hartenberger, B. Ommer, N. Navab and E. Adeli.
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. Highlight Paper. To be published. URL

Abstract

Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models; however, such large datasets are not always accessible in medical imaging due to cost and privacy issues, which contradicts one of the main applications of such models to produce synthetic samples where real data is scarce. Also, finetuning on pre-trained general models has been a challenge due to the distribution shift between the medical domain and the pre-trained models. Here, we propose Latent Drift (LD) for diffusion models that can be adopted for any fine-tuning method to mitigate the issues faced by the distribution shift or employed in inference time as a condition. Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation, which is crucial to investigate how parameters such as gender, age, and adding or removing diseases in a patient would alter the medical images. We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation. Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes. The source code of this work will be publicly released upon its acceptance.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Björn Ommer

Prof. Dr.

Computer Vision & Learning

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[223]

D. Mildenberger, P. Hager, D. Rückert and M. Menten.
A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Supervised contrastive learning (SupCon) has proven to be a powerful alternative to the standard cross-entropy loss for classification of multi-class balanced datasets. However, it struggles to learn well-conditioned representations of datasets with long-tailed class distributions. This problem is potentially exacerbated for binary imbalanced distributions, which are commonly encountered during many real-world problems such as medical diagnosis. In experiments on seven binary datasets of natural and medical images, we show that the performance of SupCon decreases with increasing class imbalance. To substantiate these findings, we introduce two novel metrics that evaluate the quality of the learned representation space. By measuring the class distribution in local neighborhoods, we are able to uncover structural deficiencies of the representation space that classical metrics cannot detect. Informed by these insights, we propose two new supervised contrastive learning strategies tailored to binary imbalanced datasets that improve the structure of the representation space and increase downstream classification accuracy over standard SupCon by up to 35%. We make our code available.

MCML Authors

David Mildenberger

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[222]

E. Özsoy, C. Pellegrini, T. Czempiel, F. Tristram, K. Yuan, D. Bani-Harouni, U. Eck, B. Busam, M. Keicher and N. Navab.
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL GitHub

Abstract

Operating rooms (ORs) are complex, high-stakes environments requiring precise understanding of interactions among medical staff, tools, and equipment for enhancing surgical assistance, situational awareness, and patient safety. Current datasets fall short in scale, realism and do not capture the multimodal nature of OR scenes, limiting progress in OR modeling. To this end, we introduce MM-OR, a realistic and large-scale multimodal spatiotemporal OR dataset, and the first dataset to enable multimodal scene graph generation. MM-OR captures comprehensive OR scenes containing RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data and is annotated with panoptic segmentations, semantic scene graphs, and downstream task labels. Further, we propose MM2SG, the first multimodal large vision-language model for scene graph generation, and through extensive experiments, demonstrate its ability to effectively leverage multimodal inputs. Together, MM-OR and MM2SG establish a new benchmark for holistic OR understanding, and open the path towards multimodal scene analysis in complex, high-stakes environments.

MCML Authors

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Felix Tristram

Computer Aided Medical Procedures & Augmented Reality

Kun Yuan

Computer Aided Medical Procedures & Augmented Reality

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[221]

R. Qorbani, G. Villani, T. Panagiotakopoulos, M. B. Colomer, L. Härenstam-Nielsen, M. Segu, P. L. Dovesi, J. Karlgren, D. Cremers, F. Tombari and M. Poggi.
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Open-vocabulary semantic segmentation models associate vision and text to label pixels from an undefined set of classes using textual queries, providing versatile performance on novel datasets. However, large shifts between training and test domains degrade their performance, requiring fine-tuning for effective real-world application. We introduce Semantic Library Adaptation (SemLa), a novel framework for training-free, test-time domain adaptation. SemLa leverages a library of LoRA-based adapters indexed with CLIP embeddings, dynamically merging the most relevant adapters based on proximity to the target domain in the embedding space. This approach constructs an ad-hoc model tailored to each specific input without additional training. Our method scales efficiently, enhances explainability by tracking adapter contributions, and inherently protects data privacy, making it ideal for sensitive applications. Comprehensive experiments on an 18-domain benchmark built over 10 standard datasets demonstrate SemLa’s superior adaptability and performance across diverse settings, establishing a new standard in domain adaptation for open-vocabulary semantic segmentation.

MCML Authors

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

[220]

D. Zhu, Y. Di, S. Gavranovic and S. Ilic.
SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL

Abstract

Denoising diffusion probabilistic models have achieved significant success in point cloud generation, enabling numerous downstream applications, such as generative data augmentation and 3D model editing. However, little attention has been given to generating point clouds with point-wise segmentation labels, as well as to developing evaluation metrics for this task. Therefore, in this paper, we present SeaLion, a novel diffusion model designed to generate high-quality and diverse point clouds with fine-grained segmentation labels. Specifically, we introduce the semantic part-aware latent point diffusion technique, which leverages the intermediate features of the generative models to jointly predict the noise for perturbed latent points and associated part segmentation labels during the denoising process, and subsequently decodes the latent points to point clouds conditioned on part segmentation labels. To effectively evaluate the quality of generated point clouds, we introduce a novel point cloud pairwise distance calculation method named part-aware Chamfer distance (p-CD). This method enables existing metrics, such as 1-NNA, to measure both the local structural quality and inter-part coherence of generated point clouds. Experiments on the large-scale synthetic dataset ShapeNet and real-world medical dataset IntrA demonstrate that SeaLion achieves remarkable performance in generation quality and diversity, outperforming the existing state-of-the-art model, DiffFacto, by 13.33% and 6.52% on 1-NNA (p-CD) across the two datasets. Experimental analysis shows that SeaLion can be trained semi-supervised, thereby reducing the demand for labeling efforts. Lastly, we validate the applicability of SeaLion in generative data augmentation for training segmentation models and the capability of SeaLion to serve as a tool for part-aware 3D shape editing.

MCML Authors

Dekai Zhu

Computer Aided Medical Procedures & Augmented Reality

[219]

W. Li, H. Xu, J. Huang, H. Jung, P. Yu, N. Navab and B. Busam.
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. Preprint available. arXiv GitHub

Abstract

A key challenge in model-free category-level pose estimation is the extraction of contextual object features that generalize across varying instances within a specific category. Recent approaches leverage foundational features to capture semantic and geometry cues from data. However, these approaches fail under partial visibility. We overcome this with a first-complete-then-aggregate strategy for feature extraction utilizing class priors. In this paper, we present GCE-Pose, a method that enhances pose estimation for novel instances by integrating category-level global context prior. GCE-Pose performs semantic shape reconstruction with a proposed Semantic Shape Reconstruction (SSR) module. Given an unseen partial RGB-D object instance, our SSR module reconstructs the instance’s global geometry and semantics by deforming category-specific 3D semantic prototypes through a learned deep Linear Shape Model. We further introduce a Global Context Enhanced (GCE) feature fusion module that effectively fuses features from partial RGB-D observations and the reconstructed global context. Extensive experiments validate the impact of our global context prior and the effectiveness of the GCE fusion module, demonstrating that GCE-Pose significantly outperforms existing methods on challenging real-world datasets HouseCat6D and NOCS-REAL275.

MCML Authors

Weihang Li

Computer Aided Medical Procedures & Augmented Reality

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Hyunjun Jung

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[218]

W. Tang, W. Li, X. Liang, O. Wysocki, F. Biljecki, C. Holst and B. Jutzi.
Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic Images.
USM3D @CVPR 2025 - 2nd Workshop on Urban Scene Modeling at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025). Nashville, TN, USA, Jun 11-15, 2025. To be published. URL GitHub

Abstract

Despite recent advancements in surface reconstruction, Level of Detail (LoD) 3 building reconstruction remains an unresolved challenge. The main issue pertains to the object-oriented modelling paradigm, which requires georeferencing, watertight geometry, facade semantics, and low-poly representation – Contrasting unstructured mesh-oriented models. In Texture2LoD3, we introduce a novel method leveraging the ubiquity of 3D building model priors and panoramic street-level images, enabling the reconstruction of LoD3 building models. We observe that prior low-detail building models can serve as valid planar targets for ortho-rectifying street-level panoramic images. Moreover, deploying segmentation on accurately textured low-level building surfaces supports maintaining essential georeferencing, watertight geometry, and low-poly representation for LoD3 reconstruction. In the absence of LoD3 validation data, we additionally introduce the ReLoD3 dataset, on which we experimentally demonstrate that our method leads to improved facade segmentation accuracy by 11% and can replace costly manual projections. We believe that Texture2LoD3 can scale the adoption of LoD3 models, opening applications in estimating building solar potential or enhancing autonomous driving simulations.

MCML Authors

Weihang Li

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[217]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Preventing Sensitive Information Leakage via Post-hoc Orthogonalization with Application to Chest Radiograph Embeddings.
PAKDD 2025 - 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Sydney, Australia, Jun 10-13, 2025. DOI GitHub

Abstract

Deep learning has substantially advanced data analysis across various fields. However, research indicates that protected characteristics, such as age, sex, and race, are often implicitly encoded within the deep feature representations, or embeddings, generated by neural networks. This encoding can lead to inherent biases, which in turn may influence decision-making processes. In clinical settings, in particular, such biases risk leading to unfair treatment of certain subgroups, potentially resulting in serious consequences. After analyzing the sources of these biases in the field of radiology, we illustrate how embeddings of chest radiographs (CXRs) can be corrected to remove the influence of protected features. To showcase the harms of such incidents, we study the MIMIC and CheXpert datasets with three prominent pre-trained models: a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our experiments reveal a significant influence of protected features on predictions of pathologies in CXRs, demonstrating the potential harm of such practices. We then propose a correction method, removing these harmful effects while maintaining competitive predictive performance.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[216]

J. Kaiser, J. Eigenmann, D. Rückert and G. Kaissis.
User-Level Differential Privacy in Medical Machine Learning.
TPDP 2025 - Workshop on Theory and Practice of Differential Privacy. Google, Mountain View, CA, USA, Jun 02-03, 2025. PDF

Abstract

We address the challenge of ensuring user-level DP when individuals contribute varying numbers of data records to a dataset. While group privacy can be used to aggregate record-level budgets, it can be overly pessimistic and lacks flexibility when users contribute varying numbers of data points. We propose a method for accounting for arbitrary numbers of records per user while maintaining a fixed per-user privacy guarantee by leveraging individual privacy assignment. Experimentally, our method yields excellent utility comparable to record-level DP while providing a more meaningful/interpretable protection.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

[215]

C. S. Vetter, A. Bender, D. B. Dwyer, M. Montembeault, A. Ruef, K. Chrisholm, L. Kambeitz-Ilankovic, L. A. Antonucci, S. Ruhrmann, J. Kambeitz, M. Lichtenstein, A. Riecher, R. Upthegrove, R. K. R. Salokangas, J. Hietala, C. Pantelis, R. Lencer, E. Meisenzahl, S. Wood, P. Brambilla, S. Borgwardt, P. Falkai, A. Bertolino, N. Koutsouleris and PRONIA Consortium.
Exploring the Predictive Value of Structural Covariance Networks for the Diagnosis of Schizophrenia.
Frontiers in Psychiatry 16 (Jun. 2025). DOI

Abstract

Schizophrenia is a psychiatric disorder hypothesized to result from disturbed brain connectivity. Structural covariance networks (SCN) describe the shared variation in morphological properties emerging from coordinated neurodevelopmental processes and may, thus, be a promising diagnostic biomarker for schizophrenia.We compared the diagnostic value of two SCN computation methods derived from regional gray matter volume (GMV) in 154 patients with a diagnosis of first episode psychosis or recurrent schizophrenia (PAT) and 366 healthy control individuals (HC). The first method (REF-SCN) quantifies the contribution of an individual to a normative reference group’s SCN, and the second approach (KLS-SCN) uses a symmetric version of Kulback-Leibler divergence. Their diagnostic value compared to regional GMV was assessed in a stepwise analysis using a series of linear support vector machines within a nested cross-validation framework and stacked generalization, all models were externally validated in an independent sample (NPAT=71, NHC=74), SCN feature importance was assessed, and the derived risk scores were analyzed for differential relationships with clinical variables.We found that models trained on SCNs were able to classify patients with schizophrenia and combining SCNs and regional GMV in a stacked model improved training (balanced accuracy (BAC)=69.96%) and external validation performance (BAC=67.10%). Among all unimodal models, the highest discovery sample performance was achieved by a model trained on REF-SCN (balanced accuracy (BAC=67.03%). All model decisions were driven by widespread structural covariance alterations involving the somato-motor, default mode, control, visual, and the ventral attention networks. Risk estimates derived from KLS-SCNs and regional GMV, but not REF-SCNs, could be predicted from clinical variables, especially driven by body mass index (BMI) and affect-related negative symptoms. These patterns of results show that different SCN computation approaches capture different aspects of the disease. While REF-SCNs contain valuable information for discriminating schizophrenia from healthy control individuals, KLS-SCNs may capture more nuanced symptom-level characteristics similar to those captured by PCA of regional GMV.

MCML Authors

Clara Sophie Vetter

Artificial Intelligence in Healthcare and Medicine

[214]

S. Wang, Q. Cheng, Q. Cheng, W. Zhang, S.-C. Wu, N. Zeller, D. Cremers and N. Navab.
VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for Enhanced Indoor View Synthesis.
IEEE Robotics and Automation Letters 10.6 (Jun. 2025). DOI

Abstract

The generation of high-fidelity view synthesis is essential for robotic navigation and interaction but remains challenging, particularly in indoor environments and real-time scenarios. Existing techniques often require significant computational resources for both training and rendering, and they frequently result in suboptimal 3D representations due to insufficient geometric structuring. To address these limitations, we introduce VoxNeRF, a novel approach that utilizes easy-to-obtain geometry priors to enhance both the quality and efficiency of neural indoor reconstruction and novel view synthesis. We propose an efficient voxel-guided sampling technique that allocates computational resources selectively to the most relevant segments of rays based on a voxel-encoded geometry prior, significantly reducing training and rendering time. Additionally, we incorporate a robust depth loss to improve reconstruction and rendering quality in sparse view settings. Our approach is validated with extensive experiments on ScanNet and ScanNet++ where VoxNeRF outperforms existing state-of-the-art methods and establishes a new benchmark for indoor immersive interpolation and extrapolation settings.

MCML Authors

Sen Wang

Computer Aided Medical Procedures & Augmented Reality

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[213]

Y. Lemaréchal, G. Couture, F. Pelletier, R. Lefol, P.-L. Asselin, S. Ouellet, J. Bernard, L. Ebrahimpour, V. S. K. Manem, J. Topalis, B. Schachtner, S. Jodogne, P. Joubert, K. Jeblick, M. Ingrisch and P. Després.
PARADIM: A Platform to Support Research at the Interface of Data Science and Medical Imaging.
Journal of Imaging Informatics in Medicine (Jun. 2025). DOI

Abstract

This paper describes PARADIM, a digital infrastructure designed to support research at the interface of data science and medical imaging, with a focus on Research Data Management best practices. The platform is built from open-source components and rooted in the FAIR principles through strict compliance with the DICOM standard. It addresses key needs in data curation, governance, privacy, and scalable resource management. Supporting every stage of the data science discovery cycle, the platform offers robust functionalities for user identity and access management, data de-identification, storage, annotation, as well as model training and evaluation. Rich metadata are generated all along the research lifecycle to ensure the traceability and reproducibility of results. PARADIM hosts several medical image collections and allows the automation of large-scale, computationally intensive pipelines (e.g., automatic segmentation, dose calculations, AI model evaluation). The platform fills a gap at the interface of data science and medical imaging, where digital infrastructures are key in the development, evaluation, and deployment of innovative solutions in the real world.

MCML Authors

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[212]

D. Bani-Harouni, C. Pellegrini, E. Özsoy, M. Keicher and N. Navab.
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning.
Preprint (Jun. 2025). arXiv

Abstract

Clinical decision-making is a dynamic, interactive, and cyclic process where doctors have to repeatedly decide on which clinical action to perform and consider newly uncovered information for diagnosis and treatment. Large Language Models (LLMs) have the potential to support clinicians in this process, however, most applications of LLMs in clinical decision support suffer from one of two limitations: Either they assume the unrealistic scenario of immediate availability of all patient information and do not model the interactive and iterative investigation process, or they restrict themselves to the limited ‘out-of-the-box’ capabilities of large pre-trained models without performing task-specific training. In contrast to this, we propose to model clinical decision-making for diagnosis with a hypothesis-driven uncertainty-aware language agent, LA-CDM, that converges towards a diagnosis via repeatedly requesting and interpreting relevant tests. Using a hybrid training paradigm combining supervised and reinforcement learning, we train LA-CDM with three objectives targeting critical aspects of clinical decision-making: accurate hypothesis generation, hypothesis uncertainty estimation, and efficient decision-making. We evaluate our methodology on MIMIC-CDM, a real-world dataset covering four abdominal diseases containing various clinical tests and show the benefit of explicitly training clinical decision-making for increasing diagnostic performance and efficiency.

MCML Authors

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[211]

L. Bastian, M. Rashed, N. Navab and T. Birdal.
Continuous-Time SO(3) Forecasting with Savitzky--Golay Neural Controlled Differential Equations.
Preprint (Jun. 2025). arXiv

Abstract

Tracking and forecasting the rotation of objects is fundamental in computer vision and robotics, yet SO(3) extrapolation remains challenging as (1) sensor observations can be noisy and sparse, (2) motion patterns can be governed by complex dynamics, and (3) application settings can demand long-term forecasting. This work proposes modeling continuous-time rotational object dynamics on SO(3) using Neural Controlled Differential Equations guided by Savitzky-Golay paths. Unlike existing methods that rely on simplified motion assumptions, our method learns a general latent dynamical system of the underlying object trajectory while respecting the geometric structure of rotations. Experimental results on real-world data demonstrate compelling forecasting capabilities compared to existing approaches.

MCML Authors

Lennart Bastian

B1 | Computer Vision
→ Group Nils Thuerey

Computer Aided Medical Procedures & Augmented Reality

Mohammad Rashed

Physics-based Simulation

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[210]

J. Huang, J. Liang, J. Hu, M. Sundermeyer, P. K. T. Yu, N. Navab and B. Busam.
XYZ-IBD: High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity.
Preprint (Jun. 2025). arXiv GitHub

Abstract

We introduce XYZ-IBD, a bin-picking dataset for 6D pose estimation that captures real-world industrial complexity, including challenging object geometries, reflective materials, severe occlusions, and dense clutter. The dataset reflects authentic robotic manipulation scenarios with millimeter-accurate annotations. Unlike existing datasets that primarily focus on household objects, which approach saturation,XYZ-IBD represents the unsolved realistic industrial conditions. The dataset features 15 texture-less, metallic, and mostly symmetrical objects of varying shapes and sizes. These objects are heavily occluded and randomly arranged in bins with high density, replicating the challenges of real-world bin-picking. XYZ-IBD was collected using two high-precision industrial cameras and one commercially available camera, providing RGB, grayscale, and depth images. It contains 75 multi-view real-world scenes, along with a large-scale synthetic dataset rendered under simulated bin-picking conditions. We employ a meticulous annotation pipeline that includes anti-reflection spray, multi-view depth fusion, and semi-automatic annotation, achieving millimeter-level pose labeling accuracy required for industrial manipulation. Quantification in simulated environments confirms the reliability of the ground-truth annotations. We benchmark state-of-the-art methods on 2D detection, 6D pose estimation, and depth estimation tasks on our dataset, revealing significant performance degradation in our setups compared to current academic household benchmarks. By capturing the complexity of real-world bin-picking scenarios, XYZ-IBD introduces more realistic and challenging problems for future research.

MCML Authors

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[209]

E. Kavak, T. N. Wolf and C. Wachinger.
DISCO: Mitigating Bias in Deep Learning with Conditional Distance Correlation.
Preprint (Jun. 2025). arXiv

Abstract

During prediction tasks, models can use any signal they receive to come up with the final answer - including signals that are causally irrelevant. When predicting objects from images, for example, the lighting conditions could be correlated to different targets through selection bias, and an oblivious model might use these signals as shortcuts to discern between various objects. A predictor that uses lighting conditions instead of real object-specific details is obviously undesirable. To address this challenge, we introduce a standard anti-causal prediction model (SAM) that creates a causal framework for analyzing the information pathways influencing our predictor in anti-causal settings. We demonstrate that a classifier satisfying a specific conditional independence criterion will focus solely on the direct causal path from label to image, being counterfactually invariant to the remaining variables. Finally, we propose DISCO, a novel regularization strategy that uses conditional distance correlation to optimize for conditional independence in regression tasks. We can show that DISCO achieves competitive results in different bias mitigation experiments, deeming it a valid alternative to classical kernel-based methods.

MCML Authors

Emre Kavak

Artificial Intelligence in Medical Imaging

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[208]

X. Li, D. Huang, Y. Zhang, N. Navab and Z. Jiang.
Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance.
Preprint (Jun. 2025). arXiv

Abstract

Understanding medical ultrasound imaging remains a long-standing challenge due to significant visual variability caused by differences in imaging and acquisition parameters. Recent advancements in large language models (LLMs) have been used to automatically generate terminology-rich summaries orientated to clinicians with sufficient physiological knowledge. Nevertheless, the increasing demand for improved ultrasound interpretability and basic scanning guidance among non-expert users, e.g., in point-of-care settings, has not yet been explored. In this study, we first introduce the scene graph (SG) for ultrasound images to explain image content to ordinary and provide guidance for ultrasound scanning. The ultrasound SG is first computed using a transformer-based one-stage method, eliminating the need for explicit object detection. To generate a graspable image explanation for ordinary, the user query is then used to further refine the abstract SG representation through LLMs. Additionally, the predicted SG is explored for its potential in guiding ultrasound scanning toward missing anatomies within the current imaging view, assisting ordinary users in achieving more standardized and complete anatomical exploration. The effectiveness of this SG-based image explanation and scanning guidance has been validated on images from the left and right neck regions, including the carotid and thyroid, across five volunteers. The results demonstrate the potential of the method to maximally democratize ultrasound by enhancing its interpretability and usability for ordinaries.

MCML Authors

Xuesong Li

Computer Aided Medical Procedures & Augmented Reality

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[207]

C. J. Mertens, H. Häntze, S. Ziegelmayer, J. N. Kather, D. Truhn, S. H. Kim, F. Busch, D. Weller, B. Wiestler, M. Graf, F. Bamberg, C. L. Schlett, J. B. Weiss, S. Ringhof, E. Can, J. Schulz-Menger, T. Niendorf, J. Lammert, I. Molwitz, A. Kader, A. Hering, A. Meddeb, J. Nawabi, M. B. Schulze, T. Keil, S. N. Willich, L. Krist, M. Hadamitzky, A. Hannemann, F. Bassermann, D. Rückert, T. Pischon, A. Hapfelmeier, M. R. Makowski, K. K. Bressem and L. C. Adams.
Deep learning-enabled MRI phenotyping uncovers regional body composition heterogeneity and disease associations in two European population cohorts.
Preprint (Jun. 2025). DOI

Abstract

Body mass index (BMI) does not account for substantial inter-individual differences in regional fat and muscle compartments, which are relevant for the prevalence of cardiometabolic and cancer conditions. We applied a validated deep learning pipeline for automated segmentation of whole-body MRI scans in 45,851 adults from the UK Biobank and German National Cohort, enabling harmonized quantification of visceral (VAT), gluteofemoral (GFAT), and abdominal subcutaneous adipose tissue (ASAT), liver fat fraction (LFF), and trunk muscle volume. Associations with clinical conditions were evaluated using compartment measures adjusted for age, sex, height, and BMI. Our analysis demonstrates that regional adiposity and muscle volume show distinct associations with cardiometabolic and cancer prevalence, and that substantial disease heterogeneity exists within BMI strata. The analytic framework and reference data presented here will support future risk stratification efforts and facilitate the integration of automated MRI phenotyping into large-scale population and clinical research.

MCML Authors

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Daniel Rückert

Prof. Dr.

C1 | Medicine
→ Group Peter Schüffler

Artificial Intelligence in Healthcare and Medicine

[206]

J. Min, H. Li, T. Nagler and S. Li.
Assessing Climate-Driven Mortality Risk: A Stochastic Approach with Distributed Lag Non-Linear Models.
Preprint (Jun. 2025). arXiv

Abstract

Assessing climate-driven mortality risk has become an emerging area of research in recent decades. In this paper, we propose a novel approach to explicitly incorporate climate-driven effects into both single- and multi-population stochastic mortality models. The new model consists of two components: a stochastic mortality model, and a distributed lag non-linear model (DLNM). The first component captures the non-climate long-term trend and volatility in mortality rates. The second component captures non-linear and lagged effects of climate variables on mortality, as well as the impact of heat waves and cold waves across different age groups. For model calibration, we propose a backfitting algorithm that allows us to disentangle the climate-driven mortality risk from the non-climate-driven stochastic mortality risk. We illustrate the effectiveness and superior performance of our model using data from three European regions: Athens, Lisbon, and Rome. Furthermore, we utilize future UTCI data generated from climate models to provide mortality projections into 2045 across these regions under two Representative Concentration Pathway (RCP) scenarios. The projections show a noticeable decrease in winter mortality alongside a rise in summer mortality, driven by a general increase in UTCI over time. Although we expect slightly lower overall mortality in the short term under RCP8.5 compared to RCP2.6, a long-term increase in total mortality is anticipated under the RCP8.5 scenario.

MCML Authors

Han Li

Dr.

Computational Pathology

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[205]

C. Pellegrini, E. Özsoy, D. Bani-Harouni, M. Keicher and N. Navab.
From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs.
Preprint (Jun. 2025). arXiv

Abstract

Healthcare systems face significant challenges in managing and interpreting vast, heterogeneous patient data for personalized care. Existing approaches often focus on narrow use cases with a limited feature space, overlooking the complex, longitudinal interactions needed for a holistic understanding of patient health. In this work, we propose a novel approach to patient pathway modeling by transforming diverse electronic health record (EHR) data into a structured representation and designing a holistic pathway prediction model, EHR2Path, optimized to predict future health trajectories. Further, we introduce a novel summary mechanism that embeds long-term temporal context into topic-specific summary tokens, improving performance over text-only models, while being much more token-efficient. EHR2Path demonstrates strong performance in both next time-step prediction and longitudinal simulation, outperforming competitive baselines. It enables detailed simulations of patient trajectories, inherently targeting diverse evaluation tasks, such as forecasting vital signs, lab test results, or length-of-stay, opening a path towards predictive and personalized healthcare.

MCML Authors

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[204]

K. Zaripova, E. Özsoy, N. Navab and A. Farshad.
PhenoKG: Knowledge Graph-Driven Gene Discovery and Patient Insights from Phenotypes Alone.
Preprint (Jun. 2025). arXiv

Abstract

Identifying causative genes from patient phenotypes remains a significant challenge in precision medicine, with important implications for the diagnosis and treatment of genetic disorders. We propose a novel graph-based approach for predicting causative genes from patient phenotypes, with or without an available list of candidate genes, by integrating a rare disease knowledge graph (KG). Our model, combining graph neural networks and transformers, achieves substantial improvements over the current state-of-the-art. On the real-world MyGene2 dataset, it attains a mean reciprocal rank (MRR) of 24.64% and nDCG@100 of 33.64%, surpassing the best baseline (SHEPHERD) at 19.02% MRR and 30.54% nDCG@100. We perform extensive ablation studies to validate the contribution of each model component. Notably, the approach generalizes to cases where only phenotypic data are available, addressing key challenges in clinical decision support when genomic information is incomplete.

MCML Authors

Kamilia Zaripova

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[203]

Y. Zhou, Y. Bi, W. Tong, W. Wang, N. Navab and Z. Jiang.
UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation.
Preprint (Jun. 2025). arXiv

Abstract

Precise anomaly detection in medical images is critical for clinical decision-making. While recent unsupervised or semi-supervised anomaly detection methods trained on large-scale normal data show promising results, they lack fine-grained differentiation, such as benign vs. malignant tumors. Additionally, ultrasound (US) imaging is highly sensitive to devices and acquisition parameter variations, creating significant domain gaps in the resulting US images. To address these challenges, we propose UltraAD, a vision-language model (VLM)-based approach that leverages few-shot US examples for generalized anomaly localization and fine-grained classification. To enhance localization performance, the image-level token of query visual prototypes is first fused with learnable text embeddings. This image-informed prompt feature is then further integrated with patch-level tokens, refining local representations for improved accuracy. For fine-grained classification, a memory bank is constructed from few-shot image samples and corresponding text descriptions that capture anatomical and abnormality-specific features. During training, the stored text embeddings remain frozen, while image features are adapted to better align with medical data. UltraAD has been extensively evaluated on three breast US datasets, outperforming state-of-the-art methods in both lesion localization and fine-grained medical classification. The code will be released upon acceptance.

MCML Authors

Yue Zhou

Computer Aided Medical Procedures & Augmented Reality

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[202]

V. Ruozzi, S. Matinfar, L. Schütz, B. Wiestler, A. Redaelli, E. Votta and N. Navab.
BioSonix: Can Physics-based Sonification Perceptualize Tissue Deformations From Tool Interactions?
IPMI 2025 - Information Processing in Medical Imaging. Kos Island, Greece, May 25-30, 2025. To be published.

Abstract

Perceptualizing tool interactions with deformable structures in surgical procedures remains challenging, as unimodal visualization techniques often fail to capture the complexity of these interactions due
to constraints such as occlusion and limited depth perception. This paper presents a novel approach to augment tool navigation in mixed reality environments by providing auditory representations of tool-tissue dynamics, particularly for interactions with soft tissue. BioSonix, a physics-informed design framework, utilizes tissue displacements in 3D space to compute excitation forces for a sound model encoding tissue properties such as stiffness and density. Biomechanical simulations were employed to model particle displacements resulting from tool-tissue interactions, establishing a robust foundation for the method. An optimization approach was used to define configurations for capturing diverse interaction scenarios with varying tool trajectories. Experiments were conducted to validate the accuracy of the sound-displacement mappings. Additionally, two user studies were performed: the first involved two clinical professionals (a neuroradiologist and a cardiologist), who confirmed the method’s impact and achieved high task accuracy; the second included 22 biomedical experts, who demonstrated high discrimination accuracy in tissue differentiation and targeting tasks. The results revealed a strong correlation between tool-tissue dynamics and their corresponding auditory profiles, highlighting the potential of these sound representations to en-
hance the intuitive understanding of complex interactions.

MCML Authors

Sasan Matinfar

Computer Aided Medical Procedures & Augmented Reality

Laura Schütz

Computer Aided Medical Procedures & Augmented Reality

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[201]

A. H. Berger, L. Lux, A. Weers, M. Menten, D. Rückert and J. C. Paetzold.
Pitfalls of topology-aware image segmentation.
IPMI 2025 - Information Processing in Medical Imaging. Kos Island, Greece, May 25-30, 2025. To be published. Preprint available. arXiv

Abstract

Topological correctness, i.e., the preservation of structural integrity and specific characteristics of shape, is a fundamental requirement for medical imaging tasks, such as neuron or vessel segmentation. Despite the recent surge in topology-aware methods addressing this challenge, their real-world applicability is hindered by flawed benchmarking practices. In this paper, we identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts in ground truth annotations, and inappropriate use of evaluation metrics. Through detailed empirical analysis, we uncover these issues’ profound impact on the evaluation and ranking of segmentation methods. Drawing from our findings, we propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[200]

F. Bongratz, Y. Li, S. Elbaroudy and C. Wachinger.
3D Shape-to-Image Brownian Bridge Diffusion for Brain MRI Synthesis from Cortical Surfaces.
IPMI 2025 - Information Processing in Medical Imaging. Kos Island, Greece, May 25-30, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Despite recent advances in medical image generation, existing methods struggle to produce anatomically plausible 3D structures. In synthetic brain magnetic resonance images (MRIs), characteristic fissures are often missing, and reconstructed cortical surfaces appear scattered rather than densely convoluted. To address this issue, we introduce Cor2Vox, the first diffusion model-based method that translates continuous cortical shape priors to synthetic brain MRIs. To achieve this, we leverage a Brownian bridge process which allows for direct structured mapping between shape contours and medical images. Specifically, we adapt the concept of the Brownian bridge diffusion model to 3D and extend it to embrace various complementary shape representations. Our experiments demonstrate significant improvements in the geometric accuracy of reconstructed structures compared to previous voxel-based approaches. Moreover, Cor2Vox excels in image quality and diversity, yielding high variation in non-target structures like the skull. Finally, we highlight the capability of our approach to simulate cortical atrophy at the sub-voxel level.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Yitong Li

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[199]

L. D. Reyes Vargas, M. Menten, J. C. Paetzold, N. Navab and M. F. Azampour.
Skelite: Compact Neural Networks for Efficient Iterative Skeletonization.
IPMI 2025 - Information Processing in Medical Imaging. Kos Island, Greece, May 25-30, 2025. To be published. Preprint available. arXiv

Abstract

Skeletonization extracts thin representations from images that compactly encode their geometry and topology. These representations have become an important topological prior for preserving connectivity in curvilinear structures, aiding medical tasks like vessel segmentation. Existing compatible skeletonization algorithms face significant trade-offs: morphology-based approaches are computationally efficient but prone to frequent breakages, while topology-preserving methods require substantial computational resources. We propose a novel framework for training iterative skeletonization algorithms with a learnable component. The framework leverages synthetic data, task-specific augmentation, and a model distillation strategy to learn compact neural networks that produce thin, connected skeletons with a fully differentiable iterative algorithm. Our method demonstrates a 100 times speedup over topology-constrained algorithms while maintaining high accuracy and generalizing effectively to new domains without fine-tuning. Benchmarking and downstream validation in 2D and 3D tasks demonstrate its computational efficiency and real-world applicability.

MCML Authors

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

[198]

D. Huang, N. Navab and Z. Jiang.
Improving Probe Localization for Freehand 3D Ultrasound using Lightweight Cameras.
ICRA 2025 - IEEE International Conference on Robotics and Automation. Atlanta, GA, USA, May 19-23, 2025. To be published.

Abstract

Ultrasound (US) probe localization relative to the examined subject is essential for freehand 3D US imaging, which offers significant clinical value due to its affordability and unrestricted field of view. However, existing methods often rely on expensive tracking systems or bulky probes, while recent US image-based deep learning methods suffer from accumulated errors during probe maneuvering. To address these challenges, this study proposes a versatile, cost-effective probe pose localization method for freehand 3D US imaging, utilizing two lightweight cameras. To eliminate accumulated errors during US scans, we introduce PoseNet, which directly predicts the probe’s 6D pose relative to a preset world coordinate system based on camera observations. We first jointly train pose and camera image encoders based on pairs of 6D pose and camera observations densely sampled in simulation. This will encourage each pair of probe pose and its corresponding camera observation to share the same representation in latent space. To ensure the two encoders handle unseen images and poses effectively, we incorporate a triplet loss that enforces smaller differences in latent features between nearby poses compared to distant ones. Then, the pose decoder uses the latent representation of the camera images to predict the probe’s 6D pose. To bridge the sim-to-real gap, in the real world, we use the trained image encoder and pose decoder for initial predictions, followed by an additional MLP layer to refine the estimated pose, improving accuracy. The results obtained from an arm phantom demonstrate the effectiveness of the proposed method, which notably surpasses state-of-the-art techniques, achieving average positional and rotational errors of 2.03 mm and 0.37◦, respectively.

MCML Authors

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[197]

J. Marcon, P. Weinhold, M. Rzany, M. P. Fabritius, M. Winkelmann, A. Buchner, L. Eismann, J.-F. Jokisch, J. Casuscelli, G. B. Schulz, T. Knösel, M. Ingrisch, J. Ricke, C. G. Stief, S. Rodler and P. M. Kazmierczak.
Radiomics-based differentiation of upper urinary tract urothelial and renal cell carcinoma in preoperative computed tomography datasets.
BMC Medical Imaging 25.196 (May. 2025). DOI

Abstract

Background: To investigate a non-invasive radiomics-based machine learning algorithm to differentiate upper urinary tract urothelial carcinoma (UTUC) from renal cell carcinoma (RCC) prior to surgical intervention.
Methods: Preoperative computed tomography venous-phase datasets from patients that underwent procedures for histopathologically confirmed UTUC or RCC were retrospectively analyzed. Tumor segmentation was performed manually, and radiomic features were extracted according to the International Image Biomarker Standardization Initiative. Features were normalized using z-scores, and a predictive model was developed using the least absolute shrinkage and selection operator (LASSO). The dataset was split into a training cohort (70%) and a test cohort (30%).
Results: A total of 236 patients [30.5% female, median age 70.5 years (IQR: 59.5–77), median tumor size 5.8 cm (range: 4.1–8.2 cm)] were included. For differentiating UTUC from RCC, the model achieved a sensitivity of 88.4% and specificity of 81% (AUC: 0.93, radiomics score cutoff: 0.467) in the training cohort. In the validation cohort, the sensitivity was 80.6% and specificity 80% (AUC: 0.87, radiomics score cutoff: 0.601). Subgroup analysis of the validation cohort demonstrated robust performance, particularly in distinguishing clear cell RCC from high-grade UTUC (sensitivity: 84%, specificity: 73.1%, AUC: 0.84) and high-grade from low-grade UTUC (sensitivity: 57.7%, specificity: 88.9%, AUC: 0.68). Limitations include the need for independent validation in future randomized controlled trials (RCTs).
Conclusions: Machine learning-based radiomics models can reliably differentiate between RCC and UTUC in preoperative CT imaging. With a suggested performance benefit compared to conventional imaging, this technology might be added to the current preoperative diagnostic workflow.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[196]

H. Chen, Y. Zhang, Y. Bi, Y. Zhang, T. Liu, J. Bi, J. Lan, J. Gu, C. Grosser, D. Krompass, N. Navab and V. Tresp.
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs.
Preprint (May. 2025). arXiv

Abstract

In recent years, Large Language Models (LLMs) have achieved remarkable advancements, drawing significant attention from the research community. Their capabilities are largely attributed to large-scale architectures, which require extensive training on massive datasets. However, such datasets often contain sensitive or copyrighted content sourced from the public internet, raising concerns about data privacy and ownership. Regulatory frameworks, such as the General Data Protection Regulation (GDPR), grant individuals the right to request the removal of such sensitive information. This has motivated the development of machine unlearning algorithms that aim to remove specific knowledge from models without the need for costly retraining. Despite these advancements, evaluating the efficacy of unlearning algorithms remains a challenge due to the inherent complexity and generative nature of LLMs. In this work, we introduce a comprehensive auditing framework for unlearning evaluation, comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods. By using various auditing algorithms, we evaluate the effectiveness and robustness of different unlearning strategies. To explore alternatives beyond prompt-based auditing, we propose a novel technique that leverages intermediate activation perturbations, addressing the limitations of auditing methods that rely solely on model inputs and outputs.

MCML Authors

Haokun Chen

Database Systems and Data Mining

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Yao Zhang

Database Systems and Data Mining

Tong Liu

Database Systems and Data Mining

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[195]

E. Özsoy, A. Mamur, F. Tristram, C. Pellegrini, M. Wysocki, B. Busam and N. Navab.
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding.
Preprint (May. 2025). arXiv

Abstract

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery, EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR’s multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

MCML Authors

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Felix Tristram

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[194]

E. Özsoy, C. Pellegrini, D. Bani-Harouni, K. Yuan, M. Keicher and N. Navab.
ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling.
Preprint (May. 2025). arXiv

Abstract

The real-world complexity of surgeries necessitates surgeons to have deep and holistic comprehension to ensure precision, safety, and effective interventions. Computational systems are required to have a similar level of comprehension within the operating room. Prior works, limited to single-task efforts like phase recognition or scene graph generation, lack scope and generalizability. In this work, we introduce ORQA, a novel OR question answering benchmark and foundational multimodal model to advance OR intelligence. By unifying all four public OR datasets into a comprehensive benchmark, we enable our approach to concurrently address a diverse range of OR challenges. The proposed multimodal large language model fuses diverse OR signals such as visual, auditory, and structured data, for a holistic modeling of the OR. Finally, we propose a novel, progressive knowledge distillation paradigm, to generate a family of models optimized for different speed and memory requirements. We show the strong performance of ORQA on our proposed benchmark, and its zero-shot generalization, paving the way for scalable, unified OR modeling and significantly advancing multimodal surgical intelligence. We will release our code and data upon acceptance.

MCML Authors

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Kun Yuan

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[193]

D. Zhu, S. Gavranovic, F. Boussuge, B. Busam and S. Ilic.
Generative Data Augmentation for Object Point Cloud Segmentation.
Preprint (May. 2025). arXiv

Abstract

Data augmentation is widely used to train deep learning models to address data scarcity. However, traditional data augmentation (TDA) typically relies on simple geometric transformation, such as random rotation and rescaling, resulting in minimal data diversity enrichment and limited model performance improvement. State-of-the-art generative models for 3D shape generation rely on the denoising diffusion probabilistic models and manage to generate realistic novel point clouds for 3D content creation and manipulation. Nevertheless, the generated 3D shapes lack associated point-wise semantic labels, restricting their usage in enlarging the training data for point cloud segmentation tasks. To bridge the gap between data augmentation techniques and the advanced diffusion models, we extend the state-of-the-art 3D diffusion model, Lion, to a part-aware generative model that can generate high-quality point clouds conditioned on given segmentation masks. Leveraging the novel generative model, we introduce a 3-step generative data augmentation (GDA) pipeline for point cloud segmentation training. Our GDA approach requires only a small amount of labeled samples but enriches the training data with generated variants and pseudo-labeled samples, which are validated by a novel diffusion-based pseudo-label filtering method. Extensive experiments on two large-scale synthetic datasets and a real-world medical dataset demonstrate that our GDA method outperforms TDA approach and related semi-supervised and self-supervised methods.

MCML Authors

Dekai Zhu

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[192]

S. Dahan, G. Bénédict, L. Z. J. Williams, Y. Guo, D. Rückert, R. Leech and E. C. Robinson.
SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv GitHub

Abstract

Current AI frameworks for brain decoding and encoding, typically train and test models within the same datasets. This limits their utility for brain computer interfaces (BCI) or neurofeedback, for which it would be useful to pool experiences across individuals to better simulate stimuli not sampled during training. A key obstacle to model generalisation is the degree of variability of inter-subject cortical organisation, which makes it difficult to align or compare cortical signals across participants. In this paper we address this through the use of surface vision transformers, which build a generalisable model of cortical functional dynamics, through encoding the topography of cortical networks and their interactions as a moving image across a surface. This is then combined with tri-modal self-supervised contrastive (CLIP) alignment of audio, video, and fMRI modalities to enable the retrieval of visual and auditory stimuli from patterns of cortical activity (and vice-versa). We validate our approach on 7T task-fMRI data from 174 healthy participants engaged in the movie-watching experiment from the Human Connectome Project (HCP). Results show that it is possible to detect which movie clips an individual is watching purely from their brain activity, even for individuals and movies not seen during training. Further analysis of attention maps reveals that our model captures individual patterns of brain activity that reflect semantic and visual systems. This opens the door to future personalised simulations of brain function.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[191]

J. Kaiser, K. Schwethelm, D. Rückert and G. Kaissis.
Laplace Sample Information: Data Informativeness Through a Bayesian Lens.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Accurately estimating the informativeness of individual samples in a dataset is an important objective in deep learning, as it can guide sample selection, which can improve model efficiency and accuracy by removing redundant or potentially harmful samples. We propose Laplace Sample Information (LSI) measure of sample informativeness grounded in information theory widely applicable across model architectures and learning settings. LSI leverages a Bayesian approximation to the weight posterior and the KL divergence to measure the change in the parameter distribution induced by a sample of interest from the dataset. We experimentally show that LSI is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty. We demonstrate these capabilities of LSI on image and text data in supervised and unsupervised settings. Moreover, we show that LSI can be computed efficiently through probes and transfers well to the training of large models.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

[190]

L. Lux, A. H. Berger, A. Weers, N. Stucki, D. Rückert, U. Bauer and J. C. Paetzold.
Topograph: An efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Topological correctness plays a critical role in many image segmentation tasks, yet most networks are trained using pixel-wise loss functions, such as Dice, neglecting topological accuracy. Existing topology-aware methods often lack robust topological guarantees, are limited to specific use cases, or impose high computational costs. In this work, we propose a novel, graph-based framework for topologically accurate image segmentation that is both computationally efficient and generally applicable. Our method constructs a component graph that fully encodes the topological information of both the prediction and ground truth, allowing us to efficiently identify topologically critical regions and aggregate a loss based on local neighborhood information. Furthermore, we introduce a strict topological metric capturing the homotopy equivalence between the union and intersection of prediction-label pairs. We formally prove the topological guarantees of our approach and empirically validate its effectiveness on binary and multi-class datasets. Our loss demonstrates state-of-the-art performance with up to fivefold faster loss computation compared to persistent homology methods.

MCML Authors

Laurin Lux

A2 | Mathematical Foundations
→ Group Ulrich Bauer

Artificial Intelligence in Healthcare and Medicine

Nico Stucki

Applied Topology and Geometry

Daniel Rückert

Prof. Dr.

A2 | Mathematical Foundations

Artificial Intelligence in Healthcare and Medicine

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry

[189]

J. R. Jostan, L. M. Rodriguez, D. Z. Bernal, J. O. Berdugo, V. Aljure, F. Lopez, J. R. Lopez, N. Navab, D. Mateus and V. G. Duque.
Ultrasound Nerve Segmentation with Deep Learning for Leprosy.
ISBI 2025 - IEEE 22nd International Symposium on Biomedical Imaging. Houston, TX, USA, Apr 14-17, 2025. DOI

Abstract

Purpose: This study aims to provide an AI tool for detecting nerves in ultrasound images to help diagnose Hansen’s disease (Leprosy) in rural areas. The significant difference in the cross-sectional area (CSA) of superficial nerves in symmetrical extremities is a landmark in the early stages of the disease. Despite its potential, ultrasound nerve evaluation is limited due to the difficulty in accurately identifying nerves in ultrasound images.
Methodology: We propose the first Leprosy video nerve segmentation pipeline based on YOLOv8 and X-Mem architectures to automate frame detection, segmentation, and label propagation. We ensure alignment with clinical practices and evaluate the inference in real time of the method and its energy efficiency, confirming the approach’s feasibility in resource-limited settings.
Results: We establish a baseline for nerve segmentation of ultrasound Leprosy videos, presenting the first results to identify relevant frames, segment, and propagate labels. To support further research, we have open source a new leprosy test dataset and created a demo web page to try our method on real patient data. This initiative aims to promote research on AI techniques to improve healthcare in rural communities, where healthcare professionals are scarce and assistance is essential.

MCML Authors

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[188]

K. Schwethelm, J. Kaiser, J. Kuntzer, M. Yigitsoy, D. Rueckert and G. Kaissis.
Differentially Private Active Learning: Balancing Effective Data Selection and Privacy.
SaTML 2025 - IEEE Conference on Secure and Trustworthy Machine Learning. Copenhagen, Denmark, Apr 09-11, 2025. DOI

Abstract

Active learning (AL) is a widely used technique for optimizing data labeling in machine learning by iteratively selecting, labeling, and training on the most informative data. However, its integration with formal privacy-preserving methods, particularly differential privacy (DP), remains largely underexplored. While some works have explored differentially private AL for specialized scenarios like online learning, the fundamental challenge of combining AL with DP in standard learning settings has remained unaddressed, severely limiting AL’s applicability in privacy-sensitive domains. This work addresses this gap by introducing differentially private active learning (DP-AL) for standard learning settings. We demonstrate that naively integrating DP-SGD training into AL presents substantial challenges in privacy budget allocation and data utilization. To overcome these challenges, we propose step amplification, which leverages individual sampling probabilities in batch creation to maximize data point participation in training steps, thus optimizing data utilization. Additionally, we investigate the effectiveness of various acquisition functions for data selection under privacy constraints, revealing that many commonly used functions become impractical. Our experiments on vision and natural language processing tasks show that DP-AL can improve performance for specific datasets and model architectures. However, our findings also highlight the limitations of AL in privacy-constrained environments, emphasizing the trade-offs between privacy, model accuracy, and data selection accuracy.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

[187]

L. Nas, B. F. Hoppe, A. T. Stüber, S. Grosu, N. Fink, A. von Fragstein, J. Rudolph, J. Ricke and B. O. Sabel.
Optimizing lower extremity CT angiography: A prospective study of individualized vs. fixed post-trigger delays in bolus tracking.
European Journal of Radiology 185.112009 (Apr. 2025). DOI

Abstract

Purpose: To compare the contrast media opacification and diagnostic quality in lower-extremity runoff CT angiography (CTA) between bolus-tracking using conventional fixed trigger delay and patient-specific individualized post-trigger delay.
Methods: In this prospective study, lower-extremity runoff CTA was performed in two cohorts, using either fixed or individualized trigger delay. Both cohorts had identical CT protocols, contrast media applications, and image reconstructions. Objective image quality (mean contrast opacification in HU), and subjective image quality (5-point Likert-scale), were assessed in six vessels: abdominal aorta (AA), common iliac artery (CIA), superficial femoral artery (SFA), popliteal artery (PA), posterior tibial artery (PTA), and dorsalis pedis artery (DPA) by one rater for objective and two raters for subjective image quality. Objective image quality was analyzed using Student t-tests, while subjective ratings were compared with Fisher’s exact test.
Results: Overall, 65 patients were included (mean age: 71 ± 14; 39 men), 35 in the individualized cohort and 30 in the fixed cohort. No differences were found between the groups regarding demographics or radiation exposure. Individualized trigger delay ranged from 2 to 23 s (mean: 8.7 ± 4.0 s) and was 10 s in the fixed cohort. The individualized cohort showed higher opacification in the peripheral arteries (PTA: 479 ± 140 HU vs. 379 ± 106 HU; p = 0.009; DPA: 477 ± 191 HU vs. 346 ± 137 HU; p = 0.009). Overall subjective “image quality” was rated higher in the individualized group (“excellent” or “good” in Rater 1: 97% vs. 57%; p < 0.001; and Rater 2: 89% vs. 53%; p = 0.002).
Conclusion: Individualized post-trigger delay enhances diagnostic quality, by improving vessel opacification in peripheral arteries and increasing subjective image quality in lower extremity runoff CTA.

MCML Authors

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Theresa Stüber

Clinical Data Science in Radiology

[186]

Y. Bi, Y. Su, N. Navab and Z. Jiang.
Gaze-Guided Robotic Vascular Ultrasound Leveraging Human Intention Estimation.
IEEE Robotics and Automation Letters 10.4 (Apr. 2025). DOI

Abstract

Medical ultrasound has been widely used to examine vascular structure in modern clinical practice. However, traditional ultrasound examination often faces challenges related to inter- and intra-operator variation. The robotic ultrasound system (RUSS) appears as a potential solution for such challenges because of its superiority in stability and reproducibility. Given the complex anatomy of human vasculature, multiple vessels often appear in ultrasound images, or a single vessel bifurcates into branches, complicating the examination process. To tackle this challenge, this work presents a gaze-guided RUSS for vascular applications. A gaze tracker captures the eye movements of the operator. The extracted gaze signal guides the RUSS to follow the correct vessel when it bifurcates. Additionally, a gaze-guided segmentation network is proposed to enhance segmentation robustness by exploiting gaze information. However, gaze signals are often noisy, requiring interpretation to accurately discern the operator’s true intentions. To this end, this study proposes a stabilization module to process raw gaze data. The inferred attention heatmap is utilized as a region proposal to aid segmentation and serve as a trigger signal when the operator needs to adjust the scanning target, such as when a bifurcation appears. To ensure appropriate contact between the probe and surface during scanning, an automatic ultrasound confidence-based orientation correction method is developed. In experiments, we demonstrated the efficiency of the proposed gaze-guided segmentation pipeline by comparing it with other methods. Besides, the performance of the proposed gaze-guided RUSS was also validated as a whole on a realistic arm phantom with an uneven surface.

MCML Authors

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[185]

J. Huang, P. K. Yu, N. Navab and B. Busam.
TTAPose: Test-time Adaptation for Unseen Object Pose Estimation.
IEEE Robotics and Automation Letters 10.6 (Apr. 2025). DOI

Abstract

Recent advances in the field of 6D pose estimation of unseen objects not present during training are promising, however, the performance gap between these general methods and object-specific methods remains significant. This paper introduces an innovative unsupervised test-time adaptation method, termed TTAPose, capable of adapting a pose estimator to any unseen object. TTAPose initially undergoes pre-training using a large synthetic dataset and thereafter refines the weights using unsupervised loss conducted on unseen real-world target objects. The network, based on a teacher-student architecture, leverages an RGB-D pose refinement pipeline to incrementally improve pseudo labels. Notably, TTAPose operates with no requirement for target data annotation, thus minimizing time and data expenditure. Experimental results show performance levels comparable to supervised methods, effectively narrowing the gap to object-specific baselines.

MCML Authors

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[184]

A.-M. Rickmann, F. Bongratz and C. Wachinger.
Vertex Correspondence and Self-Intersection Reduction in Cortical Surface Reconstruction.
IEEE Transactions on Medical Imaging Early Access (Apr. 2025). DOI

Abstract

Mesh-based cortical surface reconstruction is essential for neuroimaging, enabling precise measurements of brain morphology such as cortical thickness. Establishing vertex correspondence between individual cortical meshes and group templates allows vertex-level comparisons, but traditional methods require time-consuming post-processing steps to achieve vertex correspondence. While deep learning has improved accuracy in cortical surface reconstruction, optimizing vertex correspondence has not been the focus of prior work. We introduce Vox2Cortex with Correspondence (V2CC), an extension of Vox2Cortex, which replaces the commonly used Chamfer loss with L1 loss on registered surfaces. This approach improves inter- and intra-subject correspondence, which makes it suitable for direct group comparisons and atlas-based parcellation. Additionally, we analyze mesh self-intersections, categorizing them into minor (neighboring faces) and major (non-neighboring faces) types.To address major self-intersections, which are not effectively handled by standard regularization losses, we propose a novel Self-Proximity loss, designed to adjust non-neighboring vertices within a defined proximity threshold. Comprehensive evaluations demonstrate that recent deep learning methods inadequately address vertex correspondence, often causing in-accuracies in parcellation. In contrast, our method achieves accurate correspondence and reduces self-intersections to below 1% for both pial and white matter surfaces.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[183]

Ö. Turgut, P. Müller, P. Hager, S. Shit, S. Starck, M. Menten, E. Martens and D. Rückert.
Unlocking the diagnostic potential of electrocardiograms through information transfer from cardiac magnetic resonance imaging.
Medical Image Analysis 101.103451 (Apr. 2025). DOI GitHub

Abstract

Cardiovascular diseases (CVD) can be diagnosed using various diagnostic modalities. The electrocardiogram (ECG) is a cost-effective and widely available diagnostic aid that provides functional information of the heart. However, its ability to classify and spatially localise CVD is limited. In contrast, cardiac magnetic resonance (CMR) imaging provides detailed structural information of the heart and thus enables evidence-based diagnosis of CVD, but long scan times and high costs limit its use in clinical routine. In this work, we present a deep learning strategy for cost-effective and comprehensive cardiac screening solely from ECG. Our approach combines multimodal contrastive learning with masked data modelling to transfer domain-specific information from CMR imaging to ECG representations. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalisability of our method for subject-specific risk prediction of CVD and the prediction of cardiac phenotypes using only ECG data. Specifically, our novel multimodal pre-training paradigm improves performance by up to 12.19% for risk prediction and 27.59% for phenotype prediction. In a qualitative analysis, we demonstrate that our learned ECG representations incorporate information from CMR image regions of interest.

MCML Authors

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[182]

A. Bitarafan, M. Mozafari, M. F. Azampour, M. S. Baghshah, N. Navab and A. Farshad.
Self-supervised 3D medical image segmentation by flow-guided mask propagation learning.
Medical Image Analysis 101.103478 (Apr. 2025). DOI GitHub

Abstract

Despite significant progress in 3D medical image segmentation using deep learning, manual annotation remains a labor-intensive bottleneck. Self-supervised mask propagation (SMP) methods have emerged to alleviate this challenge, allowing intra-volume segmentation with just a single slice annotation. However, the previous SMP methods often rely on 2D information and ignore volumetric contexts. While our previous work, called Vol2Flow, attempts to address this concern, it exhibits limitations, including not focusing enough on local (i.e., slice-pair) information, neglecting global information (i.e., volumetric contexts) in the objective function, and error accumulation during slice-to-slice reconstruction. This paper introduces Flow2Mask, a novel SMP method, developed to overcome the limitations of previous SMP approaches, particularly Vol2Flow. During training, Flow2Mask proposes the Local-to-Global (L2G) loss to learn inter-slice flow fields among all consecutive slices within a volume in an unsupervised manner. This dynamic loss is based on curriculum learning to gradually learn information within a volume from local to global contexts. Additionally, the Inter-Slice Smoothness (ISS) loss is introduced as a regularization term to encourage changes between the slices occur consistently and continuously. During inference, Flow2Mask leverages these 3D flow fields for inter-slice mask propagation in a 3D image, spreading annotation from a single annotated slice to the entire volume. Moreover, we propose an automatic strategy to select the most representative slice as initial annotation in the mask propagation process. Experimental evaluations on different abdominal datasets demonstrate that our proposed SMP method outperforms previous approaches and improves the overall mean DSC of Vol2Flow by +2.1%, +8.2%, and +4.0% for the Sliver, CHAOS, and 3D-IRCAD datasets, respectively. Furthermore, Flow2Mask even exhibits substantial improvements in weakly-supervised and self-supervised few-shot segmentation methods when applied as a mask completion tool.

MCML Authors

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[181]

G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
3DV 2025 - 12th International Conference on 3D Vision. Singapore, Mar 25-28, 2025. To be published. Preprint available. arXiv

Abstract

We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Code and trained models are open-sourced.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Ruotong Liao

Database Systems and Data Mining

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[180]

F. Li, Y. Bi, D. Huang, Z. Jiang and N. Navab.
Robotic CBCT Meets Robotic Ultrasound.
International Journal of Computer Assisted Radiology and Surgery (Mar. 2025). DOI

Abstract

Purpose: The multi-modality imaging system offers optimal fused images for safe and precise interventions in modern clinical practices, such as computed tomography-ultrasound (CT-US) guidance for needle insertion. However, the limited dexterity and mobility of current imaging devices hinder their integration into standardized workflows and the advancement toward fully autonomous intervention systems. In this paper, we present a novel clinical setup where robotic cone beam computed tomography (CBCT) and robotic US are pre-calibrated and dynamically co-registered, enabling new clinical applications. This setup allows registration-free rigid registration, facilitating multi-modal guided procedures in the absence of tissue deformation.
Methods: First, a one-time pre-calibration is performed between the systems. To ensure a safe insertion path by highlighting critical vasculature on the 3D CBCT, SAM2 segments vessels from B-mode images, using the Doppler signal as an autonomously generated prompt. Based on the registration, the Doppler image or segmented vessel masks are then mapped onto the CBCT, creating an optimally fused image with comprehensive detail. To validate the system, we used a specially designed phantom, featuring lesions covered by ribs and multiple vessels with simulated moving flow.
Results: The mapping error between US and CBCT resulted in an average deviation of mm. A user study demonstrated the effectiveness of CBCT-US fusion for needle insertion guidance, showing significant improvements in time efficiency, accuracy, and success rate. Needle intervention performance improved by approximately 50% compared to the conventional US-guided workflow.
Conclusion: We present the first robotic dual-modality imaging system designed to guide clinical applications. The results show significant performance improvements compared to traditional manual interventions.

MCML Authors

Feng Li

Computer Aided Medical Procedures & Augmented Reality

Yuan Bi

Computer Aided Medical Procedures & Augmented Reality

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[179]

K. Schwethelm, J. Kaiser, M. Knolle, S. Lockfisch, D. Rückert and A. Ziller.
Visual Privacy Auditing with Diffusion Models.
Transactions on Machine Learning Research (Mar. 2025). URL

Abstract

Data reconstruction attacks on machine learning models pose a substantial threat to privacy, potentially leaking sensitive information. Although defending against such attacks using differential privacy (DP) provides theoretical guarantees, determining appropriate DP parameters remains challenging. Current formal guarantees on the success of data reconstruction suffer from overly stringent assumptions regarding adversary knowledge about the target data, particularly in the image domain, raising questions about their real-world applicability. In this work, we empirically investigate this discrepancy by introducing a reconstruction attack based on diffusion models (DMs) that only assumes adversary access to real-world image priors and specifically targets the DP defense. We find that (1) real-world data priors significantly influence reconstruction success, (2) current reconstruction bounds do not model the risk posed by data priors well, and (3) DMs can serve as heuristic auditing tools for visualizing privacy leakage.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[178]

M. Hartenberger, H. Ayaz, F. Ozlugedik, C. Caredda, L. Giannoni, F. Lange, L. Lux, J. Weidner, A. Berger, F. Kofler, M. Menten, B. Montcel, I. Tachtsidis, D. Rückert and I. Ezhov.
Redefining spectral unmixing for in-vivo brain tissue analysis from hyperspectral imaging.
Preprint (Mar. 2025). arXiv

Abstract

In this paper, we propose a methodology for extracting molecular tumor biomarkers from hyperspectral imaging (HSI), an emerging technology for intraoperative tissue assessment. To achieve this, we employ spectral unmixing, allowing to decompose the spectral signals recorded by the HSI camera into their constituent molecular components. Traditional unmixing approaches are based on physical models that establish a relationship between tissue molecules and the recorded spectra. However, these methods commonly assume a linear relationship between the spectra and molecular content, which does not capture the whole complexity of light-matter interaction. To address this limitation, we introduce a novel unmixing procedure that allows to take into account non-linear optical effects while preserving the computational benefits of linear spectral unmixing. We validate our methodology on an in-vivo brain tissue HSI dataset and demonstrate that the extracted molecular information leads to superior classification performance.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Jonas Weidner

AI for Image-Guided Diagnosis and Therapy

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[177]

J. Li, C. Liu, W. Bai, R. Arcucci, C. I. Bercea and J. A. Schnabel.
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions.
Preprint (Mar. 2025). arXiv GitHub

Abstract

Visual Language Models (VLMs) have demonstrated impressive capabilities in visual grounding tasks. However, their effectiveness in the medical domain, particularly for abnormality detection and localization within medical images, remains underexplored. A major challenge is the complex and abstract nature of medical terminology, which makes it difficult to directly associate pathological anomaly terms with their corresponding visual features. In this work, we introduce a novel approach to enhance VLM performance in medical abnormality detection and localization by leveraging decomposed medical knowledge. Instead of directly prompting models to recognize specific abnormalities, we focus on breaking down medical concepts into fundamental attributes and common visual patterns. This strategy promotes a stronger alignment between textual descriptions and visual features, improving both the recognition and localization of abnormalities in medical images. We evaluate our method on the 0.23B Florence-2 base model and demonstrate that it achieves comparable performance in abnormality grounding to significantly larger 7B LLaVA-based medical VLMs, despite being trained on only 1.5% of the data used for such models. Experimental results also demonstrate the effectiveness of our approach in both known and previously unseen abnormalities, suggesting its strong generalization capabilities.

MCML Authors

Jun Li

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[176]

M. L. Mostafa, A. Alperovich, D. Fedotov, G. Ghazaei, S. Saur, A. Farshad and N. Navab.
Surgical Flow Masked Autoencoder for Event Recognition.
Preprint (Mar. 2025).

Abstract

Recognition and forecasting of surgical events from video sequences are crucial for advancing computer-assisted surgery. Surgical events are often characterized by specific tool-tissue interactions; for example, ”bleeding damage” occurs when a tool unintentionally cuts a tissue, leading to blood flow. Despite progress in general event classification, recognizing and forecasting events in medical contexts remains challenging due to data scarcity and the complexity of these events. To address these challenges, we propose a method utilizing video masked autoencoders (VideoMAE) for surgical event recognition. This approach focuses the network on the most informative areas of the video while minimizing the need for extensive annotations. We introduce a novel mask sampling technique based on an estimated prior probability map derived from optical flow. We hypothesize that leveraging prior knowledge of tool-tissue interactions will enable the network to concentrate on the most relevant regions in the video. We propose two methods for estimating the prior probability map: (a) retaining areas with the fastest motion and (b) incorporating an additional encoding pathway for optical flow. Our extensive experiments on the public dataset CATARACTS and our in-house neurosurgical data demonstrate that optical flow-based masking consistently outperforms random masking strategies of VideoMAE in phase and event classification tasks. We find that an optical flow encoder enhances classification accuracy by directing the network’s focus to the most relevant information, even in regions without rapid motion. Finally, we investigate sequential and multi-task training strategies to identify the best-performing model, which surpasses the current state-of-the-art by 5% on the CATARACTS dataset and 27% on our in-house neurosurgical data.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[175]

H. Shang, H. Wu, G. Zhai, B. Sun, F. Wang, F. Tombari and M. Pollefeys.
SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation.
Preprint (Mar. 2025). arXiv

Abstract

Scene graphs capture complex relationships among objects, serving as strong priors for content generation and manipulation. Yet, reasonably manipulating scene graphs – whether by adding nodes or modifying edges – remains a challenging and untouched task. Tasks such as adding a node to the graph or reasoning about a node’s relationships with all others are computationally intractable, as even a single edge modification can trigger conflicts due to the intricate interdependencies within the graph. To address these challenges, we introduce SG-Tailor, an autoregressive model that predicts the conflict-free relationship between any two nodes. SG-Tailor not only infers inter-object relationships, including generating commonsense edges for newly added nodes but also resolves conflicts arising from edge modifications to produce coherent, manipulated graphs for downstream tasks. For node addition, the model queries the target node and other nodes from the graph to predict the appropriate relationships. For edge modification, SG-Tailor employs a Cut-And-Stitch strategy to solve the conflicts and globally adjust the graph. Extensive experiments demonstrate that SG-Tailor outperforms competing methods by a large margin and can be seamlessly integrated as a plug-in module for scene generation and robotic manipulation tasks.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Computer Aided Medical Procedures & Augmented Reality

[174]

S. Si, X. Wang, G. Zhai, N. Navab and B. Plank.
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior.
Preprint (Mar. 2025). arXiv

Abstract

Recent advancements in large language models (LLMs) have demonstrated that fine-tuning and human alignment can render LLMs harmless. In practice, such ‘harmlessness’ behavior is mainly achieved by training models to reject harmful requests, such as ‘Explain how to burn down my neighbor’s house’, where the model appropriately declines to respond. However, this approach can inadvertently result in false refusal, where models reject benign queries as well, such as ‘Tell me how to kill a Python process’. In this work, we demonstrate that prompting safety reflection before generating a response can mitigate false refusal behavior. Building on this finding, we introduce the Think-Before-Refusal (TBR) schema and conduct safety-aware instruction fine-tuning incorporating safety reflection. In an ablation study across 15 pre-trained models, we show that models fine-tuned with safety reflection significantly reduce false refusal behavior while maintaining safety and overall performance compared to those fine-tuned without safety reflection.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[173]

V. Sideri-Lampretsa, D. Rückert and H. Qiu.
Evaluation of Alignment-Regularity Characteristics in Deformable Image Registration.
Preprint (Mar. 2025). arXiv

Abstract

Evaluating deformable image registration (DIR) is challenging due to the inherent trade-off between achieving high alignment accuracy and maintaining deformation regularity. In this work, we introduce a novel evaluation scheme based on the alignment-regularity characteristic (ARC) to systematically capture and analyze this trade-off. We first introduce the ARC curves, which describe the performance of a given registration algorithm as a spectrum measured by alignment and regularity metrics. We further adopt a HyperNetwork-based approach that learns to continuously interpolate across the full regularization range, accelerating the construction and improving the sample density of ARC curves. We empirically demonstrate our evaluation scheme using representative learning-based deformable image registration methods with various network architectures and transformation models on two public datasets. We present a range of findings not evident from existing evaluation practices and provide general recommendations for model evaluation and selection using our evaluation scheme. All code relevant is made publicly available.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[172]

P. Spitzer, D. Hendriks, J. Rudolph, S. Schläger, J. Ricke, N. Kühl, B. F. Hoppe and S. Feuerriegel.
The effect of medical explanations from large language models on diagnostic decisions in radiology.
Preprint (Mar. 2025). DOI

Abstract

Large language models (LLMs) are increasingly used by physicians for diagnostic support. A key advantage of LLMs is the ability to generate explanations that can help physicians understand the reasoning behind a diagnosis. However, the best-suited format for LLM-generated explanations remains unclear. In this large-scale study, we examined the effect of different formats for LLM explanations on clinical decision-making. For this, we conducted a randomized experiment with radiologists reviewing patient cases with radiological images (N=2020 assessments). Participants received either no LLM support (control group) or were supported by one of three LLM-generated explanations: (1) a standard output providing the diagnosis without explanation; (2) a differential diagnosis comparing multiple possible diagnoses; or (3) a chain-of-thought explanation offering a detailed reasoning process for the diagnosis. We find that the format of explanations significantly influences diagnostic accuracy. The chain-of-thought explanations yielded the best performance, improving the diagnostic accuracy by 12.2% compared to the control condition without LLM support (P=0.001). The chain-of-thought explanations are also superior to the standard output without explanation (+7.2%; P=0.040) and the differential diagnosis format (+9.7%; P=0.004). Evidently, explaining the reasoning for a diagnosis helps physicians to identify and correct potential errors in LLM predictions and thus improve overall decisions. Altogether, the results highlight the importance of how explanations in medical LLMs are generated to maximize their utility in clinical practice. By designing explanations to support the reasoning processes of physicians, LLMs can improve diagnostic performance and, ultimately, patient outcomes.

MCML Authors

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[171]

P. Stangel, D. Bani-Harouni, C. Pellegrini, E. Özsoy, K. Zaripova, M. Keicher and N. Navab.
Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models.
Preprint (Mar. 2025). arXiv

Abstract

A safe and trustworthy use of Large Language Models (LLMs) requires an accurate expression of confidence in their answers. We introduce a novel Reinforcement Learning (RL) approach for LLM calibration that fine-tunes LLMs to elicit calibrated confidence estimations in their answers to factual questions. We model the problem as a betting game where the model predicts a confidence score together with every answer, and design a reward function that penalizes both over and under-confidence. We prove that under our reward design an optimal policy would result in a perfectly calibrated confidence estimation. Our experiments demonstrate significantly improved confidence calibration and generalization to new tasks without re-training, indicating that our approach teaches a general confidence awareness. This approach enables the training of inherently calibrated LLMs.

MCML Authors

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Kamilia Zaripova

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[170]

A. Weers, A. H. Berger, L. Lux, P. Schüffler, D. Rückert and J. C. Paetzold.
From Pixels to Histopathology: A Graph-Based Framework for Interpretable Whole Slide Image Analysis.
Preprint (Mar. 2025). arXiv GitHub

Abstract

The histopathological classification of whole-slide images (WSIs) is a fundamental task in digital pathology; yet it requires extensive time and expertise from specialists. While deep learning methods show promising results, they typically process WSIs by dividing them into artificial patches, which inherently prevents a network from learning from the entire image context, disregards natural tissue structures and compromises interpretability. Our method overcomes this limitation through a novel graph-based framework that constructs WSI graph representations. The WSI-graph efficiently captures essential histopathological information in a compact form. We build tissue representations (nodes) that follow biological boundaries rather than arbitrary patches all while providing interpretable features for explainability. Through adaptive graph coarsening guided by learned embeddings, we progressively merge regions while maintaining discriminative local features and enabling efficient global information exchange. In our method’s final step, we solve the diagnostic task through a graph attention network. We empirically demonstrate strong performance on multiple challenging tasks such as cancer stage classification and survival prediction, while also identifying predictive factors using Integrated Gradients.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[169]

A. H. Berger, L. Lux, S. Shit, I. Ezhov, G. Kaissis, M. Menten, D. Rückert and J. C. Paetzold.
Cross-Domain and Cross-Dimension Learning for Image-to-Graph Transformers.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task’s complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method’s utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[168]

Y. Li, M. Ghahremani, Y. Wally and C. Wachinger.
DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI

Abstract

Diagnosing dementia, particularly for Alzheimer’s Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. Moreover, the potential of multi-modal approaches in differential diagnosis, which holds significant clinical importance, remains largely unexplored. We propose a novel framework, DiaMond, to address these issues with vision Transformers to effectively integrate MRI and PET. DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance. DiaMond significantly outperforms existing multi-modal methods across various datasets, achieving a balanced accuracy of 92.4% in AD diagnosis, 65.2% for AD-MCI-CN classification, and 76.5% in differential diagnosis of AD and FTD. We also validated the robustness of DiaMond in a comprehensive ablation study.

MCML Authors

Yitong Li

Artificial Intelligence in Medical Imaging

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[167]

O. Wysocki, Y. Tan, T. Froech, Y. Xia, M. Wysocki, L. Hoegner, D. Cremers and C. Holst.
ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset.
WACV 2025 - IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, Feb 28-Mar 04, 2025. DOI GitHub

Abstract

Facade semantic segmentation is a long-standing challenge in photogrammetry and computer vision. Although the last decades have witnessed the influx of facade segmentation methods, there is a lack of comprehensive facade classes and data covering the architectural variability. In ZAHA11Project page: https://github.com/OloOcki/zaha, we introduce Level of Facade Generalization (LoFG), novel hierarchical facade classes designed based on international urban modeling standards, ensuring compatibility with real-world challenging classes and uniform methods’ comparison. Realizing the LoFG, we present to date the largest semantic 3D facade segmentation dataset, providing 601 million annotated points at five and 15 classes of LoFG2 and LoFG3, respectively. More-over, we analyze the performance of baseline semantic segmentation methods on our introduced LoFG classes and data, complementing it with a discussion on the unresolved challenges for facade segmentation. We firmly believe that ZAHA shall facilitate further development of 3D facade semantic segmentation methods, enabling robust segmentation indispensable in creating urban digital twins.

MCML Authors

Yan Xia

Dr.

* Former Member

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[166]

Y. Shen, Z. Zhuang, K. Yuan, M.-I. Nicolae, N. Navab, N. Padoy and M. Fritz.
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems, assisting medical personnel with decision making and results analysis. Models for radiology report generation are able to interpret medical imagery, thus reducing the workload of radiologists. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. However, these assets are potentially vulnerable to model stealing, where attackers aim to replicate their functionality via black-box access. So far, model stealing for the medical domain has focused on classification; however, existing attacks are not effective against MLLMs. In this paper, we introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs. ADA-STEAL relies on natural images, which are public and widely available, as opposed to their medical counterparts. We show that data augmentation with adversarial noise is sufficient to overcome the data distribution gap between natural images and the domain-specific distribution of the victim MLLM. Experiments on the IU X-RAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data.

MCML Authors

Kun Yuan

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[165]

K. Geißler, T. L. Koller, A. Ambroladze, E. M. Fallenberg, M. Ingrisch and H. K. Hahn.
Breast cancer risk prediction using background parenchymal enhancement, radiomics, and symmetry features on MRI.
SPIE 2025 - SPIE Medical Imaging: Computer-Aided Diagnosis. San Diego, CA, USA, Feb 16-21, 2025. DOI

Abstract

Breast cancer is the world’s most prevalent cancer type. Risk models predicting the chance of near future cancer development can help to increase the efficiency of screening programs by targeting high risk patients specifically. In this study we develop machine learning models for predicting the 2 year risk for breast cancer and current breast cancer detection. Therefore, we leverage feature sets based on background parenchymal enhancement (BPE), radiomics and breast symmetry. We train and evaluate our models on longitudinal MRI data from a German high risk screening program using random forests and 5-fold cross validation. The models, which are developed similar to prior work for breast cancer risk prediction, have low predictive power on our dataset. The best performing model is based on BPE features and achieves an AUC of 0.57 for 2 year breast cancer risk prediction.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[164]

T. L. Koller, K. Geißler, A. Ambroladze, E. M. Fallenberg, M. Ingrisch, H. Amer, P. Seeböck, G. Langs and H. K. Hahn.
Pitfalls with anomaly detection for breast cancer risk prediction.
SPIE 2025 - SPIE Medical Imaging: Computer-Aided Diagnosis. San Diego, CA, USA, Feb 16-21, 2025. DOI

Abstract

Breast cancer has the highest prevalence in the world, and thus, most countries have screening programs which aim to detect the cancer onset early. In these screening programs, negative studies dominate the dataset. Unsu- pervised anomaly detection promises to take advantage of the negative studies by using it to detect abnormalities as cancer or signs of cancer onset. In this study, we evaluate an anomaly detection method for cancer predic- tion (1-year ahead) on a MRI dataset of a high risk cohort with BRCA1 and BRCA2 gene mutations. As the approach fails to predict cancer risk on the dataset, we investigate the intrinsic behavior of the method. Our analysis reveals, that the reconstruction based method might only detect high intensity anomalies and that the reconstruction quality is highly correlated with noisy patterns in the image patches.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[163]

A. T. Stüber, M. M. Heimer, J. Ta, M. P. Fabritius, B. F. Hoppe, G. Sheikh, M. Brendel, L. Unterrainer, P. Jurmeister, A. Tufman, J. Ricke, C. C. Cyran and M. Ingrisch.
Replication study of PD-L1 status prediction in NSCLC using PET/CT radiomics.
European Journal of Radiology 183.111825 (Feb. 2025). DOI

Abstract

This study investigates the predictive capability of radiomics in determining programmed cell death ligand 1 (PD-L1) expression (>=1%) status in non-small cell lung cancer (NSCLC) patients using a newly collected [18F]FDG PET/CT dataset. We aimed to replicate and validate the radiomics-based machine learning (ML) model proposed by Zhao et al. [2] predicting PD-L1 status from PET/CT-imaging.
An independent cohort of 254 NSCLC patients underwent [18F]FDG PET/CT imaging, with primary tumor segmentation conducted using lung tissue window (LTW) and more conservative soft tissue window (STW) methods. Radiomics models (“Rad-score” and “complex model”) and a clinical-stage model from Zhao et al. were evaluated via 10-fold cross-validation and AUC analysis, alongside a benchmark-study comparing different ML-model pipelines. Clinicopathological data were collected from medical records.
On our data, the Rad-score model yielded mean AUCs of 0.593 (STW) and 0.573 (LTW), below Zhao et al.’s 0.761. The complex model achieved mean AUCs of 0.505 (STW) and 0.519 (LTW), lower than Zhao et al.’s 0.769. The clinical model showed a mean AUC of 0.555, below Zhao et al.’s 0.64. All models performed significantly lower than Zhao et al.’s findings. Our benchmark study on four ML pipelines revealed consistently low performance across all configurations.
Our study failed to replicate original findings, suggesting poor model performance and questioning predictive value of radiomics features in classifying PD-L1 expression from PET/CT imaging. These results highlight challenges in replicating radiomics-based ML models and stress the need for rigorous validation

MCML Authors

Theresa Stüber

Clinical Data Science in Radiology

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[162]

M. Ghahremani, B. R. Ernhofer, J. Wang, M. Makowski and C. Wachinger.
Organ-DETR: Organ Detection via Transformers.
IEEE Transactions on Medical Imaging Early Access (Feb. 2025). DOI URL

Abstract

Query-based Transformers have been yielding impressive performance in object localization and detection tasks. However, their application to organ detection in 3D medical imaging data has been relatively unexplored. This study introduces Organ-DETR, featuring two innovative modules, MultiScale Attention (MSA) and Dense Query Matching (DQM), designed to enhance the performance of Detection Transformers (DETRs) for 3D organ detection. MSA is a novel top-down representation learning approach for efficiently encoding Computed Tomography (CT) features. This architecture employs a multiscale attention mechanism, utilizing both dual self-attention and cross-scale attention mechanisms to extract intra- and inter-scale spatial interactions in the attention mechanism. Organ-DETR also introduces DQM, an approach for one-to-many matching that tackles the label assignment difficulties in organ detection. DQM increases positive queries to enhance both recall scores and training efficiency without the need for additional learnable parameters. Extensive results on five 3D CT datasets indicate that the proposed Organ-DETR outperforms comparable techniques by achieving a remarkable improvement of +10.6 mAP COCO.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[161]

D. Huang, C. Li, A. Karlas, X. Chu, K. W. S. Au, N. Navab and Z. Jiang.
VibNet: Vibration-Boosted Needle Detection in Ultrasound Images.
IEEE Transactions on Medical Imaging Early Access (Feb. 2025). DOI GitHub

Abstract

Precise percutaneous needle detection is crucial for ultrasound (US)-guided interventions. However, inherent limitations such as speckles, needle-like artifacts, and low resolution make it challenging to robustly detect needles, especially when their visibility is reduced or imperceptible. To address this challenge, we propose VibNet, a learning-based framework designed to enhance the robustness and accuracy of needle detection in US images by leveraging periodic vibration applied externally to the needle shafts. VibNet integrates neural Short-Time Fourier Transform and Hough Transform modules to achieve successive sub-goals, including motion feature extraction in the spatiotemporal space, frequency feature aggregation, and needle detection in the Hough space. Due to the periodic subtle vibration, the features are more robust in the frequency domain than in the image intensity domain, making VibNet more effective than traditional intensity-based methods. To demonstrate the effectiveness of VibNet, we conducted experiments on distinct ex vivo porcine and bovine tissue samples. The results obtained on porcine samples demonstrate that VibNet effectively detects needles even when their visibility is severely reduced, with a tip error of 1.61±1.56 mm compared to 8.15±9.98 mm for UNet and 6.63±7.58 mm for WNet, and a needle direction error of 1.64 ± 1.86° compared to 9.29 ± 15.30° for UNet and 8.54 ± 17.92° for WNet.

MCML Authors

Dianye Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[160]

C. I. Bercea, B. Wiestler, D. Rückert and J. A. Schnabel.
Evaluating normative representation learning in generative AI for robust anomaly detection in brain imaging.
Nature Communications 16.1624 (Feb. 2025). DOI GitHub

Abstract

Normative representation learning focuses on understanding the typical anatomical distributions from large datasets of medical scans from healthy individuals. Generative Artificial Intelligence (AI) leverages this attribute to synthesize images that accurately reflect these normative patterns. This capability enables the AI allowing them to effectively detect and correct anomalies in new, unseen pathological data without the need for expert labeling. Traditional anomaly detection methods often evaluate the anomaly detection performance, overlooking the crucial role of normative learning. In our analysis, we introduce novel metrics, specifically designed to evaluate this facet in AI models. We apply these metrics across various generative AI frameworks, including advanced diffusion models, and rigorously test them against complex and diverse brain pathologies. In addition, we conduct a large multi-reader study to compare these metrics to experts’ evaluations. Our analysis demonstrates that models proficient in normative learning exhibit exceptional versatility, adeptly detecting a wide range of unseen medical conditions.

MCML Authors

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[159]

V. Iwuajoku, K. Ekici, A. Haas, M. Z. Kazemi, A. Kasajima, C. Delbridge, A. Muckenhuber, E. Schmoeckel, F. Stögbauer, C. Bollwein, K. Schwamborn, K. Steiger, C. Mogler and P. J. Schüffler.
An equivalency and efficiency study for one year digital pathology for clinical routine diagnostics in an accredited tertiary academic center.
Virchows Archiv (Feb. 2025). DOI

Abstract

Digital pathology is revolutionizing clinical diagnostics by offering enhanced efficiency, accuracy, and accessibility of pathological examinations. This study explores the implementation and validation of digital pathology in a large tertiary academic center, focusing on its gradual integration and transition into routine clinical diagnostics. In a comprehensive validation process over a 6-month period, we compared sign-out of digital and physical glass slides of a wide range of different tissue specimens and histopathological diagnoses. Key metrics such as diagnostic concordance and user satisfaction were assessed by involving the pathologists in a validation training and study phase. We measured turnaround times before and after transitioning to digital pathology to assess the impact on overall efficiency. Our results demonstrate a 99% concordance between the analog and digital reports while at the same time reducing the time to sign out a case by almost a minute, suggesting potential long-term efficiency gains. Our digital transition positively impacted our pathology workflow: Pathologists reported increased flexibility and satisfaction due to the ease of accessing and sharing digital slides. However, challenges were identified, including technical issues related to image quality and system integration. Lessons learned from this study emphasize the importance of robust training programs, adequate IT support, and ongoing evaluation to ensure successful integration. This validation study confirms that digital pathology is a viable and beneficial tool for accurate clinical routine diagnostics in large academic centers, offering insights for other institutions considering similar endeavors.

MCML Authors

Peter Schüffler

Prof. Dr.

Computational Pathology

[158]

K. Reichard, G. Rizzoli, S. Gasperini, L. Hoyer, P. Zanuttigh, N. Navab and F. Tombari.
From Open-Vocabulary to Vocabulary-Free Semantic Segmentation.
Preprint (Feb. 2025). arXiv

Abstract

Open-vocabulary semantic segmentation enables models to identify novel object categories beyond their training data. While this flexibility represents a significant advancement, current approaches still rely on manually specified class names as input, creating an inherent bottleneck in real-world applications. This work proposes a Vocabulary-Free Semantic Segmentation pipeline, eliminating the need for predefined class vocabularies. Specifically, we address the chicken-and-egg problem where users need knowledge of all potential objects within a scene to identify them, yet the purpose of segmentation is often to discover these objects. The proposed approach leverages Vision-Language Models to automatically recognize objects and generate appropriate class names, aiming to solve the challenge of class specification and naming quality. Through extensive experiments on several public datasets, we highlight the crucial role of the text encoder in model performance, particularly when the image text classes are paired with generated descriptions. Despite the challenges introduced by the sensitivity of the segmentation text encoder to false negatives within the class tagging process, which adds complexity to the task, we demonstrate that our fully automated pipeline significantly enhances vocabulary-free segmentation accuracy across diverse real-world scenarios.

MCML Authors

Stefano Gasperini

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

[157]

Ö. Turgut, F. S. Bott, M. Ploner and D. Rückert.
Are foundation models useful feature extractors for electroencephalography analysis?
Preprint (Feb. 2025). arXiv

Abstract

The success of foundation models in natural language processing and computer vision has motivated similar approaches for general time series analysis. While these models are effective for a variety of tasks, their applicability in medical domains with limited data remains largely unexplored. To address this, we investigate the effectiveness of foundation models in medical time series analysis involving electroencephalography (EEG). Through extensive experiments on tasks such as age prediction, seizure detection, and the classification of clinically relevant EEG events, we compare their diagnostic accuracy with that of specialised EEG models. Our analysis shows that foundation models extract meaningful EEG features, outperform specialised models even without domain adaptation, and localise task-specific biomarkers. Moreover, we demonstrate that diagnostic accuracy is substantially influenced by architectural choices such as context length. Overall, our study reveals that foundation models with general time series understanding eliminate the dependency on large domain-specific datasets, making them valuable tools for clinical practice.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[156]

S. Grosu, M. P. Fabritius, M. Winkelmann, D. Puhr-Westerheide, M. Ingenerf, S. Maurus, A. Graser, C. Schulz, T. Knösel, C. C. Cyran, J. Ricke, P. M. Kazmierczak, M. Ingrisch and P. Wesp.
Effect of artificial intelligence-aided differentiation of adenomatous and non-adenomatous colorectal polyps at CT colonography on radiologists’ therapy management.
European Radiology Early Access (Jan. 2025). DOI

Abstract

Objectives: Adenomatous colorectal polyps require endoscopic resection, as opposed to non-adenomatous hyperplastic colorectal polyps. This study aims to evaluate the effect of artificial intelligence (AI)-assisted differentiation of adenomatous and non-adenomatous colorectal polyps at CT colonography on radiologists’ therapy management.
Materials and methods: Five board-certified radiologists evaluated CT colonography images with colorectal polyps of all sizes and morphologies retrospectively and decided whether the depicted polyps required endoscopic resection. After a primary unassisted reading based on current guidelines, a second reading with access to the classification of a radiomics-based random-forest AI-model labelling each polyp as ’non-adenomatous’ or ‘adenomatous’ was performed. Performance was evaluated using polyp histopathology as the reference standard.
Results: 77 polyps in 59 patients comprising 118 polyp image series (47% supine position, 53% prone position) were evaluated unassisted and AI-assisted by five independent board-certified radiologists, resulting in a total of 1180 readings (subsequent polypectomy: yes or no). AI-assisted readings had higher accuracy (76% +/− 1% vs. 84% +/− 1%), sensitivity (78% +/− 6% vs. 85% +/− 1%), and specificity (73% +/− 8% vs. 82% +/− 2%) in selecting polyps eligible for polypectomy (p < 0.001). Inter-reader agreement was improved in the AI-assisted readings (Fleiss’ kappa 0.69 vs. 0.92).
Conclusion: AI-based characterisation of colorectal polyps at CT colonography as a second reader might enable a more precise selection of polyps eligible for subsequent endoscopic resection. However, further studies are needed to confirm this finding and histopathologic polyp evaluation is still mandatory.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Philipp Wesp

Dr.

Clinical Data Science in Radiology

[155]

J. Li, T. Su, B. Zhao, F. Lv, Q. Wang, N. Navab, Y. Hu and Z. Jiang.
Ultrasound Report Generation With Cross-Modality Feature Alignment via Unsupervised Guidance.
IEEE Transactions on Medical Imaging 44.1 (Jan. 2025). DOI

Abstract

Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets.

MCML Authors

Jun Li

Computational Imaging and AI in Medicine

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Zhongliang Jiang

Dr.

Computer Aided Medical Procedures & Augmented Reality

[154]

N. Heldring, A.-R. Rezaie, A. Larsson, R. Gahn, B. Zilg, S. Camilleri, A. Saade, P. Wesp, E. Palm and O. Kvist.
A probability model for estimating age in young individuals relative to key legal thresholds: 15, 18 or 21-year.
International Journal of Legal Medicine 139.1 (Jan. 2025). DOI

Abstract

Age estimations are relevant for pre-trial detention, sentencing in criminal cases and as part of the evaluation in asylum processes to protect the rights and privileges of minors. No current method can determine an exact chronological age due to individual variations in biological development. This study seeks to develop a validated statistical model for estimating an age relative to key legal thresholds (15, 18, and 21 years) based on a skeletal (CT-clavicle, radiography-hand/wrist or MR-knee) and tooth (radiography-third molar) developmental stages. The whole model is based on 34 scientific studies, divided into examinations of the hand/wrist (15 studies), clavicle (5 studies), distal femur (4 studies), and third molars (10 studies). In total, data from approximately 27,000 individuals have been incorporated and the model has subsequently been validated with data from 5,000 individuals. The core framework of the model is built upon transition analysis and is further developed by a combination of a type of parametric bootstrapping and Bayesian theory. Validation of the model includes testing the models on independent datasets of individuals with known ages and shows a high precision with separate populations aligning closely with the model’s predictions. The practical use of the complex statistical model requires a user-friendly tool to provide probabilities together with the margin of error. The assessment based on the model forms the medical component for the overall evaluation of an individual’s age.

MCML Authors

Philipp Wesp

Dr.

Clinical Data Science in Radiology

[153]

R. Dorent, R. Khajavi, T. Idris, E. Ziegler, B. Somarouthu, H. Jacene, A. LaCasce, J. Deissler, J. Ehrhardt, S. Engelson, S. Fischer, Y. Gu, H. Handels, S. Kasai, S. Kondo, K. Maier-Hein, J. A. Schnabel, G. Wang, L. Wang, T. Wald, G.-Z. Yang, H. Zhang, M. Zhang, S. Pieper, G. Harris, R. Kikinis and T. Kapur.
LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification.
Machine Learning for Biomedical Imaging 3.Special Issue (Jan. 2025). DOI GitHub

Abstract

Accurate assessment of lymph node size in 3D CT scans is crucial for cancer staging, therapeutic management, and monitoring treatment response. Existing state-of-the-art segmentation frameworks in medical imaging often rely on fully annotated datasets. However, for lymph node segmentation, these datasets are typically small due to the extensive time and expertise required to annotate the numerous lymph nodes in 3D CT scans. Weakly-supervised learning, which leverages incomplete or noisy annotations, has recently gained interest in the medical imaging community as a potential solution. Despite the variety of weakly-supervised techniques proposed, most have been validated only on private datasets or small publicly available datasets. To address this limitation, the Mediastinal Lymph Node Quantification (LNQ) challenge was organized in conjunction with the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to advance weakly-supervised segmentation methods by providing a new, partially annotated dataset and a robust evaluation framework. A total of 16 teams from 5 countries submitted predictions to the validation leaderboard, and 6 teams from 3 countries participated in the evaluation phase. The results highlighted both the potential and the current limitations of weakly-supervised approaches. On one hand, weakly-supervised approaches obtained relatively good performance with a median Dice score of 61.0%. On the other hand, top-ranked teams, with a median Dice score exceeding 70%, boosted their performance by leveraging smaller but fully annotated datasets to combine weak supervision and full supervision. This highlights both the promise of weakly-supervised methods and the ongoing need for high-quality, fully annotated data to achieve higher segmentation performance.

MCML Authors

Stefan Fischer

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Imaging and AI in Medicine

[152]

T. Weber, J. Dexl, D. Rügamer and M. Ingrisch.
Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition.
Radiology: Artificial Intelligence 7.2 (Jan. 2025). DOI

Abstract

We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model’s parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.

MCML Authors

Tobias Weber

* Former Member

Jakob Dexl

Clinical Data Science in Radiology

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[151]

F. Drexel, V. Sideri-Lampretsa, H. Bast, A. W. Marka, T. Koehler, F. T. Gassert, D. Pfeiffer, D. Rückert and F. Pfeiffer.
Deformable Image Registration of Dark-Field Chest Radiographs for Local Lung Signal Change Assessment.
Preprint (Jan. 2025). arXiv

Abstract

Dark-field radiography of the human chest has been demonstrated to have promising potential for the analysis of the lung microstructure and the diagnosis of respiratory diseases. However, previous studies of dark-field chest radiographs evaluated the lung signal only in the inspiratory breathing state. Our work aims to add a new perspective to these previous assessments by locally comparing dark-field lung information between different respiratory states. To this end, we discuss suitable image registration methods for dark-field chest radiographs to enable consistent spatial alignment of the lung in distinct breathing states. Utilizing full inspiration and expiration scans from a clinical chronic obstructive pulmonary disease study, we assess the performance of the proposed registration framework and outline applicable evaluation approaches. Our regional characterization of lung dark-field signal changes between the breathing states provides a proof-of-principle that dynamic radiography-based lung function assessment approaches may benefit from considering registered dark-field images in addition to standard plain chest radiographs.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[150]

F. Dülmer, M. F. Azampour and N. Navab.
UltraRay: Full-Path Ray Tracing for Enhancing Realism in Ultrasound Simulation.
Preprint (Jan. 2025). arXiv

Abstract

Traditional ultrasound simulators solve the wave equation to model pressure distribution fields, achieving high accuracy but requiring significant computational time and resources. To address this, ray tracing approaches have been introduced, modeling wave propagation as rays interacting with boundaries and scatterers. However, existing models simplify ray propagation, generating echoes at interaction points without considering return paths to the sensor. This can result in unrealistic artifacts and necessitates careful scene tuning for plausible results. We propose a novel ultrasound simulation pipeline that utilizes a ray tracing algorithm to generate echo data, tracing each ray from the transducer through the scene and back to the sensor. To replicate advanced ultrasound imaging, we introduce a ray emission scheme optimized for plane wave imaging, incorporating delay and steering capabilities. Furthermore, we integrate a standard signal processing pipeline to simulate end-to-end ultrasound image formation. We showcase the efficacy of the proposed pipeline by modeling synthetic scenes featuring highly reflective objects, such as bones. In doing so, our proposed approach, UltraRay, not only enhances the overall visual quality but also improves the realism of the simulated images by accurately capturing secondary reflections and reducing unnatural artifacts. By building on top of a differentiable framework, the proposed pipeline lays the groundwork for a fast and differentiable ultrasound simulation tool necessary for gradient-based optimization, enabling advanced ultrasound beamforming strategies, neural network integration, and accurate inverse scene reconstruction.

MCML Authors

Felix Dülmer

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[149]

Z. Haouari, J. Weidner, I. Ezhov, A. Varma, D. Rückert, B. Menze and B. Wiestler.
Efficient Deep Learning-based Forward Solvers for Brain Tumor Growth Models.
Preprint (Jan. 2025). arXiv

Abstract

Glioblastoma, a highly aggressive brain tumor, poses major challenges due to its poor prognosis and high morbidity rates. Partial differential equation-based models offer promising potential to enhance therapeutic outcomes by simulating patient-specific tumor behavior for improved radiotherapy planning. However, model calibration remains a bottleneck due to the high computational demands of optimization methods like Monte Carlo sampling and evolutionary algorithms. To address this, we recently introduced an approach leveraging a neural forward solver with gradient-based optimization to significantly reduce calibration time. This approach requires a highly accurate and fully differentiable forward model. We investigate multiple architectures, including (i) an enhanced TumorSurrogate, (ii) a modified nnU-Net, and (iii) a 3D Vision Transformer (ViT). The optimized TumorSurrogate achieved the best overall results, excelling in both tumor outline matching and voxel-level prediction of tumor cell concentration. It halved the MSE relative to the baseline model and achieved the highest Dice score across all tumor cell concentration thresholds. Our study demonstrates significant enhancement in forward solver performance and outlines important future research directions.

MCML Authors

Jonas Weidner

AI for Image-Guided Diagnosis and Therapy

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

[148]

B. Jian, J. Pan, Y. Li, F. Bongratz, R. Li, D. Rückert, B. Wiestler and C. Wachinger.
TimeFlow: Longitudinal Brain Image Registration and Aging Progression Analysis.
Preprint (Jan. 2025). arXiv

Abstract

Predicting future brain states is crucial for understanding healthy aging and neurodegenerative diseases. Longitudinal brain MRI registration, a cornerstone for such analyses, has long been limited by its inability to forecast future developments, reliance on extensive, dense longitudinal data, and the need to balance registration accuracy with temporal smoothness. In this work, we present emph{TimeFlow}, a novel framework for longitudinal brain MRI registration that overcomes all these challenges. Leveraging a U-Net architecture with temporal conditioning inspired by diffusion models, TimeFlow enables accurate longitudinal registration and facilitates prospective analyses through future image prediction. Unlike traditional methods that depend on explicit smoothness regularizers and dense sequential data, TimeFlow achieves temporal consistency and continuity without these constraints. Experimental results highlight its superior performance in both future timepoint prediction and registration accuracy compared to state-of-the-art methods. Additionally, TimeFlow supports novel biological brain aging analyses, effectively differentiating neurodegenerative conditions from healthy aging. It eliminates the need for segmentation, thereby avoiding the challenges of non-trivial annotation and inconsistent segmentation errors. TimeFlow paves the way for accurate, data-efficient, and annotation-free prospective analyses of brain aging and chronic diseases.

MCML Authors

Bailiang Jian

Artificial Intelligence in Medical Imaging

Yitong Li

Artificial Intelligence in Medical Imaging

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[147]

T. N. Wolf and C. Wachinger.
WASUP: Interpretable Classification with Weight-Input Alignment and Class-Discriminative SUPports Vectors.
Preprint (Jan. 2025). arXiv

Abstract

The deployment of deep learning models in critical domains necessitates a balance between high accuracy and interpretability. We introduce WASUP, an inherently interpretable neural network that provides local and global explanations of its decision-making process. We prove that these explanations are faithful by fulfilling established axioms for explanations. Leveraging the concept of case-based reasoning, WASUP extracts class-representative support vectors from training images, ensuring they capture relevant features while suppressing irrelevant ones. Classification decisions are made by calculating and aggregating similarity scores between these support vectors and the input’s latent feature vector. We employ B-Cos transformations, which align model weights with inputs to enable faithful mappings of latent features back to the input space, facilitating local explanations in addition to global explanations of case-based reasoning. We evaluate WASUP on three tasks: fine-grained classification on Stanford Dogs, multi-label classification on Pascal VOC, and pathology detection on the RSNA dataset. Results indicate that WASUP not only achieves competitive accuracy compared to state-of-the-art black-box models but also offers insightful explanations verified through theoretical analysis. Our findings underscore WASUP’s potential for applications where understanding model decisions is as critical as the decisions themselves.

MCML Authors

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[146]

M. Keicher.
Multimodal Deep Learning for Holistic Clinical Decision and Reasoning Support.
Dissertation 2024. URL

Abstract

null

MCML Authors

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

[145]

J. Wang, M. Ghahremani, Y. Li, B. Ommer and C. Wachinger.
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL GitHub

Abstract

Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skeleton human poses, especially in complex pose conditions such as side or rear perspectives of human figures. To address this issue, we present Stable-Pose, a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer (ViT) to gain accurate pose guidance for T2I models. Stable-Pose is designed to adeptly handle pose conditions within pre-trained Stable Diffusion, providing a refined and efficient way of aligning pose representation during image synthesis. We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons. Masked pose images are used to smoothly refine the attention maps based on target pose-related features in a hierarchical manner, transitioning from coarse to fine levels. Additionally, our loss function is formulated to allocate increased emphasis to the pose region, thereby augmenting the model’s precision in capturing intricate pose details. We assessed the performance of Stable-Pose across five public datasets under a wide range of indoor and outdoor human pose scenarios. Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Yitong Li

Artificial Intelligence in Medical Imaging

Björn Ommer

Prof. Dr.

Computer Vision & Learning

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[144]

M. Fischer, P. Neher, P. J. Schüffler, S. Ziegler, S. Xiao, R. Peretzke, D. Clunie, C. Ulrich, M. Baumgartner, A. Muckenhuber, S. Dias Almeida, M. Götz, J. Kleesiek, M. Nolden, R. Braren and K. Maier-Hein.
Unlocking the Potential of Digital Pathology: Novel Baselines for Compression.
Preprint (Dec. 2024). arXiv

Abstract

Digital pathology offers a groundbreaking opportunity to transform clinical practice in histopathological image analysis, yet faces a significant hurdle: the substantial file sizes of pathological Whole Slide Images (WSI). While current digital pathology solutions rely on lossy JPEG compression to address this issue, lossy compression can introduce color and texture disparities, potentially impacting clinical decision-making. While prior research addresses perceptual image quality and downstream performance independently of each other, we jointly evaluate compression schemes for perceptual and downstream task quality on four different datasets. In addition, we collect an initially uncompressed dataset for an unbiased perceptual evaluation of compression schemes. Our results show that deep learning models fine-tuned for perceptual quality outperform conventional compression schemes like JPEG-XL or WebP for further compression of WSI. However, they exhibit a significant bias towards the compression artifacts present in the training data and struggle to generalize across various compression schemes. We introduce a novel evaluation metric based on feature similarity between original files and compressed files that aligns very well with the actual downstream performance on the compressed WSI. Our metric allows for a general and standardized evaluation of lossy compression schemes and mitigates the requirement to independently assess different downstream tasks. Our study provides novel insights for the assessment of lossy compression schemes for WSI and encourages a unified evaluation of lossy compression schemes to accelerate the clinical uptake of digital pathology.

MCML Authors

Peter Schüffler

Prof. Dr.

Computational Pathology

[143]

S. Liang, S. Wang, K. Li, M. Niemeyer, S. Gasperini, N. Navab and F. Tombari.
SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians.
Preprint (Dec. 2024). arXiv

Abstract

3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While the vanilla Gaussian Splatting representation is mainly designed for view synthesis, more recent works investigated how to extend it with scene understanding and language features. However, existing methods lack a detailed comprehension of scenes, limiting their ability to segment and interpret complex structures. To this end, We introduce SuperGSeg, a novel approach that fosters cohesive, context-aware scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural Gaussians to learn instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of what we call Super-Gaussians. Super-Gaussians facilitate the distillation of 2D language features into 3D space. Through Super-Gaussians, our method enables high-dimensional language feature rendering without extreme increases in GPU memory. Extensive experiments demonstrate that SuperGSeg outperforms prior works on both open-vocabulary object localization and semantic segmentation tasks.

MCML Authors

Sen Wang

Computer Aided Medical Procedures & Augmented Reality

Kunyi Li

Computer Aided Medical Procedures & Augmented Reality

Stefano Gasperini

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

[142]

A. Reithmeir, V. Spieker, V. Sideri-Lampretsa, D. Rückert, J. A. Schnabel and V. A. Zimmer.
From Model Based to Learned Regularization in Medical Image Registration: A Comprehensive Review.
Preprint (Dec. 2024). arXiv

Abstract

Image registration is fundamental in medical imaging applications, such as disease progression analysis or radiation therapy planning. The primary objective of image registration is to precisely capture the deformation between two or more images, typically achieved by minimizing an optimization problem. Due to its inherent ill-posedness, regularization is a key component in driving the solution toward anatomically meaningful deformations. A wide range of regularization methods has been proposed for both conventional and deep learning-based registration. However, the appropriate application of regularization techniques often depends on the specific registration problem, and no one-fits-all method exists. Despite its importance, regularization is often overlooked or addressed with default approaches, assuming existing methods are sufficient. A comprehensive and structured review remains missing. This review addresses this gap by introducing a novel taxonomy that systematically categorizes the diverse range of proposed regularization methods. It highlights the emerging field of learned regularization, which leverages data-driven techniques to automatically derive deformation properties from the data. Moreover, this review examines the transfer of regularization methods from conventional to learning-based registration, identifies open challenges, and outlines future research directions. By emphasizing the critical role of regularization in image registration, we hope to inspire the research community to reconsider regularization strategies in modern registration algorithms and to explore this rapidly evolving field further.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[141]

J. Weidner, M. Balcerak, I. Ezhov, A. Datchev, L. Lux, L. Zimmer, D. Rückert, B. Menze and B. Wiestler.
Spatial Brain Tumor Concentration Estimation for Individualized Radiotherapy Planning.
Preprint (Dec. 2024). arXiv

Abstract

Biophysical modeling of brain tumors has emerged as a promising strategy for personalizing radiotherapy planning by estimating the otherwise hidden distribution of tumor cells within the brain. However, many existing state-of-the-art methods are computationally intensive, limiting their widespread translation into clinical practice. In this work, we propose an efficient and direct method that utilizes soft physical constraints to estimate the tumor cell concentration from preoperative MRI of brain tumor patients. Our approach optimizes a 3D tumor concentration field by simultaneously minimizing the difference between the observed MRI and a physically informed loss function. Compared to existing state-of-the-art techniques, our method significantly improves predicting tumor recurrence on two public datasets with a total of 192 patients while maintaining a clinically viable runtime of under one minute - a substantial reduction from the 30 minutes required by the current best approach. Furthermore, we showcase the generalizability of our framework by incorporating additional imaging information and physical constraints, highlighting its potential to translate to various medical diffusion phenomena with imperfect data.

MCML Authors

Jonas Weidner

AI for Image-Guided Diagnosis and Therapy

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

[140]

Y. Yeganeh, R. Xiao, G. Guvercin, N. Navab and A. Farshad.
Conformable Convolution for Topologically Aware Learning of Complex Anatomical Structures.
Preprint (Dec. 2024). arXiv

Abstract

While conventional computer vision emphasizes pixel-level and feature-based objectives, medical image analysis of intricate biological structures necessitates explicit representation of their complex topological properties. Despite their successes, deep learning models often struggle to accurately capture the connectivity and continuity of fine, sometimes pixel-thin, yet critical structures due to their reliance on implicit learning from data. Such shortcomings can significantly impact the reliability of analysis results and hinder clinical decision-making. To address this challenge, we introduce Conformable Convolution, a novel convolutional layer designed to explicitly enforce topological consistency. Conformable Convolution learns adaptive kernel offsets that preferentially focus on regions of high topological significance within an image. This prioritization is guided by our proposed Topological Posterior Generator (TPG) module, which leverages persistent homology. The TPG module identifies key topological features and guides the convolutional layers by applying persistent homology to feature maps transformed into cubical complexes. Our proposed modules are architecture-agnostic, enabling them to be integrated seamlessly into various architectures. We showcase the effectiveness of our framework in the segmentation task, where preserving the interconnectedness of structures is critical. Experimental results on three diverse datasets demonstrate that our framework effectively preserves the topology in the segmentation downstream task, both quantitatively and qualitatively.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Rui Xiao

Interpretable and Reliable Machine Learning

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[139]

R. Liao, M. Erler, H. Wang, G. Zhai, G. Zhang, Y. Ma and V. Tresp.
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI GitHub

Abstract

In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA.

MCML Authors

Ruotong Liao

Database Systems and Data Mining

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Gengyuan Zhang

A3 | Computational Models
→ Group Thomas Seidl

Database Systems and Data Mining

Yunpu Ma

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[138]

M. F. Azampour, K. Mach, E. Fatemizadeh, B. Demiray, K. Westenfelder, K. Steiger, M. Eiber, T. Wendler, B. Kainz and N. Navab.
Multitask Weakly Supervised Generative Network for MR-US Registration.
IEEE Transactions on Medical Imaging 43.11 (Nov. 2024). DOI

Abstract

Registering pre-operative modalities, such as magnetic resonance imaging or computed tomography, to ultrasound images is crucial for guiding clinicians during surgeries and biopsies. Recently, deep-learning approaches have been proposed to increase the speed and accuracy of this registration problem. However, all of these approaches need expensive supervision from the ultrasound domain. In this work, we propose a multitask generative framework that needs weak supervision only from the pre-operative imaging domain during training. To perform a deformable registration, the proposed framework translates a magnetic resonance image to the ultrasound domain while preserving the structural content. To demonstrate the efficacy of the proposed method, we tackle the registration problem of pre-operative 3D MR to transrectal ultrasonography images as necessary for targeted prostate biopsies. We use an in-house dataset of 600 patients, divided into 540 for training, 30 for validation, and the remaining for testing. An expert manually segmented the prostate in both modalities for validation and test sets to assess the performance of our framework. The proposed framework achieves a 3.58 mm target registration error on the expert-selected landmarks, 89.2% in the Dice score, and 1.81 mm 95th percentile Hausdorff distance on the prostate masks in the test set. Our experiments demonstrate that the proposed generative model successfully translates magnetic resonance images into the ultrasound domain. The translated image contains the structural content and fine details due to an ultrasound-specific two-path design of the generative model. The proposed framework enables training learning-based registration methods while only weak supervision from the pre-operative domain is available.

MCML Authors

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[137]

Y. Li, Y. Zhang, K. Kawaguchi, A. Khakzar, B. Bischl and M. Rezaei.
A Dual-Perspective Approach to Evaluating Feature Attribution Methods.
Transactions on Machine Learning Research (Nov. 2024). URL

Abstract

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model’s behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

MCML Authors

Yawei Li

Statistical Learning and Data Science

Ashkan Khakzar

Dr.

* Former Member

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[136]

F. Bongratz, M. Karmann, A. Holz, M. Bonhoeffer, V. Neumaier, S. Deli, B. Schmitz-Koep, C. Zimmer, C. Sorg, M. Thalhammer, D. M. Hedderich and C. Wachinger.
MLV2-Net: Rater-Based Majority-Label Voting for Consistent Meningeal Lymphatic Vessel Segmentation.
Preprint (Nov. 2024). arXiv

Abstract

Meningeal lymphatic vessels (MLVs) are responsible for the drainage of waste products from the human brain. An impairment in their functionality has been associated with aging as well as brain disorders like multiple sclerosis and Alzheimer’s disease. However, MLVs have only recently been described for the first time in magnetic resonance imaging (MRI), and their ramified structure renders manual segmentation particularly difficult. Further, as there is no consistent notion of their appearance, human-annotated MLV structures contain a high inter-rater variability that most automatic segmentation methods cannot take into account. In this work, we propose a new rater-aware training scheme for the popular nnU-Net model, and we explore rater-based ensembling strategies for accurate and consistent segmentation of MLVs. This enables us to boost nnU-Net’s performance while obtaining explicit predictions in different annotation styles and a rater-based uncertainty estimation. Our final model, MLV2-Net, achieves a Dice similarity coefficient of 0.806 with respect to the human reference standard. The model further matches the human inter-rater reliability and replicates age-related associations with MLV volume.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[135]

B. Kulynych, J. F. Gomez, G. Kaissis, F. du Pin Calmon and C. Troncoso.
Attack-Aware Noise Calibration for Differential Privacy.
Preprint (Nov. 2024). arXiv URL

Abstract

Differential privacy (DP) is a widely used approach for mitigating privacy risks when training machine learning models on sensitive data. DP mechanisms add noise during training to limit the risk of information leakage. The scale of the added noise is critical, as it determines the trade-off between privacy and utility. The standard practice is to select the noise scale to satisfy a given privacy budget ε. This privacy budget is in turn interpreted in terms of operational attack risks, such as accuracy, sensitivity, and specificity of inference attacks aimed to recover information about the training data records. We show that first calibrating the noise scale to a privacy budget ε, and then translating {epsilon} to attack risk leads to overly conservative risk assessments and unnecessarily low utility. Instead, we propose methods to directly calibrate the noise scale to a desired attack risk level, bypassing the step of choosing ε. For a given notion of attack risk, our approach significantly decreases noise scale, leading to increased utility at the same level of privacy. We empirically demonstrate that calibrating noise to attack sensitivity/specificity, rather than ε, when training privacy-preserving ML models substantially improves model accuracy for the same risk level. Our work provides a principled and practical way to improve the utility of privacy-preserving ML without compromising on privacy.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

[134]

M. Szép, D. Rückert, R. Eisenhart-Rothe and F. Hinterwimmer.
A Practical Guide to Fine-tuning Language Models with Limited Data.
Preprint (Nov. 2024). arXiv

Abstract

Employing pre-trained Large Language Models (LLMs) has become the de facto standard in Natural Language Processing (NLP) despite their extensive data requirements. Motivated by the recent surge in research focused on training LLMs with limited data, particularly in low-resource domains and languages, this paper surveys recent transfer learning approaches to optimize model performance in downstream tasks where data is scarce. We first address initial and continued pre-training strategies to better leverage prior knowledge in unseen domains and languages. We then examine how to maximize the utility of limited data during fine-tuning and few-shot learning. The final section takes a task-specific perspective, reviewing models and methods suited for different levels of data scarcity. Our goal is to provide practitioners with practical guidelines for overcoming the challenges posed by constrained data while also highlighting promising directions for future research.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[133]

A. Ranne, L. Kuang, Y. Velikova, N. Navab and F. Baena.
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers.
IROS 2024 - IEEE/RSJ International Conference on Intelligent Robots and Systems. Abu Dhabi, United Arab Emirates, Oct 14-18, 2024. DOI

Abstract

In minimally invasive endovascular procedures, contrast-enhanced angiography remains the most robust imaging technique. However, it is at the expense of the patient and clinician’s health due to prolonged radiation exposure. As an alternative, interventional ultrasound has notable benefits such as being radiation-free, fast to deploy, and having a small footprint in the operating room. Yet, ultrasound is hard to interpret, and highly prone to artifacts and noise. Additionally, interventional radiologists must undergo extensive training before they become qualified to diagnose and treat patients effectively, leading to a shortage of staff, and a lack of open-source datasets. In this work, we seek to address both problems by introducing a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images, without demanding any labeled data. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism, and is capable of learning feature changes across time and space. To facilitate training, we used synthetic ultrasound data based on physics-driven catheter insertion simulations, and translated the data into a unique CT-Ultrasound common domain, CACTUSS, to improve the segmentation performance. We generated ground truth segmentation masks by computing the optical flow between adjacent frames using FlowNet2, and performed thresholding to obtain a binary map estimate. Finally, we validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms, thus demonstrating its potential for applications to clinical data in the future.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[132]

F. Dülmer, W. Simson, M. F. Azampour, M. Wysocki, A. Karlas and N. Navab.
PHOCUS: Physics-Based Deconvolution for Ultrasound Resolution Enhancement.
ASMUS @MICCAI 2024 - 5th International Workshop on Advances in Simplifying Medical Ultrasound at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. ASMUS @MICCAI 2024 Best Paper. DOI

Abstract

Ultrasound is widely used in medical diagnostics allowing for accessible and powerful imaging but suffers from resolution limitations due to diffraction and the finite aperture of the imaging system, which restricts diagnostic use. The impulse function of an ultrasound imaging system is called the point spread function (PSF), which is convolved with the spatial distribution of reflectors in the image formation process. Recovering high-resolution reflector distributions by removing image distortions induced by the convolution process improves image clarity and detail. Conventionally, deconvolution techniques attempt to rectify the imaging system’s dependent PSF, working directly on the radio-frequency (RF) data. However, RF data is often not readily accessible. Therefore, we introduce a physics-based deconvolution process using a modeled PSF, working directly on the more commonly available B-mode images. By leveraging Implicit Neural Representations (INRs), we learn a continuous mapping from spatial locations to their respective echogenicity values, effectively compensating for the discretized image space. Our contribution consists of a novel methodology for retrieving a continuous echogenicity map directly from a B-mode image through a differentiable physics-based rendering pipeline for ultrasound resolution enhancement. We qualitatively and quantitatively evaluate our approach on synthetic data, demonstrating improvements over traditional methods in metrics such as PSNR and SSIM. Furthermore, we show qualitative enhancements on an ultrasound phantom and an in-vivo acquisition of a carotid artery.

MCML Authors

Felix Dülmer

Computer Aided Medical Procedures & Augmented Reality

Walter Simson

Dr.

* Former Member

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[131]

F. De Benetti, Y. Yaganeh, C. Belka, S. Corradini, N. Navab, C. Kurz, G. Landry, S. Albarqouni and T. Wendler.
CloverNet – Leveraging Planning Annotations for Enhanced Procedural MR Segmentation: An Application to Adaptive Radiation Therapy.
CLIP @MICCAI 2024 - 13th International Workshop on Clinical Image-Based Procedures at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. CLIP @MICCAI 2024 Best Paper. DOI

Abstract

In radiation therapy (RT), an accurate delineation of the regions of interest (ROI) and organs at risk (OAR) allows for a more targeted irradiation with reduced side effects. The current clinical workflow for combined MR-linear accelerator devices (MR-linacs) requires the acquisition of a planning MR volume (MR-P), in which the ROI and OAR are accurately segmented by the clinical team. These segmentation maps (S-P) are transferred to the MR acquired on the day of the RT fraction (MR-Fx) using registration, followed by time-consuming manual corrections. The goal of this paper is to enable accurate automatic segmentation of MR-Fx using S-P without clinical workflow disruption. We propose a novel UNet-based architecture, CloverNet, that takes as inputs MR-Fx and S-P in two separate encoder branches, whose latent spaces are concatenated in the bottleneck to generate an improved segmentation of MP-Fx. CloverNet improves the absolute Dice Score by 3.73% (relative +4.34%, p<0.001) when compared with conventional 3D UNet. Moreover, we believe this approach is potentially applicable to other longitudinal use cases in which a prior segmentation of the ROI is available.

MCML Authors

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[130]

D. Daum, R. Osuala, A. Riess, G. Kaissis, J. A. Schnabel and M. Di Folco.
On Differentially Private 3D Medical Image Synthesis with Controllable Latent Diffusion Models.
DGM4 @MICCAI 2024 - 4th International Workshop on Deep Generative Models at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Generally, the small size of public medical imaging datasets coupled with stringent privacy concerns, hampers the advancement of data-hungry deep learning models in medical imaging. This study addresses these challenges for 3D cardiac MRI images in the short-axis view. We propose Latent Diffusion Models that generate synthetic images conditioned on medical attributes, while ensuring patient privacy through differentially private model training. To our knowledge, this is the first work to apply and quantify differential privacy in 3D medical image generation. We pre-train our models on public data and finetune them with differential privacy on the UK Biobank dataset. Our experiments reveal that pre-training significantly improves model performance, achieving a Fréchet Inception Distance (FID) of 26.77 at ϵ=10, compared to 92.52 for models without pre-training. Additionally, we explore the trade-off between privacy constraints and image quality, investigating how tighter privacy budgets affect output controllability and may lead to degraded performance. Our results demonstrate that proper consideration during training with differential privacy can substantially improve the quality of synthetic cardiac MRI images, but there are still notable challenges in achieving consistent medical realism.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[129]

A. Riess, A. Ziller, S. Kolek, D. Rückert, J. A. Schnabel and G. Kaissis.
Complex-Valued Federated Learning with Differential Privacy and MRI Applications.
DeCaF @MICCAI 2024 - 5th Workshop on Distributed, Collaborative and Federated Learning at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Federated learning enhanced with Differential Privacy (DP) is a powerful privacy-preserving strategy to protect individuals sharing their sensitive data for processing in fields such as medicine and healthcare. Many medical applications, for example magnetic resonance imaging (MRI), rely on complex-valued signal processing techniques for data acquisition and analysis. However, the appropriate application of DP to complex-valued data is still underexplored. To address this issue, from the theoretical side, we introduce the complex-valued Gaussian mechanism, whose behaviour we characterise in terms of f-DP, -DP and Rényi-DP. Moreover, we generalise the fundamental algorithm DP stochastic gradient descent to complex-valued neural networks and present novel complex-valued neural network primitives compatible with DP. Experimentally, we showcase a proof-of-concept by training federated complex-valued neural networks with DP on a real-world task (MRI pulse sequence classification in k-space), yielding excellent utility and privacy. Our results highlight the relevance of combining federated learning with robust privacy-preserving techniques in the MRI context.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

Georgios Kaissis

Dr.

* Former Member

[128]

R. Osuala, D. M. Lang, A. Riess, G. Kaissis, Z. Szafranowska, G. Skorupko, O. Diaz, J. A. Schnabel and K. Lekadir.
Enhancing the Utility of Privacy-Preserving Cancer Classification Using Synthetic Data.
Deep-Breath @MICCAI 2024 - 1st Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Deep learning holds immense promise for aiding radiologists in breast cancer detection. However, achieving optimal model performance is hampered by limitations in availability and sharing of data commonly associated to patient privacy concerns. Such concerns are further exacerbated, as traditional deep learning models can inadvertently leak sensitive training information. This work addresses these challenges exploring and quantifying the utility of privacy-preserving deep learning techniques, concretely, (i) differentially private stochastic gradient descent (DP-SGD) and (ii) fully synthetic training data generated by our proposed malignancy-conditioned generative adversarial network. We assess these methods via downstream malignancy classification of mammography masses using a transformer model. Our experimental results depict that synthetic data augmentation can improve privacy-utility tradeoffs in differentially private model training. Further, model pretraining on synthetic data achieves remarkable performance, which can be further increased with DP-SGD fine-tuning across all privacy guarantees. With this first in-depth exploration of privacy-preserving deep learning in breast imaging, we address current and emerging clinical privacy requirements and pave the way towards the adoption of private high-utility deep diagnostic models.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[127]

Y. Yeganeh, R. Lazuardi, A. Shamseddin, E. Dari, Y. Thirani, N. Navab and A. Farshad.
VISAGE: Video Synthesis using Action Graphs for Surgery.
EARTH @MICCAI 2024 - Workshop on Embodied AI and Robotics for HealTHcare at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. EARTH @MICCAI 2024 Best Paper. DOI

Abstract

Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[126]

A. Banaszak, A. H. Berger, L. Lux, S. Shit, D. Rückert and J. C. Paetzold.
Supervised Contrastive Learning for Image-to-Graph Transformers.
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Image-to-graph transformers can effectively encode image information in graphs but are typically difficult to train and require large annotated datasets. Contrastive learning can increase data efficiency by enhancing feature representations, but existing methods are not applicable to graph labels because they operate on categorical label spaces. In this work, we propose a method enabling supervised contrastive learning for image-to-graph transformers. We introduce two supervised contrastive loss formulations based on graph similarity between label pairs that we approximate using a graph neural network. Our approach avoids tailored data augmentation techniques and can be easily integrated into existing training pipelines. We perform multiple empirical studies showcasing performance improvements across various metrics.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[125]

J. Kiechle, D. M. Lang, S. M. Fischer, L. Felsner, J. C. Peeken and J. A. Schnabel.
Graph Neural Networks: A Suitable Alternative to MLPs in Latent 3D Medical Image Classification?
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI URL

Abstract

Recent studies have underscored the capabilities of natural imaging foundation models to serve as powerful feature extractors, even in a zero-shot setting for medical imaging data. Most commonly, a shallow multi-layer perceptron (MLP) is appended to the feature extractor to facilitate end-to-end learning and downstream prediction tasks such as classification, thus representing the de facto standard. However, as graph neural networks (GNNs) have become a practicable choice for various tasks in medical research in the recent past, we direct attention to the question of how effective GNNs are compared to MLP prediction heads for the task of 3D medical image classification, proposing them as a potential alternative. In our experiments, we devise a subject-level graph for each volumetric dataset instance. Therein latent representations of all slices in the volume, encoded through a DINOv2 pretrained vision transformer (ViT), constitute the nodes and their respective node features. We use public datasets to compare the classification heads numerically and evaluate various graph construction and graph convolution methods in our experiments. Our findings show enhancements of the GNN in classification performance and substantial improvements in runtime compared to an MLP prediction head. Additional robustness evaluations further validate the promising performance of the GNN, promoting them as a suitable alternative to traditional MLP classification heads.

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[124]

L. Lux, A. H. Berger, M. Romeo-Tricas, M. Menten, D. Rückert and J. C. Paetzold.
Exploring Graphs as Data Representation for Disease Classification in Ophthalmology.
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI URL

Abstract

Interpretability, particularly in terms of human understandable concepts, is essential for building trust in machine learning models for disease classification. However, state-of-the-art image classifiers exhibit limited interpretability, posing a significant barrier to their acceptance in clinical practice. To address this, our work introduces two graph representations of the retinal vasculature, aiming to bridge the gap between high-performance classifiers and human-understandable interpretability concepts in ophthalmology. We use these graphs with the aim of training graph neural networks (GNNs) for disease staging. First, we formally and experimentally show that GNNs can learn known clinical biomarkers. In that, we show that GNNs can learn human interpretable concepts. Next, we train GNNs for disease staging and study how different aggregation strategies lead the GNN to learn more and less human interpretable features. Finally, we propose a visualization for integrated gradients on graphs, which allows us to identify if GNN models have learned human-understandable representations of the data.

MCML Authors

Laurin Lux

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[123]

Ç. Köksal, G. Ghazaei, F. Holm, A. Farshad and N. Navab.
SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction.
GRAIL @MICCAI 2024 - 6th Workshop on GRaphs in biomedicAl Image anaLysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. GRAIL @MICCAI 2024 Best Paper. DOI

Abstract

Graph-based holistic scene representations facilitate surgical workflow understanding and have recently demonstrated significant success. However, this task is often hindered by the limited availability of densely annotated surgical scene data. In this work, we introduce an end-to-end framework for the generation and optimization of surgical scene graphs on a downstream task. Our approach leverages the flexibility of graph-based spectral clustering and the generalization capability of foundation models to generate unsupervised scene graphs with learnable properties. We reinforce the initial spatial graph with sparse temporal connections using local matches between consecutive frames to predict temporally consistent clusters across a temporal neighborhood. By jointly optimizing the spatiotemporal relations and node features of the dynamic scene graph with the downstream task of phase segmentation, we address the costly and annotation-burdensome task of semantic scene comprehension and scene graph generation in surgical videos using only weak surgical phase labels. Further, by incorporating effective intermediate scene representation disentanglement steps within the pipeline, our solution outperforms the SOTA on the CATARACTS dataset by 8% accuracy and 10% F1 score in surgical workflow recognition.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[122]

A. H. Berger, L. Lux, N. Stucki, V. Bürgin, S. Shit, A. Banaszaka, D. Rückert, U. Bauer and J. C. Paetzold.
Topologically faithful multi-class segmentation in medical images.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Topological accuracy in medical image segmentation is a highly important property for downstream applications such as network analysis and flow modeling in vessels or cell counting. Recently, significant methodological advancements have brought well-founded concepts from algebraic topology to binary segmentation. However, these approaches have been underexplored in multi-class segmentation scenarios, where topological errors are common. We propose a general loss function for topologically faithful multi-class segmentation extending the recent Betti matching concept, which is based on induced matchings of persistence barcodes. We project the N-class segmentation problem to N single-class segmentation tasks, which allows us to use 1-parameter persistent homology, making training of neural networks computationally feasible. We validate our method on a comprehensive set of four medical datasets with highly variant topological characteristics. Our loss formulation significantly enhances topological correctness in cardiac, cell, artery-vein, and Circle of Willis segmentation.

MCML Authors

Laurin Lux

A2 | Mathematical Foundations
→ Group Ulrich Bauer

Artificial Intelligence in Healthcare and Medicine

Nico Stucki

Applied Topology and Geometry

Daniel Rückert

Prof. Dr.

A2 | Mathematical Foundations

Artificial Intelligence in Healthcare and Medicine

Ulrich Bauer

Prof. Dr.

Applied Topology and Geometry

[121]

M. Domínguez, Y. Velikova, N. Navab and M. F. Azampour.
Diffusion as Sound Propagation: Physics-Inspired Model for Ultrasound Image Generation.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Deep learning (DL) methods typically require large datasets to effectively learn data distributions. However, in the medical field, data is often limited in quantity, and acquiring labeled data can be costly. To mitigate this data scarcity, data augmentation techniques are commonly employed. Among these techniques, generative models play a pivotal role in expanding datasets. However, when it comes to ultrasound (US) imaging, the authenticity of generated data often diminishes due to the oversight of ultrasound physics.
We propose a novel approach to improve the quality of generated US images by introducing a physics-based diffusion model that is specifically designed for this image modality. The proposed model incorporates an US-specific scheduler scheme that mimics the natural behavior of sound wave propagation in ultrasound imaging. Our analysis demonstrates how the proposed method aids in modeling the attenuation dynamics in US imaging. We present both qualitative and quantitative results based on standard generative model metrics, showing that our proposed method results in overall more plausible images.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

[120]

S. M. Fischer, L. Felsner, R. Osuala, J. Kiechle, D. M. Lang, J. C. Peeken and J. A. Schnabel.
Progressive Growing of Patch Size: Resource-Efficient Curriculum Learning for Dense Prediction Tasks.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

In this work, we introduce Progressive Growing of Patch Size, a resource-efficient implicit curriculum learning approach for dense prediction tasks. Our curriculum approach is defined by growing the patch size during model training, which gradually increases the task’s difficulty. We integrated our curriculum into the nnU-Net framework and evaluated the methodology on all 10 tasks of the Medical Segmentation Decathlon. With our approach, we are able to substantially reduce runtime, computational costs, and emissions of network training compared to classical constant patch size training. In our experiments, the curriculum approach resulted in improved convergence. We are able to outperform standard nnU-Net training, which is trained with constant patch size, in terms of Dice Score on 7 out of 10 MSD tasks while only spending roughly 50% of the original training runtime. To the best of our knowledge, our Progressive Growing of Patch Size is the first successful employment of a sample-length curriculum in the form of patch size in the field of computer vision.

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[119]

Y. Li, I. Yakushev, D. M. Hedderich and C. Wachinger.
PASTA: Pathology-Aware MRI to PET Cross-Modal Translation with Diffusion Models.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Positron emission tomography (PET) is a well-established functional imaging technique for diagnosing brain disorders. However, PET’s high costs and radiation exposure limit its widespread use. In contrast, magnetic resonance imaging (MRI) does not have these limitations. Although it also captures neurodegenerative changes, MRI is a less sensitive diagnostic tool than PET. To close this gap, we aim to generate synthetic PET from MRI. Herewith, we introduce PASTA, a novel pathology-aware image translation framework based on conditional diffusion models. Compared to the state-of-the-art methods, PASTA excels in preserving both structural and pathological details in the target modality, which is achieved through its highly interactive dual-arm architecture and multi-modal condition integration. A cycle exchange consistency and volumetric generation strategy elevate PASTA’s capability to produce high-quality 3D PET scans. Our qualitative and quantitative results confirm that the synthesized PET scans from PASTA not only reach the best quantitative scores but also preserve the pathology correctly. For Alzheimer’s classification, the performance of synthesized scans improves over MRI by 4%, almost reaching the performance of actual PET.

MCML Authors

Yitong Li

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[118]

A. Reithmeir, L. Felsner, R. Braren, J. A. Schnabel and V. A. Zimmer.
Data-Driven Tissue- and Subject-Specific Elastic Regularization for Medical Image Registration.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Physics-inspired regularization is desired for intra-patient image registration since it can effectively capture the biomechanical characteristics of anatomical structures. However, a major challenge lies in the reliance on physical parameters: Parameter estimations vary widely across the literature, and the physical properties themselves are inherently subject-specific. In this work, we introduce a novel data-driven method that leverages hypernetworks to learn the tissue-dependent elasticity parameters of an elastic regularizer. Notably, our approach facilitates the estimation of patient-specific parameters without the need to retrain the network. We evaluate our method on three publicly available 2D and 3D lung CT and cardiac MR datasets. We find that with our proposed subject-specific tissue-dependent regularization, a higher registration quality is achieved across all datasets compared to using a global regularizer.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[117]

T. Su, J. Li, X. Zhang, H. Jin, H. Chen, Q. Wang, F. Lv, B. Zhao and Y. Hu.
Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Multimodal pre-training demonstrates its potential in the medical domain, which learns medical visual representations from paired medical reports. However, many pre-training tasks require extra annotations from clinicians, and most of them fail to explicitly guide the model to learn the desired features of different pathologies. In this paper, we utilize Visual Question Answering (VQA) for multimodal pre-training to guide the framework focusing on targeted pathological features. We leverage descriptions in medical reports to design multi-granular question-answer pairs associated with different diseases, which assist the framework in pre-training without requiring extra annotations from experts. We also propose a novel pre-training framework with a quasi-textual feature transformer, a module designed to transform visual features into a quasi-textual space closer to the textual domain via a contrastive learning strategy. This narrows the vision-language gap and facilitates modality alignment. Our framework is applied to four downstream tasks: report generation, classification, segmentation, and detection across five datasets. Extensive experiments demonstrate the superiority of our framework compared to other state-of-the-art methods.

MCML Authors

Jun Li

Computational Imaging and AI in Medicine

[116]

O. Tmenova, Y. Velikova, M. Saleh and N. Navab.
Deep Spectral Methods for Unsupervised Ultrasound Image Interpretation.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Ultrasound imaging is challenging to interpret due to non-uniform intensities, low contrast, and inherent artifacts, necessitating extensive training for non-specialists. Advanced representation with clear tissue structure separation could greatly assist clinicians in mapping underlying anatomy and distinguishing between tissue layers. Decomposing an image into semantically meaningful segments is mainly achieved using supervised segmentation algorithms. Unsupervised methods are beneficial, as acquiring large labeled datasets is difficult and costly, but despite their advantages, they still need to be explored in ultrasound. This paper proposes a novel unsupervised deep learning strategy tailored to ultrasound to obtain easily interpretable tissue separations. We integrate key concepts from unsupervised deep spectral methods, which combine spectral graph theory with deep learning methods. We utilize self-supervised transformer features for spectral clustering to generate meaningful segments based on ultrasound-specific metrics and shape and positional priors, ensuring semantic consistency across the dataset. We evaluate our unsupervised deep learning strategy on three ultrasound datasets, showcasing qualitative results across anatomical contexts without label requirements. We also conduct a comparative analysis against other clustering algorithms to demonstrate superior segmentation performance, boundary preservation, and label consistency.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[115]

H. Zerouaoui, G. P. Oderinde, R. Lefdali, K. Echihabi, S. P. Akpulu, N. A. Agbon, A. S. Musa, Y. Yeganeh, A. Farshad and N. Navab.
AMONuSeg: A Histological Dataset for African Multi-organ Nuclei Semantic Segmentation.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Nuclei semantic segmentation is a key component for advancing machine learning and deep learning applications in digital pathology. However, most existing segmentation models are trained and tested on high-quality data acquired with expensive equipment, such as whole slide scanners, which are not accessible to most pathologists in developing countries. These pathologists rely on low-resource data acquired with low-precision microscopes, smartphones, or digital cameras, which have different characteristics and challenges than high-resource data. Therefore, there is a gap between the state-of-the-art segmentation models and the real-world needs of low-resource settings. This work aims to bridge this gap by presenting the first fully annotated African multi-organ dataset for histopathology nuclei semantic segmentation acquired with a low-precision microscope. We also evaluate state-of-the-art segmentation models, including spectral feature extraction encoder and vision transformer-based models, and stain normalization techniques for color normalization of Hematoxylin and Eosin-stained histopathology slides. Our results provide important insights for future research on nuclei histopathology segmentation with low-resource data.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[114]

E. Özsoy, C. Pellegrini, M. Keicher and N. Navab.
ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling.
MICCAI 2024 - 27th International Conference on Medical Image Computing and Computer Assisted Intervention. Marrakesh, Morocco, Oct 06-10, 2024. Main Conference Best Paper Runner-up. DOI GitHub

Abstract

Every day, countless surgeries are performed worldwide, each within the distinct settings of operating rooms (ORs) that vary not only in their setups but also in the personnel, tools, and equipment used. This inherent diversity poses a substantial challenge for achieving a holistic understanding of the OR, as it requires models to generalize beyond their initial training datasets. To reduce this gap, we introduce ORacle, an advanced vision-language model designed for holistic OR domain modeling, which incorporates multi-view and temporal capabilities and can leverage external knowledge during inference, enabling it to adapt to previously unseen surgical scenarios. This capability is further enhanced by our novel data augmentation framework, which significantly diversifies the training dataset, ensuring ORacle’s proficiency in applying the provided knowledge effectively. In rigorous testing, in scene graph generation, and downstream tasks on the 4D-OR dataset, ORacle not only demonstrates state-of-the-art performance but does so requiring less data than existing models. Furthermore, its adaptability is displayed through its ability to interpret unseen views, actions, and appearances of tools and equipment. This demonstrates ORacle’s potential to significantly enhance the scalability and affordability of OR domain modeling and opens a pathway for future advancements in surgical data science.

MCML Authors

Ege Özsoy

Computer Aided Medical Procedures & Augmented Reality

Chantal Pellegrini

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[113]

J. Li, S. H. Kim, P. Müller, L. Felsner, D. Rückert, B. Wiestler, J. A. Schnabel and C. I. Bercea.
Language Models Meet Anomaly Detection for Better Interpretability and Generalizability.
MMMI @MICCAI 2024 - 5th International Workshop on Multiscale Multimodal Medical Imaging at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for multi-image visual question-answering on brain magnetic resonance images encompassing multiple conditions. We propose KQ-Former (Knowledge Querying Transformer), which is designed to optimally align visual and textual information in limited-sample contexts. Our model achieves a 60.81% accuracy on closed questions, covering disease classification and severity across 15 different classes. For open questions, KQ-Former demonstrates a 70% improvement over the baseline with a BLEU-4 score of 0.41, and achieves the highest entailment ratios (up to 71.9%) and lowest contradiction ratios (down to 10.0%) among various natural language inference models. Furthermore, integrating anomaly maps results in an 18% accuracy increase in detecting open-set anomalies, thereby enhancing the language model’s generalizability to previously unseen medical conditions.

MCML Authors

Jun Li

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[112]

S. Lüpke, Y. Yeganeh, E. Adeli, N. Navab and A. Farshad.
Physics-Informed Latent Diffusion for Multimodal Brain MRI Synthesis.
MMMI @MICCAI 2024 - 5th International Workshop on Multiscale Multimodal Medical Imaging at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

Recent advances in generative models for medical imaging have shown promise in representing multiple modalities. However, the variability in modality availability across datasets limits the general applicability of the synthetic data they produce. To address this, we present a novel physics-informed generative model capable of synthesizing a variable number of brain MRI modalities, including those not present in the original dataset. Our approach utilizes latent diffusion models and a two-step generative process: first, unobserved physical tissue property maps are synthesized using a latent diffusion model, and then these maps are combined with a physical signal model to generate the final MRI scan. Our experiments demonstrate the efficacy of this approach in generating unseen MR contrasts and preserving physical plausibility. Furthermore, we validate the distributions of generated tissue properties by comparing them to those measured in real brain tissue.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[111]

M. Fischer, P. Neher, T. Wald, S. Dias Almeida, S. Xiao, P. J. Schüffler, R. Braren, M. Götz, A. Muckenhuber, J. Kleesiek, M. Nolden and K. Maier-Hein.
Learned Image Compression for HE-Stained Histopathological Images via Stain Deconvolution.
MOVI @MICCAI 2024 - 2nd International Workshop on Medical Optical Imaging and Virtual Microscopy Image Analysis at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI GitHub

Abstract

Processing histopathological Whole Slide Images (WSI) leads to massive storage requirements for clinics worldwide. Even after lossy image compression during image acquisition, additional lossy compression is frequently possible without substantially affecting the performance of deep learning-based (DL) downstream tasks. In this paper, we show that the commonly used JPEG algorithm is not best suited for further compression and we propose Stain Quantized Latent Compression (SQLC), a novel DL based histopathology data compression approach. SQLC compresses staining and RGB channels before passing it through a compression autoencoder (CAE) in order to obtain quantized latent representations for maximizing the compression. We show that our approach yields superior performance in a classification downstream task, compared to traditional approaches like JPEG, while image quality metrics like the Multi-Scale Structural Similarity Index (MS-SSIM) is largely preserved.

MCML Authors

Peter Schüffler

Prof. Dr.

Computational Pathology

[110]

D. Bani-Harouni, N. Navab and M. Keicher.
MAGDA: Multi-agent Guideline-Driven Diagnostic Assistance.
MedAGI @MICCAI 2024 - 2nd International Workshop on Foundation Models for General Medical AI at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

In emergency departments, rural hospitals, or clinics in less developed regions, clinicians often lack fast image analysis by trained radiologists, which can have a detrimental effect on patients’ healthcare. Large Language Models (LLMs) have the potential to alleviate some pressure from these clinicians by providing insights that can help them in their decision-making. While these LLMs achieve high test results on medical exams showcasing their great theoretical medical knowledge, they tend not to follow medical guidelines. In this work, we introduce a new approach for zero-shot guideline-driven decision support. We model a system of multiple LLM agents augmented with a contrastive vision-language model that collaborate to reach a patient diagnosis. After providing the agents with simple diagnostic guidelines, they will synthesize prompts and screen the image for findings following these guidelines. Finally, they provide understandable chain-of-thought reasoning for their diagnosis, which is then self-refined to consider inter-dependencies between diseases. As our method is zero-shot, it is adaptable to settings with rare diseases, where training data is limited, but expert-crafted disease descriptions are available. We evaluate our method on two chest X-ray datasets, CheXpert and ChestX-ray 14 Longtail, showcasing performance improvement over existing zero-shot methods and generalizability to rare diseases.

MCML Authors

David Bani-Harouni

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

[109]

D. Grzech, L. Le Folgoc, M. F. Azampour, A. Vlontzos, B. Glocker, N. Navab, J. A. Schnabel and B. Kainz.
Unsupervised Similarity Learning for Image Registration with Energy-Based Models.
WBIR @MICCAI 2024 - 11th International Workshop on Biomedical Image Registration at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

We present a new model for deformable image registration, which learns in an unsupervised way a data-specific similarity metric. The proposed method consists of two neural networks, one that maps pairs of input images to transformations which align them, and one that provides the similarity metric whose maximisation guides the image alignment. We parametrise the similarity metric as an energy-based model, which is simple to train and allows us to improve the accuracy of image registration compared to other models with learnt similarity metrics by taking advantage of a more general mathematical formulation, as well as larger datasets. We also achieve substantial improvement in the accuracy of inter-patient image registration on MRI scans from the OASIS dataset compared to models that rely on traditional functions.

MCML Authors

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[108]

B. Jian, J. Pan, M. Ghahremani, D. Rückert, C. Wachinger and B. Wiestler.
Mamba? Catch The Hype Or Rethink What Really Helps for Image Registration.
WBIR @MICCAI 2024 - 11th International Workshop on Biomedical Image Registration at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI

Abstract

VoxelMorph, proposed in 2018, utilizes Convolutional Neural Networks (CNNs) to address medical image registration problems. In 2021 TransMorph advanced this approach by replacing CNNs with Attention mechanisms, claiming enhanced performance. More recently, the rise of Mamba with selective state space models has led to MambaMorph, which substituted Attention with Mamba blocks, asserting superior registration. These developments prompt a critical question: does chasing the latest computational trends with “more advanced” computational blocks genuinely enhance registration accuracy, or is it merely hype? Furthermore, the role of classic high-level registration-specific designs, such as coarse-to-fine pyramid mechanism, correlation calculation, and iterative optimization, warrants scrutiny, particularly in differentiating their influence from the aforementioned low-level computational blocks. In this study, we critically examine these questions through a rigorous evaluation in brain MRI registration. We employed modularized components for each block and ensured unbiased comparisons across all methods and designs to disentangle their effects on performance. Our findings indicate that adopting “advanced” computational elements fails to significantly improve registration accuracy. Instead, well-established registration-specific designs offer fair improvements, enhancing results by a marginal 1.5% over the baseline. Our findings emphasize the importance of rigorous, unbiased evaluation and contribution disentanglement of all low- and high-level registration components, rather than simply following the computer vision trends with “more advanced” computational blocks. We advocate for simpler yet effective solutions and novel evaluation metrics that go beyond conventional registration accuracy, warranting further research across various organs and modalities.

MCML Authors

Bailiang Jian

Artificial Intelligence in Medical Imaging

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

[107]

F. Kögl, A. Reithmeir, V. Sideri-Lampretsa, I. Machado, R. Braren, D. Rückert, J. A. Schnabel and V. A. Zimmer.
General Vision Encoder Features as Guidance in Medical Image Registration.
WBIR @MICCAI 2024 - 11th International Workshop on Biomedical Image Registration at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). Marrakesh, Morocco, Oct 06-10, 2024. DOI URL

Abstract

General vision encoders like DINOv2 and SAM have recently transformed computer vision. Even though they are trained on natural images, such encoder models have excelled in medical imaging, e.g., in classification, segmentation, and registration. However, no in-depth comparison of different state-of-the-art general vision encoders for medical registration is available. In this work, we investigate how well general vision encoder features can be used in the dissimilarity metrics for medical image registration. We explore two encoders that were trained on natural images as well as one that was fine-tuned on medical data. We apply the features within the well-established B-spline FFD registration framework. In extensive experiments on cardiac cine MRI data, we find that using features as additional guidance for conventional metrics improves the registration quality.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[106]

J. Rudolph, J. Rueckel, J. Döpfert, W. X. Ling, J. Opalka, C. Brem, N. Hesse, M. Ingenerf, V. Koliogiannis, O. Solyanik, B. F. Hoppe, H. Zimmermann, W. Flatz, R. Forbrig, M. Patzig, B.-S. Rauchmann, R. Perneczky, O. Peters, J. Priller, A. Schneider, K. Fliessbach, A. Hermann, J. Wiltfang, F. Jessen, E. Düzel, K. Buerger, S. Teipel, C. Laske, M. Synofzik, A. Spottke, M. Ewers, P. Dechent, J.-D. Haynes, J. Levin, T. Liebig, J. Ricke, M. Ingrisch and S. Stoecklein.
Artificial intelligence–based rapid brain volumetry substantially improves differential diagnosis in dementia.
Alzheimer’s and Dementia 16.e70037 (Oct. 2024). DOI

Abstract

This study evaluates the clinical value of a deep learning–based artificial intelligence (AI) system that performs rapid brain volumetry with automatic lobe segmentation and age- and sex-adjusted percentile comparisons.
Methods: Fifty-five patients—17 with Alzheimer’s disease (AD), 18 with frontotemporal dementia (FTD), and 20 healthy controls—underwent cranial magnetic resonance imaging scans. Two board-certified neuroradiologists (BCNR), two board-certified radiologists (BCR), and three radiology residents (RR) assessed the scans twice: first
without AI support and then with AI assistance.
Results: AI significantly improved diagnostic accuracy for AD (area under the curve −AI: 0.800, +AI: 0.926, p < 0.05), with increased correct diagnoses (p < 0.01) and reduced errors (p < 0.03). BCR and RR showed notable performance gains (BCR:
p < 0.04; RR: p < 0.02). For the diagnosis FTD, overall consensus (p < 0.01), BCNR (p < 0.02), and BCR (p < 0.05) recorded significantly more correct diagnoses.
Discussion: AI-assisted volumetry improves diagnostic performance in differentiating AD and FTD, benefiting all reader groups, including BCNR.

MCML Authors

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[105]

M. M. Heimer, Y. Dikhtyar, B. F. Hoppe, F. L. Herr, A. T. Stüber, T. Burkard, E. Zöller, M. P. Fabritius, L. Unterrainer, L. Adams, A. Thurner, D. Kaufmann, T. Trzaska, M. Kopp, O. Hamer, K. Maurer, I. Ristow, M. S. May, A. Tufman, J. Spiro, M. Brendel, M. Ingrisch, J. Ricke and C. C. Cyran.
Software-assisted structured reporting and semi-automated TNM classification for NSCLC staging in a multicenter proof of concept study.
Insights into Imaging 15.258 (Oct. 2024). DOI

Abstract

In this multi-center study, we proposed a structured reporting (SR) framework for non-small cell lung cancer (NSCLC) and developed a software-assisted tool to automatically translate image-based findings and annotations into TNM classifications. The aim of this study was to validate the software-assisted SR tool for NSCLC, assess its potential clinical impact in a proof-of-concept study, and evaluate current reporting standards in participating institutions.

MCML Authors

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Theresa Stüber

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[104]

S. Gatidis, M. Früh, M. P. Fabritius, S. Gu, K. Nikolaou, C. L. Fougère, J. Ye, J. He, Y. Peng, L. Bi, J. Ma, B. Wang, J. Zhang, Y. Huang, L. Heiliger, Z. Marinov, R. Stiefelhagen, J. Egger, J. Kleesiek, L. Sibille, L. Xiang, S. Bendazzoli, M. Astaraki, M. Ingrisch, C. C. Cyran and T. Küstner.
Results from the autoPET challenge on fully automated lesion segmentation in oncologic PET/CT imaging.
Nature Machine Intelligence 6 (Oct. 2024). DOI

Abstract

Automated detection of tumour lesions on positron emission tomography–computed tomography (PET/CT) image data is a clinically relevant but highly challenging task. Progress in this field has been hampered in the past owing to the lack of publicly available annotated data and limited availability of platforms for inter-institutional collaboration. Here we describe the results of the autoPET challenge, a biomedical image analysis challenge aimed to motivate research in the field of automated PET/CT image analysis. The challenge task was the automated segmentation of metabolically active tumour lesions on whole-body 18F-fluorodeoxyglucose PET/CT. Challenge participants had access to a large publicly available annotated PET/CT dataset for algorithm training. All algorithms submitted to the final challenge phase were based on deep learning methods, mostly using three-dimensional U-Net architectures. Submitted algorithms were evaluated on a private test set composed of 150 PET/CT studies from two institutions. An ensemble model of the highest-ranking algorithms achieved favourable performance compared with individual algorithms. Algorithm performance was dependent on the quality and quantity of data and on algorithm design choices, such as tailored post-processing of predicted segmentations. Future iterations of this challenge will focus on generalization and clinical translation.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[103]

P. Müller, G. Kaissis and D. Rückert.
ChEX: Interactive Localization and Region Description in Chest X-rays.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX’s interactive capabilities.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[102]

K. R. Park, H. J. Lee and J. U. Kim.
Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Recent Audio-Visual Question Answering (AVQA) methods rely on complete visual and audio input to answer questions accurately. However, in real-world scenarios, issues such as device malfunctions and data transmission errors frequently result in missing audio or visual modality. In such cases, existing AVQA methods suffer significant performance degradation. In this paper, we propose a framework that ensures robust AVQA performance even when a modality is missing. First, we propose a Relation-aware Missing Modal (RMM) generator with Relation-aware Missing Modal Recalling (RMMR) loss to enhance the ability of the generator to recall missing modal information by understanding the relationships and context among the available modalities. Second, we design an Audio-Visual Relation-aware (AVR) diffusion model with Audio-Visual Enhancing (AVE) loss to further enhance audio-visual features by leveraging the relationships and shared cues between the audio-visual modalities. As a result, our method can provide accurate answers by effectively utilizing available information even when input modalities are missing. We believe our method holds potential applications not only in AVQA research but also in various multi-modal scenarios.

MCML Authors

Hong Joo Lee

Dr.

Computer Aided Medical Procedures & Augmented Reality

[101]

S. R. Vutukur, R. L. Haugaard, J. Huang, B. Busam and T. Birdal.
Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

Object pose distribution estimation is crucial in robotics for better path planning and handling of symmetric objects. Recent distribution estimation approaches employ contrastive learning-based approaches by maximizing the likelihood of a single pose estimate in the absence of a CAD model. We propose a pose distribution estimation method leveraging symmetry respecting correspondence distributions and shape information obtained using a CAD model. Contrastive learning-based approaches require an exhaustive amount of training images from different viewpoints to learn the distribution properly, which is not possible in realistic scenarios. Instead, we propose a pipeline that can leverage correspondence distributions and shape information from the CAD model, which are later used to learn pose distributions. Besides, having access to pose distribution based on correspondences before learning pose distributions conditioned on images, can help formulate the loss between distributions. The prior knowledge of distribution also helps the network to focus on getting sharper modes instead. With the CAD prior, our approach converges much faster and learns distribution better by focusing on learning sharper distribution near all the valid modes, unlike contrastive approaches, which focus on a single mode at a time. We achieve benchmark results on SYMSOL-I and T-Less datasets.

MCML Authors

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[100]

G. Zhai, E. P. Örnek, D. Z. Chen, R. Liao, Y. Di, N. Navab, F. Tombari and B. Busam.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Ruotong Liao

Database Systems and Data Mining

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[99]

F. Stilz, M. Karaoglu, F. Tristram, N. Navab, B. Busam and A. Ladikos.
Progressive Optimization of Camera Pose and 4D Radiance Fields for long Endoscopic Videos.
NeuralBCC @ECCV 2024 - 1st Workshop on Neural Fields Beyond Conventional Cameras at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. PDF

Abstract

Reconstructing endoscopic scenes is vital for medical purposes, such as post-operative assessments and educational training. Recently, neural rendering has emerged as a promising method for reconstructing endoscopic scenes involving tissue deformation. Yet, current techniques exhibit major limitations, such as reliance on static endoscopes, limited deformation, or the need for external tracking devices to obtain camera pose data. In this paper we introduce a novel solution that can tackle these challenges posed by
a moving stereo endoscope in a highly deformable setting. Our method divides the scene into multiple overlapping 4D neural radiance fields (NeRFs) and uses a progressive optimization approach via optical flow and geometry supervision for simultaneous reconstruction and camera pose estimation. Tested on videos of up to fifteen times longer than what prior work experiment on, our method greatly improves usability, extending detailed reconstruction to much longer surgical videos without external tracking. Comprehensive evaluations using the StereoMIS dataset show that our method substantially enhances novel view synthesis quality while maintaining competitive pose accuracy.

MCML Authors

Felix Tristram

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[98]

M. Wysocki, M. F. Azampour, F. Tristram, B. Busam and N. Navab.
Beyond Ultra-NeRF: Explainable Neural Fields for Ultrasound.
NeuralBCC @ECCV 2024 - 1st Workshop on Neural Fields Beyond Conventional Cameras at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. URL

Abstract

Current ultrasound image synthesis techniques often fall short in semantic accuracy and physical realism or produce images with a significant domain gap. Ultra-NeRF addresses these issues by creating a Neural Field from reconstructed acoustic properties via pose-annotated B-mode images and shows that it can be used for novel view synthesis of B-mode images. While Ultra-NeRF generates plausible results, it lacks explainability in the acoustic parameter space. In this paper, we revisit neural fields for ultrasound and introduce the Sonographic Neural Reflection Field (SuRF), which adheres to the physical properties of acoustic ultrasound. By redesigning Ultra-NeRF’s differentiable forward synthesis model and incorporating physics-inspired regularizations, we ensure the interpretability of learned acoustic parameters. Tested on the Ultra-NeRF in-silico dataset and a new multi-view ex-vivo 3D ultrasound dataset, our method demonstrates enhanced reconstruction and interpretation across various tissue types, including fat, muscle, and bone.

MCML Authors

Magdalena Wysocki

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Felix Tristram

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[97]

P. Jahoda, Y. Yeganeh, E. Adeli, N. Navab and A. Farshad.
PRISM: Progressive Restoration for Scene Graph-Based Image Manipulation.
Workshop @ECCV 2024 - Computer Vision Workshop at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

Scene graphs have emerged as accurate semantic descriptions for image generation and manipulation tasks; however, their complexity and diversity of the shapes and relations of objects in data make it challenging to incorporate them into the models and generate high-quality results. To address these challenges, we propose PRISM, a novel progressive multi-head image manipulation approach to improve the accuracy of the manipulation of target regions in the scene. Our image manipulation framework is trained using an end-to-end denoising masked reconstruction proxy task, where the masked regions are progressively unmasked from the outer regions to the inner part. We take advantage of the outer part of the masked area as they have a direct correlation with the context of the scene. Moreover, our multi-head architecture simultaneously generates detailed object-specific regions in addition to the entire image to produce higher-quality images. Our model is evaluated against methods in the semantic image manipulation task on the CLEVR and Visual Genome datasets. Our results demonstrate the potential of our approach for enhancing the quality and precision of scene graph-based image manipulation.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[96]

S. R. Vutukur, M. Ba, B. Busam, M. Kayser and G. Singh.
SABER-6D: Shape Representation Based Implicit Object Pose Estimation.
Workshop @ECCV 2024 - Computer Vision Workshop at the 18th European Conference on Computer Vision (ECCV 2024). Milano, Italy, Sep 29-Oct 04, 2024. DOI

Abstract

In this paper, we propose a novel encoder-decoder architecture, named SABER, to learn the 6D pose of the object in the embedding space by learning shape representation at a given pose. This model enables us to learn pose by performing shape representation at a target pose from RGB image input. We perform shape representation as an auxiliary task which helps us in learning rotations space for an object based on 2D images. An image encoder predicts the rotation in the embedding space and the DeepSDF based decoder learns to represent the object’s shape at the given pose. As our approach is shape based, the pipeline is suitable for any type of object irrespective of the symmetry. Moreover, we need only a CAD model of the objects to train SABER. Our pipeline is synthetic data based and can also handle symmetric objects without symmetry labels and, thus, no additional labeled training data is needed. The experimental evaluation shows that our method achieves close to benchmark results for both symmetric objects and asymmetric objects on Occlusion-LineMOD, and T-LESS datasets.

MCML Authors

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[95]

A. Mittermeier, M. Aßenmacher, B. Schachtner, S. Grosu, V. Dakovic, V. Kandratovich, B. Sabel and M. Ingrisch.
Automatische ICD-10-Codierung.
Die Radiologie 64 (Aug. 2024). DOI

Abstract

Hintergrund: Die medizinische Codierung von radiologischen Befunden ist essenziell für eine gute Qualität der Versorgung und die korrekte Abrechnung, gleichzeitig aber eine aufwändige und fehleranfällige Aufgabe.
Ziel der Arbeit: Bewertung der Anwendbarkeit natürlicher Sprachverarbeitung (Natural Language Processing, NLP) für die ICD-10-Codierung von radiologischen Befunden in deutscher Sprache durch Finetuning geeigneter Sprachmodelle.
Material und Methoden: In dieser retrospektiven Studie wurden alle Magnetresonanztomographie(MRT)-Befunde unseres Instituts zwischen 2010 und 2020 berücksichtigt. Die ICD-10-Codes bei Entlassung wurden den jeweiligen Befunden zugeordnet, um einen Datensatz für eine Multiclass-Klassifizierung zu erstellen. Finetuning von GermanBERT und flanT5 wurde auf dem Gesamtdatensatz (dstotal) mit 1035 verschiedenen ICD-10-Codes und zwei reduzierten Datensätzen mit den 100 (ds100) und 50 (ds50) häufigsten Codes durchgeführt. Die Performance der Modelle wurde mit Top-k-Genauigkeit für k = 1, 3, 5 evaluiert. In einer Ablationsstudie wurden beide Modelle einmal auf den zugehörigen Metadaten und dem Befund allein trainiert.
Ergebnisse: Der Gesamtdatensatz bestand aus 100.672 radiologischen Befunden, die reduzierten Datensätze ds100 aus 68.103 und ds50 aus 52.293 Berichten. Die Modellperformance stieg, wenn mehrere der besten Voraussagen des Modells in Betracht gezogen wurden, die Anzahl der Zielklassen reduziert wurde und die Metadaten mit dem Befund kombiniert wurden. FlanT5 übertraf GermanBERT in allen Datensätzen und Metriken und eignet sich am besten als medizinischer Codierungsassistent, wobei eine Top-3-Genauigkeit von fast 70% im realitätsnahen Datensatz dstotal erreicht wurde.
Schlussfolgerung: Finetuning von Sprachmodellen verspricht eine zuverlässige Vorhersage von ICD-10-Codes deutscher radiologischer MRT-Befunde in unterschiedlichen Szenarien. Als Codierungsassistent kann flanT5 medizinischen Codierern helfen, informierte Entscheidungen zu treffen und potenziell ihre Arbeitsbelastung reduzieren.

MCML Authors

Andreas Mittermeier

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[94]

R. Klaar, M. Rabe, A. T. Stüber, S. Hering, S. Corradini, C. Eze, S. Marschner, C. Belka, G. Landry, J. Dinkel and C. Kurz.
MRI-based ventilation and perfusion imaging to predict radiation-induced pneumonitis in lung tumor patients at a 0.35T MR-Linac.
Radiotherapy and Oncology (Aug. 2024). DOI

Abstract

Radiation-induced pneumonitis (RP), diagnosed 6–12 weeks after treatment, is a complication of lung tumor radiotherapy. So far, clinical and dosimetric parameters have not been reliable in predicting RP. We propose using non-contrast enhanced magnetic resonance imaging (MRI) based functional parameters acquired over the treatment course for patient stratification for improved follow-up.

MCML Authors

Theresa Stüber

Clinical Data Science in Radiology

[93]

A. C. Erdur, D. Rusche, D. Scholz, J. Kiechle, S. Fischer, Ó. Llorián-Salvador, J. A. Buchner, M. Q. Nguyen, L. Etzel, J. Weidner, M.-C. Metz, B. Wiestler, J. A. Schnabel, D. Rückert, S. E. Combs and J. C. Peeken.
Deep learning for autosegmentation for radiotherapy treatment planning: State-of-the-art and novel perspectives.
Strahlentherapie und Onkologie 201 (Aug. 2024). DOI GitHub

Abstract

The rapid development of artificial intelligence (AI) has gained importance, with many tools already entering our daily lives. The medical field of radiation oncology is also subject to this development, with AI entering all steps of the patient journey. In this review article, we summarize contemporary AI techniques and explore the clinical applications of AI-based automated segmentation models in radiotherapy planning, focusing on delineation of organs at risk (OARs), the gross tumor volume (GTV), and the clinical target volume (CTV). Emphasizing the need for precise and individualized plans, we review various commercial and freeware segmentation tools and also state-of-the-art approaches. Through our own findings and based on the literature, we demonstrate improved efficiency and consistency as well as time savings in different clinical scenarios. Despite challenges in clinical implementation such as domain shifts, the potential benefits for personalized treatment planning are substantial. The integration of mathematical tumor growth models and AI-based tumor detection further enhances the possibilities for refining target volumes. As advancements continue, the prospect of one-stop-shop segmentation and radiotherapy planning represents an exciting frontier in radiotherapy, potentially enabling fast treatment with enhanced precision and individualization.

MCML Authors

Daniel Scholz

AI for Image-Guided Diagnosis and Therapy

Johannes Kiechle

Computational Imaging and AI in Medicine

Stefan Fischer

Computational Imaging and AI in Medicine

Jonas Weidner

AI for Image-Guided Diagnosis and Therapy

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[92]

D. Usynin, M. Knolle and G. Kaissis.
Memorisation in Machine Learning: A Survey of Results.
Transactions on Machine Learning Research (Aug. 2024). URL

Abstract

Quantifying the impact of individual data samples on machine learning models is an open research problem. This is particularly relevant when complex and high-dimensional relationships have to be learned from a limited sample of the data generating distribution, such as in deep learning. It was previously shown that, in these cases, models rely not only on extracting patterns which are helpful for generalisation, but also seem to be required to incorporate some of the training data more or less as is, in a process often termed memorisation. This raises the question: if some memorisation is a requirement for effective learning, what are its privacy implications? In this work we consider a broad range of previous definitions and perspectives on memorisation in ML, discuss their interplay with model generalisation and their implications of these phenomena on data privacy. We then propose a framework to reason over what memorisation means in the context of ML training under the prism of individual sample’s influence on the model. Moreover, we systematise methods allowing practitioners to detect the occurrence of memorisation or quantify it and contextualise our findings in a broad range of ML learning settings. Finally, we discuss memorisation in the context of privacy attacks, differential privacy and adversarial actors.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

[91]

G. Kaissis, S. Kolek, B. Balle, J. Hayes and D. Rückert.
Beyond the Calibration Point: Mechanism Comparison in Differential Privacy.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

In differentially private (DP) machine learning, the privacy guarantees of DP mechanisms are often reported and compared on the basis of a single pε,δq-pair. This practice overlooks that DP guarantees can vary substantially even between mechanisms sharing a given pε,δq, and potentially introduces privacy vulnerabilities which can remain undetected. This motivates the need for robust, rigorous methods for comparing DP guarantees in such cases. Here, we introduce the ∆-divergence between mechanisms which quantifies the worst-case excess privacy vulnerability of choosing one mechanism over another in terms of pε,δq, f-DP and in terms of a newly presented Bayesian interpretation. Moreover, as a generalisation of the Blackwell theorem, it is endowed with strong decision-theoretic foundations. Through application examples, we show that our techniques can facilitate informed decision-making and reveal gaps in the current understanding of privacy risks, as current practices in DP-SGD often result in choosing mechanisms with high excess privacy vulnerabilities.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[90]

T. Löhr, M. Ingrisch and E. Hüllermeier.
Towards Aleatoric and Epistemic Uncertainty in Medical Image Classification.
AIME 2024 - 22nd International Conference on Artificial Intelligence in Medicine. Salt Lake City, UT, USA, Jul 09-12, 2024. DOI

Abstract

Medical domain applications require a detailed understanding of the decision making process, in particular when data-driven modeling via machine learning is involved, and quantifying uncertainty in the process adds trust and interpretability to predictive models. However, current uncertainty measures in medical imaging are mostly monolithic and do not distinguish between different sources and types of uncertainty. In this paper, we advocate the distinction between so-called aleatoric and epistemic uncertainty in the medical domain and illustrate its potential in clinical decision making for the case of PET/CT image classification.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

[89]

M. Keicher, K. Zaripova, T. Czempiel, K. Mach, A. Khakzar and N. Navab.
FlexR: Few-shot Classification with Language Embeddings for Structured Reporting of Chest X-rays.
MIDL 2024 - Medical Imaging with Deep Learning. Paris, France, Jul 03-05, 2024. URL

Abstract

The automation of chest X-ray reporting has garnered significant interest due to the time-consuming nature of the task. However, the clinical accuracy of free-text reports has proven challenging to quantify using natural language processing metrics, given the complexity of medical information, the variety of writing styles, and the potential for typos and inconsistencies. Structured reporting and standardized reports, on the other hand, can provide consistency and formalize the evaluation of clinical correctness. However, high-quality annotations for structured reporting are scarce. Therefore, we propose a method to predict clinical findings defined by sentences in structured reporting templates, which can be used to fill such templates. The approach involves training a contrastive language-image model using chest X-rays and related free-text radiological reports, then creating textual prompts for each structured finding and optimizing a classifier to predict clinical findings in the medical image. Results show that even with limited image-level annotations for training, the method can accomplish the structured reporting tasks of severity assessment of cardiomegaly and localizing pathologies in chest X-rays.

MCML Authors

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Kamilia Zaripova

Computer Aided Medical Procedures & Augmented Reality

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[88]

Y. Li, T. Wolf, S. Pölsterl, I. Yakushev, D. M. Hedderich and C. Wachinger.
From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data.
MIDL 2024 - Medical Imaging with Deep Learning. Paris, France, Jul 03-05, 2024. URL

Abstract

Differential diagnosis of dementia is challenging due to overlapping symptoms, with structural magnetic resonance imaging (MRI) being the primary method for diagnosis. Despite the clinical value of computer-aided differential diagnosis, research has been limited, mainly due to the absence of public datasets that contain diverse types of dementia. This leaves researchers with small in-house datasets that are insufficient for training deep neural networks (DNNs). Self-supervised learning shows promise for utilizing unlabeled MRI scans in training, but small batch sizes for volumetric brain scans make its application challenging. To address these issues, we propose Triplet Training for differential diagnosis with limited target data. It consists of three key stages: (i) self-supervised pre-training on unlabeled data with Barlow Twins, (ii) self-distillation on task-related data, and (iii) fine-tuning on the target dataset. Our approach significantly outperforms traditional training strategies, achieving a balanced accuracy of 75.6%. We further provide insights into the training process by visualizing changes in the latent space after each step. Finally, we validate the robustness of Triplet Training in terms of its individual components in a comprehensive ablation study.

MCML Authors

Yitong Li

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[87]

F. Bongratz, V. Golkov, L. Mautner, L. Della Libera, F. Heetmeyer, F. Czaja, J. Rodemann and D. Cremers.
How to Choose a Reinforcement-Learning Algorithm.
Preprint (Jul. 2024). arXiv GitHub

Abstract

The field of reinforcement learning offers a large variety of concepts and methods to tackle sequential decision-making problems. This variety has become so large that choosing an algorithm for a task at hand can be challenging. In this work, we streamline the process of choosing reinforcement-learning algorithms and action-distribution families. We provide a structured overview of existing methods and their properties, as well as guidelines for when to choose which methods.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Vladimir Golkov

Dr.

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[86]

A. Taghipour, M. Ghahremani, M. Bennamoun, A. M. Rekavandi, Z. Li, H. Laga and F. Boussaid.
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions.
Preprint (Jul. 2024). arXiv

Abstract

This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion (SVD) framework, focusing on their impact on video generation quality and computational efficiency. Our findings indicate that CLIP embeddings, while crucial for aesthetic quality, do not significantly contribute towards the subject and background consistency of video outputs. Moreover, the computationally expensive cross-attention mechanism can be effectively replaced by a simpler linear layer. This layer is computed only once at the first diffusion inference step, and its output is then cached and reused throughout the inference process, thereby enhancing efficiency while maintaining high-quality outputs. Building on these insights, we introduce the VCUT, a training-free approach optimized for efficiency within the SVD architecture. VCUT eliminates temporal cross-attention and replaces spatial cross-attention with a one-time computed linear layer, significantly reducing computational load. The implementation of VCUT leads to a reduction of up to 322T Multiple-Accumulate Operations (MACs) per video and a decrease in model parameters by up to 50M, achieving a 20% reduction in latency compared to the baseline. Our approach demonstrates that conditioning during the Semantic Binding stage is sufficient, eliminating the need for continuous computation across all inference steps and setting a new standard for efficient video generation.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

[85]

Y. Chen, Y. Di, G. Zhai, F. Manhardt, C. Zhang, R. Zhang, F. Tombari, N. Navab and B. Busam.
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Category-level object pose estimation, aiming to predict the 6D pose and 3D size of objects from known categories, typically struggles with large intra-class shape variation. Existing works utilizing mean shapes often fall short of cap-turing this variation. To address this issue, we present Sec-ondPose, a novel approach integrating object-specific ge-ometric features with semantic category priors from DI-NOv2. Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object represen-tation under SE(3) transformations, facilitating the map-ping from camera space to the pre-defined canonical space, thus further enhancing pose estimation. Extensive exper-iments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% leap forward over the state-of-the-art. Moreover, on a more complex dataset HouseCat6D which provides photometrically challenging objects, SecondPose still surpasses other competitors by a large margin.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[84]

M. Ghahremani, M. Khateri, B. Jian, B. Wiestler, E. Adeli and C. Wachinger.
H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI GitHub

Abstract

This paper introduces a novel top-down representation approach for deformable image registration, which estimates the deformation field by capturing various short-and long-range flow features at different scale levels. As a Hierarchical Vision Transformer (H-ViT), we propose a dual self-attention and cross-attention mechanism that uses high-level features in the deformation field to represent low-level ones, enabling information streams in the deformation field across all voxel patch embeddings irrespective of their spatial proximity. Since high-level features contain abstract flow patterns, such patterns are expected to effectively contribute to the representation of the deformation field in lower scales. When the self-attention module utilizes within-scale short-range patterns for representation, the cross-attention modules dynamically look for the key tokens across different scales to further interact with the local query voxel patches. Our method shows superior accuracy and visual quality over the state-of-the-art registration methods in five publicly available datasets, highlighting a substantial enhancement in the performance of medical imaging registration.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Bailiang Jian

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[83]

J. Huang, H. Yu, K.-T. Yu, N. Navab, S. Ilic and B. Busam.
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Recent learning methods for object pose estimation require resource-intensive training for each individual object instance or category, hampering their scalability in real applications when confronted with previously unseen objects. In this paper, we propose MatchU, a Fuse-Describe-Match strategy for 6D pose estimation from RGB-D images. MatchU is a generic approach that fuses 2D texture and 3D geometric cues for 6D pose prediction of unseen objects. We rely on learning geometric 3D descriptors that are rotation-invariant by design. By encoding pose-agnostic geometry, the learned descriptors naturally generalize to unseen objects and capture symmetries. To tackle ambiguous associations using 3D geometry only, we fuse additional RGB information into our descriptor. This is achieved through a novel attention-based mechanism that fuses cross-modal information, together with a matching loss that leverages the latent space learned from RGB data to guide the descriptor learning process. Extensive experiments reveal the generalizability of both the RGB-D fusion strategy as well as the descriptor efficacy. Benefiting from the novel designs, MatchU surpasses all existing methods by a significant margin in terms of both accuracy and speed, even without the requirement of expensive re-training or rendering.

MCML Authors

Junwen Huang

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[82]

H. Jung, S.-C. Wu, P. Ruhkamp, G. Zhai, H. Schieber, G. Rizzoli, P. Wang, H. Zhao, L. Garattoni, D. Roth, S. Meier, N. Navab and B. Busam.
HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios.
CVPR 2024 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Estimating 6D object poses is a major challenge in 3D computer vision. Building on successful instance-level approaches, research is shifting towards category-level pose estimation for practical applications. Current category-level datasets, however, fall short in annotation quality and pose variety. Addressing this, we introduce HouseCat6D, a new category-level 6D pose dataset. It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household cat-egories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm. The dataset also includes 4) 41 large-scale scenes with comprehensive view-point and occlusion coverage,5) a checkerboard-free en-vironment, and 6) dense 6D parallel-jaw robotic grasp annotations. Additionally, we present benchmark results for leading category-level pose estimation networks.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

Computer Aided Medical Procedures & Augmented Reality

[81]

S. M. Fischer, J. Kiechle, D. M. Lang, J. C. Peeken and J. A. Schnabel.
Mask the Unknown: Assessing Different Strategies to Handle Weak Annotations in the MICCAI2023 Mediastinal Lymph Node Quantification Challenge.
Machine Learning for Biomedical Imaging 2 (Jun. 2024). DOI GitHub

Abstract

Pathological lymph node delineation is crucial in cancer diagnosis, progression assessment, and treatment planning. The MICCAI 2023 Lymph Node Quantification Challenge published the first public dataset for pathological lymph node segmentation in the mediastinum. As lymph node annotations are expensive, the challenge was formed as a weakly supervised learning task, where only a subset of all lymph nodes in the training set have been annotated. For the challenge submission, multiple methods for training on these weakly supervised data were explored, including noisy label training, loss masking of unlabeled data, and an approach that integrated the TotalSegmentator toolbox as a form of pseudo labeling in order to reduce the number of unknown voxels. Furthermore, multiple public TCIA datasets were incorporated into the training to improve the performance of the deep learning model. Our submitted model achieved a Dice score of 0.628 and an average symmetric surface distance of 5.8~mm on the challenge test set. With our submitted model, we accomplished the third rank in the MICCAI2023 LNQ challenge. A finding of our analysis was that the integration of all visible, including non-pathological lymph nodes improved the overall segmentation performance on pathological lymph nodes of the test set. Furthermore, segmentation models trained only on clinically enlarged lymph nodes, as given in the challenge scenario, could not generalize to smaller pathological lymph nodes.

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[80]

R. Wicklein, L. Kreitner, A. Wild, L. Aly, D. Rückert, B. Hemmer, T. Korn, M. Menten and B. Knier.
Retinal small vessel pathology is associated with disease burden in multiple sclerosis.
Multiple Sclerosis Journal 30.7 (Jun. 2024). DOI

Abstract

Background: Alterations of the superficial retinal vasculature are commonly observed in multiple sclerosis (MS) and can be visualized through optical coherence tomography angiography (OCTA).
Objectives: This study aimed to examine changes in the retinal vasculature during MS and to integrate findings into current concepts of the underlying pathology.
Methods: In this cross-sectional study, including 259 relapsing–remitting MS patients and 78 healthy controls, we analyzed OCTAs using deep-learning-based segmentation algorithm tools.
Results: We identified a loss of small-sized vessels (diameter < 10 µm) in the superficial vascular complex in all MS eyes, irrespective of their optic neuritis (ON) history. This alteration was associated with MS disease burden and appears independent of retinal ganglion cell loss. In contrast, an observed reduction of medium-sized vessels (diameter 10–20 µm) was specific to eyes with a history of ON and was closely linked to ganglion cell atrophy.
Conclusion: These findings suggest distinct atrophy patterns in retinal vessels in patients with MS. Further studies are necessary to investigate retinal vessel alterations and their underlying pathology in MS.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[79]

A. Ziller, T. T. Mueller, S. Stieger, L. F. Feiner, J. Brandt, R. Braren, D. Rückert and G. Kaissis.
Reconciling privacy and accuracy in AI for medical imaging.
Nature Machine Intelligence 6 (Jun. 2024). DOI

Abstract

Artificial intelligence (AI) models are vulnerable to information leakage of their training data, which can be highly sensitive, for example, in medical imaging. Privacy-enhancing technologies, such as differential privacy (DP), aim to circumvent these susceptibilities. DP is the strongest possible protection for training models while bounding the risks of inferring the inclusion of training samples or reconstructing the original data. DP achieves this by setting a quantifiable privacy budget. Although a lower budget decreases the risk of information leakage, it typically also reduces the performance of such models. This imposes a trade-off between robust performance and stringent privacy. Additionally, the interpretation of a privacy budget remains abstract and challenging to contextualize. Here we contrast the performance of artificial intelligence models at various privacy budgets against both theoretical risk bounds and empirical success of reconstruction attacks. We show that using very large privacy budgets can render reconstruction attacks impossible, while drops in performance are negligible. We thus conclude that not using DP at all is negligent when applying artificial intelligence models to sensitive data. We deem our results to lay a foundation for further debates on striking a balance between privacy risks and model performance.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

[78]

J. Kiechle, S. M. Fischer, D. M. Lang, M. Folco, S. C. Foreman, V. K. N. Rösner, A.-K. Lohse, C. Mogler, C. Knebel, M. R. Makowski, K. Woertler, S. E. Combs, A. S. Gersing, J. C. Peeken and J. A. Schnabel.
Unifying local and global shape descriptors to grade soft-tissue sarcomas using graph convolutional networks.
ISBI 2024 - IEEE 21st International Symposium on Biomedical Imaging. Athens, Greece, May 27-30, 2024. DOI

Abstract

The tumor grading of patients suffering from soft-tissue sarcomas is a critical task, as an accurate classification of this high-mortality cancer entity constitutes a decisive factor in devising optimal treatment strategies. In this work, we focus on distinguishing soft-tissue sarcoma subtypes solely based on their 3D morphological characteristics, derived from tumor segmentation masks. Notably, we direct attention to overcoming the limitations of texture-based methodologies, which often fall short of providing adequate shape delineation. To this end, we propose a novel yet elegant modular geometric deep learning framework coined Global Local Graph Convolutional Network (GloLo-GCN) that integrates local and global shape characteristics into a meaningful unified shape descriptor. Evaluated on a multi-center dataset, our proposed model performs better in soft-tissue sarcoma grading than GCNs based on state-of-the-art graph convolutions and a volumetric 3D convolutional neural network, also evaluated on binary segmentation masks exclusively.

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[77]

N. Stolt-Ansó, V. Sideri-Lampretsa, M. Dannecker and D. Rückert.
Intensity-based 3D motion correction for cardiac MR images.
ISBI 2024 - IEEE 21st International Symposium on Biomedical Imaging. Athens, Greece, May 27-30, 2024. DOI

Abstract

Cardiac magnetic resonance (CMR) image acquisition requires subjects to hold their breath while 2D cine images are acquired. This process assumes that the heart remains in the same position across all slices. However, differences in breathhold positions or patient motion introduce 3D slice misalignments. In this work, we propose an algorithm that simultaneously aligns all SA and LA slices by maximizing the pair-wise intensity agreement between their intersections. Unlike previous works, our approach is formulated as a subject-specific optimization problem and requires no prior knowledge of the underlying anatomy. We quantitatively demonstrate that the proposed method is robust against a large range of rotations and translations by synthetically misaligning 10 motion-free datasets and aligning them back using the proposed method.

MCML Authors

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[76]

Y. Zhang, N. Stolt-Ansó, J. Pan, W. Huang, K. Hammernik and D. Rückert.
Direct Cardiac Segmentation from Undersampled K-Space using Transformers.
ISBI 2024 - IEEE 21st International Symposium on Biomedical Imaging. Athens, Greece, May 27-30, 2024. DOI

Abstract

The prevailing deep learning-based methods of predicting cardiac segmentation involve reconstructed magnetic resonance (MR) images. The heavy dependency of segmentation approaches on image quality significantly limits the acceleration rate in fast MR reconstruction. Moreover, the practice of treating reconstruction and segmentation as separate sequential processes leads to artifact generation and information loss in the intermediate stage. These issues pose a great risk to achieving high-quality outcomes. To leverage the redundant k-space information overlooked in this dual-step pipeline, we introduce a novel approach to directly deriving segmentations from sparse k-space samples using a transformer (DiSK). DiSK operates by globally extracting latent features from 2D+time k-space data with attention blocks and subsequently predicting the segmentation label of query points. We evaluate our model under various acceleration factors (ranging from 4 to 64) and compare against two image-based segmentation baselines. Our model consistently outperforms the baselines in Dice and Hausdorff distances across foreground classes for all presented sampling rates.

MCML Authors

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[75]

Y. Velikova, M. F. Azampour, W. Simson, M. Esposito and N. Navab.
Implicit Neural Representations for Breathing-compensated Volume Reconstruction in Robotic Ultrasound Aorta Screening.
ICRA 2024 - IEEE International Conference on Robotics and Automation. Yokohoma, Japan, May 13-17, 2024. DOI

Abstract

Ultrasound (US) imaging is widely used in diagnosing and staging abdominal diseases due to its lack of non-ionizing radiation and prevalent availability. However, significant inter-operator variability and inconsistent image acquisition hinder the widespread adoption of extensive screening programs. Robotic ultrasound systems have emerged as a promising solution, offering standardized acquisition protocols and the possibility of automated acquisition. Additionally, these systems enable access to 3D data via robotic tracking, enhancing volumetric reconstruction for improved ultrasound interpretation and precise disease diagnosis.However, the interpretability of 3D US reconstruction of abdominal images can be affected by the patient’s breathing motion. This study introduces a method to compensate for breathing motion in 3D US compounding by leveraging implicit neural representations. Our approach employs a robotic ultrasound system for automated screenings. To demonstrate the method’s effectiveness, we evaluate our proposed method for the diagnosis and monitoring of abdominal aorta aneurysms as a representative use case.Our experiments demonstrate that our proposed pipeline facilitates robust automated robotic acquisition, mitigating artifacts from breathing motion, and yields smoother 3D reconstructions for enhanced screening and medical diagnosis.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Walter Simson

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[74]

R. Klaar, M. Rabe, A. T. Stüber, S. Corradini, C. Eze, C. Belka, G. Landry, C. Kurz and J. Dinkel.
Using Ventilation and Perfusion MRI at a 0.35 T MR-Linac to Predict Radiation-Induced Pneumonitis in Lung Cancer Patients.
ISMRM 2024 - International Society for Magnetic Resonance in Medicine Annual Meeting. Singapore, May 04-09, 2024. URL

Abstract

Motivation: Early predictors of radiation-induced pneumonitis in patients receiving MR-guided radiotherapy allowing a closer follow up and taking early countermeasures to avoid a severe disease progression have not yet been identified.
Goal(s): We aimed at finding functional MR-based biomarkers acquired during treatment that allows the prediction of radiation-induced pneumonitis (RP) for lung cancer patients directly after MR-guided radiotherapy.
Approach: For 19 patients, ventilation- and perfusion-maps were acquired using a non-contrast enhanced free-breathing technique and investigated in different regions of the irradiated lung.
Results: Changes over treatment in the ventilation around the tumor significantly separate between RP and non-RP group.
Impact: The acquisition of additional functional lung imaging during MR-guided radiotherapy requires little effort while offering the opportunity to identify lung cancer patients at risk of developing radiation-induced pneumonitis right after treatment and to take early countermeasures to avoid severe complications.

MCML Authors

Theresa Stüber

Clinical Data Science in Radiology

[73]

Y. Zhang, N. Stolt-Ansó, J. Pan, W. Huang, K. Hammernik and D. Rückert.
Reconstruction-free segmentation from undersampled k-space using transformers.
ISMRM 2024 - International Society for Magnetic Resonance in Medicine Annual Meeting. Singapore, May 04-09, 2024. URL

Abstract

Motivation: High acceleration factors place a limit on MRI image reconstruction. This limit is extended to segmentation models when treating these as subsequent independent processes.
Goal(s): Our goal is to produce segmentations directly from sparse k-space measurements without the need for intermediate image reconstruction.
Approach: We employ a transformer architecture to encode global k-space information into latent features. The produced latent vectors condition queried coordinates during decoding to generate segmentation class probabilities.
Results: The model is able to produce better segmentations across high acceleration factors than image-based segmentation baselines.
Impact: Cardiac segmentation directly from undersampled k-space samples circumvents the need for an intermediate image reconstruction step. This allows the potential to assess myocardial structure and function on higher acceleration factors than methods that rely on images as input.

MCML Authors

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[72]

J. Kiechle, S. C. Foreman, S. Fischer, D. Rusche, V. Rösner, A.-K. Lohse, C. Mogler, S. E. Combs, M. R. Makowski, K. Woertler, D. M. Lang, J. A. Schnabel, A. S. Gersing and J. C. Peeken.
Investigating the role of morphology in deep learning-based liposarcoma grading.
ESTRO 2024 - Annual Meeting of the European Society for Radiotherapy and Oncology. Glasgow, UK, May 03-07, 2024. URL

MCML Authors

Johannes Kiechle

Computational Imaging and AI in Medicine

Stefan Fischer

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

C1 | Medicine
→ Group Peter Schüffler

Computational Imaging and AI in Medicine

[71]

A. Kazemi, A. Rasouli-Saravani, M. Gharib, T. Albuquerque, S. Eslami and P. J. Schüffler.
A systematic review of machine learning-based tumor-infiltrating lymphocytes analysis in colorectal cancer: Overview of techniques, performance metrics, and clinical outcomes.
Computers in Biology and Medicine 173 (May. 2024). DOI

Abstract

The incidence of colorectal cancer (CRC), one of the deadliest cancers around the world, is increasing. Tissue microenvironment (TME) features such as tumor-infiltrating lymphocytes (TILs) can have a crucial impact on diagnosis or decision-making for treating patients with CRC. While clinical studies showed that TILs improve the host immune response, leading to a better prognosis, inter-observer agreement for quantifying TILs is not perfect. Incorporating machine learning (ML) based applications in clinical routine may promote diagnosis reliability. Recently, ML has shown potential for making progress in routine clinical procedures. We aim to systematically review the TILs analysis based on ML in CRC histological images. Deep learning (DL) and non-DL techniques can aid pathologists in identifying TILs, and automated TILs are associated with patient outcomes. However, a large multi-institutional CRC dataset with a diverse and multi-ethnic population is necessary to generalize ML methods.

MCML Authors

Azar Kazemi

Computational Pathology

Peter Schüffler

Prof. Dr.

Computational Pathology

[70]

K. Jeblick, B. Schachtner, J. Dexl, A. Mittermeier, A. T. Stüber, J. Topalis, T. Weber, P. Wesp, B. O. Sabel, J. Ricke and M. Ingrisch.
ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports.
European Radiology 34 (May. 2024). DOI

Abstract

Objectives: To assess the quality of simplified radiology reports generated with the large language model (LLM) ChatGPT and to discuss challenges and chances of ChatGPT-like LLMs for medical text simplification.
Methods: In this exploratory case study, a radiologist created three fictitious radiology reports which we simplified by prompting ChatGPT with ‘Explain this medical report to a child using simple language.’’ In a questionnaire, we tasked 15 radiologists to rate the quality of the simplified radiology reports with respect to their factual correctness, completeness, and potential harm for patients. We used Likert scale analysis and inductive free-text categorization to assess the quality of the simplified reports.
Results: Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed relevant medical information, and potentially harmful passages were reported.
Conclusion: While we see a need for further adaption to the medical field, the initial insights of this study indicate a tremendous potential in using LLMs like ChatGPT to improve patient-centered care in radiology and other medical domains.
Clinical relevance statement: Patients have started to use ChatGPT to simplify and explain their medical reports, which is expected to affect patient-doctor interaction. This phenomenon raises several opportunities and challenges for clinical routine.

MCML Authors

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Balthasar Schachtner

Dr.

Clinical Data Science in Radiology

Jakob Dexl

Clinical Data Science in Radiology

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Theresa Stüber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Tobias Weber

* Former Member

Philipp Wesp

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[69]

V. G. Duque, A. Marquardt, Y. Velikova, L. Lacourpaille, A. Nordez, M. Crouzier, H. J. Lee, D. Mateus and N. Navab.
Ultrasound segmentation analysis via distinct and completed anatomical bordersd.
International Journal of Computer Assisted Radiology and Surgery 19 (May. 2024). DOI

Abstract

Segmenting ultrasound images is important for precise area and/or volume calculations, ensuring reliable diagnosis and effective treatment evaluation for diseases. Recently, many segmentation methods have been proposed and shown impressive performance. However, currently, there is no deeper understanding of how networks segment target regions or how they define the boundaries. In this paper, we present a new approach that analyzes ultrasound segmentation networks in terms of learned borders because border delimitation is challenging in ultrasound.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Hong Joo Lee

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[68]

M.-A. Gafencu, Y. Velikova, M. Saleh, T. Ungi, N. Navab, T. Wendler and M. F. Azampour.
Shape completion in the dark: completing vertebrae morphology from 3D ultrasound.
International Journal of Computer Assisted Radiology and Surgery 19 (May. 2024). DOI

Abstract

Ultrasound (US) imaging, while advantageous for its radiation-free nature, is challenging to interpret due to only partially visible organs and a lack of complete 3D information. While performing US-based diagnosis or investigation, medical professionals therefore create a mental map of the 3D anatomy. In this work, we aim to replicate this process and enhance the visual representation of anatomical structures.

MCML Authors

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

[67]

P. Wesp.
Application of machine learning in CT colonography and radiological age assessment: enhancing traditional diagnostics in radiology.
Dissertation 2024. DOI

Abstract

Machine learning can address limitations in radiology where traditional methods fall short, as shown by this work’s focus on two clinical problems: differentiating premalignant from benign colorectal polyps and continuous age prediction through clavicle ossification in CT scans. For colorectal polyps, a random forest classifier and CNN models enabled non-invasive differentiation between benign and premalignant types in CT colonography, potentially supporting more precise cancer prevention. For age assessment, a deep learning model trained on automatically detected clavicle regions achieved superior accuracy compared to human estimates, demonstrating machine learning’s potential to enhance radiological diagnostics in complex cases. (Shortened).

MCML Authors

Philipp Wesp

Dr.

Clinical Data Science in Radiology

[66]

S. T. Arasteh, A. Ziller, C. Kuhl, M. Makowski, S. Nebelung, R. Braren, D. Rückert, D. Truhn and G. Kaissis.
Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging.
Communications Medicine 4.46 (Mar. 2024). DOI

Abstract

Background: Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.
Methods: We used two datasets: (1) A large dataset (N=193,311) of high quality clinical chest radiographs, and (2) a dataset (N=1625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver operating characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson’s r or Statistical Parity Difference.
Results: We find that, while the privacy-preserving training yields lower accuracy, it largely does not amplify discrimination against age, sex or co-morbidity. However, we find an indication that difficult diagnoses and subgroups suffer stronger performance hits in private training.
Conclusions: Our study shows that – under the challenging realistic circumstances of a real-life clinical dataset – the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Georgios Kaissis

Dr.

* Former Member

[65]

P. J. Schüffler, K. Steiger and C. Mogler.
Künstliche Intelligenz in der Pathologie – wie, wo und warum?
Die Pathologie (Mar. 2024). DOI

Abstract

Künstliche Intelligenz verspricht viele Erneuerungen und Erleichterungen in der Pathologie, wirft jedoch ebenso viele Fragen und Ungewissheiten auf. In diesem Artikel geben wir eine kurze Übersicht über den aktuellen Stand, die bereits erreichten Ziele vorhandener Algorithmen und immer noch ausstehende Herausforderungen.

MCML Authors

Peter Schüffler

Prof. Dr.

Computational Pathology

[64]

C. Wachinger, D. Hedderich and F. Bongratz.
Stochastic Cortical Self-Reconstruction.
Preprint (Mar. 2024). arXiv

Abstract

Magnetic resonance imaging (MRI) is critical for diagnosing neurodegenerative diseases, yet accurately assessing mild cortical atrophy remains a challenge due to its subtlety. Automated cortex reconstruction, paired with healthy reference ranges, aids in pinpointing pathological atrophy, yet their generalization is limited by biases from image acquisition and processing. We introduce the concept of stochastic cortical self-reconstruction (SCSR) that creates a subject-specific healthy reference by taking MRI-derived thicknesses as input and, therefore, implicitly accounting for potential confounders. SCSR randomly corrupts parts of the cortex and self-reconstructs them from the remaining information. Trained exclusively on healthy individuals, repeated self-reconstruction generates a stochastic reference cortex for assessing deviations from the norm. We present three implementations of this concept: XGBoost applied on parcels, and two autoencoders on vertex level – one based on a multilayer perceptron and the other using a spherical U-Net. These models were trained on healthy subjects from the UK Biobank and subsequently evaluated across four public Alzheimer’s datasets. Finally, we deploy the model on clinical in-house data, where deviation maps’ high spatial resolution aids in discriminating between four types of dementia.

MCML Authors

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Fabian Bongratz

Artificial Intelligence in Medical Imaging

[63]

T. N. Wolf, F. Bongratz, A.-M. Rickmann, S. Pölsterl and C. Wachinger.
Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning.
AAAI 2024 - 38th Conference on Artificial Intelligence. Vancouver, Canada, Feb 20-27, 2024. DOI

Abstract

Explaining predictions of black-box neural networks is crucial when applied to decision-critical tasks. Thus, attribution maps are commonly used to identify important image regions, despite prior work showing that humans prefer explanations based on similar examples. To this end, ProtoPNet learns a set of class-representative feature vectors (prototypes) for case-based reasoning. During inference, similarities of latent features to prototypes are linearly classified to form predictions and attribution maps are provided to explain the similarity. In this work, we evaluate whether architectures for case-based reasoning fulfill established axioms required for faithful explanations using the example of ProtoPNet. We show that such architectures allow the extraction of faithful explanations. However, we prove that the attribution maps used to explain the similarities violate the axioms. We propose a new procedure to extract explanations for trained ProtoPNets, named ProtoPFaith. Conceptually, these explanations are Shapley values, calculated on the similarity scores of each prototype. They allow to faithfully answer which prototypes are present in an unseen image and quantify each pixel’s contribution to that presence, thereby complying with all axioms. The theoretical violations of ProtoPNet manifest in our experiments on three datasets (CUB-200-2011, Stanford Dogs, RSNA) and five architectures (ConvNet, ResNet, ResNet50, WideResNet50, ResNeXt50). Our experiments show a qualitative difference between the explanations given by ProtoPNet and ProtoPFaith. Additionally, we quantify the explanations with the Area Over the Perturbation Curve, on which ProtoPFaith outperforms ProtoPNet on all experiments by a factor >10^3.

MCML Authors

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[62]

A. Portafaix, P. Reidler, B. Sabel, J. Dexl, K. Jeblick, A. Mittermeier, M. Ingrisch and T. Fevens.
Computer vision-based guidance tool for correct radiographic hand positioning.
SPIE 2024 - SPIE Medical Imaging: Image Perception, Observer Performance, and Technology Assessment. San Diego, CA, USA, Feb 18-22, 2024. DOI

Abstract

Hand x-rays are used for tasks such as detecting fractures and investigating joint pain. The choice of the x-ray view plays a crucial role in a medical expert’s ability to make an accurate diagnosis. This is particularly important for the hand, where the small and overlapping bones of the carpals can make diagnosis challenging, even with proper positioning. In this study, we develop a prototype that uses deep learning models, iterative methods and a depth sensor to estimate hand and x-ray machine parameters. These parameters are then used to generate feedback that helps ensure proper radiographic hand positioning. The method of this study consists of five steps: detector table parameter estimation, 2D hand joint landmark prediction, hand joint landmark depth estimation, radiographic positioning parameter extraction, and radiographic protocol constraint verification. Detector plane parameter estimation is achieved by fitting a plane to randomly queried depth points using RANSAC. Google’s MediaPipe HandPose model is used for 2D hand joint landmark prediction, and hand joint depth estimation is determined using the OAK-D Pro sensor. Finally, hand positioning parameters are extracted and evaluated for the selected radiographic viewing protocol. We focus on three commonly used hand positioning protocols: posterior-anterior, oblique, and lateral view. The prototype also has a user interface and a feedback system designed for practical use in the x-ray room. Two evaluations are undertaken to validate our prototype. First, with the help of a radiology technician, we rate the tool’s positioning feedback. Second, using a bespoke left-hand x-ray phantom and an x-ray machine, we generate images with and without the prototype guidance for a double-blind study where the images are rated by a radiologist.

MCML Authors

Jakob Dexl

Clinical Data Science in Radiology

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[61]

K. Geißler, A. Ambroladze, N. Papenberg, T. L. Koller, H. Amer, E. M. Fallenberg, S. A. Kurt, M. Ingrisch and H. K. Hahn.
Deformable current-prior registration of DCE breast MR images on multi-site data.
SPIE 2024 - SPIE Medical Imaging: Image Processing. San Diego, CA, USA, Feb 18-22, 2024. DOI

Abstract

Recent studies indicate that malignant breast lesions can be predicted from structural changes in prior exams of preventive breast MRI examinations. Due to non-rigid deformation between studies, spatial correspondences between structures in two consecutive studies are lost. Thus, deformable image registration can contribute to predicting individual cancer risks. This study evaluates a registration approach based on a novel breast mask segmentation and non-linear image registration based on data from 5 different sites. The landmark error (mean ± standard deviation [1st quartile, 3rd quartile]), annotated by three radiologists, is 2.9 ± 2.8 [1.3, 3.2] mm when leaving out two outlier cases from the evaluation for which the registration failed completely. We assess the inter-observer variabilities of keypoint errors and find an error of 3.6 ± 4.7 [1.6, 4.0] mm, 4.4 ± 4.9 [1.8, 4.8] mm, and 3.8 ± 4.0 [1.7, 4.1] mm when comparing each radiologist to the mean keypoints of the other two radiologists. Our study shows that the current state of the art in registration is well suited to recover spatial correspondences of structures in cancerous and non-cancerous cases, despite the high level of difficulty of this task.

MCML Authors

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[60]

A. Reithmeir, J. A. Schnabel and V. A. Zimmer.
Learning physics-inspired regularization for medical image registration with hypernetworks.
SPIE 2024 - SPIE Medical Imaging: Image Processing. San Diego, CA, USA, Feb 18-22, 2024. DOI GitHub

Abstract

Medical image registration aims to identify the spatial deformation between images of the same anatomical region and is fundamental to image-based diagnostics and therapy. To date, the majority of the deep learning-based registration methods employ regularizers that enforce global spatial smoothness, e.g., the diffusion regularizer. However, such regularizers are not tailored to the data and might not be capable of reflecting the complex underlying deformation. In contrast, physics-inspired regularizers promote physically plausible deformations. One such regularizer is the linear elastic regularizer, which models the deformation of elastic material. These regularizers are driven by parameters that define the material’s physical properties. For biological tissue, a wide range of estimations of such parameters can be found in the literature, and it remains an open challenge to identify suitable parameter values for successful registration. To overcome this problem and to incorporate physical properties into learning-based registration, we propose to use a hypernetwork that learns the effect of the physical parameters of a physics-inspired regularizer on the resulting spatial deformation field. In particular, we adapt the HyperMorph framework to learn the effect of the two elasticity parameters of the linear elastic regularizer. Our approach enables the efficient discovery of suitable, data-specific physical parameters at test time. To the best of our knowledge, we are the first to use a hypernetwork to learn physics-inspired regularization for medical image registration. We evaluate our approach on 3D intrapatient lung CT images. The results show that the linear elastic regularizer can yield comparable results to the diffusion regularizer in unsupervised learning-based registration while predicting deformations with fewer foldings. With our method, the adaptation of the physical parameters to the data can successfully be performed at test time.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[59]

A. Farshad.
Learning to Learn Neural Representations with Limited Data and Supervision.
Dissertation 2024. URL

Abstract

Learning to learn is a powerful paradigm that enables machine learning models to leverage the previously learned features for new tasks and domains more effectively. This thesis explores different aspects of learning to learn from data, models, and semantics, and shows how they can enhance various computer vision and medical imaging tasks. In the first part of the thesis, we present novel and fundamental research on learning to learn from data, and in the second part, we investigate the use of high-level semantics in generative models.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

[58]

A. Taghipour, M. Ghahremani, M. Bennamoun, A. M. Rekavandi, H. Laga and F. Boussaid.
Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models.
Preprint (Feb. 2024). arXiv GitHub

Abstract

While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: i) Object generation, which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and ii) attribute binding, guaranteeing that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models, markedly enhancing model performance in addressing the key challenges. We evaluate our technique using the established CompBench and TIFA score benchmarks, demonstrating significant performance improvements compared to existing methods.

MCML Authors

Morteza Ghahremani

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence in Medical Imaging

[57]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Constrained Probabilistic Mask Learning for Task-specific Undersampled MRI Reconstruction.
WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, Hawaii, Jan 04-08, 2024. DOI

Abstract

Undersampling is a common method in Magnetic Resonance Imaging (MRI) to subsample the number of data points in k-space, reducing acquisition times at the cost of decreased image quality. A popular approach is to employ undersampling patterns following various strategies, e.g., variable density sampling or radial trajectories. In this work, we propose a method that directly learns the under-sampling masks from data points, thereby also providing task- and domain-specific patterns. To solve the resulting discrete optimization problem, we propose a general optimization routine called ProM: A fully probabilistic, differentiable, versatile, and model-free framework for mask optimization that enforces acceleration factors through a convex constraint. Analyzing knee, brain, and cardiac MRI datasets with our method, we discover that different anatomic regions reveal distinct optimal undersampling masks, demonstrating the benefits of using custom masks, tailored for a downstream task. For example, ProM can create undersampling masks that maximize performance in downstream tasks like segmentation with networks trained on fully-sampled MRIs. Even with extreme acceleration factors, ProM yields reasonable performance while being more versatile than existing methods, paving the way for data-driven all-purpose mask generation.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[56]

L. Kreitner, J. C. Paetzold, N. Rauch, C. Chen, A. M. H. Ahmed M. Hagag, A. E. Fayed, S. Sivaprasad, S. Rausch, J. Weichsel, B. H. Menze, M. Harders, B. Knier, D. Rückert and M. Menten.
Synthetic Optical Coherence Tomography Angiographs for Detailed Retinal Vessel Segmentation Without Human Annotations.
IEEE Transactions on Medical Imaging 43.6 (Jan. 2024). DOI

Abstract

Optical coherence tomography angiography (OCTA) is a non-invasive imaging modality that can acquire high-resolution volumes of the retinal vasculature and aid the diagnosis of ocular, neurological and cardiac diseases. Segmenting the visible blood vessels is a common first step when extracting quantitative biomarkers from these images. Classical segmentation algorithms based on thresholding are strongly affected by image artifacts and limited signal-to-noise ratio. The use of modern, deep learning-based segmentation methods has been inhibited by a lack of large datasets with detailed annotations of the blood vessels. To address this issue, recent work has employed transfer learning, where a segmentation network is trained on synthetic OCTA images and is then applied to real data. However, the previously proposed simulations fail to faithfully model the retinal vasculature and do not provide effective domain adaptation. Because of this, current methods are unable to fully segment the retinal vasculature, in particular the smallest capillaries. In this work, we present a lightweight simulation of the retinal vascular network based on space colonization for faster and more realistic OCTA synthesis. We then introduce three contrast adaptation pipelines to decrease the domain gap between real and artificial images. We demonstrate the superior segmentation performance of our approach in extensive quantitative and qualitative experiments on three public datasets that compare our method to traditional computer vision algorithms and supervised training using human annotations. Finally, we make our entire pipeline publicly available, including the source code, pretrained models, and a large dataset of synthetic OCTA images.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[55]

P. Wesp, B. M. Schachtner, K. Jeblick, J. Topalis, M. Weber, F. Fischer, R. Penning, J. Ricke, M. Ingrisch and B. O. Sabel.
Radiological age assessment based on clavicle ossification in CT: enhanced accuracy through deep learning.
International Journal of Legal Medicine (Jan. 2024). DOI

Abstract

Background: Radiological age assessment using reference studies is inherently limited in accuracy due to a finite number of assignable skeletal maturation stages. To overcome this limitation, we present a deep learning approach for continuous age assessment based on clavicle ossification in computed tomography (CT).
Methods: Thoracic CT scans were retrospectively collected from the picture archiving and communication system. Individuals aged 15.0 to 30.0 years examined in routine clinical practice were included. All scans were automatically cropped around the medial clavicular epiphyseal cartilages. A deep learning model was trained to predict a person’s chronological age based on these scans. Performance was evaluated using mean absolute error (MAE). Model performance was compared to an optimistic human reader performance estimate for an established reference study method.
Results: The deep learning model was trained on 4,400 scans of 1,935 patients (training set: mean age =
24.2 years ± 4.0, 1132 female) and evaluated on 300 scans of 300 patients with a balanced age and sex distribution (test set: mean age = 22.5 years ± 4.4, 150 female). Model MAE was 1.65 years, and the highest absolute error was 6.40 years for females and 7.32 years for males. However, performance could be attributed to norm-variants or pathologic disorders. Human reader estimate MAE was 1.84 years and the highest absolute error was 3.40 years for females and 3.78 years for males.
Conclusions: We present a deep learning approach for continuous age predictions using CT volumes highlighting the medial clavicular epiphyseal cartilage with performance comparable to the human reader estimate.

MCML Authors

Philipp Wesp

Dr.

Clinical Data Science in Radiology

Katharina Jeblick

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[54]

F. Bongratz, A.-M. Rickmann and C. Wachinger.
Neural deformation fields for template-based reconstruction of cortical surfaces from MRI.
Medical Image Analysis 93 (Jan. 2024). DOI

Abstract

The reconstruction of cortical surfaces is a prerequisite for quantitative analyses of the cerebral cortex in magnetic resonance imaging (MRI). Existing segmentation-based methods separate the surface registration from the surface extraction, which is computationally inefficient and prone to distortions. We introduce Vox2Cortex-Flow (V2C-Flow), a deep mesh-deformation technique that learns a deformation field from a brain template to the cortical surfaces of an MRI scan. To this end, we present a geometric neural network that models the deformation-describing ordinary differential equation in a continuous manner. The network architecture comprises convolutional and graph-convolutional layers, which allows it to work with images and meshes at the same time. V2C-Flow is not only very fast, requiring less than two seconds to infer all four cortical surfaces, but also establishes vertex-wise correspondences to the template during reconstruction. In addition, V2C-Flow is the first approach for cortex reconstruction that models white matter and pial surfaces jointly, therefore avoiding intersections between them. Our comprehensive experiments on internal and external test data demonstrate that V2C-Flow results in cortical surfaces that are state-of-the-art in terms of accuracy. Moreover, we show that the established correspondences are more consistent than in FreeSurfer and that they can directly be utilized for cortex parcellation and group analyses of cortical thickness.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[53]

D. Zhu, Q. Khan and D. Cremers.
Multi-vehicle trajectory prediction and control at intersections using state and intention information.
Neurocomputing 574 (Jan. 2024). DOI GitHub

Abstract

Traditional deep learning approaches for prediction of future trajectory of multiple road agents rely on knowing information about their past trajectory. In contrast, this work utilizes information of only the current state and intended direction to predict the future trajectory of multiple vehicles at intersections. Incorporating intention information has two distinct advantages: (1) It allows to not just predict the future trajectory but also control the multiple vehicles. (2) By manipulating the intention, the interaction among the vehicles is adapted accordingly to achieve desired behavior. Both these advantages would otherwise not be possible using only past trajectory information Our model utilizes message passing of information between the vehicle nodes for a more holistic overview of the environment, resulting in better trajectory prediction and control of the vehicles. This work also provides a thorough investigation and discussion into the disparity between offline and online metrics for the task of multi-agent control. We particularly show why conducting only offline evaluation would not suffice, thereby necessitating online evaluation. We demonstrate the superiority of utilizing intention information rather than past trajectory in online scenarios. Lastly, we show the capability of our method in adapting to different domains through experiments conducted on two distinct simulation platforms i.e. SUMO and CARLA.

MCML Authors

Dekai Zhu

Computer Aided Medical Procedures & Augmented Reality

Qadeer Khan

Computer Vision & Artificial Intelligence

Daniel Cremers

Prof. Dr.

Computer Vision & Artificial Intelligence

[52]

F. Bongratz, J. Fecht, A.-M. Rickmann and C. Wachinger.
V2C-Long: Longitudinal Cortex Reconstruction with Spatiotemporal Correspondence.
Preprint (2024). arXiv

Abstract

Reconstructing the cortex from longitudinal MRI is indispensable for analyzing morphological changes in the human brain. Despite the recent disruption of cortical surface reconstruction with deep learning, challenges arising from longitudinal data are still persistent. Especially the lack of strong spatiotemporal point correspondence hinders downstream analyses due to the introduced noise. To address this issue, we present V2C-Long, the first dedicated deep learning-based cortex reconstruction method for longitudinal MRI. In contrast to existing methods, V2C-Long surfaces are directly comparable in a cross-sectional and longitudinal manner. We establish strong inherent spatiotemporal correspondences via a novel composition of two deep mesh deformation networks and fast aggregation of feature-enhanced within-subject templates. The results on internal and external test data demonstrate that V2C-Long yields cortical surfaces with improved accuracy and consistency compared to previous methods. Finally, this improvement manifests in higher sensitivity to regional cortical atrophy in Alzheimer’s disease.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[51]

M. Ghahremani and C. Wachinger.
RegBN: Batch Normalization of Multimodal Data with Regularization.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL GitHub

Abstract

Recent years have witnessed a surge of interest in integrating high-dimensional data captured by multisource sensors, driven by the impressive success of neural networks in integrating multimodal data. However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted variability and bias, leading to suboptimal performance of multimodal models. Therefore, it becomes crucial to normalize the low- or high-level features extracted from data modalities before their fusion takes place. This paper introduces RegBN, a novel approach for multimodal Batch Normalization with REGularization. RegBN uses the Frobenius norm as a regularizer term to address the side effects of confounders and underlying dependencies among different data sources. The proposed method generalizes well across multiple modalities and eliminates the need for learnable parameters, simplifying training and inference. We validate the effectiveness of RegBN on eight databases from five research areas, encompassing diverse modalities such as language, audio, image, video, depth, tabular, and 3D MRI. The proposed method demonstrates broad applicability across different architectures such as multilayer perceptrons, convolutional neural networks, and vision transformers, enabling effective normalization of both low- and high-level features in multimodal neural networks.

MCML Authors

Morteza Ghahremani

Dr.

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[50]

G. Kaissis, A. Ziller, S. Kolek, A. Riess and D. Rückert.
Optimal privacy guarantees for a relaxed threat model: Addressing sub-optimal adversaries in differentially private machine learning.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Differentially private mechanisms restrict the membership inference capabilities of powerful (optimal) adversaries against machine learning models. Such adversaries are rarely encountered in practice. In this work, we examine a more realistic threat model relaxation, where (sub-optimal) adversaries lack access to the exact model training database, but may possess related or partial data. We then formally characterise and experimentally validate adversarial membership inference capabilities in this setting in terms of hypothesis testing errors. Our work helps users to interpret the privacy properties of sensitive data processing systems under realistic threat model relaxations and choose appropriate noise levels for their use-case.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[49]

G. Zhai, E. P. Örnek, S.-C. Wu, Y. Di, F. Tombari, N. Navab and B. Busam.
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graphs.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Controllable scene synthesis aims to create interactive environments for numerous industrial use cases. Scene graphs provide a highly suitable interface to facilitate these applications by abstracting the scene context in a compact manner. Existing methods, reliant on retrieval from extensive databases or pre-trained shape embeddings, often overlook scene-object and object-object relationships, leading to inconsistent results due to their limited generation capacity. To address this issue, we present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes, which are semantically realistic and conform to commonsense. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes via latent diffusion, capturing global scene-object and local inter-object relationships in the scene graph while preserving shape diversity. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model. Due to the lack of a scene graph dataset offering high-quality object-level meshes with relations, we also construct SG-FRONT, enriching the off-the-shelf indoor dataset 3D-FRONT with additional scene graph labels. Extensive experiments are conducted on SG-FRONT, where CommonScenes shows clear advantages over other methods regarding generation consistency, quality, and diversity. Codes and the dataset are available on the website.

MCML Authors

Guangyao Zhai

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Benjamin Busam

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[48]

Y. Zhang, Y. Li, H. Brown, M. Rezaei, B. Bischl, P. Torr, A. Khakzar and K. Kawaguchi.
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments.
XAIA @NeurIPS 2023 - Workshop XAI in Action: Past, Present, and Future Applications at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as faithfulness, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use all and only these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by explicitly designing the neural network by manually setting its weights, along with designing data, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in AttributionLab, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.

MCML Authors

Yawei Li

Statistical Learning and Data Science

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Ashkan Khakzar

Dr.

* Former Member

[47]

M. F. Azampour, Y. Velikova, E. Fatemizadeh, S. P. Dakua and N. Navab.
Self-supervised Probe Pose Regression via Optimized Ultrasound Representations for US-CT Fusion.
MICAD 2023 - International Conference on Medical Imaging and Computer-Aided Diagnosis. Cambridge, UK, Dec 09-10, 2023. DOI GitHub

Abstract

Aligning 2D ultrasound images with 3D CT scans of the liver holds significant clinical value in enhancing diagnostic precision, surgical planning, and treatment delivery. Conventional approaches primarily rely on optimization techniques, which often have a limited capture range and are susceptible to initialization errors. To address these limitations, we define the problem as “probe pose regression” and leverage deep learning for a more robust and efficient solution for liver US-CT registration without access to paired data. The proposed method is a three-part framework that combines ultrasound rendering, generative model and pose regression. In the first stage, we exploit a differentiable ultrasound rendering model designed to synthesize ultrasound images given segmentation labels. We let the downstream task optimize the rendering parameters, enhancing the performance of the overall method. In the second stage, a generative model bridges the gap between real and rendered ultrasound images, enabling application on real B-mode images. Finally, we use a patient-specific pose regression network, trained self-supervised with only synthetic images and their known poses. We use ultrasound, and CT scans from a dual-modality human abdomen phantom to validate the proposed method. Our experimental results indicate that the proposed method can estimate probe poses within an acceptable error margin, which can later be fine-tuned using conventional methods. This capability confirms that the proposed framework can serve as a reliable initialization step for US-CT fusion and achieve fully automated US-CT fusion when coupled with conventional methods.

MCML Authors

Mohammad Farid Azampour

Computer Aided Medical Procedures & Augmented Reality

Yordanka Velikova

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[46]

A. T. Stüber, S. Coors, B. Schachtner, T. Weber, D. Rügamer, A. Bender, A. Mittermeier, O. Öcal, M. Seidensticker, J. Ricke, B. Bischl and M. Ingrisch.
A comprehensive machine learning benchmark study for radiomics-based survival analysis of CT imaging data in patients with hepatic metastases of CRC.
Investigative Radiology 58.12 (Dec. 2023). DOI

Abstract

Optimizing a machine learning (ML) pipeline for radiomics analysis involves numerous choices in data set composition, preprocessing, and model selection. Objective identification of the optimal setup is complicated by correlated features, interdependency structures, and a multitude of available ML algorithms. Therefore, we present a radiomics-based benchmarking framework to optimize a comprehensive ML pipeline for the prediction of overall survival. This study is conducted on an image set of patients with hepatic metastases of colorectal cancer, for which radiomics features of the whole liver and of metastases from computed tomography images were calculated. A mixed model approach was used to find the optimal pipeline configuration and to identify the added prognostic value of radiomics features.

MCML Authors

Theresa Stüber

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Stefan Coors

* Former Member

Balthasar Schachtner

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Clinical Data Science in Radiology

Tobias Weber

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[45]

C. Wachinger, T. N. Wolf and S. Pölsterl.
Deep learning for the prediction of type 2 diabetes mellitus from neck-to-knee Dixon MRI in the UK biobank.
Heliyon 9.11 (Nov. 2023). DOI

Abstract

Rationale and objectives: We evaluate the automatic identification of type 2 diabetes from neck-to-knee, two-point Dixon MRI scans with 3D convolutional neural networks on a large, population-based dataset. To this end, we assess the best combination of MRI contrasts and stations for diabetes prediction, and the benefit of integrating risk factors.
Materials and methods: Subjects with type 2 diabetes mellitus have been identified in the prospective UK Biobank Imaging study, and a matched control sample has been created to avoid confounding bias. Five-fold cross-validation is used for the evaluation. All scans from the two-point Dixon neck-to-knee sequence have been standardized. A neural network that considers multi-channel MRI input was developed and integrates clinical information in tabular format. An ensemble strategy is used to combine multi-station MRI predictions. A subset with quantitative fat measurements is identified for comparison to prior approaches.
Results: MRI scans from 3406 subjects (mean age, 66.2 years ± 7.1 [standard deviation]; 1128 women) were analyzed with 1703 diabetics. A balanced accuracy of 78.7%, AUC ROC of 0.872, and an average precision of 0.878 was obtained for the classification of diabetes. The ensemble over multiple Dixon MRI stations yields better performance than selecting the individually best station. Moreover, combining fat and water scans as multi-channel inputs to the networks improves upon just using single contrasts as input. Integrating clinical information about known risk factors of diabetes in the network boosts the performance across all stations and the ensemble. The neural network achieved superior results compared to the prediction based on quantitative MRI measurements.
Conclusions: The developed deep learning model accurately predicted type 2 diabetes from neck-to-knee two-point Dixon MRI scans.

MCML Authors

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Tom Nuno Wolf

Artificial Intelligence in Medical Imaging

[44]

R. Raab, A. Küderle, A. Zakreuskaya, A. D. Stern, J. Klucken, G. Kaissis, D. Rückert, S. Boll, R. Eils, H. Wagener and B. M. Eskofier.
Federated electronic health records for the European Health Data Space.
The Lancet Digital Health 5.11 (Nov. 2023). DOI

Abstract

The European Commission’s draft for the European Health Data Space (EHDS) aims to empower citizens to access their personal health data and share it with physicians and other health-care providers. It further defines procedures for the secondary use of electronic health data for research and development. Although this planned legislation is undoubtedly a step in the right direction, implementation approaches could potentially result in centralised data silos that pose data privacy and security risks for individuals. To address this concern, we propose federated personal health data spaces, a novel architecture for storing, managing, and sharing personal electronic health records that puts citizens at the centre—both conceptually and technologically. The proposed architecture puts citizens in control by storing personal health data on a combination of personal devices rather than in centralised data silos. We describe how this federated architecture fits within the EHDS and can enable the same features as centralised systems while protecting the privacy of citizens. We further argue that increased privacy and control do not contradict the use of electronic health data for research and development. Instead, data sovereignty and transparency encourage active participation in studies and data sharing. This combination of privacy-by-design and transparent, privacy-preserving data sharing can enable health-care leaders to break the privacy-exploitation barrier, which currently limits the secondary use of health data in many cases.

MCML Authors

Georgios Kaissis

Dr.

* Former Member

Daniel Rückert

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Artificial Intelligence in Healthcare and Medicine

[43]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR Embeddings.
Preprint (Nov. 2023). arXiv

Abstract

Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray image classification.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[42]

D. Scholz, B. Wiestler, D. Rückert and M. Menten.
Metrics to Quantify Global Consistency in Synthetic Medical Images.
DGM4 @MICCAI 2023 - 3rd International Workshop on Deep Generative Models at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Image synthesis is increasingly being adopted in medical image processing, for example for data augmentation or inter-modality image translation. In these critical applications, the generated images must fulfill a high standard of biological correctness. A particular requirement for these images is global consistency, i.e an image being overall coherent and structured so that all parts of the image fit together in a realistic and meaningful way. Yet, established image quality metrics do not explicitly quantify this property of synthetic images. In this work, we introduce two metrics that can measure the global consistency of synthetic images on a per-image basis. To measure the global consistency, we presume that a realistic image exhibits consistent properties, e.g., a person’s body fat in a whole-body MRI, throughout the depicted object or scene. Hence, we quantify global consistency by predicting and comparing explicit attributes of images on patches using supervised trained neural networks. Next, we adapt this strategy to an unlabeled setting by measuring the similarity of implicit image features predicted by a self-supervised trained network. Our results demonstrate that predicting explicit attributes of synthetic images on patches can distinguish globally consistent from inconsistent images. Implicit representations of images are less sensitive to assess global consistency but are still serviceable when labeled data is unavailable. Compared to established metrics, such as the FID, our method can explicitly measure global consistency on a per-image basis, enabling a dedicated analysis of the biological plausibility of single synthetic images.

MCML Authors

Daniel Scholz

AI for Image-Guided Diagnosis and Therapy

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[41]

V. A. Zimmer, K. Hammernik, V. Sideri-Lampretsa, W. Huang, A. Reithmeir, D. Rückert and J. A. Schnabel.
Towards Generalised Neural Implicit Representations for Image Registration.
DGM4 @MICCAI 2023 - 3rd International Workshop on Deep Generative Models at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Neural implicit representations (NIRs) enable to generate and parametrize the transformation for image registration in a continuous way. By design, these representations are image-pair-specific, meaning that for each signal a new multi-layer perceptron has to be trained. In this work, we investigate for the first time the potential of existent NIR generalisation methods for image registration and propose novel methods for the registration of a group of image pairs using NIRs. To exploit the generalisation potential of NIRs, we encode the fixed and moving image volumes to latent representations, which are then used to condition or modulate the NIR. Using ablation studies on a 3D benchmark dataset, we show that our methods are able to generalise to a set of image pairs with a performance comparable to pairwise registration using NIRs when trained on and datasets. Our results demonstrate the potential of generalised NIRs for 3D deformable image registration.

MCML Authors

Anna Reithmeir

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

[40]

Y. Yeganeh, A. Farshad, G. Guevercin, A. Abu-zer, R. Xiao, Y. Tang, E. Adeli and N. Navab.
SCOPE: Structural Continuity Preservation for Medical Image Segmentation.
GRAIL @MICCAI 2023 - 5th Workshop on GRaphs in biomedicAl Image anaLysis at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Although the preservation of shape continuity and physiological anatomy is a natural assumption in the segmentation of medical images, it is often neglected by deep learning methods that mostly aim for the statistical modeling of input data as pixels rather than interconnected structures. In biological structures, however, organs are not separate entities; for example, in reality, a severed vessel is an indication of an underlying problem, but traditional segmentation models are not designed to strictly enforce the continuity of anatomy, potentially leading to inaccurate medical diagnoses. To address this issue, we propose a graph-based approach that enforces the continuity and connectivity of anatomical topology in medical images. Our method encodes the continuity of shapes as a graph constraint, ensuring that the network’s predictions maintain this continuity. We evaluate our method on two public benchmarks on retinal vessel segmentation, showing significant improvements in connectivity metrics compared to traditional methods while getting better or on-par performance on segmentation metrics.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Rui Xiao

Interpretable and Reliable Machine Learning

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[39]

Y. Yeganeh, G. Güvercin, R. Xiao, A. Abuzer, E. Adeli, A. Farshad and N. Navab.
SCOPE: Structural Continuity Preservation for Retinal Vessel Segmentation.
GRAIL @MICCAI 2023 - 5th Workshop on GRaphs in biomedicAl Image anaLysis at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Although the preservation of shape continuity and physiological anatomy is a natural assumption in the segmentation of medical images, it is often neglected by deep learning methods that mostly aim for the statistical modeling of input data as pixels rather than interconnected structures. In biological structures, however, organs are not separate entities; for example, in reality, a severed vessel is an indication of an underlying problem, but traditional segmentation models are not designed to strictly enforce the continuity of anatomy, potentially leading to inaccurate medical diagnoses. To address this issue, we propose a graph-based approach that enforces the continuity and connectivity of anatomical topology in medical images. Our method encodes the continuity of shapes as a graph constraint, ensuring that the network’s predictions maintain this continuity. We evaluate our method on three public benchmarks of retinal vessel segmentation and one neuronal structure segmentation benchmark, showing significant improvements in connectivity metrics compared to previous works while getting better or on-par performance on segmentation metrics.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Rui Xiao

Interpretable and Reliable Machine Learning

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[38]

R. Holland, O. Leingang, C. Holmes, P. Anders, R. Kaye, S. Riedl, J. C. Paetzold, I. Ezhov, H. Bogunović, U. Schmidt-Erfurth, H. P. N. Scholl, S. Sivaprasad, A. J. Lotery, D. Rückert and M. Menten.
Clustering Disease Trajectories in Contrastive Feature Space for Biomarker Proposal in Age-Related Macular Degeneration.
MICCAI 2023 - 26th International Conference on Medical Image Computing and Computer Assisted Intervention. Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Age-related macular degeneration (AMD) is the leading cause of blindness in the elderly. Current grading systems based on imaging biomarkers only coarsely group disease stages into broad categories that lack prognostic value for future disease progression. It is widely believed that this is due to their focus on a single point in time, disregarding the dynamic nature of the disease. In this work, we present the first method to automatically propose biomarkers that capture temporal dynamics of disease progression. Our method represents patient time series as trajectories in a latent feature space built with contrastive learning. Then, individual trajectories are partitioned into atomic sub-sequences that encode transitions between disease states. These are clustered using a newly introduced distance metric. In quantitative experiments we found our method yields temporal biomarkers that are predictive of conversion to late AMD. Furthermore, these clusters were highly interpretable to ophthalmologists who confirmed that many of the clusters represent dynamics that have previously been linked to the progression of AMD, even though they are currently not included in any clinical grading system.

MCML Authors

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

[37]

N. Stolt-Ansó, J. McGinnis, J. Pan, K. Hammernik and D. Rückert.
NISF: Neural implicit segmentation functions.
MICCAI 2023 - 26th International Conference on Medical Image Computing and Computer Assisted Intervention. Vancouver, Canada, Oct 08-12, 2023. DOI

Abstract

Segmentation of anatomical shapes from medical images has taken an important role in the automation of clinical measurements. While typical deep-learning segmentation approaches are performed on discrete voxels, the underlying objects being analysed exist in a real-valued continuous space. Approaches that rely on convolutional neural networks (CNNs) are limited to grid-like inputs and not easily applicable to sparse or partial measurements. We propose a novel family of image segmentation models that tackle many of CNNs’ shortcomings: Neural Implicit Segmentation Functions (NISF). Our framework takes inspiration from the field of neural implicit functions where a network learns a mapping from a real-valued coordinate-space to a shape representation. NISFs have the ability to segment anatomical shapes in high-dimensional continuous spaces. Training is not limited to voxelized grids, and covers applications with sparse and partial data. Interpolation between observations is learnt naturally in the training procedure and requires no post-processing. Furthermore, NISFs allow the leveraging of learnt shape priors to make predictions for regions outside of the original image plane. We go on to show the framework achieves dice scores of on a (3D+t) short-axis cardiac segmentation task using the UK Biobank dataset. We also provide a qualitative analysis on our frameworks ability to perform segmentation and image interpolation on unseen regions of an image volume at arbitrary resolutions.

MCML Authors

Nil Stolt-Ansó

Artificial Intelligence in Healthcare and Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[36]

Y. Yeganeh, A. Farshad and N. Navab.
Anatomy-Aware Masking for Inpainting in Medical Imaging.
ShapeMI @MICCAI 2023 - 3rd Workshop on Shape in Medical Imaging at the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). Vancouver, Canada, Oct 08-12, 2023. DOI GitHub

Abstract

Inpainting has recently been employed as a successful deep-learning technique for unsupervised model discovery in medical image analysis by taking advantage of the strong priors learned by models to reconstruct the structure and texture of missing parts in images. Even though the learned features depend on the masks as well as the images, the masks used for inpainting are typically random and independent of the dataset, due to the unpredictability of the content of images, i.e., different objects and shapes can appear in different locations in images. However, this is rarely the case for medical imaging data since they are obtained from similar anatomies. Still, random square masks are the most popular technique for inpainting in medical imaging. In this work, we propose a pipeline to generate, position and sample the masks to efficiently learn the shape and structures of the anatomy and generate a myriad of diverse anatomy-aware masks, aiding the model in learning the statistical shape prior to the topology of the organs of interest. We demonstrate the impact of our approach compared to other masking mechanisms in the reconstruction of anatomy. We compare the effectiveness of our proposed masking approach over square-shaped masks, which are traditionally used in medical imaging, and irregular shape masks, which are used in SOTA inpainting literature.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[35]

M. Menten, J. C. Paetzold, V. A. Zimmer, S. Shit, I. Ezhov, R. Holland, M. Probst, J. A. Schnabel and D. Rückert.
A Skeletonization Algorithm for Gradient-Based Optimization.
ICCV 2023 - IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

The skeleton of a digital image is a compact representation of its topology, geometry, and scale. It has utility in many computer vision applications, such as image description, segmentation, and registration. However, skeletonization has only seen limited use in contemporary deep learning solutions. Most existing skeletonization algorithms are not differentiable, making it impossible to integrate them with gradient-based optimization. Compatible algorithms based on morphological operations and neural networks have been proposed, but their results often deviate from the geometry and topology of the true medial axis. This work introduces the first three-dimensional skeletonization algorithm that is both compatible with gradient-based optimization and preserves an object’s topology. Our method is exclusively based on matrix additions and multiplications, convolutional operations, basic non-linear functions, and sampling from a uniform probability distribution, allowing it to be easily implemented in any major deep learning library. In benchmarking experiments, we prove the advantages of our skeletonization algorithm compared to non-differentiable, morphological, and neural-network-based baselines. Finally, we demonstrate the utility of our algorithm by integrating it with two medical image processing applications that use gradient-based optimization: deep-learning-based blood vessel segmentation, and multimodal registration of the mandible in computed tomography and magnetic resonance images.

MCML Authors

Martin Menten

Dr.

Artificial Intelligence in Healthcare and Medicine

Julia Schnabel

Prof. Dr.

Computational Imaging and AI in Medicine

Daniel Rückert

Prof. Dr.

Artificial Intelligence in Healthcare and Medicine

[34]

A. Farshad, Y. Yeganeh, Y. Chi, C. Shen, B. Ommer and N. Navab.
Scenegenie: Scene graph guided diffusion models for image synthesis.
ICCV 2023 - Workshop at the IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

Text-conditioned image generation has made significant progress in recent years with generative adversarial networks and more recently, diffusion models. While diffusion models conditioned on text prompts have produced impressive and high-quality images, accurately representing complex text prompts such as the number of instances of a specific object remains challenging.To address this limitation, we propose a novel guidance approach for the sampling process in the diffusion model that leverages bounding box and segmentation map information at inference time without additional training data. Through a novel loss in the sampling process, our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints, leading to high-resolution images that accurately represent the scene. To obtain bounding box and segmentation map information, we structure the text prompt as a scene graph and enrich the nodes with CLIP embeddings. Our proposed model achieves state-of-the-art performance on two public benchmarks for image generation from scene graphs, surpassing both scene graph to image and text-based diffusion models in various metrics. Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process for more accurate text-to-image generation.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Björn Ommer

Prof. Dr.

Computer Vision & Learning

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[33]

Y. Yeganeh, A. Farshad, P. Weinberger, S.-A. Ahmadi, E. Adeli and N. Navab.
Transformers pay attention to convolutions leveraging emerging properties of vits by dual attention-image network.
ICCV 2023 - Workshop at the IEEE/CVF International Conference on Computer Vision. Paris, France, Oct 02-06, 2023. DOI

Abstract

Although purely transformer-based architectures pretrained on large datasets are introduced as foundation models for general computer vision tasks, hybrid models that incorporate combinations of convolution and transformer blocks showed state-of-the-art performance in more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to convolutional networks, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose a novel and simple architecture based on only convolutional layers and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network, complex transformer-based networks, and even 3D architectures are outperformed with much fewer computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model in the other branch. The results of our experiments on medical imaging datasets show that the extracted attention map visualizations from the attention heads of a pre-trained transformer architecture combined with the image provide strong prior knowledge for a pure CNN architecture to outperform CNN-based and transformer-based architectures.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[32]

F. Bongratz, A.-M. Rickmann and C. Wachinger.
Abdominal organ segmentation via deep diffeomorphic mesh deformations.
Scientific Reports 13.1 (Oct. 2023). DOI

Abstract

Abdominal organ segmentation from CT and MRI is an essential prerequisite for surgical planning and computer-aided navigation systems. It is challenging due to the high variability in the shape, size, and position of abdominal organs. Three-dimensional numeric representations of abdominal shapes with point-wise correspondence to a template are further important for quantitative and statistical analyses thereof. Recently, template-based surface extraction methods have shown promising advances for direct mesh reconstruction from volumetric scans. However, the generalization of these deep learning-based approaches to different organs and datasets, a crucial property for deployment in clinical environments, has not yet been assessed. We close this gap and employ template-based mesh reconstruction methods for joint liver, kidney, pancreas, and spleen segmentation. Our experiments on manually annotated CT and MRI data reveal limited generalization capabilities of previous methods to organs of different geometry and weak performance on small datasets. We alleviate these issues with a novel deep diffeomorphic mesh-deformation architecture and an improved training scheme. The resulting method, UNetFlow, generalizes well to all four organs and can be easily fine-tuned on new data. Moreover, we propose a simple registration-based post-processing that aligns voxel and mesh outputs to boost segmentation accuracy.

MCML Authors

Fabian Bongratz

Artificial Intelligence in Medical Imaging

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

[31]

A. Khakzar.
Rethinking Feature Attribution for Neural Network Explanation.
Dissertation 2023. URL

Abstract

Feature attribution is arguably the predominant approach for illuminating black-box neural networks. This dissertation rethinks feature attribution by leveraging critical neural pathways, identifying input features with predictive information, and evaluating feature attribution using the neural network model. The dissertation also rethinks feature attribution for the explanation of medical imaging models.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

[30]

A. Stüber, S. Coors and M. Ingrisch.
Revitalize the Potential of Radiomics: Interpretation and Feature Stability in Medical Imaging Analyses through Groupwise Feature Importance.
LB-D-DC @xAI 2023 - Late-breaking Work, Demos and Doctoral Consortium at the 1st World Conference on eXplainable Artificial Intelligence (xAI 2023). Lisbon, Portugal, Jul 26-28, 2023. PDF

Abstract

Radiomics, involving analysis of calculated, quantitative features from medical images with machine learning tools, shares the instability challenge with other high-dimensional data analyses due to variations in the training set. This instability affects model interpretation and feature importance assessment. To enhance stability and interpretability, we introduce grouped feature importance, shedding light on tool limitations and advocating for more reliable radiomics-based analysis methods.

MCML Authors

Stefan Coors

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[29]

A. Farshad.
Representation learning for semantic scene understanding.
HHAI 2023 - 2nd International Conference on Hybrid Human-Artificial Intelligence. Munich, Germany, Jun 26-30, 2023. DOI

Abstract

Recent advances in semantic scene understanding have underscored its growing significance in the field of computer vision. Enhanced representations can be achieved by incorporating semantic information derived from textual data and applying it to generative models for scene modeling. Nevertheless, the features extracted from text prompts may not seamlessly model a scene. Scene graphs offer a robust solution to address this challenge, serving as a powerful representation for semantic image generation and manipulation. In this study, we delve into the utilization of scene graphs for this purpose and propose novel methodologies to augment both the representation and learning processes involved in image generation and manipulation. For image generation, we examine meta-learning for producing images in unprecedented scenes and refine the generated images using an autoregressive scene graph generation model. In terms of image manipulation, we put forth a novel self-supervised method that eliminates the need for paired before-and-after data. Additionally, we boost image manipulation performance by disentangling latent and graph representations in a self-supervised manner. By evaluating the efficacy of our proposed approaches on a diverse range of publicly available benchmarks, we demonstrate their superiority, ultimately achieving state-of-the-art performance in the domain of semantic image generation and manipulation.

MCML Authors

Azade Farshad

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[28]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis.
PAKDD 2023 - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Osaka, Japan, May 25-28, 2023. DOI

Abstract

While recent advances in large-scale foundational models show promising results, their application to the medical domain has not yet been explored in detail. In this paper, we progress into the realms of large-scale modeling in medical synthesis by proposing Cheff - a foundational cascaded latent diffusion model, which generates highly-realistic chest radiographs providing state-of-the-art quality on a 1-megapixel scale. We further propose MaCheX, which is a unified interface for public chest datasets and forms the largest open collection of chest X-rays up to date. With Cheff conditioned on radiological reports, we further guide the synthesis process over text prompts and unveil the research area of report-to-chest-X-ray generation.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[27]

A. Mittermeier.
Robust evaluation of contrast-enhanced imaging for perfusion quantification.
Dissertation 2023. DOI

Abstract

This thesis advances the quantification and prediction of hemodynamic parameters in dynamic contrast-enhanced (DCE) imaging through two innovative approaches. The Bayesian Tofts model (BTM) improves the reliability and uncertainty estimation of perfusion parameters, demonstrating its potential for enhanced treatment response assessment in cancer care. Additionally, the development of a deep learning model offers a promising alternative by directly predicting clinical endpoints from raw DCE-CT data, eliminating the need for traditional tracer-kinetic modeling and paving the way for more efficient and accurate clinical applications in stroke and other conditions. (Shortened.)

MCML Authors

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

[26]

Y. Yeganeh, A. Farshad, P. Weinberger, S.-A. Ahmadi, E. Adeli and N. Navab.
DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation.
Preprint (Apr. 2023). arXiv

Abstract

Although purely transformer-based architectures showed promising performance in many computer vision tasks, many hybrid models consisting of CNN and transformer blocks are introduced to fit more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to CNNs in medical imaging segmentation, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose simple architectures based on purely convolutional layers, and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network (e.g., DINO) one can outperform complex transformer-based networks with much less computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model (as multiple channels) in the other branch. The results of our experiments on two publicly available medical imaging datasets show that the proposed pipeline outperforms U-Net and the state-of-the-art medical image segmentation models.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[25]

W. Simson.
Physics-Informed Deep Learning for Advanced Medical Ultrasound.
Dissertation 2022. URL

Abstract

Freehand ultrasound imaging is an important medical imaging modality due to its ease of applicability and wide application spectrum. Still, modern ultrasound imaging is a largely passive imaging modality, and does not dynamically adapt to the physics in the medium of interest. This dissertation presents the application of physics-informed deep learning for ultrasound imaging applied to sound speed estimation.

MCML Authors

Walter Simson

Dr.

* Former Member

[24]

A. Farshad, Y. Yeganeh, H. Dhamo, F. Tombari and N. Navab.
DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation.
BMVC 2022 - 33rd British Machine Vision Conference. London, UK, Nov 21-24, 2022. URL GitHub

Abstract

Graph representation of objects and their relations in a scene, known as a scene graph, provides a precise and discernible interface to manipulate a scene by modifying the nodes or the edges in the graph. Although existing works have shown promising results in modifying the placement and pose of objects, scene manipulation often leads to losing some visual characteristics like the appearance or identity of objects. In this work, we propose DisPositioNet, a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs in a self-supervised manner. Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph. In addition to producing more realistic images due to the decomposition of features like pose and identity, our method takes advantage of the probabilistic sampling in the intermediate features to generate more diverse images in object replacement or addition tasks. The results of our experiments show that disentangling the feature representations in the latent manifold of the model outperforms the previous works qualitatively and quantitatively on two public benchmarks.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[23]

S. Shit, R. Koner, B. Wittmann, J. C. Paetzold, I. Ezhov, H. Li, J. Pan, S. Sharifzadeh, G. Kaissis, V. Tresp and B. Menze.
Relationformer: A Unified Framework for Image-to-Graph Generation.
ECCV 2022 - 17th European Conference on Computer Vision. Tel Aviv, Israel, Oct 23-27, 2022. DOI GitHub

Abstract

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach’s effectiveness and generalizability.

MCML Authors

Rajat Koner

Database Systems and Data Mining

Georgios Kaissis

Dr.

* Former Member

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

[22]

A. Farshad, A. Makarevich, V. Belagiannis and N. Navab.
MetaMedSeg: Volumetric Meta-learning for Few-Shot Organ Segmentation.
DART @MICCAI 2022 - 4th Workshop on Domain Adaptation and Representation Transfer at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI GitHub

Abstract

The lack of sufficient annotated image data is a common issue in medical image segmentation. For some organs and densities, the annotation may be scarce, leading to poor model training convergence, while other organs have plenty of annotated data. In this work, we present MetaMedSeg, a gradient-based meta-learning algorithm that redefines the meta-learning task for the volumetric medical data with the goal of capturing the variety between the slices. We also explore different weighting schemes for gradients aggregation, arguing that different tasks might have different complexity and hence, contribute differently to the initialization. We propose an importance-aware weighting scheme to train our model. In the experiments, we evaluate our method on the medical decathlon dataset by extracting 2D slices from CT and MRI volumes of different organs and performing semantic segmentation. The results show that our proposed volumetric task definition leads to up to improvement in terms of IoU compared to related baselines. The proposed update rule is also shown to improve the performance for complex scenarios where the data distribution of the target organ is very different from the source organs.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[21]

Y. Yeganeh, A. Farshad, J. Boschmann, R. Gaus, M. Frantzen and N. Navab.
FedAP: Adaptive Personalization in Federated Learning for Non-IID Data.
DeCaF FAIR @MICCAI 2022 - 3rd Workshop on Distributed, Collaborative, and Federated Learning, and Affordable AI and Healthcare for Resource Diverse Global Health at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI

Abstract

Federated learning (FL) is a distributed learning method that offers medical institutes the prospect of collaboration in a global model while preserving the privacy of their patients. Although most medical centers conduct similar medical imaging tasks, their differences, such as specializations, number of patients, and devices, lead to distinctive data distributions. Data heterogeneity poses a challenge for FL and the personalization of the local models. In this work, we investigate an adaptive hierarchical clustering method for FL to produce intermediate semi-global models, so clients with similar data distribution have the chance of forming a more specialized model. Our method forms several clusters consisting of clients with the most similar data distributions; then, each cluster continues to train separately. Inside the cluster, we use meta-learning to improve the personalization of the participants’ models. We compare the clustering approach with classical FedAvg and centralized training by evaluating our proposed methods on the HAM10k dataset for skin lesion classification with extreme heterogeneous data distribution. Our experiments demonstrate significant performance gain in heterogeneous distribution compared to standard FL methods in classification accuracy. Moreover, we show that the models converge faster if applied in clusters and outperform centralized training while using only a small subset of data.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[20]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Implicit Embeddings via GAN Inversion for High Resolution Chest Radiographs.
MAD @MICCAI 2022 - 1st Workshop on Medical Applications with Disentanglements at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI

Abstract

Generative models allow for the creation of highly realistic artificial samples, opening up promising applications in medical imaging. In this work, we propose a multi-stage encoder-based approach to invert the generator of a generative adversarial network (GAN) for high resolution chest radiographs. This gives direct access to its implicitly formed latent space, makes generative models more accessible to researchers, and enables to apply generative techniques to actual patient’s images. We investigate various applications for this embedding, including image compression, disentanglement in the encoded dataset, guided image manipulation, and creation of stylized samples. We find that this type of GAN inversion is a promising research direction in the domain of chest radiograph modeling and opens up new ways to combine realistic X-ray sample synthesis with radiological image analysis.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[19]

A. Farshad, Y. Yeganeh, P. Gehlbach and N. Navab.
Y-Net: A Spatiospectral Dual-Encoder Network for Medical Image Segmentation.
MICCAI 2022 - 25th International Conference on Medical Image Computing and Computer Assisted Intervention. Singapore, Sep 18-22, 2022. DOI GitHub

Abstract

Automated segmentation of retinal optical coherence tomography (OCT) images has become an important recent direction in machine learning for medical applications. We hypothesize that the anatomic structure of layers and their high-frequency variation in OCT images make retinal OCT a fitting choice for extracting spectral domain features and combining them with spatial domain features. In this work, we present Y-Net, an architecture that combines the frequency domain features with the image domain to improve the segmentation performance of OCT images. The results of this work demonstrate that the introduction of two branches, one for spectral and one for spatial domain features, brings very significant improvement in fluid segmentation performance and allows outperformance as compared to the well-known U-Net model. Our improvement was 13% on the fluid segmentation dice score and 1.9% on the average dice score. Finally, removing selected frequency ranges in the spectral domain demonstrates the impact of these features on the fluid segmentation outperformance.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[18]

P. Engstler, M. Keicher, D. Schinz, K. Mach, A. S. Gersing, S. C. Foreman, S. S. Goller, J. Weissinger, J. Rischewski, A.-S. Dietrich, B. Wiestler, J. S. Kirschke, A. Khakzar and N. Navab.
Interpretable Vertebral Fracture Diagnosis.
iMIMIC @MICCAI 2022 - Workshop on Interpretability of Machine Intelligence in Medical Image Computing at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). Singapore, Sep 18-22, 2022. DOI GitHub

Abstract

Do black-box neural network models learn clinically relevant features for fracture diagnosis? The answer not only establishes reliability, quenches scientific curiosity, but also leads to explainable and verbose findings that can assist the radiologists in the final and increase trust. This work identifies the concepts networks use for vertebral fracture diagnosis in CT images. This is achieved by associating concepts to neurons highly correlated with a specific diagnosis in the dataset. The concepts are either associated with neurons by radiologists pre-hoc or are visualized during a specific prediction and left for the user’s interpretation. We evaluate which concepts lead to correct diagnosis and which concepts lead to false positives. The proposed frameworks and analysis pave the way for reliable and explainable vertebral fracture diagnosis.

MCML Authors

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Benedikt Wiestler

Prof. Dr.

AI for Image-Guided Diagnosis and Therapy

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[17]

A. Khakzar, Y. Li, Y. Zhang, M. Sanisoglu, S. T. Kim, M. Rezaei, B. Bischl and N. Navab.
Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models.
IMLH @ICML 2022 - 2nd Workshop on Interpretable Machine Learning in Healthcare at the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 17-23, 2022. arXiv

Abstract

One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing the pathology-related concepts encoded by the networks. Our study reveals differences and insights regarding the trained models that are not reflected by quantitative metrics such as AUROC and AP and show up only by looking at the models through a lens.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Yawei Li

Statistical Learning and Data Science

Mina Rezaei

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[16]

Y. Yeganeh, A. Farshad and N. Navab.
Shape-Aware Masking for Inpainting in Medical Imaging.
Preprint (Jul. 2022). arXiv

Abstract

Inpainting has recently been proposed as a successful deep learning technique for unsupervised medical image model discovery. The masks used for inpainting are generally independent of the dataset and are not tailored to perform on different given classes of anatomy. In this work, we introduce a method for generating shape-aware masks for inpainting, which aims at learning the statistical shape prior. We hypothesize that although the variation of masks improves the generalizability of inpainting models, the shape of the masks should follow the topology of the organs of interest. Hence, we propose an unsupervised guided masking approach based on an off-the-shelf inpainting model and a superpixel over-segmentation algorithm to generate a wide range of shape-dependent masks. Experimental results on abdominal MR image reconstruction show the superiority of our proposed masking method over standard methods using square-shaped or dataset of irregular shape masks.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[15]

A. Khakzar, P. Khorsandi, R. Nobahari and N. Navab.
Do Explanations Explain? Model Knows Best.
CVPR 2022 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA, Jun 19-24, 2022. DOI GitHub

Abstract

It is a mystery which input features contribute to a neural network’s output. Various explanation (feature attribution) methods are proposed in the literature to shed light on the problem. One peculiar observation is that these explanations (attributions) point to different features as being important. The phenomenon raises the question, which explanation to trust? We propose a framework for evaluating the explanations using the neural network model itself. The framework leverages the network to generate input features that impose a particular behavior on the output. Using the generated features, we devise controlled experimental setups to evaluate whether an explanation method conforms to an axiom. Thus we propose an empirical framework for axiomatic evaluation of explanation methods. We evaluate well-known and promising explanation solutions using the proposed framework. The framework provides a toolset to reveal properties and drawbacks within existing and future explanation solutions

MCML Authors

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[14]

M. Keicher, K. Zaripova, T. Czempiel, K. Mach, A. Khakzar and N. Navab.
Few-shot Structured Radiology Report Generation Using Natural Language Prompts.
Preprint (Mar. 2022). arXiv

Abstract

MCML Authors

Matthias Keicher

Dr.

Computer Aided Medical Procedures & Augmented Reality

Kamilia Zaripova

Computer Aided Medical Procedures & Augmented Reality

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[13]

Y. Zhang, A. Khakzar, Y. Li, A. Farshad, S. T. Kim and N. Navab.
Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information.
NeurIPS 2021 - Track on Datasets and Benchmarks at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. URL

Abstract

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network’s prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features’ information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Yawei Li

Statistical Learning and Data Science

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[12]

T. Weber, M. Ingrisch, M. Fabritius, B. Bischl and D. Rügamer.
Survival-oriented embeddings for improving accessibility to complex data structures.
NeurIPS 2021 - Workshop on Bridging the Gap: from Machine Learning Research to Clinical Practice at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. arXiv

Abstract

Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and consequently, neither medical personnel nor patients do usually accept black box models as reason or basis for decisions. Apart from averseness to new technologies, this is due to missing interpretability, transparency and accountability of many machine learning methods. We propose a hazard-regularized variational autoencoder that supports straightforward interpretation of deep neural architectures in the context of survival analysis, a field highly relevant in healthcare. We apply the proposed approach to abdominal CT scans of patients with liver tumors and their corresponding survival times.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[11]

T. Weber, M. Ingrisch, B. Bischl and D. Rügamer.
Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation.
NeurIPS 2021 - Workshop on Deep Generative Models and Downstream Applications at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. PDF

Abstract

The application of deep learning in survival analysis (SA) allows utilizing unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance, and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches. We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.

MCML Authors

Tobias Weber

* Former Member

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[10]

A. Farshad, S. Musatian, H. Dhamo and N. Navab.
MIGS: Meta Image Generation from Scene Graphs.
BMVC 2021 - 32nd British Machine Vision Conference. Virtual, Nov 22-25, 2021. URL GitHub

Abstract

Generation of images from scene graphs is a promising direction towards explicit scene generation and manipulation. However, the images generated from the scene graphs lack quality, which in part comes due to high difficulty and diversity in the data. We propose MIGS (Meta Image Generation from Scene Graphs), a meta-learning based approach for few-shot image generation from graphs that enables adapting the model to different scenes and increases the image quality by training on diverse sets of tasks. By sampling the data in a task-driven fashion, we train the generator using meta-learning on different sets of tasks that are categorized based on the scene attributes. Our results show that using this meta-learning approach for the generation of images from scene graphs achieves state-of-the-art performance in terms of image quality and capturing the semantic relationships in the scene.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[9]

S. Garg, H. Dhamo, A. Farshad, S. Musatian, N. Navab and F. Tombari.
Unconditional Scene Graph Generation.
ICCV 2021 - IEEE/CVF International Conference on Computer Vision. Virtual, Oct 11-17, 2021. DOI

Abstract

Despite recent advancements in single-domain or single-object image generation, it is still challenging to generate complex scenes containing diverse, multiple objects and their interactions. Scene graphs, composed of nodes as objects and directed-edges as relationships among objects, offer an alternative representation of a scene that is more semantically grounded than images. We hypothesize that a generative model for scene graphs might be able to learn the underlying semantic structure of real-world scenes more effectively than images, and hence, generate realistic novel scenes in the form of scene graphs. In this work, we explore a new task for the unconditional generation of semantic scene graphs. We develop a deep auto-regressive model called SceneGraphGen which can directly learn the probability distribution over labelled and directed graphs using a hierarchical recurrent architecture. The model takes a seed object as input and generates a scene graph in a sequence of steps, each step generating an object node, followed by a sequence of relationship edges connecting to the previous nodes. We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes. Additionally, we demonstrate the application of the generated graphs in image synthesis, anomaly detection and scene graph completion.

MCML Authors

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

Federico Tombari

PD Dr.

Computer Aided Medical Procedures & Augmented Reality

[8]

A. Khakzar, S. Musatian, J. Buchberger, I. V. Quiroz, N. Pinger, S. Baselizadeh, S. T. Kim and N. Navab.
Towards Semantic Interpretation of Thoracic Disease and COVID-19 Diagnosis Models.
MICCAI 2021 - 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg, France, Sep 27-Oct 01, 2021. DOI GitHub

Abstract

Convolutional neural networks are showing promise in the automatic diagnosis of thoracic pathologies on chest x-rays. Their black-box nature has sparked many recent works to explain the prediction via input feature attribution methods (aka saliency methods). However, input feature attribution methods merely identify the importance of input regions for the prediction and lack semantic interpretation of model behavior. In this work, we first identify the semantics associated with internal units (feature maps) of the network. We proceed to investigate the following questions; Does a regression model that is only trained with COVID-19 severity scores implicitly learn visual patterns associated with thoracic pathologies? Does a network that is trained on weakly labeled data (e.g. healthy, unhealthy) implicitly learn pathologies? Moreover, we investigate the effect of pretraining and data imbalance on the interpretability of learned features. In addition to the analysis, we propose semantic attribution to semantically explain each prediction. We present our findings using publicly available chest pathologies (CheXpert [5], NIH ChestX-ray8 [25]) and COVID-19 datasets (BrixIA [20], and COVID-19 chest X-ray segmentation dataset [4]).

MCML Authors

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[7]

A. Khakzar, Y. Zhang, W. Mansour, Y. Cai, Y. Li, Y. Zhang, S. T. Kim and N. Navab.
Explaining COVID-19 and Thoracic Pathology Model Predictions by Identifying Informative Input Features.
MICCAI 2021 - 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg, France, Sep 27-Oct 01, 2021. DOI GitHub

Abstract

Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks’ prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building on Information Bottleneck Attribution (IBA) method, for each prediction we identify the chest X-ray regions that have high mutual information with the network’s output. Original IBA identifies input regions that have sufficient predictive information. We propose Inverse IBA to identify all informative regions. Thus all predictive cues for pathologies are highlighted on the X-rays, a desirable property for chest X-ray diagnosis. Moreover, we propose Regression IBA for explaining regression models. Using Regression IBA we observe that a model trained on cumulative severity score labels implicitly learns the severity of different X-ray regions. Finally, we propose Multi-layer IBA to generate higher resolution and more detailed attribution/saliency maps. We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-agnostic feature importance metrics on NIH Chest X-ray8 and BrixIA datasets.

MCML Authors

Ashkan Khakzar

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

* Former Member

Yawei Li

Statistical Learning and Data Science

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[6]

B. Busam.
High Performance Visual Pose Computation.
Dissertation 2021. URL

Abstract

An outside-in system uses binocular stereo and a probabilistic sparse point cloud matcher to track objects with micrometre precision in real-time. Miniaturizing the system results in a markerless inside-out stereo method with improved rotational accuracy. Reducing the constraints, we reformulate marker-free monocular pose estimation as an action decision process where the next best pose is determined using a render-and-compare strategy. This allows instance agnostic pose estimation that generalizes to unseen objects. The methods are applied on a set of medical and industrial applications.

MCML Authors

Benjamin Busam

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computer Aided Medical Procedures & Augmented Reality

[5]

M. P. Fabritius, M. Seidensticker, J. Rueckel, C. Heinze, M. Pech, K. J. Paprottka, P. M. Paprottka, J. Topalis, A. Bender, J. Ricke, A. Mittermeier and M. Ingrisch.
Bi-Centric Independent Validation of Outcome Prediction after Radioembolization of Primary and Secondary Liver Cancer.
Journal of Clinical Medicine 10.16 (Aug. 2021). DOI

Abstract

Background: Yttrium-90 radioembolization (RE) plays an important role in the treatment of liver malignancies. Optimal patient selection is crucial for an effective and safe treatment. In this study, we aim to validate the prognostic performance of a previously established random survival forest (RSF) with an external validation cohort from a different national center. Furthermore, we compare outcome prediction models with different established metrics. Methods: A previously established RSF model, trained on a consecutive cohort of 366 patients who had received RE due to primary or secondary liver tumor at a national center (center 1), was used to predict the outcome of an independent consecutive cohort of 202 patients from a different national center (center 2) and vice versa. Prognostic performance was evaluated using the concordance index (C-index) and the integrated Brier score (IBS). The prognostic importance of designated baseline parameters was measured with the minimal depth concept, and the influence on the predicted outcome was analyzed with accumulated local effects plots. RSF values were compared to conventional cox proportional hazards models in terms of C-index and IBS. Results: The established RSF model achieved a C-index of 0.67 for center 2, comparable to the results obtained for center 1, which it was trained on (0.66). The RSF model trained on center 2 achieved a C-index of 0.68 on center 2 data and 0.66 on center 1 data. CPH models showed comparable results on both cohorts, with C-index ranging from 0.68 to 0.72. IBS validation showed more differentiated results depending on which cohort was trained on and which cohort was predicted (range: 0.08 to 0.20). Baseline cholinesterase was the most important variable for survival prediction. Conclusion: The previously developed predictive RSF model was successfully validated with an independent external cohort. C-index and IBS are suitable metrics to compare outcome prediction models, with IBS showing more differentiated results. The findings corroborate that survival after RE is critically determined by functional hepatic reserve and thus baseline liver function should play a key role in patient selection.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Andreas Mittermeier

Dr.

Clinical Data Science in Radiology

Michael Ingrisch

Prof. Dr.

Clinical Data Science in Radiology

[4]

A. Khakzar, S. Baselizadeh, S. Khanduja, C. Rupprecht, S. T. Kim and N. Navab.
Neural Response Interpretation through the Lens of Critical Pathways.
CVPR 2021 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual, Jun 19-25, 2021. DOI

Abstract

Is critical input information encoded in specific sparse pathways within the neural network? In this work, we discuss the problem of identifying these critical pathways and subsequently leverage them for interpreting the network’s response to an input. The pruning objective — selecting the smallest group of neurons for which the response remains equivalent to the original network — has been previously proposed for identifying critical pathways. We demonstrate that sparse pathways derived from pruning do not necessarily encode critical input information. To ensure sparse pathways include critical fragments of the encoded input information, we propose pathway selection via neurons’ contribution to the response. We proceed to explain how critical pathways can reveal critical input features. We prove that pathways selected via neuron contribution are locally linear (in an ℓ 2 -ball), a property that we use for proposing a feature attribution method: ‘pathway gradient’. We validate our interpretation method using mainstream evaluation experiments. The validation of pathway gradient interpretation method further confirms that selected pathways using neuron contributions correspond to critical input features. The code 1 2 is publicly available.

MCML Authors

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[3]

P. Kopper, S. Pölsterl, C. Wachinger, B. Bischl, A. Bender and D. Rügamer.
Semi-Structured Deep Piecewise Exponential Models.
AAAI-SPACA 2021 - AAAI Spring Symposium Series on Survival Prediction: Algorithms, Challenges and Applications. Palo Alto, California, USA, Mar 21-24, 2021. PDF

Abstract

We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise expo-nential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data sources and higher-order interaction effects into the model, we embed the model class in a neural network and thereby enable the si-multaneous estimation of both inherently interpretable structured regression inputs as well as deep neural network components which can potentially process additional unstructured data sources. A proof of concept is provided by using the framework to predict Alzheimer’s disease progression based on tabular and 3D point cloud data and applying it to synthetic data.

MCML Authors

Christian Wachinger

Prof. Dr.

Artificial Intelligence in Medical Imaging

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[2]

S. Denner, A. Khakzar, M. Sajid, M. Saleh, Z. Spiclin, S. T. Kim and N. Navab.
Spatio-temporal learning from longitudinal data for multiple sclerosis lesion segmentation.
BrainLes @MICCAI 2020 - Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020). Virtual, Oct 04-08, 2020. DOI GitHub

Abstract

Segmentation of Multiple Sclerosis (MS) lesions in longitudinal brain MR scans is performed for monitoring the progression of MS lesions. We hypothesize that the spatio-temporal cues in longitudinal data can aid the segmentation algorithm. Therefore, we propose a multi-task learning approach by defining an auxiliary self-supervised task of deformable registration between two time-points to guide the neural network toward learning from spatio-temporal changes. We show the efficacy of our method on a clinical dataset comprised of 70 patients with one follow-up study for each patient. Our results show that spatio-temporal information in longitudinal data is a beneficial cue for improving segmentation. We improve the result of current state-of-the-art by 2.6% in terms of overall score (p < 0.05).

MCML Authors

Ashkan Khakzar

Dr.

* Former Member

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

[1]

Y. Yeganeh, A. Farshad, N. Navab and S. Albarqouni.
Inverse Distance Aggregation for Federated Learning with Non-IID Data.
DART DCL @MICCAI 2020 - Workshop on Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020). Virtual, Oct 04-08, 2020. DOI

Abstract

Federated learning (FL) has been a promising approach in the field of medical imaging in recent years. A critical problem in FL, specifically in medical scenarios is to have a more accurate shared model which is robust to noisy and out-of distribution clients. In this work, we tackle the problem of statistical heterogeneity in data for FL which is highly plausible in medical data where for example the data comes from different sites with different scanner settings. We propose IDA (Inverse Distance Aggregation), a novel adaptive weighting approach for clients based on meta-information which handles unbalanced and non-iid data. We extensively analyze and evaluate our method against the well-known FL approach, Federated Averaging as a baseline.

MCML Authors

Yousef Yeganeh

Computer Aided Medical Procedures & Augmented Reality

Azade Farshad

Dr.

Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Computer Aided Medical Procedures & Augmented Reality

C2 | Biology

MCML focuses on crucial issues in Biology and Biomedicine, addressing AI challenges such as liability, black-box behavior, and privacy. The goals include advancing personalized healthcare and fostering collaboration between algorithms and human experts. Additionally, MCML aims to be a key training hub for the next generation of AI-empowered professionals in medical and biological fields.

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Ralf Zimmer

Prof. Dr.

Bioinformatics

Publications in Research Area C2

[64]

T. Uscidda, L. Eyring, K. Roth, F. J. Theis, Z. Akata and M. Cuturi.
Disentangled Representation Learning with the Gromov-Monge Gap.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior while preserving geometric features is challenging, as a mapping that fully preserves these features while aligning the data distribution with the prior does not exist in general. To address these challenges, we introduce a novel approach to disentangled representation learning based on quadratic optimal transport. We formulate the problem using Gromov-Monge maps that transport one distribution onto another with minimal distortion of predefined geometric features, preserving them as much as can be achieved. To compute such maps, we propose the Gromov-Monge-Gap (GMG), a regularizer quantifying whether a map moves a reference distribution with minimal geometry distortion. We demonstrate the effectiveness of our approach for disentanglement across four standard benchmarks, outperforming other methods leveraging geometric considerations.

MCML Authors

Luca Eyring

Interpretable and Reliable Machine Learning

Karsten Roth

Interpretable and Reliable Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[63]

P. T. da Silva, A. Karollus, J. Hingerl, G. Galindez, N. Wagner, X. Hernandez-Alias, D. Incarnato and J. Gagneur.
Nucleotide dependency analysis of DNA language models reveals genomic functional elements.
CSHL 2025 - 5th Cold Spring Harbor conference on Probabilistic Modeling in Genomics. Cold Spring Harbor Laboratory, New York, USA, Mar 05-08, 2025. DOI URL

Abstract

Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, using DNA LMs for discovering functional genomic elements has been challenging due to the lack of interpretable methods. Here, we introduce nucleotide dependencies which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We generated genome-wide maps of pairwise nucleotide dependencies within kilobase ranges for animal, fungal, and bacterial species. We show that nucleotide dependencies indicate deleteriousness of human genetic variants more effectively than sequence alignment and DNA LM reconstruction. Regulatory elements appear as dense blocks in dependency maps, enabling the systematic identification of transcription factor binding sites as accurately as models trained on experimental binding data. Nucleotide dependencies also highlight bases in contact within RNA structures, including pseudoknots and tertiary structure contacts, with remarkable accuracy. This led to the discovery of four novel, experimentally validated RNA structures in Escherichia coli. Finally, using dependency maps, we reveal critical limitations of several DNA LM architectures and training sequence selection strategies by benchmarking and visual diagnosis. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes.Competing Interest StatementThe authors have declared no competing interest.

MCML Authors

Pedro Tomaz da Silva

Computational Molecular Medicine

Alexander Karollus

Computational Molecular Medicine

Johannes Hingerl

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

[62]

A. Tejada-Lapuerta, P. Bertin, S. Bauer, H. Aliee, Y. Bengio and F. J. Theis.
Causal machine learning for single-cell genomics.
Nature Genetics (Mar. 2025). DOI

Abstract

Advances in single-cell ‘-omics’ allow unprecedented insights into the transcriptional profiles of individual cells and, when combined with large-scale perturbation screens, enable measuring of the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes. In this Perspective, we delineate the application of causal machine learning to single-cell genomics and its associated challenges. We first present the causal model that is most commonly applied to single-cell biology and then identify and discuss potential approaches to three open problems: the lack of generalization of models to novel experimental conditions, the complexity of interpreting learned models, and the difficulty of learning cell dynamics.

MCML Authors

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[61]

M. E. Consens, C. Dufault, M. Wainberg, D. Forster, M. Karimzadeh, H. Goodarzi, F. J. Theis, A. Moses and B. Wang.
Transformers and genome language models.
Nature Machine Intelligence (Mar. 2025). DOI

Abstract

Large language models based on the transformer deep learning architecture have revolutionized natural language processing. Motivated by the analogy between human language and the genome’s biological code, researchers have begun to develop genome language models (gLMs) based on transformers and related architectures. This Review explores the use of transformers and language models in genomics. We survey open questions in genomics amenable to the use of gLMs, and motivate the use of gLMs and the transformer architecture for these problems. We discuss the potential of gLMs for modelling the genome using unsupervised pretraining tasks, specifically focusing on the power of zero- and few-shot learning. We explore the strengths and limitations of the transformer architecture, as well as the strengths and limitations of current gLMs more broadly. Additionally, we contemplate the future of genomic modelling beyond the transformer architecture, based on current trends in research. This Review serves as a guide for computational biologists and computer scientists interested in transformers and language models for genomic data.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[60]

P. Bertin, J. D. Viviano, A. Tejada-Lapuerta, W. Wang, S. Bauer, F. J. Theis and Y. Bengio.
A scalable gene network model of regulatory dynamics in single cells.
Preprint (Mar. 2025). arXiv

Abstract

Single-cell data provide high-dimensional measurements of the transcriptional states of cells, but extracting insights into the regulatory functions of genes, particularly identifying transcriptional mechanisms affected by biological perturbations, remains a challenge. Many perturbations induce compensatory cellular responses, making it difficult to distinguish direct from indirect effects on gene regulation. Modeling how gene regulatory functions shape the temporal dynamics of these responses is key to improving our understanding of biological perturbations. Dynamical models based on differential equations offer a principled way to capture transcriptional dynamics, but their application to single-cell data has been hindered by computational constraints, stochasticity, sparsity, and noise. Existing methods either rely on low-dimensional representations or make strong simplifying assumptions, limiting their ability to model transcriptional dynamics at scale. We introduce a Functional and Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions. Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale, provides improved functional insights into transcriptional mechanisms perturbed by gene knockouts, both in myeloid differentiation and K562 Perturb-seq experiments, and simulates single-cell trajectories of A549 cells following small-molecule perturbations.

MCML Authors

Stefan Bauer

Prof. Dr.

Algorithmic Machine Learning & Explainable AI

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[59]

J. Hingerl, L. D. Martens, A. Karollus, T. Manz, J. D. Buenrostro, F. J. Theis and J. Gagneur.
scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution.
Preprint (Mar. 2025). DOI

Abstract

Understanding how regulatory DNA elements shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA-seq and epigenomic profiling provides opportunities to build unifying models of gene regulation capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multi-modal technologies. Here, we introduce scooby, the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome from sequence at single-cell resolution. For this, we leverage the pre-trained multi-omics profile predictor Borzoi as a foundation model, equip it with a cell-specific decoder, and fine-tune its sequence embeddings. Specifically, we condition the decoder on the cell position in a precomputed single-cell embedding resulting in strong generalization capability. Applied to a hematopoiesis dataset, scooby recapitulates cell-specific expression levels of held-out genes and cells, and identifies regulators and their putative target genes through in silico motif deletion. Moreover, accurate variant effect prediction with scooby allows for breaking down bulk eQTL effects into single-cell effects and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.Competing Interest StatementJ.D.B. holds patents related to ATAC-seq and is an SAB member of Camp4 and seqWell. F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd and Omniscope Ltd, and has ownership interest in Dermagnostix GmbH and Cellarity.

MCML Authors

Johannes Hingerl

Computational Molecular Medicine

Alexander Karollus

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

A3 | Computational Models
→ Group Niki Kilbertus

Computational Molecular Medicine

[58]

E. Ailer, C. L. Müller and N. Kilbertus.
Instrumental variable estimation for compositional treatments.
Scientific Reports 15.5158 (Feb. 2025). DOI

Abstract

Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. First, we crisply articulate potential pitfalls for practitioners regarding the interpretation of compositional causes from the viewpoint of interventions and warn against attributing causal meaning to common summary statistics such as diversity indices in microbiome data analysis. We then advocate for and develop multivariate methods using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account while still yielding scientifically interpretable results. In a comparative analysis on synthetic and real microbiome data we show the advantages and limitations of our proposal. We posit that our analysis provides a useful framework and guidance for valid and informative cause-effect estimation in the context of compositional data.

MCML Authors

Elisabeth Ailer

* Former Member

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[57]

L. B. Kuemmerle, M. D. Luecken, A. B. Firsova, L. Barros de Andrade e Sousa, L. Straßer, I. I. Mekki, F. Campi, L. Heumos, M. Shulman, V. Beliaeva, S. Hediyeh-Zadeh, A. C. Schaar, K. T. Mahbubani, A. Sountoulidis, T. Balassa, F. Kovacs, P. Horvath, M. Piraud, A. Ertürk, C. Samakovlis and F. J. Theis.
Probe set selection for targeted spatial transcriptomics.
Nature Methods 21 (Dec. 2024). DOI

Abstract

Targeted spatial transcriptomic methods capture the topology of cell types and states in tissues at single-cell and subcellular resolution by measuring the expression of a predefined set of genes. The selection of an optimal set of probed genes is crucial for capturing the spatial signals present in a tissue. This requires selecting the most informative, yet minimal, set of genes to profile (gene set selection) for which it is possible to build probes (probe design). However, current selections often rely on marker genes, precluding them from detecting continuous spatial signals or new states. We present Spapros, an end-to-end probe set selection pipeline that optimizes both gene set specificity for cell type identification and within-cell type expression variation to resolve spatially distinct populations while considering prior knowledge as well as probe design and expression constraints. We evaluated Spapros and show that it outperforms other selection approaches in both cell type recovery and recovering expression variation beyond cell types. Furthermore, we used Spapros to design a single-cell resolution in situ hybridization on tissues (SCRINSHOT) experiment of adult lung tissue to demonstrate how probes selected with Spapros identify cell types of interest and detect spatial variation even within cell types.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[56]

J. Hingerl, A. Karollus and J. Gagneur.
Flashzoi: An enhanced Borzoi model for accelerated genomic analysis.
Preprint (Dec. 2024). DOI

Abstract

Accurately predicting how DNA sequence drives gene regulation and how genetic variants alter gene expression is a central challenge in genomics. Borzoi, which models over ten thousand genomic assays including RNA-seq coverage from over half a megabase of sequence context alone promises to become an important foundation model in regulatory genomics, both for massively annotating variants and for further model development. However, its reliance on handcrafted, relative positional encodings within the transformer architecture limits its computational efficiency. Here we present Flashzoi, an enhanced Borzoi model that leverages rotary positional encodings and FlashAttention-2. This achieves over 3-fold faster training and inference and up to 2.4-fold reduced memory usage, while maintaining or improving accuracy in modeling various genomic assays including RNA-seq coverage, predicting variant effects, and enhancer-promoter linking. Flashzoi{textquoteright}s improved efficiency facilitates large-scale genomic analyses and opens avenues for exploring more complex regulatory mechanisms and modeling.Competing Interest StatementThe authors have declared no competing interest.

MCML Authors

Johannes Hingerl

Computational Molecular Medicine

Alexander Karollus

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

[55]

B. Clarke, E. Holtkamp, H. Öztürk, M. Mück, M. Wahlberg, K. Meyer, F. Munzlinger, F. Brechtmann, F. R. Hölzlwimmer, J. Lindner, Z. Chen, J. Gagneur and O. Stegle.
Integration of variant annotations using deep set networks boosts rare variant association testing.
Nature Genetics 56 (Sep. 2024). DOI

Abstract

Rare genetic variants can have strong effects on phenotypes, yet accounting for rare variants in genetic analyses is statistically challenging due to the limited number of allele carriers and the burden of multiple testing. While rich variant annotations promise to enable well-powered rare variant association tests, methods integrating variant annotations in a data-driven manner are lacking. Here we propose deep rare variant association testing (DeepRVAT), a model based on set neural networks that learns a trait-agnostic gene impairment score from rare variant annotations and phenotypes, enabling both gene discovery and trait prediction. On 34 quantitative and 63 binary traits, using whole-exome-sequencing data from UK Biobank, we find that DeepRVAT yields substantial gains in gene discoveries and improved detection of individuals at high genetic risk. Finally, we demonstrate how DeepRVAT enables calibrated and computationally efficient rare variant tests at biobank scale, aiding the discovery of genetic risk factors for human disease traits.

MCML Authors

Julien Gagneur

Prof. Dr.

A3 | Computational Models
→ Group Niki Kilbertus

Computational Molecular Medicine

[54]

A. Szałata, K. Hrovatin, S. Becker, A. Tejada-Lapuerta, H. Cui, B. Wang and F. J. Theis.
Transformers in single-cell omics: a review and new perspectives.
Nature Methods 21 (Aug. 2024). DOI

Abstract

Recent efforts to construct reference maps of cellular phenotypes have expanded the volume and diversity of single-cell omics data, providing an unprecedented resource for studying cell properties. Despite the availability of rich datasets and their continued growth, current single-cell models are unable to fully capitalize on the information they contain. Transformers have become the architecture of choice for foundation models in other domains owing to their ability to generalize to heterogeneous, large-scale datasets. Thus, the question arises of whether transformers could set off a similar shift in the field of single-cell modeling. Here we first describe the transformer architecture and its single-cell adaptations and then present a comprehensive review of the existing applications of transformers in single-cell analysis and critically discuss their future potential for single-cell biology. By studying limitations and technical challenges, we aim to provide a structured outlook for future research directions at the intersection of machine learning and single-cell biology.

MCML Authors

Sören Becker

Ethics in Systems Design and Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[53]

T. Uscidda, L. Eyring, K. Roth, F. J. Theis, Z. Akata and M. Cuturi.
Disentangled Representation Learning through Geometry Preservation with the Gromov-Monge Gap.
SPIGM @ICML 2024 - Workshop on Structured Probabilistic Inference & Generative Modeling at the 41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. arXiv

Abstract

MCML Authors

Luca Eyring

Interpretable and Reliable Machine Learning

Karsten Roth

Interpretable and Reliable Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

[52]

M. Ali, M. Kuijs, S. Hediyeh-zadeh, T. Treis, K. Hrovatin, G. Palla, A. C. Schaar and F. J. Theis.
GraphCompass: spatial metrics for differential analyses of cell organization across conditions.
Bioinformatics 40.Supplement 1 (Jul. 2024). DOI

Abstract

Spatial omics technologies are increasingly leveraged to characterize how disease disrupts tissue organization and cellular niches. While multiple methods to analyze spatial variation within a sample have been published, statistical and computational approaches to compare cell spatial organization across samples or conditions are mostly lacking. We present GraphCompass, a comprehensive set of omics-adapted graph analysis methods to quantitatively evaluate and compare the spatial arrangement of cells in samples representing diverse biological conditions. GraphCompass builds upon the Squidpy spatial omics toolbox and encompasses various statistical approaches to perform cross-condition analyses at the level of individual cell types, niches, and samples. Additionally, GraphCompass provides custom visualization functions that enable effective communication of results. We demonstrate how GraphCompass can be used to address key biological questions, such as how cellular organization and tissue architecture differ across various disease states and which spatial patterns correlate with a given pathological condition. GraphCompass can be applied to various popular omics techniques, including, but not limited to, spatial proteomics (e.g. MIBI-TOF), spot-based transcriptomics (e.g. 10× Genomics Visium), and single-cell resolved transcriptomics (e.g. Stereo-seq). In this work, we showcase the capabilities of GraphCompass through its application to three different studies that may also serve as benchmark datasets for further method development. With its easy-to-use implementation, extensive documentation, and comprehensive tutorials, GraphCompass is accessible to biologists with varying levels of computational expertise. By facilitating comparative analyses of cell spatial organization, GraphCompass promises to be a valuable asset in advancing our understanding of tissue function in health and disease.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[51]

T. Wollschläger, N. Kemper, L. Hetzel, J. Sommer and S. Günnemann.
Expressivity and Generalization: Fragment-Biases for Molecular GNNs.
Preprint (Jun. 2024). arXiv

Abstract

Although recent advances in higher-order Graph Neural Networks (GNNs) improve the theoretical expressiveness and molecular property predictive performance, they often fall short of the empirical performance of models that explicitly use fragment information as inductive bias. However, for these approaches, there exists no theoretic expressivity study. In this work, we propose the Fragment-WL test, an extension to the well-known Weisfeiler & Leman (WL) test, which enables the theoretic analysis of these fragment-biased GNNs. Building on the insights gained from the Fragment-WL test, we develop a new GNN architecture and a fragmentation with infinite vocabulary that significantly boosts expressiveness. We show the effectiveness of our model on synthetic and real-world data where we outperform all GNNs on Peptides and have 12% lower error than all GNNs on ZINC and 34% lower error than other fragment-biased models. Furthermore, we show that our model exhibits superior generalization capabilities compared to the latest transformer-based architectures, positioning it as a robust solution for a range of molecular modeling tasks.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[50]

L. Eyring, D. Klein, T. Uscidda, G. Palla, N. Kilbertus, Z. Akata and F. J. Theis.
Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

In optimal transport (OT), a Monge map is known as a mapping that transports a source distribution to a target distribution in the most cost-efficient way. Recently, multiple neural estimators for Monge maps have been developed and applied in diverse unpaired domain translation tasks, e.g. in single-cell biology and computer vision. However, the classic OT framework enforces mass conservation, which makes it prone to outliers and limits its applicability in real-world scenarios. The latter can be particularly harmful in OT domain translation tasks, where the relative position of a sample within a distribution is explicitly taken into account. While unbalanced OT tackles this challenge in the discrete setting, its integration into neural Monge map estimators has received limited attention. We propose a theoretically grounded method to incorporate unbalancedness into any Monge map estimator. We improve existing estimators to model cell trajectories over time and to predict cellular responses to perturbations. Moreover, our approach seamlessly integrates with the OT flow matching (OT-FM) framework. While we show that OT-FM performs competitively in image translation, we further improve performance by incorporating unbalancedness (UOT-FM), which better preserves relevant features. We hence establish UOT-FM as a principled method for unpaired image translation.

MCML Authors

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[49]

A. Gayoso, P. Weiler, M. Lotfollahi, D. Klein, J. Hong, A. Streets, F. J. Theis and N. Yosef.
Deep generative modeling of transcriptional dynamics for RNA velocity analysis in single cells.
Nature Methods 21 (Jan. 2024). DOI

Abstract

RNA velocity has been rapidly adopted to guide interpretation of transcriptional dynamics in snapshot single-cell data; however, current approaches for estimating RNA velocity lack effective strategies for quantifying uncertainty and determining the overall applicability to the system of interest. Here, we present veloVI (velocity variational inference), a deep generative modeling framework for estimating RNA velocity. veloVI learns a gene-specific dynamical model of RNA metabolism and provides a transcriptome-wide quantification of velocity uncertainty. We show that veloVI compares favorably to previous approaches with respect to goodness of fit, consistency across transcriptionally similar cells and stability across preprocessing pipelines for quantifying RNA abundance. Further, we demonstrate that veloVI’s posterior velocity uncertainty can be used to assess whether velocity analysis is appropriate for a given dataset. Finally, we highlight veloVI as a flexible framework for modeling transcriptional dynamics by adapting the underlying dynamical model to use time-dependent transcription rates.

MCML Authors

Philipp Weiler

* Former Member

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[48]

D. Di Fraia, A. Marino, J. H. Lee, E. Kelmer Sacramento, M. Baumgart, S. Bagnoli, P. T. da Silva, A. K. Sahu, G. Siano, M. Tiessen, E. Terzibasi-Tozzini, J. Gagneur, J. Frydman, A. Cellerino and A. Ori.
Impaired biogenesis of basic proteins impacts multiple hallmarks of the aging brain.
Preprint (Jan. 2024). DOI

Abstract

Aging and neurodegeneration entail diverse cellular and molecular hallmarks. Here, we studied the effects of aging on the transcriptome, translatome, and multiple layers of the proteome in the brain of a short-lived killifish. We reveal that aging causes widespread reduction of proteins enriched in basic amino acids that is independent of mRNA regulation, and it is not due to impaired proteasome activity. Instead, we identify a cascade of events where aberrant translation pausing leads to reduced ribosome availability resulting in proteome remodeling independently of transcriptional regulation. Our research uncovers a vulnerable point in the aging brain’s biology – the biogenesis of basic DNA/RNA binding proteins. This vulnerability may represent a unifying principle that connects various aging hallmarks, encompassing genome integrity and the biosynthesis of macromolecules.

MCML Authors

Pedro Tomaz da Silva

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

[47]

F. Brechtmann, T. Bechtler, S. Londhe, C. Mertes and J. Gagneur.
Evaluation of input data modality choices on functional gene embeddings.
NAR Genomics and Bioinformatics 5.4 (Dec. 2023). DOI

Abstract

Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein–protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we benchmarked functional gene embeddings obtained from various data modalities for predicting disease-gene lists, cancer drivers, phenotype–gene associations and scores from genome-wide association studies. Off-the-shelf predictors trained on precomputed embeddings matched or outperformed dedicated state-of-the-art predictors, demonstrating their high utility. Embeddings based on literature and protein–protein interactions inferred from low-throughput experiments outperformed embeddings derived from genome-wide experimental data (transcriptomics, deletion screens and protein sequence) when predicting curated gene lists. In contrast, they did not perform better when predicting genome-wide association signals and were biased towards highly-studied genes. These results indicate that embeddings derived from literature and low-throughput experiments appear favourable in many existing benchmarks because they are biased towards well-studied genes and should therefore be considered with caution. Altogether, our study and precomputed embeddings will facilitate the development of machine-learning models in genetics and related fields.

MCML Authors

Julien Gagneur

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Computational Molecular Medicine

[46]

C. Kolb, B. Bischl, C. L. Müller and D. Rügamer.
Sparse Modality Regression.
IWSM 2023 - 37th International Workshop on Statistical Modelling. Dortmund, Germany, Jul 17-21, 2023. Best Paper Award. PDF

Abstract

Deep neural networks (DNNs) enable learning from various data modalities, such as images or text. This concept has also found its way into statistical modelling through the use of semi-structured regression, a model additively combining structured predictors with unstructured effects from arbitrary data modalities learned through a DNN. This paper introduces a new framework called sparse modality regression (SMR). SMR is a regression model combining different data modalities and uses a group lasso-type regularization approach to perform modality selection by zeroing out potentially uninformative modalities.

MCML Authors

Chris Kolb

Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

[45]

C. Kolb, C. L. Müller, B. Bischl and D. Rügamer.
Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization.
Preprint (Jul. 2023). arXiv

Abstract

We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning. The proposed optimization transfer comprises an overparameterization of selected parameters and a change of penalties. In the overparametrized problem, smooth surrogate regularization induces non-smooth, sparse regularization in the base parametrization. We prove that the surrogate objective is equivalent in the sense that it not only has identical global minima but also matching local minima, thereby avoiding the introduction of spurious solutions. Additionally, our theory establishes results of independent interest regarding matching local minima for arbitrary, potentially unregularized, objectives. We comprehensively review sparsity-inducing parametrizations across different fields that are covered by our general theory, extend their scope, and propose improvements in several aspects. Numerical experiments further demonstrate the correctness and effectiveness of our approach on several sparse learning problems ranging from high-dimensional regression to sparse neural network training.

MCML Authors

Chris Kolb

Statistical Learning and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

[44]

M. Lotfollahi, A. K. Susmelj, C. De Donno, L. Hetzel, Y. Ji, I. L. Ibarra, S. R. Srivatsan, M. Naghipourfar, R. M. Daza, B. Martin, J. Shendure, J. L. McFaline‐Figueroa, P. Boyeau, F. A. Wolf, N. Yakubova, S. Günnemann, C. Trapnell, D. Lopez‐Paz and F. J. Theis.
Predicting cellular responses to complex perturbations in high‐throughput screens.
Molecular Systems Biology 19.e11517 (Jun. 2023). DOI

Abstract

Recent advances in multiplexed single‐cell transcriptomics experiments facilitate the high‐throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep‐learning approaches for single‐cell response modeling. CPA learns to in silico predict transcriptional perturbation response at the single‐cell level for unseen dosages, cell types, time points, and species. Using newly generated single‐cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture’s modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing in silico 5,329 missing combinations (97.6% of all possibilities) in a single‐cell Perturb‐seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single‐cell level and thus accelerate therapeutic applications using single‐cell technologies.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[43]

J. Sommer, L. Hetzel, D. Lüdke, F. J. Theis and S. Günnemann.
The power of motifs as inductive bias for learning molecular distributions.
Preprint (Jun. 2023). arXiv

Abstract

Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study aims to investigate the impact of subgraph structures and vocabulary design on distribution learning, using small drug molecules as a case study. To this end, we introduce Subcover, a new subgraph-based fragmentation scheme, and evaluate it through a two-step variational auto-encoder. Our results show that Subcover’s improved identification of chemically meaningful subgraphs leads to a relative improvement of the FCD score by 30%, outperforming previous methods. Our findings highlight the potential of Subcover to enhance the performance and scalability of existing methods, contributing to the advancement of drug discovery.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

[42]

A. Karollus, J. Hingerl, D. Gankin, M. Grosshauser, K. Klemon and J. Gagneur.
Species-aware DNA language models capture regulatory elements and their evolution.
Genome Biology 35.83 (Apr. 2023). DOI

Abstract

Background: The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution.
Results: Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery.
Conclusions: Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.

MCML Authors

Alexander Karollus

Computational Molecular Medicine

Johannes Hingerl

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

[41]

D. S. Fischer, A. C. Schaar and F. J. Theis.
Modeling intercellular communication in tissues using spatial graphs of cell.
Nature Biotechnology 41 (Mar. 2023). DOI

Abstract

Models of intercellular communication in tissues are based on molecular profiles of dissociated cells, are limited to receptor–ligand signaling and ignore spatial proximity in situ. We present node-centric expression modeling, a method based on graph neural networks that estimates the effects of niche composition on gene expression in an unbiased manner from spatial molecular profiling data. We recover signatures of molecular processes known to underlie cell communication.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[40]

L. Heumos, A. C. Schaar, C. Lance, A. Litinetskaya, F. Drost, L. Zappia, M. D. Lücken, D. C. Strobl, J. Henao, F. Curion, S.-c. Best Practices Consortium, H. B. Schiller and F. J. Theis.
Best practices for single-cell analysis across modalities.
Nature Reviews Genetics 24 (Mar. 2023). DOI

Abstract

Recent advances in single-cell technologies have enabled high-throughput molecular profiling of cells across modalities and locations. Single-cell transcriptomics data can now be complemented by chromatin accessibility, surface protein expression, adaptive immune receptor repertoire profiling and spatial information. The increasing availability of single-cell data across modalities has motivated the development of novel computational methods to help analysts derive biological insights. As the field grows, it becomes increasingly difficult to navigate the vast landscape of tools and analysis steps. Here, we summarize independent benchmarking studies of unimodal and multimodal single-cell analysis across modalities to suggest comprehensive best-practice workflows for the most common analysis steps. Where independent benchmarks are not available, we review and contrast popular methods. Our article serves as an entry point for novices in the field of single-cell (multi-)omic analysis and guides advanced users to the most recent best practices.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Mathematical Modelling of Biological Systems

[39]

T. Ullmann, S. Peschel, P. Finger, C. L. Müller and A.-L. Boulesteix.
Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering.
PLOS Computational Biology 19.1 (Jan. 2023). DOI

Abstract

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.

MCML Authors

Theresa Ullmann

Dr.

* Former Member

Stefanie Peschel

Biomedical Statistics and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

[38]

P. T. da Silva, Y. Zhang, E. Theodorakis, L. D. Martens, V. A. Yépez, V. Pelechano and J. Gagneur.
Cellular energy regulates mRNA translation and degradation in a codon-specific manner.
Preprint (2023). DOI

Abstract

Background: Codon optimality is a major determinant of mRNA translation and degradation rates. However, whether and through which mechanisms its effects are regulated remains poorly understood.
Results: Here we show that codon optimality associates with up to 2-fold change in mRNA stability variations between human tissues, and that its effect is attenuated in tissues with high energy metabolism and amplifies with age. Biochemical modeling and perturbation data through oxygen deprivation and ATP synthesis inhibition reveal that cellular energy variations non-uniformly affect the decoding kinetics of different codons.
Conclusions: This new mechanism of codon effect regulation, independent of tRNA regulation, provides a fundamental mechanistic link between cellular energy metabolism and eukaryotic gene expression.

MCML Authors

Pedro Tomaz da Silva

Computational Molecular Medicine

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine

[37]

H. Aliee, T. Richter, M. Solonin, I. Ibarra, F. J. Theis and N. Kilbertus.
Sparsity in Continuous-Depth Neural Networks.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Neural Ordinary Differential Equations (NODEs) have proven successful in learning dynamical systems in terms of accurately recovering the observed trajectories. While different types of sparsity have been proposed to improve robustness, the generalization properties of NODEs for dynamical systems beyond the observed data are underexplored. We systematically study the influence of weight and feature sparsity on forecasting as well as on identifying the underlying dynamical laws. Besides assessing existing methods, we propose a regularization technique to sparsify input-output connections’’ and extract relevant features during training. Moreover, we curate real-world datasets including human motion capture and human hematopoiesis single-cell RNA-seq data to realistically analyze different levels of out-of-distribution (OOD) generalization in forecasting and dynamics identification respectively. Our extensive empirical evaluation on these challenging benchmarks suggests that weight sparsity improves generalization in the presence of noise or irregular sampling. However, it does not prevent learning spurious feature dependencies in the inferred dynamics, rendering them impractical for predictions under interventions, or for inferring the true underlying dynamics. Instead, feature sparsity can indeed help with recovering sparse ground-truth dynamics compared to unregularized NODEs.

MCML Authors

Till Richter

Mathematical Modelling of Biological Systems

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

[36]

L. Hetzel, S. Boehm, N. Kilbertus, S. Günnemann, M. Lotfollahi and F. J. Theis.
Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution.
NeurIPS 2022 - 36th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA HTS is required to enrich single-cell data meaningfully.We introduce chemCPA, a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with an architecture surgery for transfer learning and demonstrate how training on existing bulk RNA HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating drug discovery.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[35]

D. S. Fischer, M. Ali, S. Richter, A. Ertürk and F. J. Theis.
Graph neural networks learn emergent tissue properties from spatial molecular profiles.
Preprint (Nov. 2022). DOI

Abstract

Tissue phenotypes such as metabolic states, inflammation, and tumor properties are functions of molecular states of cells that constitute the tissue. Recent spatial molecular profiling assays measure tissue architecture motifs in a molecular and often unbiased way and thus can explain some aspects of emergence of these phenotypes. Here, we characterize the ability of graph neural networks to model tissue-level emergent phenotypes based on spatial data by evaluating phenotype prediction across model complexities. First, we show that immune cell dispersion in colorectal tumors, which is known to be predictive of disease outcome, can be captured by graph neural networks. Second, we show that breast cancer tumor classes can be predicted from gene expression alone without spatial information and are thus too simplistic a phenotype to require a complex model of emergence. Third, we show that representation learning approaches for spatial graphs of molecular profiles are limited by overfitting in the prevalent regime of up to 100s of images per study. We address overfitting with within-graph self-supervision and illustrate its promise for tissue representation learning as a constraint for node representations.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[34]

D. Rügamer, A. Bender, S. Wiegrebe, D. Racek, B. Bischl, C. L. Müller and C. Stachl.
Factorized Structured Regression for Large-Scale Varying Coefficient Models.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI

Abstract

Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structured regression models, including time-aware varying coefficients models, are, however, limited in their applicability to categorical effects and inclusion of a large number of interactions. Here, we propose Factorized Structured Regression (FaStR) for scalable varying coefficient models. FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation. This fusion provides a scalable framework for the estimation of statistical models in previously infeasible data settings. Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques, while scaling notably better and also being competitive with other time-aware RS in terms of prediction performance. We illustrate FaStR’s performance and interpretability on a large-scale behavioral study with smartphone user data.

MCML Authors

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

[33]

B. A. Hersbach, D. S. Fischer, G. Masserdotti, Deeksha, K. Mojžišová, T. Waltzhöni, D. Rodriguez‐Terrones, M. Heinig, F. J. Theis, M. Götz and S. H. Stricker.
Probing cell identity hierarchies by fate titration and collision during direct reprogramming.
Molecular Systems Biology 18.e11129 (Sep. 2022). DOI

Abstract

Despite the therapeutic promise of direct reprogramming, basic principles concerning fate erasure and the mechanisms to resolve cell identity conflicts remain unclear. To tackle these fundamental questions, we established a single‐cell protocol for the simultaneous analysis of multiple cell fate conversion events based on combinatorial and traceable reprogramming factor expression: Collide‐seq. Collide‐seq revealed the lack of a common mechanism through which fibroblast‐specific gene expression loss is initiated. Moreover, we found that the transcriptome of converting cells abruptly changes when a critical level of each reprogramming factor is attained, with higher or lower levels not contributing to major changes. By simultaneously inducing multiple competing reprogramming factors, we also found a deterministic system, in which titration of fates against each other yields dominant or colliding fates. By investigating one collision in detail, we show that reprogramming factors can disturb cell identity programs independent of their ability to bind their target genes. Taken together, Collide‐seq has shed light on several fundamental principles of fate conversion that may aid in improving current reprogramming paradigms.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[32]

E. M. A. Slob, A. Faiz, J. van Nijnatten, S. J. H. Vijverberg, C. Longo, M. Kutlu, F. T. Chew, Y. Y. Sio, E. Herrera-Luis, A. Espuela-Ortiz, J. Perez-Garcia, M. Pino-Yanes, E. G. Burchard, U. Potočnik, M. Gorenjak, C. Palmer, C. Maroteau, S. Turner, K. Verhamme, L. Karimi, S. Mukhopadhyay, W. Timens, P. S. Hiemstra, M. W. Pijnenburg, M. Neighbors, M. A. Grimbaldeston, G. W. Tew, C. A. Brandsma, V. Berce, H. Aliee, F. J. Theis, D. D. Sin, X. Li, M. van den Berge, A. H. Zee and G. H. Koppelman.
Association of bronchial steroid inducible methylation quantitative trait loci with asthma and chronic obstructive pulmonary disease treatment response.
Clinical and Translational Allergy 12.8 (Aug. 2022). DOI

Abstract

Large variation in response to inhaled corticosteroids (ICS) has been reported in both asthma and chronic obstructive pulmonary disease (COPD), which may partly be explained by genetic factors. The transcriptome of the airways changes following ICS treatment,1 which may be directed by single nucleotide polymorphisms (SNPs), that affect deoxyribonucleic acid (DNA) methylation (methylation-Quantitative Trait Loci, meQTL).
A strong and consistent response of the airways to ICS in both asthma and COPD patients1, 2 has been found, and severe childhood asthma has been associated with increased odds of COPD development in later life,3 showing that overlap between the diseases may exist. We hypothesised that preselection of steroid-inducible meQTL that affect DNA methylation upon ICS treatment may increase power to find SNPs that also clinically affect response to ICS and that these genetic variants might overlap between asthma and COPD. The aim of this study was to identify SNPs that affect change in DNA methylation in the airway wall upon ICS treatment, and to investigate whether these SNPs are associated with asthma exacerbations in children despite treatment with ICS.
For the identification of meQTLs, we investigated 43 Dutch COPD patients from the Groningen and Leiden Universities study of Corticosteroids in Obstructive Lung Disease (GLUCOLD) study (Table S1).1 Longitudinal airway wall DNA methylation (EPIC 850 K array) and gene expression (ribonucleic acid-sequencing, RNA-seq) was collected from these patients pre- and post-6 months of fluticasone ± salmeterol (500/50 μg twice daily) treatment (Figure S1). We focused on methylation sites that previously were shown to be altered during ICS treatment (1049 CpG sites).4 This analysis identified 76 inducible meQTL caused by 71 independent SNPs with an false discovery rate (FDR) < 0.05 (Table S2). The most significant association was between cg13086983 and rs10917023, where the G allele (minor allele frequency: 7.7%) induced higher methylation (Beta: 0.849, p value: 4.21 × 10−06). Of these 76 CpG sites, 24 were associated with 24 gene transcripts (Table S3). The most significant association was found between the Cytosine-phosphate-Guanine (CpG) site cg08570199 and the CCDC80 gene (Beta coefficient: −1.249, p-value: 2.05 × 10−4; Figure 1A–D).

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[31]

M. Lotfollahi, M. Naghipourfar, M. D. Luecken, M. Khajavi, M. Büttner, M. Wagenstetter, Z. Avsec, A. Gayoso, N. Yosef, M. Interlandi, S. Rybakov, A. V. Misharin and F. J. Theis.
Mapping single-cell data to reference atlases by transfer learning.
Nature Biotechnology 40 (Aug. 2022). DOI

Abstract

Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[30]

L. Hetzel, S. Boehm, N. Kilbertus, S. Günnemann, M. Lotfollahi and F. J. Theis.
Predicting single-cell perturbation responses for unseen drugs.
MLDD @ICML 2022 - Workshop on Machine Learning for Drug Discovery at the 39th International Conference on Machine Learning (ICML 2022). Baltimore, MD, USA, Jul 17-23, 2022. URL

Abstract

Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA-seq HTS is required to enrich single-cell data meaningfully. We introduce a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with a transfer learning scheme and demonstrate how training on existing bulk RNA-seq HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating targeted drug discovery.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[29]

K. Baßler, W. Fujii, T. S. Kapellos, E. Dudkin, N. Reusch, A. Horne, B. Reiz, M. D. Luecken, C. Osei-Sarpong, S. Warnat-Herresthal, L. Bonaguro, J. Schulte-Schrepping, A. Wagner, P. Günther, C. Pizarro, T. Schreiber, R. Knoll, L. Holsten, C. Kröger, E. De Domenico, M. Becker, K. Händler, C. T. Wohnhaas, F. Baumgartner, M. Köhler, H. Theis, M. Kraut, M. H. Wadsworth, T. K. Hughes, H. J. Ferreira, E. Hinkley, I. H. Kaltheuner, M. Geyer, C. Thiele, A. K. Shalek, A. Feißt, D. Thomas, H. Dickten, M. Beyer, P. Baum, N. Yosef, A. C. Aschenbrenner, T. Ulas, J. Hasenauer, F. J. Theis, D. Skowasch and J. L. Schultze.
Alveolar macrophages in early stage COPD show functional deviations with properties of impaired immune activation.
Frontiers in Immunology 13 (Jul. 2022). DOI

Abstract

Despite its high prevalence, the cellular and molecular mechanisms of chronic obstructive pulmonary disease (COPD) are far from being understood. Here, we determine disease-related changes in cellular and molecular compositions within the alveolar space and peripheral blood of a cohort of COPD patients and controls. Myeloid cells were the largest cellular compartment in the alveolar space with invading monocytes and proliferating macrophages elevated in COPD. Modeling cell-to-cell communication, signaling pathway usage, and transcription factor binding predicts TGF-β1 to be a major upstream regulator of transcriptional changes in alveolar macrophages of COPD patients. Functionally, macrophages in COPD showed reduced antigen presentation capacity, accumulation of cholesteryl ester, reduced cellular chemotaxis, and mitochondrial dysfunction, reminiscent of impaired immune activation.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[28]

G. Palla, H. Spitzer, M. Klein, D. Fischer, A. C. Schaar, L. B. Kuemmerle, S. Rybakov, I. L. Ibarra, O. Holmberg, I. Virshup, M. Lotfollahi, S. Richter and F. J. Theis.
Squidpy: a scalable framework for spatial omics analysis.
Nature Methods 19 (Jan. 2022). DOI

Abstract

Spatial omics data are advancing the study of tissue organization and cellular communication at an unprecedented scale. Flexible tools are required to store, integrate and visualize the large diversity of spatial omics data. Here, we present Squidpy, a Python framework that brings together tools from omics and image analysis to enable scalable description of spatial molecular data, such as transcriptome or multivariate proteins. Squidpy provides efficient infrastructure and numerous analysis methods that allow to efficiently store, manipulate and interactively visualize spatial omics data. Squidpy is extensible and can be interfaced with a variety of already existing libraries for the scalable analysis of spatial omics data.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[27]

M. Lange, V. Bergen, M. Klein, M. Setty, B. Reuter, M. Bakhti, H. Lickert, M. Ansari, J. Schniering, H. B. Schiller, D. Pe’er and F. J. Theis.
CellRank for directed single-cell fate mapping.
Nature Methods 19.2 (Jan. 2022). DOI

Abstract

Computational trajectory inference enables the reconstruction of cell state dynamics from single-cell RNA sequencing experiments. However, trajectory inference requires that the direction of a biological process is known, largely limiting its application to differentiating systems in normal development. Here, we present CellRank (https://cellrank.org) for single-cell fate mapping in diverse scenarios, including regeneration, reprogramming and disease, for which direction is unknown. Our approach combines the robustness of trajectory inference with directional information from RNA velocity, taking into account the gradual and stochastic nature of cellular fate decisions, as well as uncertainty in velocity vectors. On pancreas development data, CellRank automatically detects initial, intermediate and terminal populations, predicts fate potentials and visualizes continuous gene expression trends along individual lineages. Applied to lineage-traced cellular reprogramming data, predicted fate probabilities correctly recover reprogramming outcomes. CellRank also predicts a new dedifferentiation trajectory during postinjury lung regeneration, including previously unknown intermediate cell states, which we confirm experimentally.

MCML Authors

Marius Lange

Dr.

* Former Member

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[26]

L. Hetzel, D. S. Fischer, S. Günnemann and F. J. Theis.
Graph representation learning for single-cell biology.
Current Opinion in Systems Biology 28.100347 (Dec. 2021). DOI

Abstract

Single-cell RNA sequencing measures gene expression at an unprecedented resolution and scale and allows the analysis of cellular phenotypes which was not possible before. In this context, graphs occur as a natural representation of the system —both as gene-centric and cell-centric. However, many advances in machine learning on graphs are not yet harnessed in models on single-cell data. Taking the inference of cell types or gene interactions as examples, graph representation learning has a wide applicability to both cell and gene graphs. Recent advances in spatial molecular profiling additionally put graph learning in the focus of attention because of the innate resemblance of spatial information to spatial graphs. We argue that graph embedding techniques have great potential for various applications across single-cell biology. Here, we discuss how graph representation learning maps to current models and concepts used in single-cell biology and formalise overlaps to developments in graph-based deep learning.

MCML Authors

Leon Hetzel

Mathematical Modelling of Biological Systems

Stephan Günnemann

Prof. Dr.

Data Analytics & Machine Learning

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[25]

K. T. Schmid, B. Höllbacher, C. Cruceanu, A. Böttcher, H. Lickert, E. B. Binder, F. J. Theis and M. Heinig.
scPower accelerates and optimizes the design of multi-sample single cell transcriptomic studies.
Nature Communications 12.6625 (Nov. 2021). DOI

Abstract

Single cell RNA-seq has revolutionized transcriptomics by providing cell type resolution for differential gene expression and expression quantitative trait loci (eQTL) analyses. However, efficient power analysis methods for single cell data and inter-individual comparisons are lacking. Here, we present scPower; a statistical framework for the design and power analysis of multi-sample single cell transcriptomic experiments. We modelled the relationship between sample size, the number of cells per individual, sequencing depth, and the power of detecting differentially expressed genes within cell types. We systematically evaluated these optimal parameter combinations for several single cell profiling platforms, and generated broad recommendations. In general, shallow sequencing of high numbers of cells leads to higher overall power than deep sequencing of fewer cells. The model, including priors, is implemented as an R package and is accessible as a web tool. scPower is a highly customizable tool that experimentalists can use to quickly compare a multitude of experimental designs and optimize for a limited budget.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[24]

C. M. Verdun, T. Fuchs, P. Harar, D. Elbrächter, D. S. Fischer, J. Berner, P. Grohs, F. J. Theis and F. Krahmer.
Group Testing for SARS-CoV-2 Allows for Up to 10-Fold Efficiency Increase Across Realistic Scenarios and Testing Strategies.
Frontiers in Public Health 9 (Aug. 2021). DOI

Abstract

Background: Due to the ongoing COVID-19 pandemic, demand for diagnostic testing has increased drastically, resulting in shortages of necessary materials to conduct the tests and overwhelming the capacity of testing laboratories. The supply scarcity and capacity limits affect test administration: priority must be given to hospitalized patients and symptomatic individuals, which can prevent the identification of asymptomatic and presymptomatic individuals and hence effective tracking and tracing policies. We describe optimized group testing strategies applicable to SARS-CoV-2 tests in scenarios tailored to the current COVID-19 pandemic and assess significant gains compared to individual testing.
Methods: We account for biochemically realistic scenarios in the context of dilution effects on SARS-CoV-2 samples and consider evidence on specificity and sensitivity of PCR-based tests for the novel coronavirus. Because of the current uncertainty and the temporal and spatial changes in the prevalence regime, we provide analysis for several realistic scenarios and propose fast and reliable strategies for massive testing procedures.
Key Findings: We find significant efficiency gaps between different group testing strategies in realistic scenarios for SARS-CoV-2 testing, highlighting the need for an informed decision of the pooling protocol depending on estimated prevalence, target specificity, and high- vs. low-risk population. For example, using one of the presented methods, all 1.47 million inhabitants of Munich, Germany, could be tested using only around 141 thousand tests if the infection rate is below 0.4% is assumed. Using 1 million tests, the 6.69 million inhabitants from the city of Rio de Janeiro, Brazil, could be tested as long as the infection rate does not exceed 1%. Moreover, we provide an interactive web application, available at www.group-testing.com, for visualizing the different strategies and designing pooling schemes according to specific prevalence scenarios and test configurations.
Interpretation: Altogether, this work may help provide a basis for an efficient upscaling of current testing procedures, which takes the population heterogeneity into account and is fine-grained towards the desired study populations, e.g., mild/asymptomatic individuals vs. symptomatic ones but also mixtures thereof.

MCML Authors

Fabian Theis

Prof. Dr.

A2 | Mathematical Foundations

Mathematical Modelling of Biological Systems

Felix Krahmer

Prof. Dr.

Optimization & Data Analysis

[23]

D. S. Fischer, L. Dony, M. König, A. Moeed, L. Zappia, L. Heumos, S. Tritschler, O. Holmberg, H. Aliee and F. J. Theis.
Sfaira accelerates data and model reuse in single cell genomics.
Genome Biology 22.248 (Aug. 2021). DOI

Abstract

Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[22]

V. Bergen, R. A. Soldatov, P. V. Kharchenko and F. J. Theis.
RNA velocity—current challenges and future perspectives.
Molecular Systems Biology 17.e10282 (Aug. 2021). DOI

Abstract

RNA velocity has enabled the recovery of directed dynamic information from single‐cell transcriptomics by connecting measurements to the underlying kinetics of gene expression. This approach has opened up new ways of studying cellular dynamics. Here, we review the current state of RNA velocity modeling approaches, discuss various examples illustrating limitations and potential pitfalls, and provide guidance on how the ensuing challenges may be addressed. We then outline future directions on how to generalize the concept of RNA velocity to a wider variety of biological systems and modalities.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[21]

D. S. Fischer, M. Ansari, K. I. Wagner, S. Jarosch, Y. , C. H. Mayr, M. Lang, E. D’Ippolito, M. Hammel, L. Mateyka, S. Weber, L. S. Wolff, K. Witter, I. E. Fernandez, G. Leuschner, K. Milger, M. Frankenberger, L. Nowak, K. Heinig-Menhard, I. Koch, M. G. Stoleriu, A. Hilgendorff, J. Behr, A. Pichlmair, B. Schubert, F. J. Theis, D. H. Busch, H. B. Schiller and K. Schober.
Single-cell RNA sequencing reveals ex vivo signatures of SARS-CoV-2-reactive T cells through ‘reverse phenotyping’.
Nature Communications 12.1 (Jul. 2021). DOI

Abstract

The in vivo phenotypic profile of T cells reactive to severe acute respiratory syndrome (SARS)-CoV-2 antigens remains poorly understood. Conventional methods to detect antigen-reactive T cells require in vitro antigenic re-stimulation or highly individualized peptide-human leukocyte antigen (pHLA) multimers. Here, we use single-cell RNA sequencing to identify and profile SARS-CoV-2-reactive T cells from Coronavirus Disease 2019 (COVID-19) patients. To do so, we induce transcriptional shifts by antigenic stimulation in vitro and take advantage of natural T cell receptor (TCR) sequences of clonally expanded T cells as barcodes for ‘reverse phenotyping’. This allows identification of SARS-CoV-2-reactive TCRs and reveals phenotypic effects introduced by antigen-specific stimulation. We characterize transcriptional signatures of currently and previously activated SARS-CoV-2-reactive T cells, and show correspondence with phenotypes of T cells from the respiratory tract of patients with severe disease in the presence or absence of virus in independent cohorts. Reverse phenotyping is a powerful tool to provide an integrated insight into cellular states of SARS-CoV-2-reactive T cells across tissues and activation states.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[20]

Y. Ji, M. Lotfollahi, F. A. W. F. Alexander Wolf and F. J. Theis.
Machine learning for perturbational single-cell omics.
Cell Systems 12.6 (Jun. 2021). DOI GitHub

Abstract

Cell biology is fundamentally limited in its ability to collect complete data on cellular phenotypes and the wide range of responses to perturbation. Areas such as computer vision and speech recognition have addressed this problem of characterizing unseen or unlabeled conditions with the combined advances of big data, deep learning, and computing resources in the past 5 years. Similarly, recent advances in machine learning approaches enabled by single-cell data start to address prediction tasks in perturbation response modeling. We first define objectives in learning perturbation response in single-cell omics; survey existing approaches, resources, and datasets (https://github.com/theislab/sc-pert); and discuss how a perturbation atlas can enable deep learning models to construct an informative perturbation latent space. We then examine future avenues toward more powerful and explainable modeling using deep neural networks, which enable the integration of disparate information sources and an understanding of heterogeneous, complex, and unseen systems.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[19]

D. S. Fischer, A. C. Schaar and F. J. Theis.
Learning cell communication from spatial graphs of cells.
Preprint (Jun. 2021). DOI

Abstract

Tissue niches are sources of cellular variation and key to understanding both single-cell and tissue phenotypes. The interaction of a cell with its niche can be described through cell communication events. These events cannot be directly observed in molecular profiling assays of single cells and have to be inferred. However, computational models of cell communication and variance attribution defined on data from dissociated tissues suffer from multiple limitations with respect to their ability to define and to identify communication events. We address these limitations using spatial molecular profiling data with node-centric expression modeling (NCEM), a computational method based on graph neural networks which reconciles variance attribution and communication modeling in a single model of tissue niches. We use these models in varying complexity across spatial assays, such as immunohistochemistry and MERFISH, and biological systems to demonstrate that the statistical cell–cell dependencies discovered by NCEM are plausible signatures of known molecular processes underlying cell communication. We identify principles of tissue organisation as cell communication events across multiple datasets using interpretation mechanisms. In the primary motor cortex, we found gene expression variation that is due to niche composition variation across cortical depth. Using the same approach, we also identified niche-dependent cell state variation in CD8 T cells from inflamed colon and colorectal cancer. Finally, we show that NCEMs can be extended to mixed models of explicit cell communication events and latent intrinsic sources of variation in conditional variational autoencoders to yield holistic models of cellular variation in spatial molecular profiling data. Altogether, this graphical model of cellular niches is a step towards understanding emergent tissue phenotypes.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[18]

M. Lotfollahi, A. K. Susmelj, C. De Donno, Y. Ji, I. L. Ibarra, F. A. Wolf, N. Yakubova, F. J. Theis and D. Lopez-Paz.
Compositional perturbation autoencoder for single-cell response modeling.
Preprint (May. 2021). DOI

Abstract

Recent advances in multiplexed single-cell transcriptomics experiments are facilitating the high-throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible, so computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep-learning approaches for single-cell response modeling. CPA encodes and learns transcriptional drug responses across different cell type, dose, and drug combinations. The model produces easy-to-interpret embeddings for drugs and cell types, which enables drug similarity analysis and predictions for unseen dosage and drug combinations. We show that CPA accurately models single-cell perturbations across compounds, doses, species, and time. We further demonstrate that CPA predicts combinatorial genetic interactions of several types, implying that it captures features that distinguish different interaction programs. Finally, we demonstrate that CPA can generate in-silico 5,329 missing genetic combination perturbations (97.6% of all possibilities) with diverse genetic interactions. We envision our model will facilitate efficient experimental design and hypothesis generation by enabling in-silico response prediction at the single-cell level, and thus accelerate therapeutic applications using single-cell technologies.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[17]

Abstract

Spatial omics data are advancing the study of tissue organization and cellular communication at an unprecedented scale. Here, we present Squidpy, a Python framework that brings together tools from omics and image analysis to enable scalable description of spatial molecular data, such as transcriptome or multivariate proteins. Squidpy provides both infrastructure and numerous analysis methods that allow to efficiently store, manipulate and interactively visualize spatial omics data.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[16]

F. Meier, N. D. Köhler, A.-D. Brunner, J.-M. H. Wanka, E. Voytik, M. T. Strauss, F. J. Theis and M. Mann.
Deep learning the collisional cross sections of the peptide universe from a million experimental values.
Nature Communications 12.1185 (Feb. 2021). DOI

Abstract

The size and shape of peptide ions in the gas phase are an under-explored dimension for mass spectrometry-based proteomics. To investigate the nature and utility of the peptide collisional cross section (CCS) space, we measure more than a million data points from whole-proteome digests of five organisms with trapped ion mobility spectrometry (TIMS) and parallel accumulation-serial fragmentation (PASEF). The scale and precision (CV < 1%) of our data is sufficient to train a deep recurrent neural network that accurately predicts CCS values solely based on the peptide sequence. Cross section predictions for the synthetic ProteomeTools peptides validate the model within a 1.4% median relative error (R > 0.99). Hydrophobicity, proportion of prolines and position of histidines are main determinants of the cross sections in addition to sequence-specific interactions. CCS values can now be predicted for any peptide and organism, forming a basis for advanced proteomics workflows that make full use of the additional information.

MCML Authors

Fabian Theis

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Anne-Laure Boulesteix

Mathematical Modelling of Biological Systems

[15]

H. Seibold, S. Czerny, S. Decke, R. Dieterle, T. Eder, S. Fohr, N. Hahn, R. Hartmann, C. Heindl, P. Kopper, D. Lepke, V. Loidl, M. M. Mandl, S. Musiol, J. Peter, A. Piehler, E. Rojas, S. Schmid, H. Schmidt, M. Schmoll, L. Schneider, X.-Y. To, V. Tran, A. Völker, M. Wagner, J. Wagner, M. Waize, H. Wecker, R. Yang, S. Zellner and M. Nalenz.
A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses.
PLOS One 16.6 (2021). DOI

Abstract

Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses—such as the analysis of longitudinal data—reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies.

MCML Authors

Maximilian Mandl

Biometry in Molecular Medicine

Lennart Schneider

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Xiao-Yin To

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Viet Tran

Biomedical Statistics and Data Science

[14]

J. P. Lopez, E. Brivio, A. Santambrogio, C. De Donno, A. Kos, M. Peters, N. Rost, D. Czamara, T. M. Brückl, S. Roeh, M. L. Pöhlmann, C. Engelhardt, A. Ressle, R. Stoffel, A. Tontsch, J. M. Villamizar, M. Reincke, A. Riester, S. Sbiera, M. Fassnacht, H. S. Mayberg, W. E. Craighead, B. W. Dunlop, C. B. Nemeroff, M. V. Schmidt, E. B. Binder, F. J. Theis, F. Beuschlein, C. L. Andoniadou and A. Chen.
Single-cell molecular profiling of all three components of the HPA axis reveals adrenal ABCB1 as a regulator of stress adaptation.
Science Advances 7.5 (Jan. 2021). DOI

Abstract

Chronic activation and dysregulation of the neuroendocrine stress response have severe physiological and psychological consequences, including the development of metabolic and stress-related psychiatric disorders. We provide the first unbiased, cell type–specific, molecular characterization of all three components of the hypothalamic-pituitary-adrenal axis, under baseline and chronic stress conditions. Among others, we identified a previously unreported subpopulation of Abcb1b+ cells involved in stress adaptation in the adrenal gland. We validated our findings in a mouse stress model, adrenal tissues from patients with Cushing’s syndrome, adrenocortical cell lines, and peripheral cortisol and genotyping data from depressed patients. This extensive dataset provides a valuable resource for researchers and clinicians interested in the organism’s nervous and endocrine responses to stress and the interplay between these tissues. Our findings raise the possibility that modulating ABCB1 function may be important in the development of treatment strategies for patients suffering from metabolic and stress-related psychiatric disorders.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[13]

M. Lotfollahi, M. Naghipourfar, F. J. Theis and F. A. Wolf.
Conditional out-of-distribution generation for unpaired data using transfer VAE.
Bioinformatics 36.Supplement 2 (Dec. 2020). DOI

Abstract

Motivation: While generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational autoencoder (CVAE), for instance, does not explicitly relate conditions during training and, hence, has no explicit incentive of learning such a compact representation.

Results: We overcome the limitation of the CVAE by matching distributions across conditions using maximum mean discrepancy in the decoder layer that follows the bottleneck. This introduces a strong regularization both for reconstructing samples within the same condition and for transforming samples across conditions, resulting in much improved generalization. As this amount to solving a style-transfer problem, we refer to the model as transfer VAE (trVAE). Benchmarking trVAE on high-dimensional image and single-cell RNA-seq, we demonstrate higher robustness and higher accuracy than existing approaches. We also show qualitatively improved predictions by tackling previously problematic minority classes and multiple conditions in the context of cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data. For generic tasks, we improve Pearson correlations of high-dimensional estimated means and variances with their ground truths from 0.89 to 0.97 and 0.75 to 0.87, respectively. We further demonstrate that trVAE learns cell-type-specific responses after perturbation and improves the prediction of most cell-type-specific genes by 65%.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[12]

N.-K. Chlis, A. Karlas, N.-A. Fasoula, M. Kallmayer, H.-H. Eckstein, F. J. Theis, V. Ntziachristos and C. Marr.
A sparse deep learning approach for automatic segmentation of human vasculature in multispectral optoacoustic tomography.
Photoacoustics 20.100203 (Dec. 2020). DOI

Abstract

Multispectral Optoacoustic Tomography (MSOT) resolves oxy- (HbO2) and deoxy-hemoglobin (Hb) to perform vascular imaging. MSOT suffers from gradual signal attenuation with depth due to light-tissue interactions: an effect that hinders the precise manual segmentation of vessels. Furthermore, vascular assessment requires functional tests, which last several minutes and result in recording thousands of images. Here, we introduce a deep learning approach with a sparse-UNET (S-UNET) for automatic vascular segmentation in MSOT images to avoid the rigorous and time-consuming manual segmentation. We evaluated the S-UNET on a test-set of 33 images, achieving a median DICE score of 0.88. Apart from high segmentation performance, our method based its decision on two wavelengths with physical meaning for the task-at-hand: 850 nm (peak absorption of oxy-hemoglobin) and 810 nm (isosbestic point of oxy-and deoxy-hemoglobin). Thus, our approach achieves precise data-driven vascular segmentation for automated vascular assessment and may boost MSOT further towards its clinical translation.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[11]

N.-K. Chlis, L. Rausch, T. Brocker, J. Kranich and F. J. Theis.
Predicting single-cell gene expression profiles of imaging flow cytometry data with machine learning.
Nucleic Acids Research 48.20 (Nov. 2020). DOI

Abstract

High-content imaging and single-cell genomics are two of the most prominent high-throughput technologies for studying cellular properties and functions at scale. Recent studies have demonstrated that information in large imaging datasets can be used to estimate gene mutations and to predict the cell-cycle state and the cellular decision making directly from cellular morphology. Thus, high-throughput imaging methodologies, such as imaging flow cytometry can potentially aim beyond simple sorting of cell-populations. We introduce IFC-seq, a machine learning methodology for predicting the expression profile of every cell in an imaging flow cytometry experiment. Since it is to-date unfeasible to observe single-cell gene expression and morphology in flow, we integrate uncoupled imaging data with an independent transcriptomics dataset by leveraging common surface markers. We demonstrate that IFC-seq successfully models gene expression of a moderate number of key gene-markers for two independent imaging flow cytometry datasets: (i) human blood mononuclear cells and (ii) mouse myeloid progenitor cells. In the case of mouse myeloid progenitor cells IFC-seq can predict gene expression directly from brightfield images in a label-free manner, using a convolutional neural network. The proposed method promises to add gene expression information to existing and new imaging flow cytometry datasets, at no additional cost.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[10]

D. S. Fischer, Y. Wu, B. Schubert and F. J. Theis.
Predicting antigen specificity of single T cells based on TCR CDR3 regions.
Molecular Systems Biology 16.8 (Aug. 2020). DOI

Abstract

It has recently become possible to simultaneously assay T-cell specificity with respect to large sets of antigens and the T-cell receptor sequence in high-throughput single-cell experiments. Leveraging this new type of data, we propose and benchmark a collection of deep learning architectures to model T-cell specificity in single cells. In agreement with previous results, we found that models that treat antigens as categorical outcome variables outperform those that model the TCR and antigen sequence jointly. Moreover, we show that variability in single-cell immune repertoire screens can be mitigated by modeling cell-specific covariates. Lastly, we demonstrate that the number of bound pMHC complexes can be predicted in a continuous fashion providing a gateway to disentangle cell-to-dextramer binding strength and receptor-to-pMHC affinity. We provide these models in the Python package TcellMatch to allow imputation of antigen specificities in single-cell RNA-seq studies on T cells without the need for MHC staining.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[9]

V. Bergen, M. Lange, S. Peidli, F. A. Wolf and F. J. Theis.
Generalizing RNA velocity to transient cell states through dynamical modeling.
Nature Biotechnology 38 (Aug. 2020). DOI

Abstract

RNA velocity has opened up new ways of studying cellular differentiation in single-cell RNA-sequencing data. It describes the rate of gene expression change for an individual gene at a given time point based on the ratio of its spliced and unspliced messenger RNA (mRNA). However, errors in velocity estimates arise if the central assumptions of a common splicing rate and the observation of the full splicing dynamics with steady-state mRNA levels are violated. Here we present scVelo, a method that overcomes these limitations by solving the full transcriptional dynamics of splicing kinetics using a likelihood-based dynamical model. This generalizes RNA velocity to systems with transient cell states, which are common in development and in response to perturbations. We apply scVelo to disentangling subpopulation kinetics in neurogenesis and pancreatic endocrinogenesis. We infer gene-specific rates of transcription, splicing and degradation, recover each cell’s position in the underlying differentiation processes and detect putative driver genes. scVelo will facilitate the study of lineage decisions and gene regulation.

MCML Authors

Marius Lange

Dr.

* Former Member

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[8]

J. Kranich, N.-K. Chlis, L. Rausch, A. Latha, M. Schifferer, T. Kurz, A. F.-A. Kia, a. Simons, F. J. Theis and T. Brocker.
In vivo identification of apoptotic and extracellular vesicle-bound live cells using image-based deep learning.
Journal of Extracellular Vesicles 9.1 (Jul. 2020). DOI

Abstract

The in vivo detection of dead cells remains a major challenge due to technical hurdles. Here, we present a novel method, where injection of fluorescent milk fat globule-EGF factor 8 protein (MFG-E8) in vivo combined with imaging flow cytometry and deep learning allows the identification of dead cells based on their surface exposure of phosphatidylserine (PS) and other image parameters. A convolutional autoencoder (CAE) was trained on defined pictures and successfully used to identify apoptotic cells in vivo. However, unexpectedly, these analyses also revealed that the great majority of PS+ cells were not apoptotic, but rather live cells associated with PS+ extracellular vesicles (EVs). During acute viral infection apoptotic cells increased slightly, while up to 30% of lymphocytes were decorated with PS+ EVs of antigen-presenting cell (APC) exosomal origin. The combination of recombinant fluorescent MFG-E8 and the CAE-method will greatly facilitate analyses of cell death and EVs in vivo.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[7]

M. Lotfollahi, M. Naghipourfar, M. D. Luecken, M. Khajavi, M. Büttner, Z. Avsec, A. V. Misharin and F. J. Theis.
Query to reference single-cell integration with transfer learning.
Preprint (Jul. 2020). DOI

Abstract

Large single-cell atlases are now routinely generated with the aim of serving as reference to analyse future smaller-scale studies. Yet, learning from reference data is complicated by batch effects between datasets, limited availability of computational resources, and sharing restrictions on raw data. Leveraging advances in machine learning, we propose a deep learning strategy to map query datasets on top of a reference called single-cell architectural surgery (scArches, https://github.com/theislab/scarches). It uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building, and the contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, and whole organism atlases, we showcase that scArches preserves nuanced biological state information while removing batch effects in the data, despite using four orders of magnitude fewer parameters compared to de novo integration. To demonstrate mapping disease variation, we show that scArches preserves detailed COVID-19 disease variation upon reference mapping, enabling discovery of new cell identities that are unseen during training. We envision our method to facilitate collaborative projects by enabling the iterative construction, updating, sharing, and efficient use of reference atlases.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[6]

K. Baßler, W. Fujii, T. S. Kapellos, A. Horne, B. Reiz, E. Dudkin, M. Lücken, N. Reusch, C. Osei-Sarpong, S. Warnat-Herresthal, A. Wagner, L. Bonaguro, P. Günther, C. Pizarro, T. Schreiber, M. Becker, K. Händler, C. T. Wohnhaas, F. Baumgartner, M. Köhler, H. Theis, M. Kraut, M. H. Wadsworth, T. K. Hughes, H. J. G. Ferreira, J. Schulte-Schrepping, E. Hinkley, I. H. Kaltheuner, M. Geyer, C. Thiele, A. K. Shalek, A. Feißt, D. Thomas, H. Dickten, M. Beyer, P. Baum, N. Yosef, A. C. Aschenbrenner, T. Ulas, J. Hasenauer, F. J. Theis, D. Skowasch and J. L. Schultze.
Alterations of multiple alveolar macrophage states in chronic obstructive pulmonary disease.
Preprint (May. 2020). DOI

Abstract

Despite the epidemics of chronic obstructive pulmonary disease (COPD), the cellular and molecular mechanisms of this disease are far from being understood. Here, we characterize and classify the cellular composition within the alveolar space and peripheral blood of COPD patients and control donors using a clinically applicable single-cell RNA-seq technology corroborated by advanced computational approaches for: machine learning-based cell-type classification, identification of differentially expressed genes, prediction of metabolic changes, and modeling of cellular trajectories within a patient cohort. These high-resolution approaches revealed: massive transcriptional plasticity of macrophages in the alveolar space with increased levels of invading and proliferating cells, loss of MHC expression, reduced cellular motility, altered lipid metabolism, and a metabolic shift reminiscent of mitochondrial dysfunction in COPD patients. Collectively, single-cell omics of multi-tissue samples was used to build the first cellular and molecular framework for COPD pathophysiology as a prerequisite to develop molecular biomarkers and causal therapies against this deadly disease.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[5]

S. Sachs, A. Bastidas-Ponce, S. Tritschler, M. Bakhti, A. Böttcher, M. A. Sánchez-Garrido, M. Tarquis-Medina, M. Kleinert, K. Fischer, S. Jall, A. Harger, E. Bader, S. Roscioni, S. Ussar, A. Feuchtinger, B. Yesildag, A. Neelakandhan, C. B. Jensen, M. Cornu, B. Yang, B. Finan, R. D. DiMarchi, M. H. T. Matthias H. Tschöp, F. J. Theis, S. M. Hofmann, T. D. Müller and H. Lickert.
Targeted pharmacological therapy restores β-cell function for diabetes remission.
Nature Metabolism 2 (Feb. 2020). DOI

Abstract

Dedifferentiation of insulin-secreting β cells in the islets of Langerhans has been proposed to be a major mechanism of β-cell dysfunction. Whether dedifferentiated β cells can be targeted by pharmacological intervention for diabetes remission, and ways in which this could be accomplished, are unknown as yet. Here we report the use of streptozotocin-induced diabetes to study β-cell dedifferentiation in mice. Single-cell RNA sequencing (scRNA-seq) of islets identified markers and pathways associated with β-cell dedifferentiation and dysfunction. Single and combinatorial pharmacology further show that insulin treatment triggers insulin receptor pathway activation in β cells and restores maturation and function for diabetes remission. Additional β-cell selective delivery of oestrogen by Glucagon-like peptide-1 (GLP-1–oestrogen conjugate) decreases daily insulin requirements by 60%, triggers oestrogen-specific activation of the endoplasmic-reticulum-associated protein degradation system, and further increases β-cell survival and regeneration. GLP-1–oestrogen also protects human β cells against cytokine-induced dysfunction. This study not only describes mechanisms of β-cell dedifferentiation and regeneration, but also reveals pharmacological entry points to target dedifferentiated β cells for diabetes remission.

MCML Authors

Fabian Theis

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Mathematical Modelling of Biological Systems

[4]

D. Davletshina, V. Melnychuk, V. Tran, H. Singla, M. Berrendorf, E. Faerman, M. Fromm and M. Schubert.
Unsupervised Anomaly Detection for X-Ray Images.
Preprint (Jan. 2020). arXiv GitHub

Abstract

Obtaining labels for medical (image) data requires scarce and expensive experts. Moreover, due to ambiguous symptoms, single images rarely suffice to correctly diagnose a medical condition. Instead, it often requires to take additional background information such as the patient’s medical history or test results into account. Hence, instead of focusing on uninterpretable black-box systems delivering an uncertain final diagnosis in an end-to-end-fashion, we investigate how unsupervised methods trained on images without anomalies can be used to assist doctors in evaluating X-ray images of hands. Our method increases the efficiency of making a diagnosis and reduces the risk of missing important regions. Therefore, we adopt state-of-the-art approaches for unsupervised learning to detect anomalies and show how the outputs of these methods can be explained. To reduce the effect of noise, which often can be mistaken for an anomaly, we introduce a powerful preprocessing pipeline. We provide an extensive evaluation of different approaches and demonstrate empirically that even without labels it is possible to achieve satisfying results on a real-world dataset of X-ray images of hands. We also evaluate the importance of preprocessing and one of our main findings is that without it, most of our approaches perform not better than random.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Viet Tran

Biomedical Statistics and Data Science

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Michael Fromm

Dr.

A3 | Computational Models
→ Group Thomas Seidl

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

[3]

M. Lotfollahi, F. A. Wolf and F. J. Theis.
scGen predicts single-cell perturbation responses.
Nature Methods 16.8 (Jul. 2019). DOI GitHub

Abstract

Accurately modeling cellular response to perturbations is a central goal of computational biology. While such modeling has been based on statistical, mechanistic and machine learning models in specific settings, no generalization of predictions to phenomena absent from training data (out-of-sample) has yet been demonstrated. Here, we present scGen (https://github.com/theislab/scgen), a model combining variational autoencoders and latent space vector arithmetics for high-dimensional single-cell gene expression data. We show that scGen accurately models perturbation and infection response of cells across cell types, studies and species. In particular, we demonstrate that scGen learns cell-type and species-specific responses implying that it captures features that distinguish responding from non-responding genes and cells. With the upcoming availability of large-scale atlases of organs in a healthy state, we envision scGen to become a tool for experimental design through in silico screening of perturbation response in the context of disease and drug treatment.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[2]

F. Erhard, M. A. P. Baptista, T. Krammer, T. Hennig, M. Lange, P. Arampatzi, C. S. Jürges, F. J. Theis, A.-E. Saliba and L. Dölken.
scSLAM-seq reveals core features of transcription dynamics in single cells.
Nature 571 (Jul. 2019). DOI

Abstract

Single-cell RNA sequencing (scRNA-seq) has highlighted the important role of intercellular heterogeneity in phenotype variability in both health and disease1. However, current scRNA-seq approaches provide only a snapshot of gene expression and convey little information on the true temporal dynamics and stochastic nature of transcription. A further key limitation of scRNA-seq analysis is that the RNA profile of each individual cell can be analysed only once. Here we introduce single-cell, thiol-(SH)-linked alkylation of RNA for metabolic labelling sequencing (scSLAM-seq), which integrates metabolic RNA labelling2, biochemical nucleoside conversion3 and scRNA-seq to record transcriptional activity directly by differentiating between new and old RNA for thousands of genes per single cell. We use scSLAM-seq to study the onset of infection with lytic cytomegalovirus in single mouse fibroblasts. The cell-cycle state and dose of infection deduced from old RNA enable dose–response analysis based on new RNA. scSLAM-seq thereby both visualizes and explains differences in transcriptional activity at the single-cell level. Furthermore, it depicts ‘on–off’ switches and transcriptional burst kinetics in host gene expression with extensive gene-specific differences that correlate with promoter-intrinsic features (TBP–TATA-box interactions and DNA methylation). Thus, gene-specific, and not cell-specific, features explain the heterogeneity in transcriptomes between individual cells and the transcriptional response to perturbations.

MCML Authors

Marius Lange

Dr.

* Former Member

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

[1]

M. D. Luecken and F. J. Theis.
Current best practices in single‐cell RNA‐seq analysis: a tutorial.
Molecular Systems Biology 15.e8746 (Jun. 2019). DOI GitHub

Abstract

Single‐cell RNA‐seq has enabled gene expression to be studied at an unprecedented resolution. The promise of this technology is attracting a growing user base for single‐cell analysis methods. As more analysis tools are becoming available, it is becoming increasingly difficult to navigate this landscape and produce an up‐to‐date workflow to analyse one’s data. Here, we detail the steps of a typical single‐cell RNA‐seq analysis, including pre‐processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and cell‐ and gene‐level downstream analysis. We formulate current best‐practice recommendations for these steps based on independent comparison studies. We have integrated these best‐practice recommendations into a workflow, which we apply to a public dataset to further illustrate how these steps work in practice. Our documented case study can be found at https://www.github.com/theislab/single-cell-tutorial. This review will serve as a workflow tutorial for new entrants into the field, and help established users update their analysis pipelines.

MCML Authors

Fabian Theis

Prof. Dr.

Mathematical Modelling of Biological Systems

C3 | Physics and Geo Sciences

Geoinformation from Earth Observation satellite data is vital for addressing societal challenges. The research focus at MCML in this area is on tailoring data science and ML for geo-relevant applications. This includes physics-aware ML, uncertainty quantification, explainable geoinformation retrieval, Quantum ML for a digital twin of the Earth, and ethical considerations in ML for Earth Observation.

Patrick Rinke

Prof. Dr.

AI-based Material Science

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

David Egger

Prof. Dr.

Associate

Theory of Functional Energy Materials

Daniel Grün

Prof. Dr.

Associate

Astrophysics, Cosmology and Artificial Intelligence

Helge Stein

Prof. Dr.

Associate

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Digital Catalysis

Publications in Research Area C3

[101]

Z. Chen, Y. Wang, L. Nan and X. Zhu.
Parametric Point Cloud Completion for Polygonal Surface Reconstruction.
CVPR 2025 - IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025. To be published. URL GitHub

Abstract

Existing polygonal surface reconstruction methods heavily depend on input completeness and struggle with incomplete point clouds. We argue that while current point cloud completion techniques may recover missing points, they are not optimized for polygonal surface reconstruction, where the parametric representation of underlying surfaces remains overlooked. To address this gap, we introduce parametric completion, a novel paradigm for point cloud completion, which recovers parametric primitives instead of individual points to convey high-level geometric structures. Our presented approach, PaCo, enables high-quality polygonal surface reconstruction by leveraging plane proxies that encapsulate both plane parameters and inlier points, proving particularly effective in challenging scenarios with highly incomplete data. Comprehensive evaluations of our approach on the ABC dataset establish its effectiveness with superior performance and set a new standard for polygonal surface reconstruction from incomplete data.

MCML Authors

Zhaiyu Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Helge Stein

Data Science in Earth Observation

[100]

L. Merker, B. Zhang, J. Yuan, S. Ji and H. S. Stein.
Insight generation from information-dense formation protocols.
Batteries & Supercaps.e202500153 (Jun. 2025). DOI

Abstract

Accelerated formation protocols that utilize pulsed charging offer an unprecedented wealth of electrochemical data. Herein we present methods to extract diagnostic data relating to a pseudo-diffusion coefficients, internal resistance, and others that give live insight to the solid electrolyte interphase (SEI) growth. Specifically, we present a pure mathematical method to track formation progression at near-real time and chart a path towards incorporation of adjusting pulse parameters towards targeted SEI synthesis. The method and analysis performed on 3 mAh cells but can also be applied to higher capacity cells.

MCML Authors

Shanling Ji

Digital Catalysis

Helge Stein

Prof. Dr.

Digital Catalysis

[99]

X. Zhao, Z. Xiong, P. Karlshöfer, N. Tziolas, M. Wiesmeier, U. Heiden and X. Zhu.
Soil organic carbon estimation using spaceborne hyperspectral composites on a large scale.
International Journal of Applied Earth Observation and Geoinformation 140 (Jun. 2025). DOI

Abstract

Soil Organic Carbon (SOC) is a key property for soil health. Spectral reflectance such as multispectral and hyperspectral data could provide efficient and cost-effective retrieval of SOC content. However, constrained by the availability of hyperspectral satellite data, current works mostly use a small number of spaceborne hyperspectral imagery for SOC retrieval on a small scale. In this work, the first large-scale hyperspectral imaging reflectance composites were built, and they were used for SOC estimation. Specifically, DESIS satellite images were used to predict SOC over the whole state of Bavaria in Germany ( 70,000 km). We prepare 850 hyperspectral images from the DESIS satellite and build temporal composites from them. For the soil data, data was gathered from LfU(Bavarian State Office for the Environment), LfL(Bavarian State Research Center for Agriculture) and LUCAS 2018 (Land Use and Coverage Area Frame Survey). 828 soil samples were selected after data filtering. For this regression task, different machine learning and deep learning methods were implemented and explored. Moreover, a spectral attention mechanism was added to the model. Besides hyperspectral input, the digital elevation model (DEM) was also included as an auxiliary input as the measured spectrum has inter-variability dependent on the elevation and the generated topographical features are also relevant with SOC distribution. Based on the regression results evaluated by , , and , the deep learning models showed much better performance than machine learning methods. Especially when only using hyperspectral data as input, the best result was achieved with 1.947%, 0.626, and 1.710 on the test set. After incorporating topographical features, the fused model achieved further improved performance with 1.752% and 0.695 and 1.919. From the interpretability analysis for model performance, it was found out that the bands in the range of 530 nm–570 nm, 770 nm–790 nm, and 840 nm - 870 nm are the most relevant bands for SOC estimation. In the end, several SOC maps were generated and analyzed together with soil types. The SOC maps indicate that water-associated areas, such as coastal soils and bogs, tend to have higher SOC, while mountain areas tend to contain lower SOC. Such findings align with SOC distribution across soil types and show the effectiveness of the model.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[98]

Q. Xu, L. F. De Vos, Y. Shi, N. Rüther, A. Bronstert and X. Zhu.
Urban Flood Modeling and Forecasting with Deep Neural Operator and Transfer Learning.
Journal of Hydrology In Press, Journal Pre-proof.133705 (Jun. 2025). DOI

Abstract

Physics-based models provide accurate flood modeling but are limited by their dependence on high-quality data and computational demands, particularly in complex urban environments. Machine learning-based surrogate models like neural operators present a promising alternative; however, their practical application in urban flood modeling remains challenges, such as insufficient feature representation, high memory demands, and limited transferability. To address these challenges, this study introduces a deep neural operator (DNO) and a transfer learning-based DNO for fast, accurate, resolution-invariant, and cross-scenario urban flood forecasting. The DNO features an enhanced Fourier layer with skip connections for improved memory efficiency, alongside a deep encoder-decoder framework and an urban-embedded residual loss to enhance modeling effectiveness. The transfer learning-based DNO further integrates a fine-tuning-based approach for efficient cross-scenario forecasting in the target domain and a domain adaptation-based strategy for continuous learning across diverse domains. The fine-tuning-based DNO enables rapid adaptation to target domains, while the domain adaptation-based DNO mitigates knowledge forgetting from the source domain. Experimental results demonstrate that the proposed DNO significantly outperforms existing neural solvers using a comprehensive urban flood benchmark dataset, particularly in predicting high water depths and exhibiting exceptional zero-shot downscaling performance for high-resolution forecasting. Moreover, the fine-tuning-based DNO enhances transferability for cross-scenario urban flood forecasting, while the domain adaptation-based DNO achieves accurate flood predictions in both source and target domains, even with limited labeled target data. Through the combination of these ML methods and the benchmark dataset, a practical tool is established for effective, cross-scenario, and downscaled spatiotemporal urban flood forecasting.

MCML Authors

Qingsong Xu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[97]

N. Bhatia, P. Rinke and O. Krejci.
Leveraging active learning-enhanced machine-learned interatomic potential for efficient infrared spectra prediction.
Preprint (Jun. 2025). arXiv

Abstract

Infrared (IR) spectroscopy is a pivotal analytical tool as it provides real-time molecular insight into material structures and enables the observation of reaction intermediates in situ. However, interpreting IR spectra often requires high-fidelity simulations, such as density functional theory based ab-initio molecular dynamics, which are computationally expensive and therefore limited in the tractable system size and complexity. In this work, we present a novel active learning-based framework, implemented in the open-source software package PALIRS, for efficiently predicting the IR spectra of small catalytically relevant organic molecules. PALIRS leverages active learning to train a machine-learned interatomic potential, which is then used for machine learning-assisted molecular dynamics simulations to calculate IR spectra. PALIRS reproduces IR spectra computed with ab-initio molecular dynamics accurately at a fraction of the computational cost. PALIRS further agrees well with available experimental data not only for IR peak positions but also for their amplitudes. This advancement with PALIRS enables high-throughput prediction of IR spectra, facilitating the exploration of larger and more intricate catalytic systems and aiding the identification of novel reaction pathways.

MCML Authors

Patrick Rinke

Prof. Dr.

AI-based Material Science

[96]

X. Zhu, S. Chen, F. Zhang, Y. Shi and Y. Wang.
GlobalBuildingAtlas: An Open Global and Complete Dataset of Building Polygons, Heights and LoD1 3D Models.
Preprint (Jun. 2025). arXiv

Abstract

We introduce GlobalBuildingAtlas, a publicly available dataset providing global and complete coverage of building polygons, heights and Level of Detail 1 (LoD1) 3D building models. This is the first open dataset to offer high quality, consistent, and complete building data in 2D and 3D form at the individual building level on a global scale. Towards this dataset, we developed machine learning-based pipelines to derive building polygons and heights (called this http URL) from global PlanetScope satellite data, respectively. Also a quality-based fusion strategy was employed to generate higher-quality polygons (called this http URL) based on existing open building polygons, including our own derived one. With more than 2.75 billion buildings worldwide, this http URL surpasses the most comprehensive database to date by more than 1 billion buildings. this http URL offers the most detailed and accurate global 3D building height maps to date, achieving a spatial resolution of 3x3 meters-30 times finer than previous global products (90 m), enabling a high-resolution and reliable analysis of building volumes at both local and global scales. Finally, we generated a global LoD1 building model (called GBA.LoD1) from the resulting this http URL and this http URL. GBA.LoD1 represents the first complete global LoD1 building models, including 2.68 billion building instances with predicted heights, i.e., with a height completeness of more than 97%, achieving RMSEs ranging from 1.5 m to 8.9 m across different continents. With its height accuracy, comprehensive global coverage and rich spatial details, GlobalBuildingAltas offers novel insights on the status quo of global buildings, which unlocks unprecedented geospatial analysis possibilities, as showcased by a better illustration of where people live and a more comprehensive monitoring of the progress on the 11th Sustainable Development Goal of the United Nations.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Sining Chen

Data Science in Earth Observation

[95]

J. Baumsteiger, L. Celiberti, P. Rinke, M. Todorović and C. Franchini.
Exploring Noncollinear Magnetic Energy Landscapes with Bayesian Optimization.
Digital Discovery 4.6 (May. 2025). DOI

Abstract

The investigation of magnetic energy landscapes and the search for ground states of magnetic materials using ab initio methods like density functional theory (DFT) is a challenging task. Complex interactions, such as superexchange and spin-orbit coupling, make these calculations computationally expensive and often lead to non-trivial energy landscapes. Consequently, a comprehensive and systematic investigation of large magnetic configuration spaces is often impractical. We approach this problem by utilizing Bayesian Optimization, an active machine learning scheme that has proven to be efficient in modeling unknown functions and finding global minima. Using this approach we can obtain the magnetic contribution to the energy as a function of one or more spin canting angles with relatively small numbers of DFT calculations. To assess the capabilities and the efficiency of the approach we investigate the noncollinear magnetic energy landscapes of selected materials containing 3d, 5d and 5f magnetic ions: Ba3MnNb2O9, LaMn2Si2, β-MnO2, Sr2IrO4, UO2 and Ba2NaOsO6. By comparing our results to previous ab initio studies that followed more conventional approaches, we observe significant improvements in efficiency.

MCML Authors

Patrick Rinke

Prof. Dr.

AI-based Material Science

[94]

C. Schweden, K. Hechinger, G. Kauermann and X. Zhu.
Can Uncertainty Quantification Benefit From Label Embeddings? A Case Study on Local Climate Zone Classification.
IEEE Transactions on Geoscience and Remote Sensing 63 (May. 2025). DOI

Abstract

Modern deep learning models have achieved superior performance in almost all fields of remote sensing. An often neglected aspect of these models is the quantification and evaluation of predictive uncertainties. Regarding a classification task, this means that the focus of the analysis solely lies on performance metrics such as accuracy or the loss. On the other hand, a notion of uncertainty indicates the model’s indecisiveness among the given classes and is essential to understand where the model struggles to classify the data samples. In this work, three levels of uncertainty are distinguished, starting with the typical softmax pseudo-probabilities as level-1 uncertainty. As a next level, the more flexible Dirichlet framework is utilized as model output space, and hereby also, a Bayesian setting with an uninformative prior is considered. For the level-3 uncertainty, an empirical Bayes setting is incorporated where a latent embedding of the label space is iteratively estimated by the marginal likelihood of the fully parameterized label space (see [1]). The estimated embeddings are then learned by the network in three different settings: Two regression losses use the embeddings directly, while the closed-form solution of the Kullback-Leibler (KL-) Divergence uses the embedding parameterized as a Dirichlet distribution. To assess the different levels of uncertainty, the label evaluation subset of the So2Sat LCZ42 dataset, which contains label votes from multiple remote sensing experts, is investigated. The predictive uncertainties are evaluated by means of Out-of-Distribution (OoD) detection and calibration performance. Overall, the embedding-based approaches show strong performance for calibration, while for the OoD experiments, the Bayesian Dirichlet setting with an uninformative prior achieves the best performance. In conclusion, embedded labels offer a flexible framework for incorporating uncertain or ambiguous labels into a supervised training setup. They could be highly beneficial for applications in fields such as urban planning or disaster response.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[93]

S. Wang, N. A. A. Braham and X. Zhu.
Weak-strong Graph Contrastive Learning Neural Network for Hyperspectral Image Classification.
IEEE Transactions on Geoscience and Remote Sensing Early Access (May. 2025). DOI GitHub

Abstract

Deep learning methods have shown promising results in various hyperspectral image (HSI) analysis tasks. Despite these advancements, existing models still struggle to accurately identify fine-classified land cover types on noisy hyperspectral images. Traditional methods have limited performance when extracting features from noisy hyperspectral data. Graph Neural Networks (GNNs) offer an adaptable and robust structure by effectively extracting both spectral and spatial features. However, supervised models still require large quantities of labeled data for effective training, posing a significant challenge. Contrastive learning, which leverages unlabeled data for pre-training, can mitigate this issue by reducing the dependency on extensive manual annotation. To address the issues, we propose WSGraphCL, a weak-strong graph contrastive learning model for HSI classification, and conduct experiments in a few-shot scenario. First, the image is transformed into K-hop subgraphs through a spectral-spatial adjacency matrix construction method. Second, WSGraphCL leverages contrastive learning to pre-train a graph-based encoder on the unlabeled hyperspectral image. We demonstrate that weak-strong augmentations and false negative pairs filtering stabilize pre-training and get good-quality representations. Finally, we test our model with a lightweight classifier on the features with a handful of labels. Experimental results showcase the superior performance of WSGraphCL compared to several baseline models, thereby emphasizing its efficacy in addressing the identified limitations in HSI classification.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[92]

S. Zhao, Z. Xiong and X. Zhu.
RainScaler: A Physics-inspired Network for Precipitation Correction and Downscaling.
IEEE Transactions on Geoscience and Remote Sensing Early Access (May. 2025). DOI GitHub

Abstract

Spatial downscaling of precipitation, in which finegrained regional precipitation patterns are recovered from coarse-resolution images, plays a crucial role in various weather and meteorological analyses. However, the intricate noise information presented in the observation data intertwines with the fine-scale characteristics, which poses challenges for subsequent feature extraction. Regional precipitation suffers from complex spatial patterns. Moreover, the real observatory data contains information inconsistent with the established physical principle, due either to inaccurate or incomplete physical models or limited data quality, thus making the implementation of physicallyinformed deep learning more difficult. For example, strong physical constraints may lead to over-regularization, in which the model becomes too rigid and fails to capture certain complexities in the data. In this work, we propose RainScaler, a physicsinspired deep neural network, to tackle these issues. First, to remove the noise and preserve the vital precipitation patterns effectively, the proposed RainScaler exploits an Inconsistencyaware Denoising Net to explicitly model the spatial variability of noise in the input. In addition, a graph module is designed to learn the geographical-dependent fine-grained patterns in high dimensional feature space at a moderate computation cost. Finally, multi-scale physical constraints are skillfully embedded to incorporate additional insights into the data-driven framework. We test our approach on a public dataset consisting of over 60,000 real low-resolution and high-resolution precipitation map pairs collected by different sensors. Our method produces realisticlooking precipitation maps with better discernment capability and corrects the structural error of precipitation distribution, especially for extreme events. Moreover, we evaluate the potential risks of incorporating physical constraints in real-world data applications. Our method unveils opportunities for multi-source data fusion and provides possible solutions to improve the physical feasibility of data-driven models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[91]

Q. Li, L. Mou, Y. Shi and X. Zhu.
BANet: A bilateral attention network for extracting changed buildings between remote sensing imagery and cadastral maps.
International Journal of Applied Earth Observation and Geoinformation 139.104486 (May. 2025). DOI

Abstract

Up-to-date cadastral maps are vital to local governments in administrating real estate in cities. With its growing availability, remote sensing imagery is the cost-effective data for updating semantic contents on cadastral maps. In this study, we address the problem of updating buildings on cadastral maps, as city renewal is mainly characterized by new construction and demolition. While previous works focus on extracting all buildings from remote sensing images, we argue that these methods not only disregard preliminary information on cadastral maps but also fail to preserve building priors in unchanged areas on cadastral maps. Therefore, we focus on the task of extracting changed buildings (i.e., newly built and demolished buildings) from remote sensing images and cadastral maps. To address this task, we create an image-map building change detection (IMBCD) dataset, formed by around 27K pairs of remote sensing images and maps and their corresponding changed buildings in six distinct geographical areas across the globe. Accordingly, we propose a Bilateral Attention Network (BANet), introducing a novel attention mechanism: changed-first (CF) attention and non-changed-first (NCF) attention. This bilateral attention mechanism helps to refine the uncertain areas between changed and non-changed regions. Extensive experiments on our IMBCD dataset showcase the superior performance of BANet. Specifically, our BANet outperforms state-of-the-art models with F1 scores of 90.00% and 63.00% for the IMBCD-WHU and IMBCD-Inria datasets. This confirms that the leverage of bilateral attention blocks (BAB) can boost performance.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[90]

Y. Mu, J. Guo, M. Shahzad and X. Zhu.
National-scale tree species mapping with deep learning reveals forest management insights in Germany.
International Journal of Applied Earth Observation and Geoinformation 139.104522 (May. 2025). DOI

Abstract

Accurate tree species distribution is essential for biodiversity assessment, sustainable forest management, and environmental policy. However, mapping species over large areas with satellite data is challenging due to spectral mixing and complex spatial distribution. To address this, we developed a novel deep learning model, ForestFormer, using Sentinel-2 time series data to map eight dominant tree species in Germany. ForestFormer’s dual-branch network with spectral and spatial attention modules improves classification by highlighting species-specific characteristics. Cross-validation in 2,364 National Forest Inventory plots shows that ForestFormer achieves species classification accuracy ranging from 69% to 92%, with an average accuracy of 84%, outperforming existing baseline methods. The developed ForestFormer model can help generate a large-scale and reliable tree species map for Germany, which in turn provides crucial insights into the diverse characteristics of tree species to support forest management. Our analysis of results shows that Pine is the species most resistant to disturbances, while Douglas fir is the least. Northeastern regions of Germany exhibit particularly low levels of forest biodiversity, especially in the states of Brandenburg and Berlin, followed by neighboring states such as Sachsen-Anhalt, Mecklenburg-Vorpommern, Sachsen, and Niedersachsen. In addition, climatic factors, especially water deficit, are shown to play a very important role in determining tree species distribution patterns, followed by topographic and soil factors. These findings are anticipated to provide a critical basis for environmental policy formulation, particularly in forest management strategies responding to ongoing climate change.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[89]

H. Homm, J. Laakso and P. Rinke.
Efficient dataset generation for machine learning halide perovskite alloys.
Physical Review Materials 9.053802 (May. 2025). DOI

Abstract

Lead-based perovskite solar cells have reached high efficiencies, but toxicity and lack of stability hinder their wide-scale adoption. These issues have been partially addressed through compositional engineering of perovskite materials, but the vast complexity of the perovskite materials space poses a significant obstacle to exploration. We previously demonstrated how machine learning (ML) can accelerate property predictions for the CsPb⁢(Cl/Br)3 perovskite alloy. However, the substantial computational demand of density functional theory (DFT) calculations required for model training prevents applications to more complex materials. Here, we introduce a data-efficient scheme to facilitate model training, validated initially on CsPb⁢(Cl/Br)3 data and extended to the ternary alloy CsSn⁢(Cl/Br/I)3. Our approach employs clustering to construct a compact yet diverse initial dataset of atomic structures. We then apply a two-stage active learning approach to first improve the reliability of the ML-based structure relaxations and then refine accuracy near equilibrium structures. Tests for CsPb⁢(Cl/Br)3 demonstrate that our scheme reduces the number of required DFT calculations during the different parts of our proposed model training method by up to 20% and 50%. The fitted model for CsSn⁢(Cl/Br/I)3 is robust and highly accurate, evidenced by the convergence of all ML-based structure relaxations in our tests and an average relaxation error of only 0.5 meV/atom.

MCML Authors

Patrick Rinke

Prof. Dr.

AI-based Material Science

[88]

P. Henkel, J. Li and P. Rinke.
Design Rules for Optimizing Quaternary Mixed-Metal Chalcohalides.
Preprint (May. 2025). arXiv

Abstract

Quaternary mixed-metal M(II)2M(III)Ch2X3 chalcohalides are an emerging material class for photovoltaic absorbers that combines the beneficial optoelectronic properties of lead-based halide perovskites with the stability of metal chalcogenides. Inspired by the recent discovery of lead-free mixed-metal chalcohalides materials, we utilized a combination of density functional theory and machine learning to determine compositional trends and chemical design rules in the lead-free and lead-based materials spaces. We explored a total of 54 M(II)2M(III)Ch2X3 materials with M(II) = Sn, Pb, M(III) = In, Sb, Bi, Ch = S, Se, Te, and X = Cl, Br, I per phase (Cmcm, Cmc21 , and P21/c). The P21/c phase is the equilibrium phase at low temperatures, followed by Cmc21 and Cmcm. The fundamental band gaps in Cmcm and Cmc21 are smaller than those in P21/c, but direct band gaps are more common in Cmcm and Cmc21. The effective electron masses in P21/c are significantly larger compared to Cmcm and Cmc21, while the effective hole masses are nearly the same across all three phases. Using random forest regression, we found that the two electron acceptor sites (Ch and X) are crucial in shaping the properties of mixed-metal chalcohalide compounds. Furthermore, the electron donor sites (M(II) and M(III)) can be used to finetune the material properties to desired applications. These design rules enable precise tailoring of mixed-metal chalcohalide compounds for a variety of applications.

MCML Authors

Patrick Rinke

Prof. Dr.

AI-based Material Science

[87]

F. Zhang, Y. Shi and X. Zhu.
Global Collinearity-aware Polygonizer for Polygonal Building Mapping in Remote Sensing.
Preprint (May. 2025). arXiv GitHub

Abstract

This paper addresses the challenge of mapping polygonal buildings from remote sensing images and introduces a novel algorithm, the Global Collinearity-aware Polygonizer (GCP). GCP, built upon an instance segmentation framework, processes binary masks produced by any instance segmentation model. The algorithm begins by collecting polylines sampled along the contours of the binary masks. These polylines undergo a refinement process using a transformer-based regression module to ensure they accurately fit the contours of the targeted building instances. Subsequently, a collinearity-aware polygon simplification module simplifies these refined polylines and generate the final polygon representation. This module employs dynamic programming technique to optimize an objective function that balances the simplicity and fidelity of the polygons, achieving globally optimal solutions. Furthermore, the optimized collinearity-aware objective is seamlessly integrated into network training, enhancing the cohesiveness of the entire pipeline. The effectiveness of GCP has been validated on two public benchmarks for polygonal building mapping. Further experiments reveal that applying the collinearity-aware polygon simplification module to arbitrary polylines, without prior knowledge, enhances accuracy over traditional methods such as the Douglas-Peucker algorithm. This finding underscores the broad applicability of GCP.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[86]

S. Zhao, Z. Xiong, J. Zhao and X. Zhu.
ExEBench: Benchmarking Foundation Models on Extreme Earth Events.
Preprint (May. 2025). arXiv GitHub

Abstract

Our planet is facing increasingly frequent extreme events, which pose major risks to human lives and ecosystems. Recent advances in machine learning (ML), especially with foundation models (FMs) trained on extensive datasets, excel in extracting features and show promise in disaster management. Nevertheless, these models often inherit biases from training data, challenging their performance over extreme values. To explore the reliability of FM in the context of extreme events, we introduce textbf{ExE}Bench (textbf{Ex}treme textbf{E}arth Benchmark), a collection of seven extreme event categories across floods, wildfires, storms, tropical cyclones, extreme precipitation, heatwaves, and cold waves. The dataset features global coverage, varying data volumes, and diverse data sources with different spatial, temporal, and spectral characteristics. To broaden the real-world impact of FMs, we include multiple challenging ML tasks that are closely aligned with operational needs in extreme events detection, monitoring, and forecasting. ExEBench aims to (1) assess FM generalizability across diverse, high-impact tasks and domains, (2) promote the development of novel ML methods that benefit disaster management, and (3) offer a platform for analyzing the interactions and cascading effects of extreme events to advance our understanding of Earth system, especially under the climate change expected in the decades to come.

MCML Authors

Jie Zhao

Dr.

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[85]

J. de Ruite, N. Sairam, A. Camero, K. Rafiezadeh Shahi, X. Zhu, M. W. Smith and H. Kreibich.
The complex connection between flood risk and malaria dynamics in Sub-Saharan Africa.
EGU 2025 - General Assembly of the European Geosciences Union. Vienna, Austria, Apr 27-May 02, 2025. DOI

Abstract

Climate change projections for 2030 indicate a concerning increase in the frequency of floods, which is expected to result in significant economic damages and losses on a global scale. The growth of urbanization has indeed increased flood risk, highlighting the need for a prompt evaluation of economic losses to facilitate rapid response and effective reconstruction. However, providing timely and accurate economic damage assessment immediately after a flood event is difficult and associated with high uncertainty. Remote sensing data can support this task, but challenges such as cloud cover, infrequent return times from satellites, and the lack of ground truth data make supervised approaches challenging. To address these challenges, we propose a new economic damage assessment approach based on the analysis of multi-temporal and multi-source, Synthetic Aperture Radar (SAR) images before and after the flood peak with an unsupervised change detection method. This method utilizes computer vision techniques, specifically a pixel-based approach with SAR data (Sentinel-1 and TerraSAR-X/TanDEM-X) to monitor changes in buildings and the flood extension. It employs various threshold techniques and parameters to determine the optimal threshold values for highlighting changes and the presence of water. By using this method, our aim is to obtain an economic model based on pixels, which represents the volume of water surrounding or on each building and the flood extension. The purpose of this study is to support governments in decision-making processes and enable insurers to efficiently assess and compensate for damages caused by flood events.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[84]

S. Bai, A. Kruspe and X. Zhu.
Generating Synthetic Oracle Datasets to Analyze Noise Impact: A Study on Building Function Classification Using Tweets.
ECIR 2025 - 47th European Conference on Information Retrieval. Lucca, Italy, Apr 06-10, 2025. To be published.

Abstract

Geo-tagged tweets collected at the building level has patterns that aid in building function classification. However, this data source suffers from substantial noise, limiting its effectiveness. Conducting a systematic noise analysis requires a noise-free environment, which is difficult to obtain from real-world data. In this study, we propose an approach using an LLM-generated synthetic oracle dataset that contains only correctly assigned tweets aligned with their respective buildings. To make the dataset reflects real-world distributions, we use a data generation pipeline that integrates data attributes from real world into LLM prompts. To evaluate the utility of the synthetic dataset for noise analysis, we compare the performance of Naïve Bayes (NB) and mBERT classifiers on it against real-world noisy data. Additionally, we assess the dataset’s diversity by comparing Self-BLEU and perplexity scores against those of real-world datasets. Our findings reveal that while noise significantly disrupts mBERT’s contextual learning, its removal in the synthetic dataset enables mBERT to substantially outperform NB. This highlights that noise reduction is more effective than increasing model complexity for context-dependent text classification tasks. Moreover, despite reduced noise and sentence structure variations, the synthetic dataset preserves realistic linguistic characteristics. These results confirm that a synthetic oracle dataset provides an effective noise-free experimental environment for studying noise impact in text classification.

MCML Authors

Shanshan Bai

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[83]

Q. Li, H. Taubenböck and X. Zhu.
Identification of the potential for roof greening using remote sensing and deep learning.
Cities 159.105782 (Apr. 2025). DOI

Abstract

Under the mounting pressure from global warming, green roofs emerge as a valuable source for climate adaptation, particularly in compact metropolises where green space is limited. Consequently, there is a need to quantitatively evaluate the potential for roof greening where it is most needed and suitable. Despite the increasing importance of this issue, there have been limited studies on the effectiveness of remote sensing and deep learning in identifying the potential for roof greening in many cities. To address this, we have created a GreenRoof dataset, comprising approximately 6400 pairs of remote sensing images and corresponding masks of roofs with high greening potential in four European cities. Afterward, we exploit the capabilities of deep learning methods to identify roofs that are suitable for greening from remote sensing images. Using 15 German cities as a case study for future urban rooftop planning, we estimate the spatial potential for retrofitting green roofs. Structural parameters for prioritizing green roof implementation include vegetation coverage, thermal environment, and building density. Results indicate that the total area suitable for green roof retrofitting exceeds 20% of the roof area in the 15 German cities examined. The spatial analysis effectively reflects variation in demand and suitability for green roof retrofitting across different cities. In conclusion, this study provides a versatile screening approach utilizing remote sensing, deep learning, and spatial analysis, which can be readily adapted to inform municipal policies in other cities aiming to promote green roofs and enhance sustainable urban development.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[82]

Z. Yuan, Z. Xiong, L. Mou and X. Zhu.
ChatEarthNet: a global-scale image–text dataset empowering vision–language geo-foundation models.
Earth System Science Data 17.3 (Mar. 2025). DOI

Abstract

The rapid development of remote sensing technology has led to an exponential growth in satellite images, yet their inherent complexity often makes them difficult for non-expert users to understand. Natural language, as a carrier of human knowledge, can bridge the gap between common users and complicated satellite imagery. Additionally, when paired with visual data, natural language can be utilized to train large vision–language foundation models, significantly improving performance in various tasks. Despite these advancements, the remote sensing community still faces a challenge due to the lack of large-scale, high-quality vision–language datasets for satellite images. To address this challenge, we introduce a new image–text dataset, providing high-quality natural language descriptions for global-scale satellite data. Specifically, we utilize Sentinel-2 data for its global coverage as the foundational image source, employing semantic segmentation labels from the European Space Agency’s WorldCover project to enrich the descriptions of land cover types. By conducting in-depth semantic analysis, we formulate detailed prompts to elicit rich descriptions from ChatGPT. We then include a manual verification process to enhance the dataset’s quality further. This step involves manual inspection and correction to refine the dataset. Finally, we offer the community ChatEarthNet, a large-scale image–text dataset characterized by global coverage, high quality, wide-ranging diversity, and detailed descriptions. ChatEarthNet consists of 163 488 image–text pairs with captions generated by ChatGPT-3.5 and an additional 10 000 image–text pairs with captions generated by ChatGPT-4V(ision). This dataset has significant potential for both training and evaluating vision–language geo-foundation models for remote sensing. The code is publicly available at https://doi.org/10.5281/zenodo.11004358 (Yuan et al., 2024b), and the ChatEarthNet dataset is available at https://doi.org/10.5281/zenodo.11003436 (Yuan et al., 2024c).

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[81]

S. Garske, K. Heidler, B. Evans, K. Wong and X. Zhu.
SHAZAM: Self-Supervised Change Monitoring for Hazard Detection and Mapping.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Early Access (Mar. 2025). DOI GitHub

Abstract

The increasing frequency of environmental hazards due to climate change underscores the urgent need for effective monitoring systems. Current approaches either rely on expensive labelled datasets, struggle with seasonal variations, or require multiple observations for confirmation (which delays detection). To address these challenges, this work presents SHAZAM - Self-Supervised Change Monitoring for Hazard Detection and Mapping. SHAZAM uses a lightweight conditional UNet to generate expected images of a region of interest (ROI) for any day of the year, allowing for the direct modelling of normal seasonal changes and the ability to distinguish potential hazards. A modified structural similarity measure compares the generated images with actual satellite observations to compute region-level anomaly scores and pixel-level hazard maps. Additionally, a theoretically grounded seasonal threshold eliminates the need for dataset-specific optimisation. Evaluated on four diverse datasets that contain bushfires (wildfires), burned regions, extreme and out-of-season snowfall, floods, droughts, algal blooms, and deforestation, SHAZAM achieved F1 score improvements of between 0.066 and 0.234 over existing methods. This was achieved primarily through more effective hazard detection (higher recall) while using only 473K parameters. SHAZAM demonstrated superior mapping capabilities through higher spatial resolution and improved ability to suppress background features while accentuating both immediate and gradual hazards. SHAZAM has been established as an effective and generalisable solution for hazard detection and mapping across different geographical regions and a diverse range of hazards.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[80]

C. Liu, C. M. Albrecht, Y. Wang and X. Zhu.
CromSS: Cross-Modal Pretraining With Noisy Labels for Remote Sensing Image Segmentation.
IEEE Transactions on Geoscience and Remote Sensing 63 (Mar. 2025). DOI GitHub

Abstract

We explore the potential of large-scale noisily labeled data to enhance feature learning by pretraining semantic segmentation models within a multimodal framework for geospatial applications. We propose a novel cross-modal sample selection (CromSS) method, a weakly supervised pretraining strategy designed to improve feature representations through cross-modal consistency and noise mitigation techniques. Unlike conventional pretraining approaches, CromSS exploits massive amounts of noisy and easy-to-come-by labels for improved feature learning beneficial to semantic segmentation tasks. We investigate middle and late fusion strategies to optimize the multimodal pretraining architecture design. We also introduce a cross-modal sample selection module to mitigate the adverse effects of label noise, which employs a cross-modal entangling strategy to refine the estimated confidence masks within each modality to guide the sampling process. Additionally, we introduce a spatial–temporal label smoothing technique to counteract overconfidence for enhanced robustness against noisy labels. To validate our approach, we assembled the multimodal dataset, NoLDO-S12, which consists of a large-scale noisy label subset from Google’s Dynamic World (DW) dataset for pretraining and two downstream subsets with high-quality labels from Google DW and OpenStreetMap (OSM) for transfer learning. Experimental results on two downstream tasks and the publicly available DFC2020 dataset demonstrate that when effectively utilized, the low-cost noisy labels can significantly enhance feature learning for segmentation tasks.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[79]

Q. Xu, Y. Shi, J. Zhao and X. Zhu.
FloodCastBench: A Large-Scale Dataset and Foundation Models for Flood Modeling and Forecasting.
Scientific Data 12.431 (Mar. 2025). DOI

Abstract

Effective flood forecasting is crucial for informed decision-making and emergency response. Existing flood datasets mainly describe flood events but lack dynamic process data suitable for machine learning (ML). This work introduces the FloodCastBench dataset, designed for ML-based flood modeling and forecasting, featuring four major flood events: Pakistan 2022, UK 2015, Australia 2022, and Mozambique 2019. FloodCastBench details the process of flood dynamics data acquisition, starting with input data preparation (e.g., topography, land use, rainfall) and flood measurement data collection (e.g., SAR-based maps, surveyed outlines) for hydrodynamic modeling. We deploy a widely recognized finite difference numerical solution to construct high-resolution spatiotemporal dynamic processes with 30-m spatial and 300-second temporal resolutions. Flood measurement data are used to calibrate the hydrodynamic model parameters and validate the flood inundation maps. FloodCastBench provides comprehensive low-fidelity and high-fidelity flood forecasting datasets specifically for ML. Furthermore, we establish a benchmark of foundational models for neural flood forecasting using FloodCastBench, validating its effectiveness in supporting ML models for spatiotemporal, cross-regional, and downscaled flood forecasting.

MCML Authors

Qingsong Xu

Data Science in Earth Observation

Jie Zhao

Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[78]

S. Kondylatos, N. Bountos, D. Michail, X. Zhu, G. Camps-Valls and I. Papoutsis.
On the Generalization of Representation Uncertainty in Earth Observation.
Preprint (Mar. 2025). arXiv GitHub

Abstract

Recent advances in Computer Vision have introduced the concept of pretrained representation uncertainty, enabling zero-shot uncertainty estimation. This holds significant potential for Earth Observation (EO), where trustworthiness is critical, yet the complexity of EO data poses challenges to uncertainty-aware methods. In this work, we investigate the generalization of representation uncertainty in EO, considering the domain’s unique semantic characteristics. We pretrain uncertainties on large EO datasets and propose an evaluation framework to assess their zero-shot performance in multi-label classification and segmentation EO tasks. Our findings reveal that, unlike uncertainties pretrained on natural images, EO-pretraining exhibits strong generalization across unseen EO domains, geographic locations, and target granularities, while maintaining sensitivity to variations in ground sampling distance. We demonstrate the practical utility of pretrained uncertainties showcasing their alignment with task-specific uncertainties in downstream tasks, their sensitivity to real-world EO image noise, and their ability to generate spatial uncertainty estimates out-of-the-box. Initiating the discussion on representation uncertainty in EO, our study provides insights into its strengths and limitations, paving the way for future research in the field.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[77]

I. Obadic, D. Kangin, D. Oliveira, P. Angelov and X. Zhu.
i-WiViG: Interpretable Window Vision GNN.
Preprint (Mar. 2025). arXiv

Abstract

Deep learning models based on graph neural networks have emerged as a popular approach for solving computer vision problems. They encode the image into a graph structure and can be beneficial for efficiently capturing the long-range dependencies typically present in remote sensing imagery. However, an important drawback of these methods is their black-box nature which may hamper their wider usage in critical applications. In this work, we tackle the self-interpretability of the graph-based vision models by proposing our Interpretable Window Vision GNN (i-WiViG) approach, which provides explanations by automatically identifying the relevant subgraphs for the model prediction. This is achieved with window-based image graph processing that constrains the node receptive field to a local image region and by using a self-interpretable graph bottleneck that ranks the importance of the long-range relations between the image regions. We evaluate our approach to remote sensing classification and regression tasks, showing it achieves competitive performance while providing inherent and faithful explanations through the identified relations. Further, the quantitative evaluation reveals that our model reduces the infidelity of post-hoc explanations compared to other Vision GNN models, without sacrificing explanation sparsity.

MCML Authors

Ivica Obadic

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[76]

Y. Wang, Z. Xiong, C. Liu, A. J. Stewart, T. Dujardin, N. I. Bountos, A. Zavras, F. Gerken, I. Papoutsis, L. Leal-Taixé and X. Zhu.
Towards a Unified Copernicus Foundation Model for Earth Vision.
Preprint (Mar. 2025). arXiv GitHub

Abstract

Advances in Earth observation (EO) foundation models have unlocked the potential of big satellite data to learn generic representations from space, benefiting a wide range of downstream applications crucial to our planet. However, most existing efforts remain limited to fixed spectral sensors, focus solely on the Earth’s surface, and overlook valuable metadata beyond imagery. In this work, we take a step towards next-generation EO foundation models with three key components: 1) Copernicus-Pretrain, a massive-scale pretraining dataset that integrates 18.7M aligned images from all major Copernicus Sentinel missions, spanning from the Earth’s surface to its atmosphere; 2) Copernicus-FM, a unified foundation model capable of processing any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding; and 3) Copernicus-Bench, a systematic evaluation benchmark with 15 hierarchical downstream tasks ranging from preprocessing to specialized applications for each Sentinel mission. Our dataset, model, and benchmark greatly improve the scalability, versatility, and multimodal adaptability of EO foundation models, while also creating new opportunities to connect EO, weather, and climate research.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Franziska Gerken

Computer Vision & Artificial Intelligence

Laura Leal-Taixé

Prof. Dr.

* Former Principal Investigator

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[75]

Y. Mu, M. Shahzad and X. Zhu.
MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification.
AAAI 2025 - 39th Conference on Artificial Intelligence. Philadelphia, PA, USA, Feb 25-Mar 04, 2025. DOI

Abstract

Multivariate Time Series Classification (MTSC) is crucial in extensive practical applications, such as environmental monitoring, medical EEG analysis, and action recognition. Real-world time series datasets typically exhibit complex dynamics. To capture this complexity, RNN-based, CNN-based, Transformer-based, and hybrid models have been proposed. Unfortunately, current deep learning-based methods often neglect the simultaneous construction of local features and global dependencies at different time scales, lacking sufficient feature extraction capabilities to achieve satisfactory classification accuracy. To address these challenges, we propose a novel Multiscale Periodic Time Series Network (MPTSNet), which integrates multiscale local patterns and global correlations to fully exploit the inherent information in time series. Recognizing the multi-periodicity and complex variable correlations in time series, we use the Fourier transform to extract primary periods, enabling us to decompose data into multiscale periodic segments. Leveraging the inherent strengths of CNN and attention mechanism, we introduce the PeriodicBlock, which adaptively captures local patterns and global dependencies while offering enhanced interpretability through attention integration across different periodic scales. The experiments on UEA benchmark datasets demonstrate that the proposed MPTSNet outperforms 21 existing advanced baselines in the MTSC tasks.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[74]

X.-Y. Tong, R. Dong and X. Zhu.
Global high categorical resolution land cover mapping via weak supervision.
ISPRS Journal of Photogrammetry and Remote Sensing 220 (Feb. 2025). DOI GitHub

Abstract

Land cover information is indispensable for advancing the United Nations’ sustainable development goals, and land cover mapping under a more detailed category system would significantly contribute to economic livelihood tracking and environmental degradation measurement. However, the substantial difficulty in acquiring fine-grained training data makes the implementation of this task particularly challenging. Here, we propose to combine fully labeled source domain and weakly labeled target domain for weakly supervised domain adaptation (WSDA). This is beneficial as the utilization of sparse and coarse weak labels can considerably alleviate the labor required for precise and detailed land cover annotation. Specifically, we introduce the Prototype-based pseudo-label Rectification and Expansion (PRE) approach, which leverages the prototypes (i.e., the class-wise feature centroids) as the bridge to connect sparse labels and global feature distributions. According to the feature distances to the prototypes, the confidence of pseudo-labels predicted in the unlabeled regions of the target domain is assessed. This confidence is then utilized to guide the dynamic expansion and rectification of pseudo-labels. Based on PRE, we carry out high categorical resolution land cover mapping for 10 cities in different regions around the world, severally using PlanetScope, Gaofen-1, and Sentinel-2 satellite images. In the study areas, we achieve cross-sensor, cross-category, and cross-continent WSDA, with the overall accuracy exceeding 80%. The promising results indicate that PRE is capable of reducing the dependency of land cover classification on high-quality annotations, thereby improving label efficiency. We expect our work to enable global fine-grained land cover mapping, which in turn promote Earth observation to provide more precise and thorough information for environmental monitoring.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[73]

V. Steidl, J. L. Bamber and X. Zhu.
Physics-aware machine learning for glacier ice thickness estimation: a case study for Svalbard.
The Cryosphere 19.2 (Feb. 2025). DOI

Abstract

The ice thickness of the world’s glaciers is mostly unmeasured, and physics-based models to reconstruct ice thickness cannot always deliver accurate estimates. In this study, we use deep learning paired with physical knowledge to generate ice thickness estimates for all glaciers of Spitsbergen, Barentsøya, and Edgeøya in Svalbard. We incorporate mass conservation and other physically derived conditions into a neural network to predict plausible ice thicknesses even for glaciers without any in situ ice thickness measurements. With a glacier-wise cross-validation scheme, we evaluate the performance of the physics-informed neural network. The results of these proof-of-concept experiments let us identify several challenges and opportunities that affect the model’s performance in a real-world setting.

MCML Authors

Viola Steidl

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[72]

F. P. D. Frederico P. Delgado, F. Simões, L. Kronik, W. Kaiser and D. A. Egger.
Machine-Learning Force Fields Reveal Shallow Electronic States on Dynamic Halide Perovskite Surfaces.
Preprint (Feb. 2025). arXiv

Abstract

The spectacular performance of halide perovskites in optoelectronic devices is rooted in their favorable tolerance to structural defects. Previous studies showed that defects in these materials generate shallow electronic states that do not degrade device performance. However, how these shallow states persist amid the pronounced thermally-stimulated atomic dynamics on halide perovskite surfaces remains unknown. This work reveals that electronic states at surfaces of the prototypical CsPbBr3 variant are energetically distributed at room temperature, akin to well-passivated inorganic semiconductors, even when covalent bonds remain cleaved and undercoordinated. Specifically, a striking tendency for shallow surface states is found with approximately 70% of surface-state energies appearing within 0.2 eV or ≈8kBT from the valence-band edge. Furthermore, we show that even when surface states appear deeper in the gap, they are not energetically isolated and are less likely to act as traps. We achieve this result by accelerating first-principles calculations via machine-learning techniques and show that the unique atomic dynamics in these materials render the formation of deep electronic states at their surfaces unlikely. These findings reveal the microscopic mechanism behind the low density of deep defect states at dynamic halide perovskite surfaces, which is key to their exceptional performance in devices.

MCML Authors

David Egger

Prof. Dr.

Theory of Functional Energy Materials

[71]

L. Merker, M. Blessing, B. Zhang and H. S. Stein.
Information dense and industry scalable accelerated formation.
Preprint (Feb. 2025). DOI

Abstract

Bespoke formation of Batteries offers improved lifetime and performance but is generally associated with long processing times, high cost, and large floorspace. Facile strategies like heating or increasing the formation current, as well as current alterations during formation have their limits in speed up and efficiency. We present pulsed formation on graphitic anode full cells as an accelerated formation strategy and investigate its influence on various quality parameters. Optimized pulsed charging is demonstrated herein to reduce the formation time by more than 50% whilst maintaining or improving all other cell quality parameters including discharge capacity. The newly discovered protocol is scaled up to 25Ah prismatic cells in the PHEV1 format that confirm the accelerated and improved pulsed formation strategy. We attribute the accelerated and improved formation to an apt balance of surface and bulk diffusion which results in thinner, more homogenous SEI. Dynamics of pulsed formation also allow for the extraction of new quality markers while formation is happening.

MCML Authors

Helge Stein

Prof. Dr.

Digital Catalysis

[70]

D. Racek, Q. Zhang, P. Thurner, X. Zhu and G. Kauermann.
Unsupervised Detection of Building Destruction during War from Publicly Available Radar Satellite Imagery.
Preprint (Feb. 2025). DOI

Abstract

The timely automated detection of building destruction in conflict zones is crucial for human rights monitoring, humanitarian response, and academic research. However, current approaches rely on expensive proprietary satellite imagery, limiting their scalability and accessibility. This study addresses these challenges by introducing an automated and unsupervised method that uses freely available Sentinel-1 synthetic aperture radar (SAR) imagery from the European Space Agency (ESA). By statistically assessing interferometric coherence changes over time, our approach enables the timely detection of building destruction at scale without requiring labeled training data, which are often not available in conflict-affected regions. We validate our method across three case studies, Beirut, Mariupol, and Gaza, demonstrating its ability to capture diverse patterns of destruction and their spatio-temporal dynamics, despite the moderate resolution of Sentinel-1 imagery. Our approach offers a scalable, global, and cost-effective solution for detecting building destruction in conflict zones.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[69]

X. Xue and X. Zhu.
Regression in EO: Are VLMs Up to the Challenge?
Preprint (Feb. 2025). arXiv

Abstract

Earth Observation (EO) data encompass a vast range of remotely sensed information, featuring multi-sensor and multi-temporal, playing an indispensable role in understanding our planet’s dynamics. Recently, Vision Language Models (VLMs) have achieved remarkable success in perception and reasoning tasks, bringing new insights and opportunities to the EO field. However, the potential for EO applications, especially for scientific regression related applications remains largely unexplored. This paper bridges that gap by systematically examining the challenges and opportunities of adapting VLMs for EO regression tasks. The discussion first contrasts the distinctive properties of EO data with conventional computer vision datasets, then identifies four core obstacles in applying VLMs to EO regression: 1) the absence of dedicated benchmarks, 2) the discrete-versus-continuous representation mismatch, 3) cumulative error accumulation, and 4) the suboptimal nature of text-centric training objectives for numerical tasks. Next, a series of methodological insights and potential subtle pitfalls are explored. Lastly, we offer some promising future directions for designing robust, domain-aware solutions. Our findings highlight the promise of VLMs for scientific regression in EO, setting the stage for more precise and interpretable modeling of critical environmental processes.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[68]

A. Zavras, D. Michail, X. Zhu, B. Demir and I. Papoutsis.
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis.
Preprint (Feb. 2025). arXiv

Abstract

The continuous operation of Earth-orbiting satellites generates vast and ever-growing archives of Remote Sensing (RS) images. Natural language presents an intuitive interface for accessing, querying, and interpreting the data from such archives. However, existing Vision-Language Models (VLMs) are predominantly trained on web-scraped, noisy image-text data, exhibiting limited exposure to the specialized domain of RS. This deficiency results in poor performance on RS-specific tasks, as commonly used datasets often lack detailed, scientifically accurate textual descriptions and instead emphasize solely on attributes like date and location. To bridge this critical gap, we introduce GAIA, a novel dataset designed for multi-scale, multi-sensor, and multi-modal RS image analysis. GAIA comprises of 205,150 meticulously curated RS image-text pairs, representing a diverse range of RS modalities associated to different spatial resolutions. Unlike existing vision-language datasets in RS, GAIA specifically focuses on capturing a diverse range of RS applications, providing unique information about environmental changes, natural disasters, and various other dynamic phenomena. The dataset provides a spatially and temporally balanced distribution, spanning across the globe, covering the last 25 years with a balanced temporal distribution of observations. GAIA’s construction involved a two-stage process: (1) targeted web-scraping of images and accompanying text from reputable RS-related sources, and (2) generation of five high-quality, scientifically grounded synthetic captions for each image using carefully crafted prompts that leverage the advanced vision-language capabilities of GPT-4o. Our extensive experiments, including fine-tuning of CLIP and BLIP2 models, demonstrate that GAIA significantly improves performance on RS image classification, cross-modal retrieval and image captioning tasks.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[67]

A. Sanin, J. K. Flowers, T. H. Piotrowiak, F. Felsen, L. Merker, A. Ludwig, D. Bresser and H. S. Stein.
Integrating Automated Electrochemistry and High-Throughput Characterization with Machine Learning to Explore Si─Ge─Sn Thin-Film Lithium Battery Anodes.
Advanced Energy Materials Early Access.2404961 (Jan. 2025). DOI

Abstract

High-performance batteries need accelerated discovery and optimization of new anode materials. Herein, we explore the Si─Ge─Sn ternary alloy system as a candidate fast-charging anode materials system by utilizing a scanning droplet cell (SDC) as an autonomous electrochemical characterization tool with the goal of subsequent upscaling. As the SDC is performing experiments sequentially, an exploration of the entire ternary space is unfeasible due to time constraints. Thus, closed-loop optimization, guided by real-time data analysis and sequential learning algorithms, is utilized to direct experiments. The lead material identified is scaled up to a coin cell to validate the findings from the autonomous millimeter-scale thin-film electrochemical experimentation. Explainable machine learning (ML) models incorporating data from high-throughput Raman spectroscopy and X-ray diffraction (XRD) are used to elucidate the effect of short and long-range ordering on material performance.

MCML Authors

Helge Stein

Prof. Dr.

Digital Catalysis

[66]

F. Bortolussi, H. Sandström, F. Partovi, J. Mikkilä, P. Rinke and M. Rissanen.
Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning.
Atmospheric Chemistry and Physics 25.1 (Jan. 2025). DOI

Abstract

Chemical ionization mass spectrometry (CIMS) is widely used in atmospheric chemistry studies. However, due to the complex interactions between reagent ions and target compounds, chemical understanding remains limited and compound identification difficult. In this study, we apply machine learning to a reference dataset of pesticides in two standard solutions to build a model that can provide insights from CIMS analyses in atmospheric science. The CIMS measurements were performed with an Orbitrap mass spectrometer coupled to a thermal desorption multi-scheme chemical ionization inlet unit (TD-MION-MS) with both negative and positive ionization modes utilizing Br−, , H3O+ and (CH3)2COH+ (AceH+) as reagent ions. We then trained two machine learning methods on these data: (1) random forest (RF) for classifying if a pesticide can be detected with CIMS and (2) kernel ridge regression (KRR) for predicting the expected CIMS signals. We compared their performance on five different representations of the molecular structure: the topological fingerprint (TopFP), the molecular access system keys (MACCS), a custom descriptor based on standard molecular properties (RDKitPROP), the Coulomb matrix (CM) and the many-body tensor representation (MBTR). The results indicate that MACCS outperforms the other descriptors. Our best classification model reaches a prediction accuracy of 0.85 ± 0.02 and a receiver operating characteristic curve area of 0.91 ± 0.01. Our best regression model reaches an accuracy of 0.44 ± 0.03 logarithmic units of the signal intensity. Subsequent feature importance analysis of the classifiers reveals that the most important sub-structures are NH and OH for the negative ionization schemes and nitrogen-containing groups for the positive ionization schemes.

MCML Authors

Patrick Rinke

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

AI-based Material Science

[65]

W. Huang, Z. Gu, Y. Shi, Z. Xiong and X. Zhu.
Semi-Supervised Building Footprint Extraction Using Debiased Pseudo-Labels.
IEEE Transactions on Geoscience and Remote Sensing 63 (Jan. 2025). DOI GitHub

Abstract

Accurate extraction of building footprints from satellite imagery is of high value. Currently, deep learning methods are predominant in this field due to their powerful representation capabilities. However, they generally require extensive pixel-wise annotations, which constrains their practical application. Semi-supervised learning (SSL) significantly mitigates this requirement by leveraging large volumes of unlabeled data for model self-training (ST), thus enhancing the viability of building footprint extraction. Despite its advantages, SSL faces a critical challenge: the imbalanced distribution between the majority background class and the minority building class, which often results in model bias toward the background during training. To address this issue, this article introduces a novel method called DeBiased matching (DBMatch) for semi-supervised building footprint extraction. DBMatch comprises three main components: 1) a basic supervised learning module (SUP) that uses labeled data for initial model training; 2) a classical weak-to-strong ST module that generates pseudo-labels from unlabeled data for further model ST; and 3) a novel logit debiasing (LDB) module that calculates a global logit bias between building and background, allowing for dynamic pseudo-label calibration. To verify the effectiveness of the proposed DBMatch, extensive experiments are performed on three public building footprint extraction datasets covering six global cities in SSL setting. The experimental results demonstrate that our method significantly outperforms some advanced SSL methods in semi-supervised building footprint extraction.

MCML Authors

Ziqi Gu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[64]

F. Fan, Y. Shi, T. Guggemos and X. Zhu.
Hybrid Quantum Deep Learning With Superpixel Encoding for Earth Observation Data Classification.
IEEE Transactions on Neural Networks and Learning Systems Early Access (Jan. 2025). DOI URL

Abstract

Earth observation (EO) has inevitably entered the Big Data era. The computational challenge associated with analyzing large EO data using sophisticated deep learning models has become a significant bottleneck. To address this challenge, there has been a growing interest in exploring quantum computing as a potential solution. However, the process of encoding EO data into quantum states for analysis potentially undermines the efficiency advantages gained from quantum computing. This article introduces a hybrid quantum deep learning model that effectively encodes and analyzes EO data for classification tasks. The proposed model uses an efficient encoding approach called superpixel encoding, which reduces the quantum resources required for large image representation by incorporating the concept of superpixels. To validate the effectiveness of our model, we conducted evaluations on multiple EO benchmarks, including Overhead-MNIST, So2Sat LCZ42, and SAT-6 datasets. In addition, we studied the impacts of different interaction gates and measurements on classification performance to guide model optimization. The experimental results suggest the validity of our model for accurate classification of EO data.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[63]

T. Li, S. Hofer, G. Moholdt, A. Igneczi, K. Heidler, X. Zhu and J. Bamber.
Pervasive glacier retreats across Svalbard from 1985 to 2023.
Nature Communications 16.705 (Jan. 2025). DOI

Abstract

A major uncertainty in predicting the behaviour of marine-terminating glaciers is ice dynamics driven by non-linear calving front retreat, which is poorly understood and modelled. Using 124919 calving front positions for 149 marine-terminating glaciers in Svalbard from 1985 to 2023, generated with deep learning, we identify pervasive calving front retreats for non-surging glaciers over the past 38 years. We observe widespread seasonal cycles in calving front position for over half of the glaciers. At the seasonal timescale, peak retreat rates exhibit a several-month phase lag, with changes on the west coast occurring before those on the east coast, coincident with regional ocean warming. This spatial variability in seasonal patterns is linked to different timings of warm ocean water inflow from the West Spitsbergen Current, demonstrating the dominant role of ice-ocean interaction in seasonal front changes. The interannual variability of calving front retreat shows a strong sensitivity to both atmospheric and oceanic warming, with immediate responses to large air and ocean temperature anomalies in 2016 and 2019, likely driven by atmospheric blocking that can influence extreme temperature variability. With more frequent blocking occurring and continued regional warming, future calving front retreats will likely intensify, leading to more significant glacier mass loss.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[62]

K. Ghosh, M. Todorović, A. Vehtari and P. Rinke.
Active learning of molecular data for task-specific objectives.
The Journal of Chemical Physics 162.014103 (Jan. 2025). DOI

Abstract

Active learning (AL) has shown promise to be a particularly data-efficient machine learning approach. Yet, its performance depends on the application, and it is not clear when AL practitioners can expect computational savings. Here, we carry out a systematic AL performance assessment for three diverse molecular datasets and two common scientific tasks: compiling compact, informative datasets and targeted molecular searches. We implemented AL with Gaussian processes (GP) and used the many-body tensor as molecular representation. For the first task, we tested different data acquisition strategies, batch sizes, and GP noise settings. AL was insensitive to the acquisition batch size, and we observed the best AL performance for the acquisition strategy that combines uncertainty reduction with clustering to promote diversity. However, for optimal GP noise settings, AL did not outperform the randomized selection of data points. Conversely, for targeted searches, AL outperformed random sampling and achieved data savings of up to 64%. Our analysis provides insight into this task-specific performance difference in terms of target distributions and data collection strategies. We established that the performance of AL depends on the relative distribution of the target molecules in comparison to the total dataset distribution, with the largest computational savings achieved when their overlap is minimal.

MCML Authors

Patrick Rinke

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Daniel Grün

AI-based Material Science

[61]

J. Homer and O. Friedrich.
SBIAX: Density-estimation simulation-based inference in JAX.
The Journal of Open Source Software 10.105 (Jan. 2025). DOI

Abstract

In a typical Bayesian inference problem, the data likelihood is not known. However, in recent
years, machine learning methods for density estimation can allow for inference using an estimator
of the data likelihood. This likelihood estimator is fit with neural networks that are trained on
simulations to maximise the likelihood of the simulation-parameter pairs - one of the many
available tools for Simulation Based Inference (SBI), (Cranmer et al., 2020)…

MCML Authors

Jed Homer

Astrophysics, Cosmology and Artificial Intelligence

[60]

T. Beker and X. Zhu.
Volcanic Deformation Monitoring utilizing Deep Learning and Wavelet Transform.
AGU 2024 - American Geophysical Union Annual Meeting. Washington D.C., USA, Dec 09-13, 2024. URL

Abstract

There are 20-50 new volcanic eruptions annually, which often do not have onsite monitoring. InSAR can be used to globally monitor volcanic deformations, even in hard-to-reach areas. With state-of-the-art persistent and distributed scatterer processing, InSAR data can even point to the volcanoes’ subtle, few mm/year changes and deep learning (DL) models can red flag them. Our research leverages the practical application of DL with a classification architecture, InceptionResNet v2, to identify InSAR data containing volcanic deformations. We utilize 5-year-long deformation maps covering the Central Volcanic Zone in the South American Andes, reserving the area known for its volcanoes for testing. The remaining data, in combination with synthetic volcanic deformations, is used for training. The explainability tool, Grad-CAM, shows that due to the nature of subtle volcanic deformations observed by InSAR, the model is struggling to delineate and distinguish volcanic deformation signals. We use wavelet transformations and filtering to enhance the data and improve the DL model performance. Daubechies 2 wavelet transform accentuates subtle large-surface signals, which are often volcanic in nature while removing the subtle high-frequency patterns. The DL models are trained, and each is tested on the data with a different number of wavelet transforms from 0-4. The model trained and tested on original data achieved a 64.02% AUC ROC average over 3 runs, while when tested on data two times transformed by wavelet transform, it improved to 84.14% AUC ROC average over 3 runs. These findings prove that Daubechies 2 wavelet transform cleans data while exaggerating the volcanic deformation. It also enlarges the small point deformation sources large in intensity, which can be solved by filtering beforehand. The models trained and used in this way detect all 5 different subtle volcanic deformations in the region, with smallest being 5 mm/year.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[59]

A. Höhl, I. Obadic, M.-Á. Fernández-Torres, H. Najjar, D. Oliveira, Z. Akata, A. Dengel and X. Zhu.
Opening the Black Box: A systematic review on explainable artificial intelligence in remote sensing.
IEEE Geoscience and Remote Sensing Magazine 12.4 (Dec. 2024). DOI

Abstract

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the explainable AI methods used and their objectives, findings, and challenges in remote sensing applications is still missing. In this paper, we address this gap by performing a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches and emerging directions that tackle specific remote sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights, and reflect on the approaches used for the evaluation of explainable AI methods. As such, our review provides a complete summary of the state-of-the-art of explainable AI in remote sensing. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field.

MCML Authors

Adrian Höhl

Data Science in Earth Observation

Ivica Obadic

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Zeynep Akata

Prof. Dr.

Interpretable and Reliable Machine Learning

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[58]

S. Zhao, Z. Chen, Z. Xiong, Y. Shi, S. Saha and X. Zhu.
Beyond Grid Data: Exploring graph neural networks for Earth observation.
IEEE Geoscience and Remote Sensing Magazine Early Access (Dec. 2024). DOI

Abstract

Earth Observation (EO) data analysis has been significantly revolutionized by deep learning (DL), with applications typically limited to grid-like data structures. Graph Neural Networks (GNNs) emerge as an important innovation, propelling DL into the non-Euclidean domain. Naturally, GNNs can effectively tackle the challenges posed by diverse modalities, multiple sensors, and the heterogeneous nature of EO data. To introduce GNNs in the related domains, our review begins by offering fundamental knowledge on GNNs. Then, we summarize the generic problems in EO, to which GNNs can offer potential solutions. Following this, we explore a broad spectrum of GNNs’ applications to scientific problems in Earth systems, covering areas such as weather and climate analysis, disaster management, air quality monitoring, agriculture, land cover classification, hydrological process modeling, and urban modeling. The rationale behind adopting GNNs in these fields is explained, alongside methodologies for organizing graphs and designing favorable architectures for various tasks. Furthermore, we highlight methodological challenges of implementing GNNs in these domains and possible solutions that could guide future research. While acknowledging that GNNs are not a universal solution, we conclude the paper by comparing them with other popular architectures like transformers and analyzing their potential synergies.

MCML Authors

Zhaiyu Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[57]

Z. Chen, Y. Shi, L. Nan, Z. Xiong and X. Zhu.
PolyGNN: Polyhedron-based graph neural network for 3D building reconstruction from point clouds.
ISPRS Journal of Photogrammetry and Remote Sensing 218.A (Dec. 2024). DOI GitHub

Abstract

We present PolyGNN, a polyhedron-based graph neural network for 3D building reconstruction from point clouds. PolyGNN learns to assemble primitives obtained by polyhedral decomposition via graph node classification, achieving a watertight and compact reconstruction. To effectively represent arbitrary-shaped polyhedra in the neural network, we propose a skeleton-based sampling strategy to generate polyhedron-wise queries. These queries are then incorporated with inter-polyhedron adjacency to enhance the classification. PolyGNN is end-to-end optimizable and is designed to accommodate variable-size input points, polyhedra, and queries with an index-driven batching technique. To address the abstraction gap between existing city-building models and the underlying instances, and provide a fair evaluation of the proposed method, we develop our method on a large-scale synthetic dataset with well-defined ground truths of polyhedral labels. We further conduct a transferability analysis across cities and on real-world point clouds. Both qualitative and quantitative results demonstrate the effectiveness of our method, particularly its efficiency for large-scale reconstructions.

MCML Authors

Zhaiyu Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Daniel Grün

Data Science in Earth Observation

[56]

J. Homer, O. Friedrich and D. Grün.
Simulation-based inference has its own Dodelson-Schneider effect (but it knows that it does).
Preprint (Dec. 2024). arXiv

Abstract

Making inferences about physical properties of the Universe requires knowledge of the data likelihood. A Gaussian distribution is commonly assumed for the uncertainties with a covariance matrix estimated from a set of simulations. The noise in such covariance estimates causes two problems: it distorts the width of the parameter contours, and it adds scatter to the location of those contours which is not captured by the widths themselves. For non-Gaussian likelihoods, an approximation may be derived via Simulation-Based Inference (SBI). It is often implicitly assumed that parameter constraints from SBI analyses, which do not use covariance matrices, are not affected by the same problems as parameter estimation with a covariance matrix estimated from simulations. We investigate whether SBI suffers from effects similar to those of covariance estimation in Gaussian likelihoods. We use Neural Posterior and Likelihood Estimation with continuous and masked autoregressive normalizing flows for density estimation. We fit our approximate posterior models to simulations drawn from a Gaussian linear model, so that the SBI result can be compared to the true posterior. We test linear and neural network based compression, demonstrating that neither methods circumvent the issues of covariance estimation. SBI suffers an inflation of posterior variance that is equal or greater than the analytical result in covariance estimation for Gaussian likelihoods for the same number of simulations. The assumption that SBI requires a smaller number of simulations than covariance estimation for a Gaussian likelihood analysis is inaccurate. The limitations of traditional likelihood analysis with simulation-based covariance remain for SBI with a finite simulation budget. Despite these issues, we show that SBI correctly draws the true posterior contour given enough simulations.

MCML Authors

Jed Homer

Astrophysics, Cosmology and Artificial Intelligence

Daniel Grün

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Patrick Rinke

Astrophysics, Cosmology and Artificial Intelligence

[55]

P. Pisal, O. Krejci and P. Rinke.
Machine-learning Accelerated Descriptor Design for Catalyst Discovery: A CO2 to Methanol Conversion Case Study.
Preprint (Dec. 2024). arXiv

Abstract

Transforming CO2 into methanol represents a crucial step towards closing the carbon cycle, with thermoreduction technology nearing industrial application. However, obtaining high methanol yields and ensuring the stability of heterocatalysts remain significant challenges. Herein, we present a sophisticated computational framework to accelerate the discovery of novel thermal heterogeneous catalysts, using machine-learned force fields. We propose a new catalytic descriptor, termed adsorption energy distribution, that aggregates the binding energies for different catalyst facets, binding sites, and adsorbates. The descriptor is versatile and can easily be adjusted to a specific reaction through careful choice of the key-step reactants and reaction intermediates. By applying unsupervised machine learning and statistical analysis to a dataset comprising nearly 160 metallic alloys, we offer a powerful tool for catalyst discovery. Finally, we propose new promising candidate materials such as ZnRh and ZnPt3, which to our knowledge, have not yet been tested, and discuss their possible advantage in terms of stability.

MCML Authors

Prajwal Pisal

AI-based Material Science

Patrick Rinke

Prof. Dr.

AI-based Material Science

[54]

Y. Wang, Q. Song, D. Wasif, M. Shahzad, C. Koller, J. Bamber and X. Zhu.
How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning.
Preprint (Dec. 2024). arXiv GitHub

Abstract

Uncertainty quantification (UQ) is essential for assessing the reliability of Earth observation (EO) products. However, the extensive use of machine learning models in EO introduces an additional layer of complexity, as those models themselves are inherently uncertain. While various UQ methods do exist for machine learning models, their performance on EO datasets remains largely unevaluated. A key challenge in the community is the absence of the ground truth for uncertainty, i.e. how certain the uncertainty estimates are, apart from the labels for the image/signal. This article fills this gap by introducing three benchmark datasets specifically designed for UQ in EO machine learning models. These datasets address three common problem types in EO: regression, image segmentation, and scene classification. They enable a transparent comparison of different UQ methods for EO machine learning models. We describe the creation and characteristics of each dataset, including data sources, preprocessing steps, and label generation, with a particular focus on calculating the reference uncertainty. We also showcase baseline performance of several machine learning models on each dataset, highlighting the utility of these benchmarks for model development and comparison. Overall, this article offers a valuable resource for researchers and practitioners working in artificial intelligence for EO, promoting a more accurate and reliable quality measure of the outputs of machine learning models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[53]

X. Xue, G. Wei, H. Chen, H. Zhang, F. Lin, C. Shen and X. Zhu.
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation.
Preprint (Dec. 2024). arXiv

Abstract

The rapid evolution of Vision Language Models (VLMs) has catalyzed significant advancements in artificial intelligence, expanding research across various disciplines, including Earth Observation (EO). While VLMs have enhanced image understanding and data processing within EO, their applications have predominantly focused on image content description. This limited focus overlooks their potential in geographic and scientific regression tasks, which are essential for diverse EO applications. To bridge this gap, this paper introduces a novel benchmark dataset, called textbf{REO-Instruct} to unify regression and generation tasks specifically for the EO domain. Comprising 1.6 million multimodal EO imagery and language pairs, this dataset is designed to support both biomass regression and image content interpretation tasks. Leveraging this dataset, we develop REO-VLM, a groundbreaking model that seamlessly integrates regression capabilities with traditional generative functions. By utilizing language-driven reasoning to incorporate scientific domain knowledge, REO-VLM goes beyond solely relying on EO imagery, enabling comprehensive interpretation of complex scientific attributes from EO data. This approach establishes new performance benchmarks and significantly enhances the capabilities of environmental monitoring and resource management.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[52]

Q. Li, S. Krapf, L. Mou, Y. Shi and X. Zhu.
Deep learning-based framework for city-scale rooftop solar potential estimation by considering roof superstructures.
Applied Energy 374.123839 (Nov. 2024). DOI

Abstract

Solar energy is an environmentally friendly energy source. Identifying suitable rooftops for solar panel installation contributes to not only sustainable energy plans but also carbon neutrality goals. Aerial imagery, bolstered by its growing availability, is a cost-effective data source for rooftop solar potential assessment at large scale. Existing studies generally do not take roof superstructures into account when determining how many solar panels can be installed. This procedure will lead to an overestimation of solar potential. Only several works have considered this issue, but none have devised a network that can simultaneously learn roof orientations and roof superstructures. Therefore, we devise SolarNet+, a novel framework to improve the precision of rooftop solar potential estimation. After implementing SolarNet+ on a benchmark dataset, we find that SolarNet+ outperforms other state-of-the-art approaches in both tasks — roof orientations and roof superstructure segmentation. Moreover, the SolarNet+ framework enables rooftop solar estimation at large-scale applications for investigating the correlation between urban rooftop solar potential and various local climate zone (LCZ) types. The results in the city of Brussels reveal that three specific LCZ urban types exhibit the highest rooftop solar potential efficiency: compact highrise (LCZ1), compact midrise (LCZ2), and heavy industry (LCZ10). The annual photovoltaic potential for these LCZ types is reported as 10.56 , 11.77 , and 10.70 , respectively.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[51]

Y. Wang, H. Hernández Hernández, C. M. Albrecht and X. Zhu.
Feature Guided Masked Autoencoder for Self-Supervised Learning in Remote Sensing.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (Nov. 2024). DOI

Abstract

Self-supervised learning guided by masked image modeling, such as masked autoencoder (MAE), has attracted wide attention for pretraining vision transformers in remote sensing. However, MAE tends to excessively focus on pixel details, limiting the model’s capacity for semantic understanding, particularly for noisy synthetic aperture radar (SAR) images. In this article, we explore spectral and spatial remote sensing image features as improved MAE-reconstruction targets. We first conduct a study on reconstructing various image features, all performing comparably well or better than raw pixels. Based on such observations, we propose feature guided MAE (FG-MAE): reconstructing a combination of histograms of oriented gradients (HOG) and normalized difference indices (NDI) for multispectral images, and reconstructing HOG for SAR images. Experimental results on three downstream tasks illustrate the effectiveness of FG-MAE with a particular boost for SAR imagery (e.g., up to 5% better than MAE on EuroSAT-SAR). Furthermore, we demonstrate the well-inherited scalability of FG-MAE and release a first series of pretrained vision transformers for medium-resolution SAR and multispectral images.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[50]

Y. Yang, X. Sun, J. Dong, K.-M. Lam and X. Zhu.
Attention-ConvNet Network for Ocean-Front Prediction via Remote Sensing SST Images.
IEEE Transactions on Geoscience and Remote Sensing 62 (Nov. 2024). DOI GitHub

Abstract

Ocean front is one typical geophysical phenomenon acting as oases in the ocean for fishes and marine mammals. Accurate ocean-front prediction is critical for fishery and navigation safety. However, the formation and evolution of ocean fronts are inherently nonlinear and are influenced by various factors such as ocean currents, wind fields, and temperature changes, making ocean-front prediction a considerable challenge. This study proposes a temporal-sensitive network named Attention-ConvNet to address this challenge. Ocean fronts exhibit significant multiscale characteristics, requiring analysis and prediction across various temporal and spatial scales. The proposed network designs a hierarchical attention mechanism (HAM) that efficiently prioritizes relevant spatial and temporal information to meet the specific requirement. What is more, the proposed network uses a complex hierarchical branching convolutional network (HBCNet) architecture, which allows our network to leverage the complementary strengths of spatial and temporal information, effectively capturing the dynamic and complex variations in ocean fronts. In general, the network prioritizes and focuses on the most relevant information of front dynamics, which ensures its ability to effectively predict the ocean front. External experiments demonstrate that our network significantly outperforms conventional methods, confirming its capability for precise ocean-front prediction.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[49]

W. Yu, X. Zhang, R. Gloaguen, X. Zhu and P. Ghamisi.
MineNetCD: A Benchmark for Global Mining Change Detection on Remote Sensing Imagery.
IEEE Transactions on Geoscience and Remote Sensing 62 (Nov. 2024). DOI

Abstract

Monitoring land changes triggered by mining activities is crucial for industrial control, environmental management, and regulatory compliance, yet it poses significant challenges due to the vast and often remote locations of mining sites. Remote sensing technologies have increasingly become indispensable to detect and analyze these changes over time. We thus introduce MineNetCD, a comprehensive benchmark designed for global mining change detection using remote sensing imagery. The benchmark comprises three key contributions. First, we establish a global mining change detection dataset featuring more than 70k paired patches of bitemporal high-resolution remote sensing images and pixel-level annotations from 100 mining sites worldwide. Second, we develop a novel baseline model based on a change-aware fast Fourier transform (ChangeFFT) module, which enhances various backbones by leveraging essential spectrum components within features in the frequency domain and capturing the channelwise correlation of bitemporal feature differences to learn change-aware representations. Third, we construct a unified change detection (UCD) framework that currently integrates 20 change detection methods. This framework is designed for streamlined and efficient processing, using the cloud platform hosted by HuggingFace. Extensive experiments have been conducted to demonstrate the superiority of the proposed baseline model compared with 19 state-of-the-art change detection approaches. Empirical studies on modularized backbones comprehensively confirm the efficacy of different representation learners on change detection. This benchmark represents significant advancements in the field of remote sensing and change detection, providing a robust resource for future research and applications in global mining monitoring.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[48]

Z. Li, D. Muhtar, F. Gu, X. Zhang, P. Xiao, G. He and X. Zhu.
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation.
Preprint (Nov. 2024). arXiv GitHub

Abstract

Automatically and rapidly understanding Earth’s surface is fundamental to our grasp of the living environment and informed decision-making. This underscores the need for a unified system with comprehensive capabilities in analyzing Earth’s surface to address a wide range of human needs. The emergence of multimodal large language models (MLLMs) has great potential in boosting the efficiency and convenience of intelligent Earth observation. These models can engage in human-like conversations, serve as unified platforms for understanding images, follow diverse instructions, and provide insightful feedbacks. In this study, we introduce LHRS-Bot-Nova, an MLLM specialized in understanding remote sensing (RS) images, designed to expertly perform a wide range of RS understanding tasks aligned with human instructions. LHRS-Bot-Nova features an enhanced vision encoder and a novel bridge layer, enabling efficient visual compression and better language-vision alignment. To further enhance RS-oriented vision-language alignment, we propose a large-scale RS image-caption dataset, generated through feature-guided image recaptioning. Additionally, we introduce an instruction dataset specifically designed to improve spatial recognition abilities. Extensive experiments demonstrate superior performance of LHRS-Bot-Nova across various RS image understanding tasks. We also evaluate different MLLM performances in complex RS perception and instruction following using a complicated multi-choice question evaluation benchmark, providing a reliable guide for future model selection and improvement.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[47]

Z. Xiong, F. Zhang, Y. Wang, Y. Shi and X. Zhu.
EarthNets: Empowering artificial intelligence for Earth observation.
IEEE Geoscience and Remote Sensing Magazine Early Access (Oct. 2024). DOI GitHub

Abstract

Earth observation (EO), aiming at monitoring the state of planet Earth using remote sensing data, is critical for improving our daily lives and living environment. With a growing number of satellites in orbit, an increasing number of datasets with diverse sensors and research domains are being published to facilitate the research of the remote sensing community. This paper presents a comprehensive review of more than 500 publicly published datasets, including research domains like agriculture, land use and land cover, disaster monitoring, scene understanding, vision-language models, foundation models, climate change, and weather forecasting. We systematically analyze these EO datasets from four aspects: volume, resolution distributions, research domains, and the correlation between datasets. Based on the dataset attributes, we propose to measure, rank, and select datasets to build a new benchmark for model evaluation. Furthermore, a new platform for EO, termed EarthNets, is released to achieve a fair and consistent evaluation of deep learning methods on remote sensing data. EarthNets supports standard dataset libraries and cutting-edge deep learning models to bridge the gap between the remote sensing and machine learning communities. Based on this platform, extensive deep-learning methods are evaluated on the new benchmark. The insightful results are beneficial to future research.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[46]

Y. Wang, C. M. Albrecht and X. Zhu.
Multilabel-Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining.
IEEE Transactions on Geoscience and Remote Sensing 62 (Oct. 2024). DOI GitHub

Abstract

Self-supervised pretraining on large-scale satellite data has raised great interest in building Earth observation (EO) foundation models. However, many important resources beyond pure satellite imagery, such as land-cover-land-use products that provide free global semantic information, as well as vision foundation models that hold strong knowledge of the natural world, are not widely studied. In this work, we show these free additional resources not only help resolve common contrastive learning bottlenecks but also significantly boost the efficiency and effectiveness of EO pretraining. Specifically, we first propose soft contrastive learning (SoftCon) that optimizes cross-scene soft similarity based on land-cover-generated multilabel supervision, naturally solving the issue of multiple positive samples and too strict positive matching in complex scenes. Second, we revisit and explore cross-domain continual pretraining for both multispectral and synthetic aperture radar (SAR) imagery, building efficient EO foundation models from strongest vision models such as DINOv2. Adapting simple weight-initialization and Siamese masking strategies into our SoftCon framework, we demonstrate impressive continual pretraining performance even when the input modalities are not aligned. Without prohibitive training, we produce multispectral and SAR foundation models that achieve significantly better results in 10 out of 11 downstream tasks than most existing SOTA models. For example, our ResNet50/ViT-S achieve 84.8/85.0 linear probing mAP scores on BigEarthNet-10%, which are better than most existing ViT-L models; under the same setting, our ViT-B sets a new record of 86.8 in multispectral, and 82.5 in SAR, the latter even better than many multispectral models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[45]

Q. Xu, Y. Shi, J. Bamber, C. Ouyang and X. Zhu.
Large-scale flood modeling and forecasting with FloodCast.
Water Research 264 (Oct. 2024). DOI

Abstract

Large-scale hydrodynamic models generally rely on fixed-resolution spatial grids and model parameters as well as incurring a high computational cost. This limits their ability to accurately forecast flood crests and issue time-critical hazard warnings. In this work, we build a fast, stable, accurate, resolution-invariant, and geometry-adaptive flood modeling and forecasting framework that can perform at large scales, namely FloodCast. The framework comprises two main modules: multi-satellite observation and hydrodynamic modeling. In the multi-satellite observation module, a real-time unsupervised change detection method and a rainfall processing and analysis tool are proposed to harness the full potential of multi-satellite observations in large-scale flood prediction. In the hydrodynamic modeling module, a geometry-adaptive physics-informed neural solver (GeoPINS) is introduced, benefiting from the absence of a requirement for training data in physics-informed neural networks (PINNs) and featuring a fast, accurate, and resolution-invariant architecture with Fourier neural operators. To adapt to complex river geometries, we reformulate PINNs in a geometry-adaptive space. GeoPINS demonstrates impressive performance on popular partial differential equations across regular and irregular domains. Building upon GeoPINS, we propose a sequence-to-sequence GeoPINS model to handle long-term temporal series and extensive spatial domains in large-scale flood modeling. This model employs sequence-to-sequence learning and hard-encoding of boundary conditions. Next, we establish a benchmark dataset in the 2022 Pakistan flood using a widely accepted finite difference numerical solution to assess various flood simulation methods. Finally, we validate the model in three dimensions - flood inundation range, depth, and transferability of spatiotemporal downscaling - utilizing SAR-based flood data, traditional hydrodynamic benchmarks, and concurrent optical remote sensing images. Traditional hydrodynamics and sequence-to-sequence GeoPINS exhibit exceptional agreement during high water levels, while comparative assessments with SAR-based flood depth data show that sequence-to-sequence GeoPINS outperforms traditional hydrodynamics, with smaller simulation errors. The experimental results for the 2022 Pakistan flood demonstrate that the proposed method enables high-precision, large-scale flood modeling with an average MAPE of 14.93% and an average Mean Absolute Error (MAE) of 0.0610 m for 14-day water depth simulations while facilitating real-time flood hazard forecasting using reliable precipitation data.

MCML Authors

Qingsong Xu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[44]

W. Huang, Y. Shi, Z. Xiong and X. Zhu.
Representation Enhancement-Stabilization: Reducing Bias-Variance of Domain Generalization.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

Domain Generalization (DG) focuses on enhancing the generalization of deep learning models trained on multiple source domains to adapt to unseen target domains. This paper explores DG through the lens of bias-variance decomposition, uncovering that test errors in DG predominantly arise from cross-domain bias and variance. Inspired by this insight, we introduce a Representation Enhancement-Stabilization (RES) framework, comprising a Representation Enhancement (RE) module and a Representation Stabilization (RS) module. In RE, a novel set of feature frequency augmentation techniques is used to progressively reduce cross-domain bias during feature extraction. Furthermore, in RS, a novel Mutual Exponential Moving Average (MEMA) strategy is designed to stabilize model optimization for diminishing cross-domain variance during training. Collectively, the whole RES method can significantly enhance model generalization. We evaluate RES on five benchmark datasets and the results show that it outperforms multiple advanced DG methods.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[43]

Y. Wang, C. M. Albrecht, N. A. A. Braham, C. Liu, Z. Xiong and X. Zhu.
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning.
ECCV 2024 - 18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024. DOI GitHub

Abstract

The increasing availability of multi-sensor data sparks wide interest in multimodal self-supervised learning. However, most existing approaches learn only common representations across modalities while ignoring intra-modal training and modality-unique representations. We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning. By distinguishing inter- and intra-modal embeddings through multimodal redundancy reduction, DeCUR can integrate complementary information across different modalities. We evaluate DeCUR in three common multimodal scenarios (radar-optical, RGB-elevation, and RGB-depth), and demonstrate its consistent improvement regardless of architectures and for both multimodal and modality-missing settings. With thorough experiments and comprehensive analysis, we hope this work can provide valuable insights and raise more interest in researching the hidden relationships of multimodal representations.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[42]

K. Heidler, I. Nitze, G. Grosse and X. Zhu.
PixelDINO: Semi-Supervised Semantic Segmentation for Detecting Permafrost Disturbances in the Arctic.
IEEE Transactions on Geoscience and Remote Sensing 62 (Aug. 2024). DOI

Abstract

Arctic permafrost is facing significant changes due to global climate change. As these regions are largely inaccessible, remote sensing plays a crucial rule in better understanding the underlying processes across the Arctic. In this study, we focus on the remote detection of retrogressive thaw slumps (RTSs), a permafrost disturbance comparable to slow landslides. For such remote sensing tasks, deep learning has become an indispensable tool, but limited labeled training data remains a challenge for training accurate models. We present PixelDINO, a semi-supervised learning approach, to improve model generalization across the Arctic with a limited number of labels. PixelDINO leverages unlabeled data by training the model to define its own segmentation categories (pseudoclasses), promoting consistent structural learning across strong data augmentations. This allows the model to extract structural information from unlabeled data, supplementing the learning from labeled data. PixelDINO surpasses both supervised baselines and existing semi-supervised methods, achieving average intersection-over-union (IoU) of 30.2 and 39.5 on the two evaluation sets, representing significant improvements of 13% and 21%, respectively, over the strongest existing models. This highlights the potential for training robust models that generalize well to regions that were not included in the training data.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[41]

F. Xu, Y. , W. Yang, G.-S. Xia and X. Zhu.
CloudSeg: A multi-modal learning framework for robust land cover mapping under cloudy conditions.
ISPRS Journal of Photogrammetry and Remote Sensing 214 (Aug. 2024). DOI GitHub

Abstract

Cloud coverage poses a significant challenge to optical image interpretation, degrading ground information on Earth’s surface. Synthetic aperture radar (SAR), with its ability to penetrate clouds, provides supplementary information to optical data. However, existing optical-SAR fusion methods predominantly focus on cloud-free scenarios, neglecting the practical challenge of semantic segmentation under cloudy conditions. To tackle this issue, we propose CloudSeg, a novel framework tailored for land cover mapping in the presence of clouds. It addresses the challenges posed by cloud cover from two aspects: reducing semantic ambiguity in areas of the cloudy image that are obscured by clouds and enhancing effective information in the unobstructed portions. Specifically, CloudSeg employs a multi-task learning strategy to simultaneously handle low-level visual task and high-level semantic understanding task, mitigating the semantic ambiguity caused by cloud cover by acquiring discriminative features through an auxiliary cloud removal task. Additionally, CloudSeg incorporates a knowledge distillation strategy, which utilizes the knowledge learned by the teacher network under cloud-free conditions to guide the student network to overcome the interference of cloud-covered areas, enhancing the valuable information from the unobstructed parts of cloud-covered images. Extensive experiments conducted on two datasets, M3M-CR and WHU-OPT-SAR, demonstrate the effectiveness and superiority of the proposed CloudSeg method for land cover mapping under cloudy conditions. Specifically, CloudSeg outperforms the state-of-the-art competitors by 3.16% in terms of mIoU on M3M-CR and by 5.56% on WHU-OPT-SAR, highlighting its substantial advantages for analyzing regions frequently obscured by clouds.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[40]

J. Eudaric, H. Kreibich, A. Camero, K. R. Shahi, S. Martinis and X. Zhu.
A satellite imagery-driven framework for rapid resource allocation in flood scenarios to enhance loss and damage fund effectiveness.
Scientific Reports 14.19290 (Aug. 2024). DOI

Abstract

The impact of climate change and urbanization has increased the risk of flooding. During the UN Climate Change Conference 28 (COP 28), an agreement was reached to establish “The Loss and Damage Fund” to assist low-income countries impacted by climate change. However, allocating the resources required for post-flood reconstruction and reimbursement is challenging due to the limited availability of data and the absence of a comprehensive tool. Here, we propose a novel resource allocation framework based on remote sensing and geospatial data near the flood peak, such as buildings and population. The quantification of resource distribution utilizes an exposure index for each municipality, which interacts with various drivers, including flood hazard drivers, buildings exposure, and population exposure. The proposed framework asses the flood extension using pre- and post-flood Sentinel-1 Synthetic Aperture Radar (SAR) data. To demonstrate the effectiveness of this framework, an analysis was conducted on the flood that occurred in the Thessaly region of Greece in September 2023. The study revealed that the municipality of Palamas has the highest need for resource allocation, with an exposure index rating of 5/8. Any government can use this framework for rapid decision-making and to expedite post-flood recovery.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[39]

F. Fan, Y. Shi and X. Zhu.
Land Cover Classification From Sentinel-2 Images With Quantum-Classical Convolutional Neural Networks.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 17 (Jul. 2024). DOI

Abstract

Exploiting machine learning techniques to automatically classify multispectral remote sensing imagery plays a significant role in deriving changes on the Earth’s surface. However, the computation power required to manage large Earth observation data and apply sophisticated machine learning models for this analysis purpose has become an intractable bottleneck. Leveraging quantum computing provides a possibility to tackle this challenge in the future. This article focuses on land cover classification by analyzing Sentinel-2 images with quantum computing. Two hybrid quantum-classical deep learning frameworks are proposed. Both models exploit quantum computing to extract features efficiently from multispectral images and classical computing for final classification. As proof of concept, numerical simulation results on the LCZ42 dataset through the TensorFlow Quantum platform verify our models’ validity. The experiments indicate that our models can extract features more effectively compared with their classical counterparts, specifically, the convolutional neural network (CNN) model. Our models demonstrated improvements, with an average test accuracy increase of 4.5% and 3.3%, respectively, in comparison to the CNN model. In addition, our proposed models exhibit better transferability and robustness than CNN models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[38]

Z. Xiong, S. Chen, Y. Shi and X. Zhu.
Self-Supervised Pretraining With Monocular Height Estimation for Semantic Segmentation.
IEEE Transactions on Geoscience and Remote Sensing 62 (Jul. 2024). DOI GitHub

Abstract

Monocular height estimation (MHE) is key for generating 3-D city models, essential for swift disaster response. Moving beyond the traditional focus on performance enhancement, our study breaks new ground by probing the interpretability of MHE networks. We have pioneeringly discovered that neurons within MHE models demonstrate selectivity for both height and semantic classes. This insight sheds light on the complex inner workings of MHE models and inspires innovative strategies for leveraging elevation data more effectively. Informed by this insight, we propose a pioneering framework that employs MHE as a self-supervised pretraining method for remote sensing (RS) imagery. This approach significantly enhances the performance of semantic segmentation tasks. Furthermore, we develop a disentangled latent transformer (DLT) module that leverages explainable deep representations from pretrained MHE networks for unsupervised semantic segmentation. Our method demonstrates the significant potential of MHE tasks in developing foundation models for sophisticated pixel-level semantic analyses. Additionally, we present a new dataset designed to benchmark the performance of both semantic segmentation and height estimation tasks.

MCML Authors

Sining Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[37]

W. Yu, X. Zhang, S. Das, X. Zhu and P. Ghamisi.
MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification.
IEEE Transactions on Geoscience and Remote Sensing 62 (Jul. 2024). DOI GitHub

Abstract

Change detection (CD) from remote sensing (RS) images using deep learning has been widely investigated in the literature. It is typically regarded as a pixelwise labeling task that aims to classify each pixel as changed or unchanged. Although per-pixel classification networks in encoder-decoder structures have shown dominance, they still suffer from imprecise boundaries and incomplete object delineation at various scenes. For high-resolution RS images, partly or totally changed objects are more worthy of attention rather than a single pixel. Therefore, we revisit the CD task from the mask prediction and classification perspective and propose mask classification-based CD (MaskCD) to detect changed areas by adaptively generating categorized masks from input image pairs. Specifically, it utilizes a cross-level change representation perceiver (CLCRP) to learn multiscale change-aware representations and capture spatiotemporal relations from encoded features by exploiting deformable multihead self-attention (DeformMHSA). Subsequently, a masked cross-attention-based detection transformers (MCA-DETRs) decoder is developed to accurately locate and identify changed objects based on masked cross-attention and self-attention (SA) mechanisms. It reconstructs the desired changed objects by decoding the pixelwise representations into learnable mask proposals and making final predictions from these candidates. Experimental results on five benchmark datasets demonstrate the proposed approach outperforms other state-of-the-art models.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[36]

F. Fan, Y. Shi and X. Zhu.
Urban Land Cover Classification with Efficient Hybrid Quantum Machine Learning Model.
CEC 2024 - IEEE Congress on Evolutionary Computation. Yokohama, Japan, Jun 30-Jul 05, 2024. DOI

Abstract

Urban land cover classification aims to derive crucial information from earth observation data and categorize it into specific land uses. To achieve accurate classification, sophisticated machine learning models trained with large earth observation data are employed, but the required computation power has become a bottleneck. Quantum computing might tackle this challenge in the future. However, representing images into quantum states for analysis with quantum computing is challenging due to the high demand for quantum resources. To tackle this challenge, we propose a hybrid quantum neural network that can effectively represent and classify remote sensing imagery with reduced quantum resources. Our model was evaluated on the Local Climate Zone (LCZ)-based land cover classification task using the TensorFlow Quantum platform, and the experimental results indicate its validity for accurate urban land cover classification.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[35]

A. Höhl, I. Obadic, M.-Á. Fernández-Torres, D. Oliveira and X. Zhu;.
Recent Trends Challenges and Limitations of Explainable AI in Remote Sensing.
CVPR 2024 - Workshop at the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. URL

Abstract

Training deep learning models on remote sensing imagery is an increasingly popular approach for addressing pressing challenges related to urbanization extreme weather events food security deforestation or poverty reduction. Although explainable AI is getting more frequently utilized to uncover the workings of these models a comprehensive summary of how the fundamental challenges in remote sensing are tackled by explainable AI is still missing. By conducting a scoping review we identify the current works and key trends in the field. Next we relate them to recent developments and challenges in remote sensing and explainable AI. By doing so we also point to novel strategies and promising research directions such as the work on self-interpretable deep learning models and explanation evaluation.

MCML Authors

Adrian Höhl

Data Science in Earth Observation

Ivica Obadic

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[34]

I. Obadic, A. Levering, L. Pennig, D. Oliveira, D. Marcos and X. Zhu.
Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes.
CVPR 2024 - Workshop at the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, Jun 17-21, 2024. DOI

Abstract

Predicting socioeconomic indicators from satellite imagery with deep learning has become an increasingly popular research direction. Post-hoc concept-based explanations can be an important step towards broader adoption of these models in policy-making as they enable the interpretation of socioeconomic outcomes based on visual concepts that are intuitive to humans. In this paper, we study the interplay between representation learning using an additional task-specific contrastive loss and post-hoc concept explainability for socioeconomic studies. Our results on two different geographical locations and tasks indicate that the task-specific pretraining imposes a continuous ordering of the latent space embeddings according to the socioeconomic outcomes. This improves the model’s interpretability as it enables the latent space of the model to associate urban concepts with continuous intervals of socioeconomic outcomes. Further, we illustrate how analyzing the model’s conceptual sensitivity for the intervals of socioeconomic outcomes can shed light on new insights for urban studies.

MCML Authors

Ivica Obadic

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Lars Pennig

A3 | Computational Models
→ Group Niki Kilbertus

Ethics in Systems Design and Machine Learning

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[33]

C. Liu, C. M. Albrecht, Y. Wang and X. Zhu.
Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation.
IGARSS 2024 - IEEE International Geoscience and Remote Sensing Symposium. Athens, Greece, Jun 07, 2024-12, 2023. DOI

Abstract

Compared to supervised deep learning, self-supervision provides remote sensing a tool to reduce the amount of exact, human-crafted geospatial annotations. While image-level information for unsupervised pretraining efficiently works for various classification downstream tasks, the performance on pixel-level semantic segmentation lags behind in terms of model accuracy. On the contrary, many easily available label sources (e.g., automatic labeling tools and land cover land use products) exist, which can provide a large amount of noisy labels for segmentation model training. In this work, we propose to exploit noisy semantic segmentation maps for model pretraining. Our experiments provide insights on robustness per network layer. The transfer learning settings test the cases when the pretrained encoders are fine-tuned for different label classes and decoders. The results from two datasets indicate the effectiveness of task-specific supervised pretraining with noisy labels. Our findings pave new avenues to improved model accuracy and novel pretraining strategies for efficient remote sensing image segmentation.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[32]

Z. Yuan, L. Mou;, Y. Hua and X. Zhu.
Referring Image Segmentation for Remote Sensing Data.
IGARSS 2024 - IEEE International Geoscience and Remote Sensing Symposium. Athens, Greece, Jun 07, 2024-12, 2023. DOI

Abstract

In this paper, we present a new task: referring image segmentation for remote sensing data, which targets segmenting out specific objects referred to by natural language. Due to the absence of a dataset for this task, we construct a dataset based on the SkyScapes dataset. Our dataset is designed with linguistically structured expressions that focus on object categories, attributes, and spatial relationships, enabling the generation of binary masks from semantic segmentation maps. To benchmark this task, we evaluate and compare the performance of three different convolutional neural network (CNN)-based methods and a Transformer-based method. Experimental results provide valuable insights into the adaptability of these methods to remote sensing data, highlighting the potential of our dataset as a resource for the remote sensing community to further explore vision-language tasks.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[31]

Q. Zhang, Y. Wang and X. Zhu.
Deep-Learning-Based Large-Scale Forest Height Generation.
IGARSS 2024 - IEEE International Geoscience and Remote Sensing Symposium. Athens, Greece, Jun 07, 2024-12, 2023. DOI

Abstract

The vegetation height has been identified as a key biophysical parameter to justify the role of forests in the carbon cycle and ecosystem productivity. Therefore, consistent and large-scale forest height is essential for managing terrestrial ecosystems, mitigating climate change, and preventing biodiversity loss. Since spaceborne multispectral instruments, Light Detection and Ranging (LiDAR), and Synthetic Aperture Radar (SAR) have been widely used for large-scale earth observation for years, this paper explores the possibility of generating largescale and high-accuracy forest heights with the synergy of the Sentinel-1, Sentinel-2, and ICESat-2 data. A Forest Height Generative Adversarial Network (FH-GAN) is developed to retrieve forest height from Sentinel-1 and Sentinel-2 images sparsely supervised by the ICESat-2 data. This model is made up of a cascade forest height and coherence generator, where the output of the forest height generator is fed into the spatial discriminator to regularize spatial details, and the coherence generator is connected to a coherence discriminator to refine the vertical details. A progressive strategy further underpins the generator to boost the accuracy of multi-source forest height estimation. Results indicated that FH-GAN achieves the best RMSE of 2.10 m at a large scale compared with the LVIS reference and the best RMSE of 6.16 m compared with the ICESat-2 reference.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[30]

J. Guo, D. Hong, Z. Liu and X. Zhu.
Continent-wide urban tree canopy fine-scale mapping and coverage assessment in South America with high-resolution satellite images.
ISPRS Journal of Photogrammetry and Remote Sensing 212 (Jun. 2024). DOI

Abstract

Urban development in South America has experienced significant growth and transformation over the past few decades. South America’s urban development and trees are closely interconnected, and tree cover within cities plays a vital role in shaping sustainable and resilient urban landscapes. However, knowledge of urban tree canopy (UTC) coverage in the South American continent remains limited. In this study, we used high-resolution satellite images and developed a semi-supervised deep learning method to create UTC data for 888 South American cities. The proposed semi-supervised method can leverage both labeled and unlabeled data during training. By incorporating labeled data for guidance and utilizing unlabeled data to explore underlying patterns, the algorithm enhances model robustness and generalization for urban tree canopy detection across South America, with an average overall accuracy of 94.88% for the tested cities. Based on the created UTC products, we successfully assessed the UTC coverage for each city. Statistical results showed that the UTC coverage in South America is between 0.76% and 69.53%, and the average UTC coverage is approximately 19.99%. Among the 888 cities, only 357 cities that accommodate approximately 48.25% of the total population have UTC coverage greater than 20%, while the remaining 531 cities that accommodate approximately 51.75% of the total population have UTC coverage less than 20%. Natural factors (climatic and geographical) play a very important role in determining UTC coverage, followed by human activity factors (economy and urbanization level). We expect that the findings of this study and the created UTC dataset will help formulate policies and strategies to promote sustainable urban forestry, thus further improving the quality of life of residents in South America.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[29]

W. Huang, Y. Shi, Z. Xiong and X. Zhu.
Decouple and weight semi-supervised semantic segmentation of remote sensing images.
ISPRS Journal of Photogrammetry and Remote Sensing 212 (Jun. 2024). DOI GitHub

Abstract

Semantic understanding of high-resolution remote sensing (RS) images is of great value in Earth observation, however, it heavily depends on numerous pixel-wise manually-labeled data, which is laborious and thereby limits its practical application. Semi-supervised semantic segmentation (SSS) of RS images would be a promising solution, which uses both limited labeled data and dominant unlabeled data to train segmentation models, significantly mitigating the annotation burden. The current mainstream methods of remote sensing semi-supervised semantic segmentation (RS-SSS) utilize the hard or soft pseudo-labels of unlabeled data for model training and achieve excellent performance. Nevertheless, their performance is bottlenecked because of two inherent problems: irreversible wrong pseudo-labels and long-tailed distribution among classes. Aiming at them, we propose a decoupled weighting learning (DWL) framework for RS-SSS in this study, which consists of two novel modules, decoupled learning and ranking weighting, corresponding to the above two problems, respectively. During training, the decoupled learning module separates the predictions of the labeled and unlabeled data to decrease the negative impact of the self-training of the wrongly pseudo-labeled unlabeled data on the supervised training of the labeled data. Furthermore, the ranking weighting module tries to adaptively weight each pseudo-label of the unlabeled data according to its relative confidence ranking in its pseudo-class to alleviate model bias to majority classes as a result of the long-tailed distribution. To verify the effectiveness of the proposed DWL framework, extensive experiments are conducted on three widely-used RS semantic segmentation datasets in the semi-supervised setting. The experimental results demonstrate the superiority of our method to some state-of-the-art SSS methods.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[28]

C. Liu, C. Albrecht, Y. Wang and X. Zhu.
CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation.
ML4RS @ICLR 2024 - 2nd Workshop Machine Learning for Remote Sensing at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. PDF

Abstract

We study the potential of noisy labels y to pretrain semantic segmentation models in a multi-modal learning framework for geospatial applications. Specifically, we propose a novel Cross-modal Sample Selection method (CromSS) that utilizes the class distributions P^{(d)}(x,c) over pixels x and classes c modelled by multiple sensors/modalities d of a given geospatial scene. Consistency of predictions across sensors d is jointly informed by the entropy of P^{(d)}(x,c). Noisy label sampling we determine by the confidence of each sensor d in the noisy class label, P^{(d)}(x,c=y(x)). To verify the performance of our approach, we conduct experiments with Sentinel-1 (radar) and Sentinel-2 (optical) satellite imagery from the globally-sampled SSL4EO-S12 dataset. We pair those scenes with 9-class noisy labels sourced from the Google Dynamic World project for pretraining. Transfer learning evaluations (downstream task) on the DFC2020 dataset confirm the effectiveness of the proposed method for remote sensing image segmentation.

MCML Authors

Chenying Liu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[27]

S. Zhao, I. Prapas, I. Karasante, Z. Xiong, I. Papoutsis, G. Camps-Valls and X. Zhu.
Causal Graph Neural Networks for Wildfire Danger Prediction.
ML4RS @ICLR 2024 - 2nd Workshop Machine Learning for Remote Sensing at the 12th International Conference on Learning Representations (ICLR 2024). Vienna, Austria, May 07-11, 2024. PDF

Abstract

Wildfire forecasting is notoriously hard due to the complex interplay of different factors such as weather conditions, vegetation types and human activities. Deep learning models show promise in dealing with this complexity by learning directly from data. However, to inform critical decision making, we argue that we need models that are right for the right reasons; that is, the implicit rules learned should be grounded by the underlying processes driving wildfires. In that direction, we propose integrating causality with Graph Neural Networks (GNNs) that explicitly model the causal mechanism among complex variables via graph learning. The causal adjacency matrix considers the synergistic effect among variables and removes the spurious links from highly correlated impacts. Our methodology’s effectiveness is demonstrated through superior performance forecasting wildfire patterns in the European boreal and mediterranean biome. The gain is especially prominent in a highly imbalanced dataset, showcasing an enhanced robustness of the model to adapt to regime shifts in functional relationships. Furthermore, SHAP values from our trained model further enhance our understanding of the model’s inner workings.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[26]

K. Hechinger, C. Koller, X. Zhu and G. Kauermann.
Human-in-the-loop: Towards Label Embeddings for Measuring Classification Difficulty.
Preprint (May. 2024). arXiv

Abstract

Uncertainty in machine learning models is a timely and vast field of research. In supervised learning, uncertainty can already occur in the first stage of the training process, the annotation phase. This scenario is particularly evident when some instances cannot be definitively classified. In other words, there is inevitable ambiguity in the annotation step and hence, not necessarily a ‘ground truth’ associated with each instance. The main idea of this work is to drop the assumption of a ground truth label and instead embed the annotations into a multidimensional space. This embedding is derived from the empirical distribution of annotations in a Bayesian setup, modeled via a Dirichlet-Multinomial framework. We estimate the model parameters and posteriors using a stochastic Expectation Maximization algorithm with Markov Chain Monte Carlo steps. The methods developed in this paper readily extend to various situations where multiple annotators independently label instances. To showcase the generality of the proposed approach, we apply our approach to three benchmark datasets for image classification and Natural Language Inference. Besides the embeddings, we can investigate the resulting correlation matrices, which reflect the semantic similarities of the original classes very well for all three exemplary datasets.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[25]

X. Zhu, Z. Xiong, Y. Wang, A. Stewart, K. Heidler, Y. Wang, Z. Yuan, T. Dujardin, Q. Xu and Y. Shi.
On the Foundations of Earth and Climate Foundation Models.
Preprint (May. 2024). arXiv

Abstract

Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an environmental- and human-centric this http URL further shed light on the way forward to achieve the ideal model and to evaluate Earth foundation models. What comes after foundation models? Energy efficient adaptation, adversarial defenses, and interpretability are among the emerging directions.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Adam Stewart

Dr.

Data Science in Earth Observation

Qingsong Xu

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[24]

J. Eudaric, A. Camero, K. Rafiezadeh Shahi, H. Kreibich, S. Martinis and X. Zhu.
Rapid unsupervised economic assessment of urban flood damage using SAR images.
EGU 2024 - General Assembly of the European Geosciences Union. Vienna, Austria, Apr 14-19, 2024. DOI

Abstract

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[23]

Y. Mu, M. Shahzad and X. Zhu.
A spectral-spatial-temporal attention network for tree species mapping using DESIS hyperspectral imagery.
EGU 2024 - General Assembly of the European Geosciences Union. Vienna, Austria, Apr 14-19, 2024. DOI

Abstract

Accurate mapping and monitoring of forest tree species are crucial for understanding ecosystem dynamics [1], assessing biodiversity [2], and enabling sustainable forest management [3]. Tree species adapt their morphology and phenology to the environment [4], leading to variability in spectral signatures across geographic regions. Furthermore, the spectral reflectance of a given tree species varies significantly with growth stages and seasons [5], making the classification based solely on RGB data extremely challenging. At the local level, spectral variability also closely correlates with stand structure factors such as crown size, stand density, and gap sizes. This results in varying signal reflectance from different parts of the same crown, further complicating tree species classification [6]. Thus, we proposed a spectral-spatial-temporal constrained deep learning method, an end-to-end multi-head attention-based network, to automatically extract deep features for tree species mapping. Employing this model on multi-temporal hyperspectral imagery from the DLR Earth Sensing Imaging Spectrometer (DESIS), we produced a 30 m resolution forest species distribution map of the Harz Forest in Germany. DESIS, a VNIR sensor aboard the International Space Station, captures detailed Earth images upon request, offering extensive spectral data across 235 bands ranging from 400 to 1000 nm [7]. Our methodology leverages the comprehensive spectral information provided by DESIS, enhancing the tree species mapping accuracy. Utilizing the reference data from TreeSatAI Benchmark Archive [8], we prepared 134,886 hyperspectral data patches, each labelled with tree species information. The evaluation involved assessing the F1-score, Jaccard index, Hamming loss, and accuracy for various tree species using National Forest Inventory (NFI) data plots. The results reveal the potential of deep learning using hyperspectral data in the precise and automated mapping of forest tree species distribution, thereby supporting evidence-based decision-making in sustainable forest management.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[22]

C. Koller, P. Jung and X. Zhu.
Can Land Cover Classification Models Benefit From Distance-Aware Architectures?
IEEE Geoscience and Remote Sensing Magazine 21 (Apr. 2024). DOI GitHub

Abstract

The quantification of predictive uncertainties helps to understand where the existing models struggle to find the correct prediction. A useful quality control tool is the task of detecting out-of-distribution (OOD) data by examining the model’s predictive uncertainty. For this task, deterministic single forward pass frameworks have recently been established as deep learning models and have shown competitive performance in certain tasks. The unique combination of spectrally normalized weight matrices and residual connection networks with an approximate Gaussian process (GP) output layer can here offer the best trade-off between performance and complexity. We utilize this framework with a refined version that adds spectral batch normalization and an inducing points approximation of the GP for the task of OOD detection in remote sensing image classification. This is an important task in the field of remote sensing, because it provides an evaluation of how reliable the model’s predictive uncertainty estimates are. By performing experiments on the benchmark datasets Eurosat and So2Sat LCZ42, we can show the effectiveness of the proposed adaptions to the residual networks (ResNets). Depending on the chosen dataset, the proposed methodology achieves OOD detection performance up to 16% higher than previously considered distance-aware networks. Compared with other uncertainty quantification methodologies, the results are on the same level and exceed them in certain experiments by up to 2%. In particular, spectral batch normalization, which normalizes the batched data as opposed to normalizing the network weights by the spectral normalization (SN), plays a crucial role and leads to performance gains of up to 3% in every single experiment.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[21]

X. Li, C. Wen, Y. Hu, Z. Yuan and X. Zhu.
Vision-Language Models in Remote Sensing: Current progress and future trends.
IEEE Geoscience and Remote Sensing Magazine 62 (Apr. 2024). DOI

Abstract

The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-4) have sparked a wave of interest and research in the field of large language models (LLMs) for artificial general intelligence (AGI). These models provide intelligent solutions that are closer to human thinking, enabling us to use general artificial intelligence (AI) to solve problems in various applications. However, in the field of remote sensing (RS), the scientific literature on the implementation of AGI remains relatively scant. Existing AI-related research in RS focuses primarily on visual-understanding tasks while neglecting the semantic understanding of the objects and their relationships. This is where vision-LMs (VLMs) excel as they enable reasoning about images and their associated textual descriptions, allowing for a deeper understanding of the underlying semantics. VLMs can go beyond visual recognition of RS images and can model semantic relationships as well as generate natural language descriptions of the image. This makes them better suited for tasks that require both visual and textual understanding, such as image captioning and visual question answering (VQA). This article provides a comprehensive review of the research on VLMs in RS, summarizing the latest progress, highlighting current challenges, and identifying potential research opportunities. Specifically, we review the application of VLMs in mainstream RS tasks, including image captioning, text-based image generation, text-based image retrieval (TBIR), VQA, scene classification, semantic segmentation, and object detection. For each task, we analyze representative works and discuss research progress. Finally, we summarize the limitations of existing works and provide possible directions for future development. This review aims to provide a comprehensive overview of the current research progress of VLMs in RS (see Figure 1 ), and to inspire further research in this exciting and promising field.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[20]

K. Qian, Y. Wang, P. Jung, Y. Shi and X. Zhu.
HyperLISTA-ABT: An Ultralight Unfolded Network for Accurate Multicomponent Differential Tomographic SAR Inversion.
IEEE Transactions on Geoscience and Remote Sensing 62 (Apr. 2024). DOI

Abstract

Deep neural networks based on unrolled iterative algorithms have achieved remarkable success in sparse reconstruction applications, such as synthetic aperture radar (SAR) tomographic inversion (TomoSAR). However, the currently available deep learning-based TomoSAR algorithms are limited to 3-D reconstruction. The extension of deep learning-based algorithms to 4-D imaging, i.e., differential TomoSAR (D-TomoSAR) applications, is impeded mainly due to the high-dimensional weight matrices required by the network designed for D-TomoSAR inversion, which typically contain millions of freely trainable parameters. Learning such huge number of weights requires an enormous number of training samples, resulting in a large memory burden and excessive time consumption. To tackle this issue, we propose an efficient and accurate algorithm called HyperLISTA-ABT. The weights in HyperLISTA-ABT are determined in an analytical way according to a minimum coherence criterion, trimming the model down to an ultra-light one with only three hyperparameters. Additionally, HyperLISTA-ABT improves the global thresholding by utilizing an adaptive blockwise thresholding (ABT) scheme, which applies block-coordinate techniques and conducts thresholding in local blocks, so that weak expressions and local features can be retained in the shrinkage step layer by layer. Simulations were performed and demonstrated the effectiveness of our approach, showing that HyperLISTA-ABT achieves superior computational efficiency with no significant performance degradation compared to the state-of-the-art methods. Real data experiments showed that a high-quality 4-D point cloud could be reconstructed over a large area by the proposed HyperLISTA-ABT with affordable computational resources and in a fast time.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[19]

S. Doda, M. Kahl, K. Ouan, I. Obadic, Y. Wang, H. Taubenböck and X. Zhu.
Interpretable deep learning for consistent large-scale urban population estimation using Earth observation data.
International Journal of Applied Earth Observation and Geoinformation 128 (Apr. 2024). DOI

Abstract

Accurate and up-to-date mapping of the human population is fundamental for a wide range of disciplines, from effective governance and establishing policies to disaster management and crisis dilution. The traditional method of gathering population data through census is costly and time-consuming. Recently, with the availability of large amounts of Earth observation data sets, deep learning methods have been explored for population estimation; however, they are either limited by census data availability, inter-regional evaluations, or transparency. In this paper, we present an end-to-end interpretable deep learning framework for large-scale population estimation at a resolution of 1 km that uses only the publicly available data sets and does not rely on census data for inference. The architecture is based on a modification of the common ResNet-50 architecture tailored to analyze both image-like and vector-like data. Our best model outperforms the baseline random forest model by improving the RMSE by around 9% and also surpasses the community standard product, GHS-POP, thus yielding promising results. Furthermore, we improve the transparency of the proposed model by employing an explainable AI technique that identified land use information to be the most relevant feature for population estimation. We expect the improved interpretation of the model outcome will inspire both academic and non-academic end users, particularly those investigating urbanization or sub-urbanization trends, to have confidence in the deep learning methods for population estimation.

MCML Authors

Matthias Kahl

Dr.

Data Science in Earth Observation

Ivica Obadic

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[18]

J. Guo, D. Hong and X. Zhu.
High-resolution satellite images reveal the prevalent positive indirect impact of urbanization on urban tree canopy coverage in South America.
Landscape and Urban Planning 247 (Apr. 2024). DOI

Abstract

Trees in urban areas act as carbon sinks and provide ecosystem services for residents. However, the impact of urbanization on tree coverage in South America remains poorly understood. Here, we make use of very high resolution satellite imagery to derive urban tree coverage for 882 cities in South America and developed a tree coverage impacted (TCI) coefficient to quantify the direct and indirect impacts of urbanization on urban tree canopy (UTC) coverage. The direct effect refers to the change in tree cover due to the rise in urban intensity compared to scenarios with extremely low levels of urbanization, while the indirect impact refers to the change in tree coverage resulting from human management practices and alterations in urban environments. Our study revealed the negative direct impacts and prevalent positive indirect impacts of urbanization on UTC coverage. In South America, 841 cities exhibit positive indirect impacts, while only 41 cities show negative indirect impacts. The prevalent positive indirect effects can offset approximately 48% of the direct loss of tree coverage due to increased urban intensity, with full offsets achieved in Argentinian and arid regions of South America. In addition, human activity factors play the most important role in determining the indirect effects of urbanization on UTC coverage, followed by climatic and geographic factors. These findings will help us understand the impact of urbanization on UTC coverage along the urban intensity gradient and formulate policies and strategies to promote sustainable urban development in South America.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[17]

Y. Mu, J. Guo, M. Shahzad and X. Zhu.
Nationwide Tree Species Mapping in Germany with Forestformer: Assessing Forest Resilience and Unveiling Distribution Patterns.
Preprint (Apr. 2024). DOI

Abstract

Accurate information on tree species distribution is crucial for biodiversity assessment, effective forest management, and evidence-informed environmental policy-making. However, achieving high-resolution discrimination of tree species over large areas is challenging, especially in heterogeneous forest ecosystems where multiple species coexist, leading to spectral mixing and spatial complexity. To overcome these challenges, we developed a novel ForestFormer model using Sentinel-2 time series data for mapping eight dominant tree species (Beech, Oak, Other deciduous, Larch, Spruce, Pine, Fir, and Douglas fir) in Germany at 10 m resolution. ForestFormer employs a dual-branch network with attention modules in both spectral and spatial domains, enhancing classification accuracy effectively by highlighting key spectral and spatial characteristics unique to individual species. Cross validation on 9,456 National Forest Inventory (NFI) data plots indicates that the proposed ForestFormer achieves an overall average accuracy of 83.94%, outperforming several state-of-the-art methods. The developed ForestFormer model can aid in generating high-resolution tree species distribution maps for Germany, which in turn can provide crucial insights into the diverse characteristics of tree species. For instance, our analysis of results shows that the Pine is the species most resilient to disturbances, while Douglas fir is the least. Northeastern regions of Germany exhibit particularly low levels of biodiversity, especially in the states of Brandenburg and Berlin, followed by neighboring states such as Sachsen-Anhalt, Mecklenburg-Vorpommern, Sachsen, and Niedersachsen. In addition, climatic, topographic, and soil factors are shown to play a very important role in determining tree species distribution patterns, followed by human activity factors. These findings are anticipated to provide a critical basis for environmental policy formulation, particularly in forest management strategies responding to ongoing climate change.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[16]

Z. Xiong, F. Zhang, Y. Wang, Y. Shi and X. Zhu.
EarthNets: Empowering AI in Earth Observation.
Preprint (Apr. 2024). arXiv GitHub

Abstract

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[15]

X. Zhu, Q. Li, Y. Shi, Y. Wang, A. Stewart and J. Prexl.
GlobalBuildingMap -- Unveiling the Mystery of Global Buildings.
Preprint (Apr. 2024). arXiv

Abstract

Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. To this end, by using a big data analytics approach and nearly 800,000 satellite images, we generated the highest resolution and highest accuracy building map ever created: the GlobalBuildingMap (GBM). A joint analysis of building maps and solar potentials indicates that rooftop solar energy can supply the global energy consumption need at a reasonable cost. Specifically, if solar panels were placed on the roofs of all buildings, they could supply 1.1-3.3 times – depending on the efficiency of the solar device – the global energy consumption in 2020, which is the year with the highest consumption on record. We also identified a clear geospatial correlation between building areas and key socioeconomic variables, which indicates our global building map can serve as an important input to modeling global socioeconomic needs and drivers.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Adam Stewart

Dr.

Data Science in Earth Observation

[14]

Q. Li, L. Mou, Y. Sun, Y. Hua, Y. Shi and X. Zhu.
A Review of Building Extraction From Remote Sensing Imagery: Geometrical Structures and Semantic Attributes.
IEEE Transactions on Geoscience and Remote Sensing 62 (Mar. 2024). DOI

Abstract

In the remote sensing community, extracting buildings from remote sensing imagery has triggered great interest. While many studies have been conducted, a comprehensive review of these approaches that are applied to optical and synthetic aperture radar (SAR) imagery is still lacking. Therefore, we provide an in-depth review of both early efforts and recent advances, which are aimed at extracting geometrical structures or semantic attributes of buildings, including building footprint generation, building facade segmentation, roof segment and superstructure segmentation, building height retrieval, building-type classification, building change detection, and annotation data correction. Furthermore, a list of corresponding benchmark datasets is given. Finally, challenges and outlooks of existing approaches as well as promising applications are discussed to enhance comprehension within this realm of research.

MCML Authors

Yao Sun

Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[13]

Z. Yuan, L. Mou, Y. Hua and X. Zhu.
RRSIS: Referring Remote Sensing Image Segmentation.
IEEE Transactions on Geoscience and Remote Sensing 62 (Mar. 2024). DOI GitHub

Abstract

Localizing desired objects from remote sensing images is of great use in practical applications. Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images. However, almost no research attention is given to this task of remote sensing imagery. Considering its potential for real-world applications, in this article, we introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations. Specifically, we created a new dataset, called RefSegRS, for this task, enabling us to evaluate different methods. Afterward, we benchmark referring image segmentation methods of natural images on the RefSegRS dataset and find that these models show limited efficacy in detecting small and scattered objects. To alleviate this issue, we propose a language-guided cross-scale enhancement (LGCE) module that utilizes linguistic features to adaptively enhance multiscale visual features by integrating both deep and shallow features. The proposed dataset, benchmarking results, and the designed LGCE module provide insights into the design of a better RRSIS model.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[12]

Z. Xiong, Y. Wang, F. Zhang, A. J. Stewart, J. Hanna, D. Borth, I. Papoutsis, B. Le Saux, G. Camps-Valls and X. Zhu.
Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation.
Preprint (Mar. 2024). arXiv

Abstract

The development of foundation models has revolutionized our ability to interpret the Earth’s surface using satellite observational data. Traditional models have been siloed, tailored to specific sensors or data types like optical, radar, and hyperspectral, each with its own unique characteristics. This specialization hinders the potential for a holistic analysis that could benefit from the combined strengths of these diverse data sources. Our novel approach introduces the Dynamic One-For-All (DOFA) model, leveraging the concept of neural plasticity in brain science to integrate various data modalities into a single framework adaptively. This dynamic hypernetwork, adjusting to different wavelengths, enables a single versatile Transformer jointly trained on data from five sensors to excel across 12 distinct Earth observation tasks, including sensors never seen during pretraining. DOFA’s innovative design offers a promising leap towards more accurate, efficient, and unified Earth observation analysis, showcasing remarkable adaptability and performance in harnessing the potential of multimodal Earth observation data.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[11]

T. Li, K. Heidler, L. Mou, Á. Ignéczi, X. Zhu and J. L. Bamber.
A high-resolution calving front data product for marine-terminating glaciers in Svalbard.
Earth System Science Data 16.2 (Feb. 2024). DOI

Abstract

The mass loss of glaciers outside the polar ice sheets has been accelerating during the past several decades and has been contributing to global sea-level rise. However, many of the mechanisms of this mass loss process are not well understood, especially the calving dynamics of marine-terminating glaciers, in part due to a lack of high-resolution calving front observations. Svalbard is an ideal site to study the climate sensitivity of glaciers as it is a region that has been undergoing amplified climate variability in both space and time compared to the global mean. Here we present a new high-resolution calving front dataset of 149 marine-terminating glaciers in Svalbard, comprising 124 919 glacier calving front positions during the period 1985–2023 (https://doi.org/10.5281/zenodo.10407266, Li et al., 2023). This dataset was generated using a novel automated deep-learning framework and multiple optical and SAR satellite images from Landsat, Terra-ASTER, Sentinel-2, and Sentinel-1 satellite missions. The overall calving front mapping uncertainty across Svalbard is 31 m. The newly derived calving front dataset agrees well with recent decadal calving front observations between 2000 and 2020 (Kochtitzky and Copland, 2022) and an annual calving front dataset between 2008 and 2022 (Moholdt et al., 2022). The calving fronts between our product and the latter deviate by 32±65m on average. The R2 of the glacier calving front change rates between these two products is 0.98, indicating an excellent match. Using this new calving front dataset, we identified widespread calving front retreats during the past four decades, across most regions in Svalbard except for a handful of glaciers draining the ice caps Vestfonna and Austfonna on Nordaustlandet. In addition, we identified complex patterns of glacier surging events overlaid with seasonal calving cycles. These data and findings provide insights into understanding glacier calving mechanisms and drivers. This new dataset can help improve estimates of glacier frontal ablation as a component of the integrated mass balance of marine-terminating glaciers.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[10]

Y. Xie, X. Yuan, X. Zhu and J. Tian.
Multimodal Co-Learning for Building Change Detection: A Domain Adaptation Framework Using VHR Images and Digital Surface Models.
IEEE Transactions on Geoscience and Remote Sensing 62 (Feb. 2024). DOI

Abstract

In this article, we propose a multimodal co-learning framework for building change detection. This framework can be adopted to jointly train a Siamese bitemporal image network and a height difference (HDiff) network with labeled source data and unlabeled target data pairs. Three co-learning combinations (vanilla co-learning, fusion co-learning, and detached fusion co-learning) are proposed and investigated with two types of co-learning loss functions within our framework. Our experimental results demonstrate that the proposed methods are able to take advantage of unlabeled target data pairs and, therefore, enhance the performance of single-modal neural networks on the target data. In addition, our synthetic-to-real experiments demonstrate that the recently published synthetic dataset, Simulated Multimodal Aerial Remote Sensing (SMARS), is feasible to be used in real change detection scenarios, where the optimal result is with the F1 score of 79.29%.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[9]

F. Zhang, Y. Shi, Z. Xiong and X. Zhu.
Few-Shot Object Detection in Remote Sensing: Lifting the Curse of Incompletely Annotated Novel Objects.
IEEE Transactions on Geoscience and Remote Sensing 62 (Jan. 2024). DOI GitHub

Abstract

Object detection (OD) is an essential and fundamental task in computer vision (CV) and satellite image processing. Existing deep learning methods have achieved impressive performance thanks to the availability of large-scale annotated datasets. Yet, in real-world applications, the availability of labels is limited. In this article, few-shot OD (FSOD) has emerged as a promising direction, which aims at enabling the model to detect novel objects with only few of them annotated. However, many existing FSOD algorithms overlook a critical issue: when an input image contains multiple novel objects and only a subset of them are annotated, the unlabeled objects will be considered as background during training. This can cause confusions and severely impact the model’s ability to recall novel objects. To address this issue, we propose a self-training-based FSOD (ST-FSOD) approach, which incorporates the self-training mechanism into the few-shot fine-tuning process. ST-FSOD aims to enable the discovery of novel objects that are not annotated and take them into account during training. On the one hand, we devise a two-branch region proposal networks (RPNs) to separate the proposal extraction of base and novel objects. On the another hand, we incorporate the student-teacher mechanism into RPN and the region-of-interest (RoI) head to include those highly confident yet unlabeled targets as pseudolabels. Experimental results demonstrate that our proposed method outperforms the state of the art in various FSOD settings by a large margin.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[8]

S. Šćepanović, I. Obadic, S. Joglekar, L. Giustarini, C. Nattero, D. Quercia and X. Zhu.
MedSat: A Public Health Dataset for England Featuring Medical Prescriptions and Satellite Imagery.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

As extreme weather events become more frequent, understanding their impact on human health becomes increasingly crucial. However, the utilization of Earth Observation to effectively analyze the environmental context in relation to health remains limited. This limitation is primarily due to the lack of fine-grained spatial and temporal data in public and population health studies, hindering a comprehensive understanding of health outcomes. Additionally, obtaining appropriate environmental indices across different geographical levels and timeframes poses a challenge. For the years 2019 (pre-COVID) and 2020 (COVID), we collected spatio-temporal indicators for all Lower Layer Super Output Areas in England. These indicators included: i) 111 sociodemographic features linked to health in existing literature, ii) 43 environmental point features (e.g., greenery and air pollution levels), iii) 4 seasonal composite satellite images each with 11 bands, and iv) prescription prevalence associated with five medical conditions (depression, anxiety, diabetes, hypertension, and asthma), opioids and total prescriptions. We combined these indicators into a single MEDSAT dataset, the availability of which presents an opportunity for the machine learning community to develop new techniques specific to public health. These techniques would address challenges such as handling large and complex data volumes, performing effective feature engineering on environmental and sociodemographic factors, capturing spatial and temporal dependencies in the models, addressing imbalanced data distributions, developing novel computer vision methods for health modeling based on satellite imagery, ensuring model explainability, and achieving generalization beyond the specific geographical region.

MCML Authors

Ivica Obadic

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[7]

F. Xu, Y. Shi, P. Ebel, W. Yang and X. Zhu.
Multimodal and Multiresolution Data Fusion for High-Resolution Cloud Removal: A Novel Baseline and Benchmark.
IEEE Transactions on Geoscience and Remote Sensing 62 (Dec. 2023). DOI GitHub

Abstract

Cloud removal (CR) is a significant and challenging problem in remote sensing, and in recent years, there have been notable advancements in this area. However, two major issues remain hindering the development of CR: the unavailability of high-resolution imagery for existing datasets and the absence of evaluation regarding the semantic meaningfulness of the generated structures. In this article, we introduce M3R-CR, a benchmark dataset for high-resolution CR with multimodal and multiresolution data fusion. M3R-CR is the first public dataset for CR to feature globally sampled high-resolution optical observations, paired with radar measurements and pixel-level land-cover annotations. With this dataset, we consider the problem of CR in high-resolution optical remote-sensing imagery by integrating multimodal and multiresolution information. In this context, we have to take into account the alignment errors caused by the multiresolution nature, along with the more pronounced misalignment issues in high-resolution images due to inherent imaging mechanism differences and other factors. Existing multimodal data fusion-based methods, which assume the image pairs are aligned accurately at the pixel level, are thus not appropriate for this problem. To this end, we design a new baseline named Align-CR to perform the low-resolution synthetic aperture radar (SAR) image-guided high-resolution optical image CR. It gradually warps and fuses the features of the multimodal and multiresolution data during the reconstruction process, effectively mitigating concerns associated with misalignment. In the experiments, we evaluate the performance of CR by analyzing the quality of visually pleasing textures using image reconstruction (IR) metrics and further analyze the generation of semantically meaningful structures using a well-established semantic segmentation task. The proposed Align-CR method is superior to other baseline methods in both areas.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[6]

F. Fan, Y. Shi, T. Guggemos and X. Zhu.
Hybrid Quantum-Classical Convolutional Neural Network Model for Image Classification.
IEEE Transactions on Neural Networks and Learning Systems 35.12 (Dec. 2023). DOI URL

Abstract

Image classification plays an important role in remote sensing. Earth observation (EO) has inevitably arrived in the big data era, but the high requirement on computation power has already become a bottleneck for analyzing large amounts of remote sensing data with sophisticated machine learning models. Exploiting quantum computing might contribute to a solution to tackle this challenge by leveraging quantum properties. This article introduces a hybrid quantum-classical convolutional neural network (QC-CNN) that applies quantum computing to effectively extract high-level critical features from EO data for classification purposes. Besides that, the adoption of the amplitude encoding technique reduces the required quantum bit resources. The complexity analysis indicates that the proposed model can accelerate the convolutional operation in comparison with its classical counterpart. The model’s performance is evaluated with different EO benchmarks, including Overhead-MNIST, So2Sat LCZ42, PatternNet, RSI-CB256, and NaSC-TG2, through the TensorFlow Quantum platform, and it can achieve better performance than its classical counterpart and have higher generalizability, which verifies the validity of the QC-CNN model on EO data classification tasks.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[5]

T. Beker, H. Ansari, S. Montazeri, Q. Song and X. Zhu.
Deep Learning for Subtle Volcanic Deformation Detection With InSAR Data in Central Volcanic Zone.
IEEE Transactions on Geoscience and Remote Sensing 61 (Oct. 2023). DOI

Abstract

Subtle volcanic deformations point to volcanic activities, and monitoring them helps predict eruptions. Today, it is possible to remotely detect volcanic deformation in mm/year scale thanks to advances in interferometric synthetic aperture radar (InSAR). This article proposes a framework based on a deep learning model to automatically discriminate subtle volcanic deformations from other deformation types in five-year-long InSAR stacks. Models are trained on a synthetic training set. To better understand and improve the models, explainable artificial intelligence (AI) analyses are performed. In initial models, Gradient-weighted Class Activation Mapping (Grad-CAM) linked new-found patterns of slope processes and salt lake deformations to false-positive detections. The models are then improved by fine-tuning (FT) with a hybrid synthetic-real data, and additional performance is extracted by low-pass spatial filtering (LSF) of the real test set. The t-distributed stochastic neighbor embedding (t-SNE) latent feature visualization confirmed the similarity and shortcomings of the FT set, highlighting the problem of elevation components in residual tropospheric noise. After fine-tuning, all the volcanic deformations are detected, including the smallest one, Lazufre, deforming 5 mm/year. The first time confirmed deformation of Cerro El Condor is observed, deforming 9.9–17.5 mm/year. Finally, sensitivity analysis uncovered the model’s minimal detectable deformation of 2 mm/year.

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[4]

S. Chen, Y. Shi, Z. Xiong and X. Zhu.
HTC-DC Net: Monocular Height Estimation From Single Remote Sensing Images.
IEEE Transactions on Geoscience and Remote Sensing 61 (Oct. 2023). DOI GitHub

Abstract

Three-dimensional geoinformation is of great significance for understanding the living environment; however, 3-D perception from remote sensing data, especially on a large scale, is restricted, mainly due to the high costs of 3-D sensors such as light detection and ranging (LiDAR). To tackle this problem, we propose a method for monocular height estimation from optical imagery, which is currently one of the richest sources of remote sensing data. As an ill-posed problem, monocular height estimation requires well-designed networks for enhanced representations to improve the performance. Moreover, the distribution of height values is long-tailed with the low-height pixels, e.g., the background (BG), as the head, and thus, trained networks are usually biased and tend to underestimate building heights. To solve the problems, instead of formalizing the problem as a regression task, we propose HTC-DC Net following the classification–regression paradigm, with the head-tail cut (HTC) and the distribution-based constraints (DCs) as the main contributions. HTC-DC Net is composed of the backbone network as the feature extractor, the HTC-AdaBins module, and the hybrid regression process. The HTC-AdaBins module serves as the classification phase to determine bins adaptive to each input image. It is equipped with a vision transformer (ViT) encoder to incorporate local context with holistic information and involves an HTC to address the long-tailed problem in monocular height estimation for balancing the performances of foreground (FG) and BG pixels. The hybrid regression process does the regression via the smoothing of bins from the classification phase, which is trained via DCs. The proposed network is tested on three datasets of different resolutions, namely ISPRS Vaihingen (0.09 m), Data Fusion Contest 19 (DFC19) (1.3 m), and Global Building Height (GBH) (3 m). The experimental results show the superiority of the proposed network over existing methods by large margins. Extensive ablation studies demonstrate the effectiveness of each design component.

MCML Authors

Sining Chen

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

[3]

F. Zhou, X. Sun, C. Sun, J. Dong and X. Zhu.
Adaptive Morphology Filter: A Lightweight Module for Deep Hyperspectral Image Classification.
IEEE Transactions on Geoscience and Remote Sensing 61 (Oct. 2023). DOI GitHub

Abstract

Deep neural network models significantly outperform classical algorithms in the hyperspectral image (HSI) classification task. These deep models improve generalization but incur significant computational demands. This article endeavors to alleviate the computational distress in a depthwise manner through the use of morphological operations. We propose the adaptive morphology filter (AMF) to effectively extract spatial features like the conventional depthwise convolution layer. Furthermore, we reparameterize AMF into its equivalent form, i.e., a traditional binary morphology filter, which drastically reduces the number of parameters in the inference phase. Finally, we stack multiple AMFs to achieve a large receptive field and construct a lightweight AMNet for classifying HSIs. It is noteworthy that we prove the deep stack of depthwise AMFs to be equivalent to structural element decomposition. We test our model on five benchmark datasets. Experiments show that our approach outperforms state-of-the-art methods with fewer parameters (≈10k).

MCML Authors

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[2]

Q. Xu, Y. Shi, J. Bamber, Y. Tuo, R. Ludwig and X. Zhu.
Physics-aware Machine Learning Revolutionizes Scientific Paradigm for Machine Learning and Process-based Hydrology.
Preprint (Oct. 2023). arXiv GitHub

Abstract

Accurate hydrological understanding and water cycle prediction are crucial for addressing scientific and societal challenges associated with the management of water resources, particularly under the dynamic influence of anthropogenic climate change. Existing reviews predominantly concentrate on the development of machine learning (ML) in this field, yet there is a clear distinction between hydrology and ML as separate paradigms. Here, we introduce physics-aware ML as a transformative approach to overcome the perceived barrier and revolutionize both fields. Specifically, we present a comprehensive review of the physics-aware ML methods, building a structured community (PaML) of existing methodologies that integrate prior physical knowledge or physics-based modeling into ML. We systematically analyze these PaML methodologies with respect to four aspects: physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning. PaML facilitates ML-aided hypotheses, accelerating insights from big data and fostering scientific discoveries. We first conduct a systematic review of hydrology in PaML, including rainfall-runoff hydrological processes and hydrodynamic processes, and highlight the most promising and challenging directions for different objectives and PaML methods. Finally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for hydrological applications. HydroPML enhances the explainability and causality of ML and lays the groundwork for the digital water cycle’s realization.

MCML Authors

Qingsong Xu

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

C3 | Physics and Geo Sciences
→ Group Xiaoxiang Zhu

Data Science in Earth Observation

[1]

I. Obadic, R. Roscher, D. A. B. Oliveira and X. Zhu.
Exploring Self-Attention for Crop-type Classification Explainability.
Preprint (Oct. 2022). arXiv

Abstract

Automated crop-type classification using Sentinel-2 satellite time series is essential to support agriculture monitoring. Recently, deep learning models based on transformer encoders became a promising approach for crop-type classification. Using explainable machine learning to reveal the inner workings of these models is an important step towards improving stakeholders’ trust and efficient agriculture monitoring. In this paper, we introduce a novel explainability framework that aims to shed a light on the essential crop disambiguation patterns learned by a state-of-the-art transformer encoder model. More specifically, we process the attention weights of a trained transformer encoder to reveal the critical dates for crop disambiguation and use domain knowledge to uncover the phenological events that support the model performance. We also present a sensitivity analysis approach to understand better the attention capability for revealing crop-specific phenological events. We report compelling results showing that attention patterns strongly relate to key dates, and consequently, to the critical phenological events for crop-type classification. These findings might be relevant for improving stakeholder trust and optimizing agriculture monitoring processes. Additionally, our sensitivity analysis demonstrates the limitation of attention weights for identifying the important events in the crop phenology as we empirically show that the unveiled phenological events depend on the other crops in the data considered during training.

MCML Authors

Ivica Obadic

Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Data Science in Earth Observation

The landscape for empirical social scientists has transformed with the rise of computational social science. Our researchers focus on aligning research goals with available digital trace data, evaluating data quality in relation to research objectives, and ensuring reproducibility through thorough documentation. They emphasize the critical need to assess and evaluate data feeding into AI systems to prevent biases, unfair operations, and the exacerbation of social inequalities.

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Christoph Kern

Prof. Dr.

Associate

Social Data Science and AI Lab

Publications in Research Area C4

[159]

Z. Jonassen, K. Lawrence, B. M. Wiesenfeld, S. Feuerriegel and D. Mann.
A qualitative analysis of remote patient monitoring: how a paradox mindset can support balancing emotional tensions in the design of healthcare technologies.
CSCW 2025 - 28th ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing. Bergen, Norway, Oct 18-22, 2025. To be published. Preprint available. DOI

Abstract

Remote patient monitoring (RPM) is the use of digital technologies to improve patient care at a distance. However, current RPM solutions are often biased toward tech-savvy patients. To foster health equity, researchers have studied how to address the socio-economic and cognitive needs of diverse patient groups, but their emotional needs have remained largely neglected. We perform the first qualitative study to explore the emotional needs of diverse patients around RPM. Specifically, we conduct a thematic analysis of 18 interviews and 4 focus groups at a large US healthcare organization. We identify emotional needs that lead to four emotional tensions within and across stakeholder groups when applying an equity focus to the design and implementation of RPM technologies. The four emotional tensions are making diverse patients feel: (i) heard vs. exploited; (ii) seen vs. deprioritized for efficiency; (iii) empowered vs. anxious; and (iv) cared for vs. detached from care. To manage these emotional tensions across stakeholders, we develop design recommendations informed by a paradox mindset (i.e., ‘both-and’ rather than ‘and-or’ strategies).

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[158]

B. Ma, Y. Li, W. Zhou, Z. Gong, Y. J. Liu, K. Jasinskaja, A. Friedrich, J. Hirschberg, F. Kreuter and B. Plank.
Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

Understanding pragmatics-the use of language in context-is crucial for developing NLP systems capable of interpreting nuanced language use. Despite recent advances in language technologies, including large language models, evaluating their ability to handle pragmatic phenomena such as implicatures and references remains challenging. To advance pragmatic abilities in models, it is essential to understand current evaluation trends and identify existing limitations. In this survey, we provide a comprehensive review of resources designed for evaluating pragmatic capabilities in NLP, categorizing datasets by the pragmatics phenomena they address. We analyze task designs, data collection methods, evaluation approaches, and their relevance to real-world applications. By examining these resources in the context of modern language models, we highlight emerging trends, challenges, and gaps in existing benchmarks. Our survey aims to clarify the landscape of pragmatic evaluation and guide the development of more comprehensive and targeted benchmarks, ultimately contributing to more nuanced and context-aware NLP models.

MCML Authors

Bolei Ma

Social Data Science and AI

Yang Janet Liu

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Barbara Plank

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

AI and Computational Linguistics

[157]

B. Ma, B. Yoztyurk, A.-C. Haensch, X. Wang, M. Herklotz, F. Kreuter, B. Plank and M. Aßenmacher.
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. To be published. Preprint available. arXiv

Abstract

In recent research, large language models (LLMs) have been increasingly used to investigate public opinions. This study investigates the algorithmic fidelity of LLMs, i.e., the ability to replicate the socio-cultural context and nuanced opinions of human participants. Using open-ended survey data from the German Longitudinal Election Studies (GLES), we prompt different LLMs to generate synthetic public opinions reflective of German subpopulations by incorporating demographic features into the persona prompts. Our results show that Llama performs better than other LLMs at representing subpopulations, particularly when there is lower opinion diversity within those groups. Our findings further reveal that the LLM performs better for supporters of left-leaning parties like The Greens and The Left compared to other parties, and matches the least with the right-party AfD. Additionally, the inclusion or exclusion of specific variables in the prompts can significantly impact the models’ predictions. These findings underscore the importance of aligning LLMs to more effectively model diverse public opinions while minimizing political biases and enhancing robustness in representativeness.

MCML Authors

Bolei Ma

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Xinpeng Wang

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Barbara Plank

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

AI and Computational Linguistics

Matthias Aßenmacher

Dr.

Statistical Learning and Data Science

[156]

O. Kononykhina, A.-C. Haensch and F. Kreuter.
How Much Can Stratification Improve the Approximation of Shapley Values?
GeBNLP @ACL 2025 - 6th Workshop on Gender Bias in Natural Language Processing at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Vienna, Austria, Jul 27-Aug 01, 2025. To be published.

Abstract

Large Language Models (LLMs) offer promising alternatives to traditional occupational coding approaches in survey research. Using a German dataset, we examine the extent to which LLM-based occupational coding differs by gender. Our findings reveal systematic disparities: gendered job titles (e.g., “Autor” vs. “Autorin”, meaning “male author” vs. “female author”) frequently result in diverging occupation codes,
even when semantically identical. Across all models, 54%–82% of gendered inputs obtain different Top-5 suggestions. The practical impact, however, depends on the model. GPT includes the correct code most often (62%) but demonstrates female bias (up to +18 pp). IBM is less accurate (51%) but largely balanced. Alibaba, Gemini, and MiniLM achieve about 50% correct-code inclusion, and their small (< 10 pp) and direction-flipping gaps could indicate a sampling noise rather than gender bias. We discuss these findings in the context of fairness and reproducibility in NLP applications for social data.

MCML Authors

Olga Kononykhina

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[155]

O. Kononykhina and M. Schierholz.
Can Large Language Models Advance Occupational Coding? Evidence and Methodological Insights.
ESRA 2025 - 11th Conference of the European Survey Research Association. Utrecht, The Netherlands, Jul 14-18, 2025. To be published.

Abstract

Occupational coding is a critical funnel between open-ended job descriptions and the statistical frameworks that shape employment research and policies. Automatic coding tools—whether rule-based or machine learning (ML)—have streamlined the process, and demonstrate promising results. Yet, ML approaches typically require extensive, high-quality training data that exceed what a typical national survey can provide and fall under data protection constraints. This study asks whether mainstream large language models (LLMs) can serve as a viable alternative, largely bypassing the need for exhaustive training data and requiring only some coding skills and API access. We created embeddings for standardized German (Kldb) job descriptions, then used respondents’ own words (e.g., “doctor”) from a representative German survey to generate job embeddings. Cosine similarity was applied to find the five most likely occupational codes for each response. To assess performance, we compared LLM-based suggestions with those from a German ML occupational coding tool (OccuCoDe), using professional manual coding as our benchmark. Results show that in 55% of the cases, both LLM and OccuCoDe included the correct code among their top five suggestions. However, there was limited overlap: in 60% of the cases, the two tools shared at most two out of their five recommended codes. While OccuCoDe more frequently placed the correct code as the first suggestion, LLM-embeddings suggested the correct occupation in 45% of cases where OccuCoDe did not provide any result. Additionally, LLM performance was sensitive to minor changes in job descriptions (e.g., capitalisation or gendered job titles) and sometimes showed “embedding drift,” raising reproducibility concerns. Our findings highlight LLMs’ promise as a complement or substitute to other tools for occupational coding in limited training data contexts, while underscoring critical limitations that must be addressed before fully entrusting them with classifying the work we do.

MCML Authors

Olga Kononykhina

Social Data Science and AI

Malte Schierholz

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[154]

U. Fischer Abaigar, C. Kern and J. Perdomo.
The Value of Prediction in Identifying the Worst-Off.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. Spotlight Presentation. To be published. Preprint available. arXiv

Abstract

Machine learning is increasingly used in government programs to identify and support the most vulnerable individuals, prioritizing assistance for those at greatest risk over optimizing aggregate outcomes. This paper examines the welfare impacts of prediction in equity-driven contexts, and how they compare to other policy levers, such as expanding bureaucratic capacity. Through mathematical models and a real-world case study on long-term unemployment amongst German residents, we develop a comprehensive understanding of the relative effectiveness of prediction in surfacing the worst-off. Our findings provide clear analytical frameworks and practical, data-driven tools that empower policymakers to make principled decisions when designing these systems.

MCML Authors

Unai Fischer Abaigar

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI Lab

[153]

J. Schweisthal, D. Frauen, M. Schröder, K. Heß, N. Kilbertus and S. Feuerriegel.
Learning Representations of Instruments for Partial Identification of Treatment Effects.
ICML 2025 - 42nd International Conference on Machine Learning. Vancouver, Canada, Jul 13-19, 2025. To be published. Preprint available. arXiv

Abstract

Reliable estimation of treatment effects from observational data is important in many disciplines such as medicine. However, estimation is challenging when unconfoundedness as a standard assumption in the causal inference literature is violated. In this work, we leverage arbitrary (potentially high-dimensional) instruments to estimate bounds on the conditional average treatment effect (CATE). Our contributions are three-fold: (1) We propose a novel approach for partial identification through a mapping of instruments to a discrete representation space so that we yield valid bounds on the CATE. This is crucial for reliable decision-making in real-world applications. (2) We derive a two-step procedure that learns tight bounds using a tailored neural partitioning of the latent instrument space. As a result, we avoid instability issues due to numerical approximations or adversarial training. Furthermore, our procedure aims to reduce the estimation variance in finite-sample settings to yield more reliable estimates. (3) We show theoretically that our procedure obtains valid bounds while reducing estimation variance. We further perform extensive experiments to demonstrate the effectiveness across various settings. Overall, our procedure offers a novel path for practitioners to make use of potentially high-dimensional instruments (e.g., as in Mendelian randomization).

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Maresa Schröder

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Niki Kilbertus

Prof. Dr.

Ethics in Systems Design and Machine Learning

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Artificial Intelligence in Management

[152]

U. Fischer Abaigar, C. Kern and F. Kreuter.
Adjusting survey estimates with multi-accuracy post-processing.
ITACOSM 2025 - Italian Conference on Survey Methodology. Bologna, Italy, Jul 01-04, 2025. Invited talk. publish_preprint.

Abstract

With the rise of non-probability samples and new data sources, survey researchers face growing challenges related to selection bias. One emerging line of work adapts algorithmic tools from machine learning to improve robustness in such settings. This talk introduces multi-accuracy boosting (Kim et al., 2019), a post-processing method that reduces subgroup-level prediction error. Originally developed in the context of fairness, it has since been explored for use in survey adjustment tasks (Kim & Kern et al., 2022). I offer an accessible overview of the method and share reflections on its potential, and open questions for future research.

MCML Authors

Unai Fischer Abaigar

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Social Data Science and AI

[151]

S. Yuan, E. Nie, B. Ma and M. Färber.
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers.
IJCNN 2025 - International Joint Conference on Neural Networks. Rome, Italy, Jun 30-Jul 05, 2025. Preprint. arXiv

Abstract

Large Language Models (LLMs) possess outstanding capabilities in addressing various natural language processing (NLP) tasks. However, the sheer size of these models poses challenges in terms of storage, training and inference due to the inclusion of billions of parameters through layer stacking. While traditional approaches such as model pruning or distillation offer ways for reducing model size, they often come at the expense of performance retention. In our investigation, we systematically explore the approach of reducing the number of layers in LLMs. Surprisingly, we observe that even with fewer layers, LLMs maintain similar or better performance levels, particularly in prompt-based fine-tuning for text classification tasks. Remarkably, in certain cases, models with a single layer outperform their fully layered counterparts. These findings offer valuable insights for future work aimed at mitigating the size constraints of LLMs while preserving their performance, thereby opening avenues for significantly more efficient use of LLMs.

MCML Authors

Ercong Nie

Computational Linguistics

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[150]

C. Wu, B. Ma, Z. Zhang, N. Deng, Y. He and Y. Xue.
Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models.
International Journal of Machine Learning and Cybernetics (Jun. 2025). DOI

Abstract

Aspect-based sentiment analysis (ABSA), a sequence labeling task, has attracted increasing attention in multilingual contexts. While previous research has focused largely on fine-tuning or training models specifically for ABSA, we evaluate large language models (LLMs) under zero-shot conditions to explore their potential to tackle this challenge with minimal task-specific adaptation. We conduct a comprehensive empirical evaluation of a series of LLMs on multilingual ABSA tasks, investigating various prompting strategies, including vanilla zero-shot, chain-of-thought (CoT), self-improvement, self-debate, and self-consistency, across nine different models. Results indicate that while LLMs show promise in handling multilingual ABSA, they generally fall short of fine-tuned, task-specific models. Notably, simpler zero-shot prompts often outperform more complex strategies, especially in high-resource languages like English. These findings underscore the need for further refinement of LLM-based approaches to effectively address ABSA task across diverse languages.

MCML Authors

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[149]

Y. Ma, D. Frauen, E. Javurek and S. Feuerriegel.
Foundation Models for Causal Inference via Prior-Data Fitted Networks.
Preprint (Jun. 2025). arXiv

Abstract

Prior-data fitted networks (PFNs) have recently been proposed as a promising way to train tabular foundation models. PFNs are transformers that are pre-trained on synthetic data generated from a prespecified prior distribution and that enable Bayesian inference through in-context learning. In this paper, we introduce CausalFM, a comprehensive framework for training PFN-based foundation models in various causal inference settings. First, we formalize the construction of Bayesian priors for causal inference based on structural causal models (SCMs) in a principled way and derive necessary criteria for the validity of such priors. Building on this, we propose a novel family of prior distributions using causality-inspired Bayesian neural networks that enable CausalFM to perform Bayesian causal inference in various settings, including back-door, front-door, and instrumental variable adjustment. Finally, we instantiate CausalFM and explicitly train a foundation model for estimating conditional average treatment effects (CATEs) using back-door adjustment. We show that CausalFM performs competitively for CATE estimation using various synthetic and semi-synthetic benchmarks. In sum, our framework can be used as a general recipe to train foundation models for various causal inference settings. In contrast to the current state-of-the-art in causal inference, CausalFM offers a novel paradigm with the potential to fundamentally change how practitioners perform causal inference in medicine, economics, and other disciplines.

MCML Authors

Yuchen Ma

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Emil Javurek

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[148]

Y. Ma, J. Schweisthal, H. Zhang and S. Feuerriegel.
A Diffusion-Based Method for Learning the Multi-Outcome Distribution of Medical Treatments.
Preprint (Jun. 2025). arXiv

Abstract

In medicine, treatments often influence multiple, interdependent outcomes, such as primary endpoints, complications, adverse events, or other secondary endpoints. Hence, to make optimal treatment decisions, clinicians are interested in learning the distribution of multi-dimensional treatment outcomes. However, the vast majority of machine learning methods for predicting treatment effects focus on single-outcome settings, despite the fact that medical data often include multiple, interdependent outcomes. To address this limitation, we propose a novel diffusion-based method called DIME to learn the joint distribution of multiple outcomes of medical treatments. We addresses three challenges relevant in medical practice: (i)it is tailored to learn the joint interventional distribution of multiple medical outcomes, which enables reliable decision-making with uncertainty quantification rather than relying solely on point estimates; (ii)it explicitly captures the dependence structure between outcomes; (iii)it can handle outcomes of mixed type, including binary, categorical, and continuous variables. In DIME, we take into account the fundamental problem of causal inference through causal masking. For training, our method decomposes the joint distribution into a series of conditional distributions with a customized conditional masking to account for the dependence structure across outcomes. For inference, our method auto-regressively generates predictions. This allows our method to move beyond point estimates of causal quantities and thus learn the joint interventional distribution. To the best of our knowledge, DIME is the first neural method tailored to learn the joint, multi-outcome distribution of medical treatments. Across various experiments, we demonstrate that our method effectively learns the joint distribution and captures shared information among multiple outcomes.

MCML Authors

Yuchen Ma

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[147]

S. Ball, S. Allmendinger, F. Kreuter and N. Kühl.
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction.
AAPOR 2025 - AAPOR 80th Annual Conference on Reshaping Democracy’s Oracle: TransForming Polls, Surveys, and the Measurement of Public Opinion in the Age of Al. St. Louis, MO, USA, May 14-16, 2025. To be published. Preprint available. arXiv

Abstract

Generative AI (GenAI) is increasingly used in survey contexts to simulate human preferences. While many research endeavors evaluate the quality of synthetic GenAI data by comparing model-generated responses to gold-standard survey results, fundamental questions about the validity and reliability of using LLMs as substitutes for human respondents remain. Our study provides a technical analysis of how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs) and evaluates their suitability for survey-based predictions. Using 14 different models, we find that LLM-generated data fails to replicate the variance observed in real-world human responses, particularly across demographic subgroups. In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data. Moreover, we show that prompt sensitivity can significantly alter outputs for some models, further undermining the stability and predictiveness of LLM-based simulations. As a key contribution, we adapt a probe-based methodology that reveals how LLMs encode political affiliations in their latent space, exposing the systematic distortions introduced by these models. Our findings highlight critical limitations in AI-generated survey data, urging caution in its use for public opinion research, social science experimentation, and computational behavioral modeling.

MCML Authors

Sarah Ball

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[146]

O. Kononykhina.
How ML-Filtered Answer Options Shape Responses and Interactions in CATI Surveys.
AAPOR 2025 - AAPOR 80th Annual Conference on Reshaping Democracy’s Oracle: TransForming Polls, Surveys, and the Measurement of Public Opinion in the Age of Al. St. Louis, MO, USA, May 14-16, 2025. To be published. Preprint available. URL

Abstract

Occupational coding has historically been a manual, post-survey task, but tools like OccuCoDe are shifting this process into real-time surveys using machine learning (ML). OccuCoDe dynamically filters and presents tailored answer options, allowing respondents themselves to select the description that best matches their occupation. However, our study revealed low agreement between such respondent-driven ML-based coding and post-survey manual coding, prompting us to explore how the quality of responses in automatic occupational coding relates to the quality of answer options, respondent and interviewer behaviors. We embedded OccuCoDe into a standard monthly multi-topic survey conducted by the Institute for Applied Social Science (INFAS) from 1 April to 31 June 2019. The survey was designed as a cross-sectional and panel survey with a 30:70 ratio for panel and new respondents, resulting in a representative sample of adults in Germany aged 18 and older. We received and analyzed 669 audio recordings through behavioral coding. Results showed that the quality of ML-generated suggestions significantly influenced classification accuracy, with highly accurate suggestion leading to better alignment with manual coding. Contrary to expectations, behavioral factors such as interviewer adherence to scripts or respondent mapping or comprehension issues were not the significant drivers of mismatches. Instead, familiar survey dynamics persisted: respondents often interrupted when they identified an option they liked, or interviewers skipped certain categories (e.g., ‘Other’). These findings suggest that while integrating ML or other AI tools into surveys is potentially fruitful, the key to success lies in refining the precision and distinctiveness of answer options. We also demonstrate that, although both respondents and interviewers showed adaptability to the presence of an automatic component, their behaviors could not overcome mismatches caused by limitations in ML-generated suggestions. In occupational coding—and potentially other survey domains—the effectiveness of real-time ML/AI integration depends on aligning algorithmic outputs with respondent realities to achieve high-quality data.

MCML Authors

Olga Kononykhina

Social Data Science and AI

[145]

C. Kern, U. Fischer-Abaigar, J. Schweisthal, D. Frauen, R. Ghani, S. Feuerriegel, M. van der Schaar and F. Kreuter.
Algorithms for reliable decision-making need causal reasoning.
Nature Computational Science 5 (May. 2025). DOI

Abstract

Decision-making inherently involves cause–effect relationships that introduce causal challenges. We argue that reliable algorithms for decision-making need to build upon causal reasoning. Addressing these causal challenges requires explicit assumptions about the underlying causal structure to ensure identifiability and estimatability, which means that the computational methods must successfully align with decision-making objectives in real-world tasks. Algorithmic decision-making (ADM) has become common in a wide range of domains, including precision medicine, manufacturing, education, hiring, the public sector, and smart cities. At the core of ADM systems are data-driven models that learn from data to recommend decisions, often with the goal of maximizing a defined utility function1. For example, in smart city contexts, ADM is frequently used to optimize traffic flow through predictive models that analyze real-time data, thereby reducing congestion and improving urban mobility. Another prominent application area for ADM are normative decision support systems (often subsumed under ‘prescriptive analytics’) or, more recently, artificial intelligence (AI) agents that either inform or automatically execute managerial and operational decisions in industry. Yet, the applications of ADM to high-stakes decisions face safety and reliability issues1,2,3. Often, the objectives of ADM systems fail to align with the nuanced goals of real-world decision-making, thus creating a tension between the potential of ADM and the risk of harm and failure. Especially when deployed in dynamic, real-world environments, ADM can amplify systemic disadvantages for vulnerable communities and lead to flawed decisions. In this Comment, we argue that reliable algorithmic decision-making — systems that perform safely and robustly under deployment conditions — must be grounded in causal reasoning.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI Lab

Jonas Schweisthal

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[144]

R. L. Bach and C. Kern.
Fairness, Justice, and Social Inequality in Machine Learning.
Preprint (May. 2025). DOI

Abstract

As machine learning (ML) systems increasingly shape decision-making across crucial societal domains, the discourse around fairness in algorithmic systems (fairML) has intensified. Although fairML research is rapidly expanding, contributions from social science, particularly sociology, remain limited. This chapter aims to address this gap by examining fairness in ML through a sociological lens, focusing on the interplay between algorithmic decision-making and social inequality. We argue that fairML frameworks must explicitly distinguish technical fairness—focused on unbiased predictions—from normative justice, which addresses broader ethical and distributive considerations. We identify and discuss five key challenges confronting fairML today: (1) clearly separating fairness and justice, (2) developing more sophisticated measures of vulnerability and protected attributes, (3) incorporating historical disadvantage and social origin into fairness evaluations, (4) assessing unintended social consequences of algorithmic interventions, and (5) empirically investigating stakeholder preferences toward AI systems. By highlighting these sociologically informed challenges, this chapter advocates for a more holistic, context-sensitive approach to algorithmic fairness. Ultimately, our analysis proposes a sociologically grounded research agenda aimed at critically assessing and enhancing the role of fairML in either perpetuating or alleviating social inequalities.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI Lab

[143]

D. Frauen, V. Melnychuk, J. Schweisthal, M. van der Schaar and S. Feuerriegel.
Treatment Effect Estimation for Optimal Decision-Making.
Preprint (May. 2025). arXiv

Abstract

Decision-making across various fields, such as medicine, heavily relies on conditional average treatment effects (CATEs). Practitioners commonly make decisions by checking whether the estimated CATE is positive, even though the decision-making performance of modern CATE estimators is poorly understood from a theoretical perspective. In this paper, we study optimal decision-making based on two-stage CATE estimators (e.g., DR-learner), which are considered state-of-the-art and widely used in practice. We prove that, while such estimators may be optimal for estimating CATE, they can be suboptimal when used for decision-making. Intuitively, this occurs because such estimators prioritize CATE accuracy in regions far away from the decision boundary, which is ultimately irrelevant to decision-making. As a remedy, we propose a novel two-stage learning objective that retargets the CATE to balance CATE estimation error and decision performance. We then propose a neural method that optimizes an adaptively-smoothed approximation of our learning objective. Finally, we confirm the effectiveness of our method both empirically and theoretically. In sum, our work is the first to show how two-stage CATE estimators can be adapted for optimal decision-making.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[142]

D. Frauen, M. Schröder, K. Hess and S. Feuerriegel.
Orthogonal Survival Learners for Estimating Heterogeneous Treatment Effects from Time-to-Event Data.
Preprint (May. 2025). arXiv

Abstract

Estimating heterogeneous treatment effects (HTEs) is crucial for personalized decision-making. However, this task is challenging in survival analysis, which includes time-to-event data with censored outcomes (e.g., due to study dropout). In this paper, we propose a toolbox of novel orthogonal survival learners to estimate HTEs from time-to-event data under censoring. Our learners have three main advantages: (i) we show that learners from our toolbox are guaranteed to be orthogonal and thus come with favorable theoretical properties; (ii) our toolbox allows for incorporating a custom weighting function, which can lead to robustness against different types of low overlap, and (iii) our learners are model-agnostic (i.e., they can be combined with arbitrary machine learning models). We instantiate the learners from our toolbox using several weighting functions and, as a result, propose various neural orthogonal survival learners. Some of these coincide with existing survival learners (including survival versions of the DR- and R-learner), while others are novel and further robust w.r.t. low overlap regimes specific to the survival setting (i.e., survival overlap and censoring overlap). We then empirically verify the effectiveness of our learners for HTE estimation in different low-overlap regimes through numerical experiments. In sum, we provide practitioners with a large toolbox of learners that can be used for randomized and observational studies with censored time-to-event data.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Maresa Schröder

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[141]

N. Holzner, S. Maier and S. Feuerriegel.
Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis.
Preprint (May. 2025). arXiv

Abstract

Generative artificial intelligence (GenAI) is increasingly used to support a wide range of human tasks, yet empirical evidence on its effect on creativity remains scattered. Can GenAI generate ideas that are creative? To what extent can it support humans in generating ideas that are both creative and diverse? In this study, we conduct a meta-analysis to evaluate the effect of GenAI on the performance in creative tasks. For this, we first perform a systematic literature search, based on which we identify n = 28 relevant studies (m = 8214 participants) for inclusion in our meta-analysis. We then compute standardized effect sizes based on Hedges’ g. We compare different outcomes: (i) how creative GenAI is; (ii) how creative humans augmented by GenAI are; and (iii) the diversity of ideas by humans augmented by GenAI. Our results show no significant difference in creative performance between GenAI and humans (g = -0.05), while humans collaborating with GenAI significantly outperform those working without assistance (g = 0.27). However, GenAI has a significant negative effect on the diversity of ideas for such collaborations between humans and GenAI (g = -0.86). We further analyze heterogeneity across different GenAI models (e.g., GPT-3.5, GPT-4), different tasks (e.g., creative writing, ideation, divergent thinking), and different participant populations (e.g., laypeople, business, academia). Overall, our results position GenAI as an augmentative tool that can support, rather than replace, human creativity-particularly in tasks benefiting from ideation support.

MCML Authors

Sebastian Maier

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[140]

A. Karamolegkou, A. Borah, E. Cho, S. R. Choudhury, M. Galletti, R. Ghosh, P. Gupta, O. Ignat, P. Kargupta, N. Kotonya, H. Lamba, S.-J. Lee, A. Mangla, I. Mondal, D. Nazarova, P. Nemkova, D. Pisarevskaya, N. Rizwan, N. Sabri, D. Stammbach, A. Steinberg, D. Tomás, S. R. Wilson, B. Yi, J. H. Zhu, A. Zubiaga, A. Søgaard, A. Fraser, Z. Jin, R. Mihalcea, J. R. Tetreault and D. Dementieva.
NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment.
Preprint (May. 2025). arXiv

Abstract

Recent advancements in large language models (LLMs) have unlocked unprecedented possibilities across a range of applications. However, as a community, we believe that the field of Natural Language Processing (NLP) has a growing need to approach deployment with greater intentionality and responsibility. In alignment with the broader vision of AI for Social Good (Tomašev et al., 2020), this paper examines the role of NLP in addressing pressing societal challenges. Through a cross-disciplinary analysis of social goals and emerging risks, we highlight promising research directions and outline challenges that must be addressed to ensure responsible and equitable progress in NLP4SG research.

MCML Authors

Anna Steinberg

Social Data Science and AI

Alexander Fraser

Prof. Dr.

B2 | Natural Language Processing
→ Group Alexander Fraser

Data Analytics & Statistics

Daryna Dementieva

Dr.

Data Analytics & Statistics

[139]

M. Schröder, J. Hartenstein and S. Feuerriegel.
PrivATE: Differentially Private Confidence Intervals for Average Treatment Effects.
Preprint (May. 2025). arXiv

Abstract

The average treatment effect (ATE) is widely used to evaluate the effectiveness of drugs and other medical interventions. In safety-critical applications like medicine, reliable inferences about the ATE typically require valid uncertainty quantification, such as through confidence intervals (CIs). However, estimating treatment effects in these settings often involves sensitive data that must be kept private. In this work, we present PrivATE, a novel machine learning framework for computing CIs for the ATE under differential privacy. Specifically, we focus on deriving valid privacy-preserving CIs for the ATE from observational data. Our PrivATE framework consists of three steps: (i) estimating a differentially private ATE through output perturbation; (ii) estimating the differentially private variance through a truncated output perturbation mechanism; and (iii) constructing the CIs while accounting for the uncertainty from both the estimation and privatization steps. Our PrivATE framework is model agnostic, doubly robust, and ensures valid CIs. We demonstrate the effectiveness of our framework using synthetic and real-world medical datasets. To the best of our knowledge, we are the first to derive a general, doubly robust framework for valid CIs of the ATE under (ε, δ)-differential privacy.

MCML Authors

Maresa Schröder

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[138]

I. Sen, B. Ma, G. Ahnert, A.-C. Haensch, T. Holtdirk, F. Kreuter and M. Strohmaier.
Connecting Natural Language Processing and Survey Methodology: Potentials, Challenges, and Open Questions.
Preprint (May. 2025). DOI

Abstract

Recent generative AI technologies, particularly Large Language Models (LLMs), have increased interest in Natural Language Processing (NLP) methods for scientists and practitioners across disciplines. In this position paper, we highlight one such discipline — survey methodology, which not only uses more and more NLP techniques, e.g., using LLMs to simulate survey respondents, but also stands to benefit NLP, e.g., informing the design of NLP annotation and evaluation tasks. We argue for increasing synergies between NLP and Survey Methodology to realize the potential at their intersection. We also outline challenges that impede progress on these potential synergies and present 10 open questions to encourage further reflection.

MCML Authors

Bolei Ma

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Tobias Holtdirk

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[137]

B. Ma, C. A. Huang and A.-C. Haensch.
Can Large Language Models Advance Crosswalks? The Case of Danish Occupation Codes.
SRW @NAACL 2025 - Student Research Workshop at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025). Albuquerque, NM, USA, Apr 29-May 04, 2025. URL

Abstract

Crosswalks, which map one classification system to another, are critical tools for harmonizing data across time, countries, or frameworks. However, constructing crosswalks is labor-intensive and often requires domain expertise. This paper investigates the potential of Large Language Models (LLMs) to assist in creating crosswalks, focusing on two Danish occupational classification systems from different time periods as a case study. We propose a two-stage, prompt-based framework for this task, where LLMs perform similarity assessments between classification codes and identify final mappings through a guided decision process. Using four instruction-tuned LLMs and comparing them against an embedding-based baseline, we evaluate the performance of different models in crosswalks. Our results highlight the strengths of LLMs in crosswalk creation compared to the embedding-based baseline, showing the effectiveness of the interactive prompt-based framework for conducting crosswalks by LLMs. Furthermore, we analyze the impact of model combinations across two interactive rounds, highlighting the importance of model selection and consistency. This work contributes to the growing field of NLP applications for domain-specific knowledge mapping and demonstrates the potential of LLMs in advancing crosswalk methodologies.

MCML Authors

Bolei Ma

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[136]

D. Geißler, A. Maarouf and S. Feuerriegel.
Analyzing User Characteristics of Hate Speech Spreaders on Social Media.
WWW 2025 - ACM Web Conference. Sydney, Australia, Apr 28-May 02, 2025. To be published. Preprint available. arXiv

Abstract

Hate speech on social media threatens the mental and physical well-being of individuals and contributes to real-world violence. Resharing is an important driver behind the spread of hate speech on social media. Yet, little is known about who reshares hate speech and what their characteristics are. In this paper, we analyze the role of user characteristics in hate speech resharing across different types of hate speech (e.g., political hate). For this, we proceed as follows: First, we cluster hate speech posts using large language models to identify different types of hate speech. Then we model the effects of user attributes on users’ probability to reshare hate speech using an explainable machine learning model. To do so, we apply debiasing to control for selection bias in our observational social media data and further control for the latent vulnerability of users to hate speech. We find that, all else equal, users with fewer followers, fewer friends, fewer posts, and older accounts share more hate speech. This shows that users with little social influence tend to share more hate speech. Further, we find substantial heterogeneity across different types of hate speech. For example, racist and misogynistic hate is spread mostly by users with little social influence. In contrast, political anti-Trump and anti-right-wing hate is reshared by users with larger social influence. Overall, understanding the factors that drive users to share hate speech is crucial for detecting individuals at risk of engaging in harmful behavior and for designing effective mitigation strategies.

MCML Authors

Dominique Geißler

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Abdurahman Maarouf

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Artificial Intelligence in Management

[135]

J. Simson, F. Draxler, S. Mehr and C. Kern.
Preventing Harmful Data Practices by Using Participatory Input to Navigate the Machine Learning Multiverse.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

In light of inherent trade-offs regarding fairness, privacy, interpretability and performance, as well as normative questions, the machine learning (ML) pipeline needs to be made accessible for public input, critical reflection and engagement of diverse stakeholders. In this work, we introduce a participatory approach to gather
input from the general public on the design of an ML pipeline. We show how people’s input can be used to navigate and constrain the multiverse of decisions during both model development and evaluation. We highlight that central design decisions should be democratized rather than “optimized” to acknowledge their critical impact on the system’s output downstream. We describe the iterative development of our approach and its exemplary implementation on a citizen science platform. Our results demonstrate how public participation can inform critical design decisions along the model-building pipeline and combat widespread lazy data practices.

MCML Authors

Jan Simson

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI Lab

[134]

K. Forster, V. Wagner, L. Keil, M. A. Müller, T. Sellhorn and S. Feuerriegel.
Tracking ESG Disclosures of European Companies with Retrieval-Augmented Generation.
Climate Change AI @ICLR 2025 - Workshop on Tackling Climate Change with Machine Learning at the 13th International Conference on Learning Representations (ICLR 2025). Singapore, Apr 24-28, 2025. To be published.

Abstract

Corporations play a crucial role in mitigating climate change and accelerating progress toward environmental, social, and governance (ESG) objectives. However, structured information on the current state of corporate ESG efforts remains limited. In this paper, we propose a machine learning framework based on a retrieval-augmented generation (RAG) pipeline to track ESG indicators from N = 9, 200 corporate reports. Our analysis includes ESG indicators from 600 of the largest listed corporations in Europe between 2014 and 2023. We focus on two key dimensions: first, we identify gaps in corporate sustainability reporting in light of existing standards. Second, we provide comprehensive bottom-up estimates of key ESG indicators across European industries. Our findings enable policymakers and financial markets to effectively assess corporate ESG transparency and track progress toward global sustainability objectives.

MCML Authors

Kerstin Forster

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[133]

D. Frauen, K. Heß and S. Feuerriegel.
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Estimating heterogeneous treatment effects (HTEs) over time is crucial in many disciplines such as personalized medicine. For example, electronic health records are commonly collected over several time periods and then used to personalize treatment decisions. Existing works for this task have mostly focused on model-based learners (i.e., learners that adapt specific machine-learning models). In contrast, model-agnostic learners – so-called meta-learners – are largely unexplored. In our paper, we propose several meta-learners that are model-agnostic and thus can be used in combination with arbitrary machine learning models (e.g., transformers) to estimate HTEs over time. Here, our focus is on learners that can be obtained via weighted pseudo-outcome regressions, which allows for efficient estimation by targeting the treatment effect directly. We then provide a comprehensive theoretical analysis that characterizes the different learners and that allows us to offer insights into when specific learners are preferable. Finally, we confirm our theoretical insights through numerical experiments. In sum, while meta-learners are already state-of-the-art for the static setting, we are the first to propose a comprehensive set of meta-learners for estimating HTEs in the time-varying setting.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[132]

K. Heß and S. Feuerriegel.
Stabilized Neural Prediction of Potential Outcomes in Continuous Time.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Patient trajectories from electronic health records are widely used to predict potential outcomes of treatments over time, which then allows to personalize care. Yet, existing neural methods for this purpose have a key limitation: while some adjust for time-varying confounding, these methods assume that the time series are recorded in discrete time. In other words, they are constrained to settings where measurements and treatments are conducted at fixed time steps, even though this is unrealistic in medical practice. In this work, we aim to predict potential outcomes in continuous time. The latter is of direct practical relevance because it allows for modeling patient trajectories where measurements and treatments take place at arbitrary, irregular timestamps. We thus propose a new method called stabilized continuous time inverse propensity network (SCIP-Net). For this, we further derive stabilized inverse propensity weights for robust prediction of the potential outcomes. To the best of our knowledge, our SCIP-Net is the first neural method that performs proper adjustments for time-varying confounding in continuous time.

MCML Authors

Konstantin Heß

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[131]

C. Kern, M. P. Kim and A. Zhou.
Multi-Accurate CATE is Robust to Unknown Covariate Shifts.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Estimating heterogeneous treatment effects is important to tailor treatments to those individuals who would most likely benefit. However, conditional average treatment effect predictors may often be trained on one population but possibly deployed on different, possibly unknown populations. We use methodology for learning multi-accurate predictors to post-process CATE T-learners (differenced regressions) to become robust to unknown covariate shifts at the time of deployment. The method works in general for pseudo-outcome regression, such as the DR-learner. We show how this approach can combine (large) confounded observational and (smaller) randomized datasets by learning a confounded predictor from the observational dataset, and auditing for multi-accuracy on the randomized controlled trial. We show improvements in bias and mean squared error in simulations with increasingly larger covariate shift, and on a semi-synthetic case study of a parallel large observational study and smaller randomized controlled experiment. Overall, we establish a connection between methods developed for multi-distribution learning and achieve appealing desiderata (e.g. external validity) in causal inference and machine learning.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI Lab

[130]

M. Schröder, V. Melnychuk and S. Feuerriegel.
Differentially private learners for heterogeneous treatment effects.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. URL

Abstract

Patient data is widely used to estimate heterogeneous treatment effects and understand the effectiveness and safety of drugs. Yet, patient data includes highly sensitive information that must be kept private. In this work, we aim to estimate the conditional average treatment effect (CATE) from observational data under differential privacy. Specifically, we present DP-CATE, a novel framework for CATE estimation that is doubly robust and ensures differential privacy of the estimates. For this, we build upon non-trivial tools from semi-parametric and robust statistics to exploit the connection between privacy and model robustness. Our framework is highly general and applies to any two-stage CATE meta-learner with a Neyman-orthogonal loss function. It can be used with all machine learning models employed for nuisance estimation. We further provide an extension of DP-CATE where we employ RKHS regression to release the complete doubly robust CATE function while ensuring differential privacy. We demonstrate the effectiveness of DP-CATE across various experiments using synthetic and real-world datasets. To the best of our knowledge, we are the first to provide a framework for CATE estimation that is doubly robust and differentially private.

MCML Authors

Maresa Schröder

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[129]

Y. Wang, M. Schröder, D. Frauen, J. Schweisthal, K. Heß and S. Feuerriegel.
Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets.
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. To be published. Preprint available. arXiv

Abstract

Constructing confidence intervals (CIs) for the average treatment effect (ATE) from patient records is crucial to assess the effectiveness and safety of drugs. However, patient records typically come from different hospitals, thus raising the question of how multiple observational datasets can be effectively combined for this purpose. In our paper, we propose a new method that estimates the ATE from multiple observational datasets and provides valid CIs. Our method makes little assumptions about the observational datasets and is thus widely applicable in medical practice. The key idea of our method is that we leverage prediction-powered inferences and thereby essentially `shrink’ the CIs so that we offer more precise uncertainty quantification as compared to naïve approaches. We further prove the unbiasedness of our method and the validity of our CIs. We confirm our theoretical results through various numerical experiments. Finally, we provide an extension of our method for constructing CIs from combinations of experimental and observational datasets.

MCML Authors

Yuxin Wang

Artificial Intelligence in Management

Maresa Schröder

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[128]

A. Maarouf, S. Feuerriegel and N. Pröllochs.
A fused large language model for predicting startup success.
European Journal of Operational Research 322.1 (Apr. 2025). DOI

Abstract

Investors are continuously seeking profitable investment opportunities in startups and, hence, for effective decision-making, need to predict a startup’s probability of success. Nowadays, investors can use not only various fundamental information about a startup (e.g., the age of the startup, the number of founders, and the business sector) but also textual description of a startup’s innovation and business model, which is widely available through online venture capital (VC) platforms such as Crunchbase. To support the decision-making of investors, we develop a machine learning approach with the aim of locating successful startups on VC platforms. Specifically, we develop, train, and evaluate a tailored, fused large language model to predict startup success. Thereby, we assess to what extent self-descriptions on VC platforms are predictive of startup success. Using 20,172 online profiles from Crunchbase, we find that our fused large language model can predict startup success, with textual self-descriptions being responsible for a significant part of the predictive power. Our work provides a decision support tool for investors to find profitable investment opportunities.

MCML Authors

Abdurahman Maarouf

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[127]

L. von der Heyde, A.-C. Haensch and A. Wenz.
Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion.
Social Science Computer Review Online First (Apr. 2025). DOI

Abstract

‘Synthetic samples’ generated by large language models (LLMs) have been argued to complement or replace traditional surveys, assuming their training data is grounded in human-generated data that potentially reflects attitudes and behaviors prevalent in the population. Initial US-based studies that have prompted LLMs to mimic survey respondents found that the responses match survey data. However, the relationship between the respective target population and LLM training data might affect the generalizability of such findings. In this paper, we critically evaluate the use of LLMs for public opinion research in a different context, by investigating whether LLMs can estimate vote choice in Germany. We generate a synthetic sample matching the 2017 German Longitudinal Election Study respondents and ask the LLM GPT-3.5 to predict each respondent’s vote choice. Comparing these predictions to the survey-based estimates on the aggregate and subgroup levels, we find that GPT-3.5 exhibits a bias towards the Green and Left parties. While the LLM predictions capture the tendencies of “typical” voters, they miss more complex factors of vote choice. By examining the LLM-based prediction of voting behavior in a non-English speaking context, our study contributes to research on the extent to which LLMs can be leveraged for studying public opinion. The findings point to disparities in opinion representation in LLMs and underscore the limitations in applying them for public opinion estimation.

MCML Authors

Leah von der Heyde

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[126]

D. Martens, G. Shmueli, T. Evgeniou, K. Bauer, C. Janiesch, S. Feuerriegel, S. Gabel, S. Goethals, T. Greene, N. Klein, M. Kraus, N. Kühl, C. Perlich, W. Verbeke, A. Zharova, P. Zschech and F. Provost.
Beware of 'Explanations' of AI.
Preprint (Apr. 2025). arXiv

Abstract

Understanding the decisions made and actions taken by increasingly complex AI system remains a key challenge. This has led to an expanding field of research in explainable artificial intelligence (XAI), highlighting the potential of explanations to enhance trust, support adoption, and meet regulatory standards. However, the question of what constitutes a ‘good’ explanation is dependent on the goals, stakeholders, and context. At a high level, psychological insights such as the concept of mental model alignment can offer guidance, but success in practice is challenging due to social and technical factors. As a result of this ill-defined nature of the problem, explanations can be of poor quality (e.g. unfaithful, irrelevant, or incoherent), potentially leading to substantial risks. Instead of fostering trust and safety, poorly designed explanations can actually cause harm, including wrong decisions, privacy violations, manipulation, and even reduced AI adoption. Therefore, we caution stakeholders to beware of explanations of AI: while they can be vital, they are not automatically a remedy for transparency or responsible AI adoption, and their misuse or limitations can exacerbate harm. Attention to these caveats can help guide future research to improve the quality and impact of AI explanations.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[125]

D. Bär, N. Pröllochs and S. Feuerriegel.
The role of social media ads for election outcomes: Evidence from the 2021 German election.
PNAS Nexus.pgaf073 (Mar. 2025). DOI

Abstract

Social media ads have become a key communication channel in politics. However, the relationship between political ads from social media and election outcomes is not fully understood. Here, we aim to estimate the association between online political advertising and election outcomes during the 2021 German federal election. For this, we analyze a large-scale dataset of 21,641 political ads from Facebook and Instagram that received ≈126 million impressions. Using regression analysis, we show that political advertising on social media has a positive relationship with a candidate’s election outcome and may even sway elections. All else equal, ≈200,000 additional impressions are predicted to increase a candidate’s votes by 2.1%. We further use a causal sensitivity analysis to evaluate how unobserved confounding may affect our estimates. We find that the estimated impact of ads cannot be reasonably explained away, highlighting the significance of social media for election outcomes.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[124]

Abstract

MCML Authors

Boj Friedrich Hoppe

Dr.

Clinical Data Science in Radiology

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[123]

D. Tschernutter and S. Feuerriegel.
Data-driven dynamic police patrolling: An efficient Monte Carlo tree search.
European Journal of Operational Research 321.1 (Feb. 2025). DOI

Abstract

Crime is responsible for major financial losses and serious harm to the well-being of individuals, and, hence, a crucial task of police operations is effective patrolling. Yet, in existing decision models aimed at police operations, microscopic routing decisions from patrolling are not considered, and, furthermore, the objective is limited to surrogate metrics (e. g., response time) instead of crime prevention. In this paper, we thus formalize the decision problem of dynamic police patrolling as a Markov decision process that models microscopic routing decisions, so that the expected number of prevented crimes are maximized. We experimentally show that standard solution approaches for our decision problem are not scalable to real-world settings. As a remedy, we present a tailored and highly efficient Monte Carlo tree search algorithm. We then demonstrate our algorithm numerically using real-world crime data from Chicago and show that the decision-making by our algorithm offers significant improvements for crime prevention over patrolling tactics from current practice. Informed by our results, we finally discuss implications for improving the patrolling tactics in police operations.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Artificial Intelligence in Management

[122]

H. Funk, R. Ludwig, H. Küchenhoff and T. Nagler.
Modelling Climate Variables at High Temporal Resolution.
Preprint (Feb. 2025). DOI

Abstract

Large ensembles of climate models are indispensable for analyzing natural climate variability and estimating the occurrence of rare extreme events. Many hydrometeorological applications—such as compound event analysis, return period estimation, weather forecasting, downscaling, and bias correction—rely on an accurate representation of the multivariate distribution of climate variables. However, at high temporal resolutions, variables like precipitation often exhibit significant zero-inflation and heavy-tailed distributions. This inflation propagates through the entire multivariate dependence structure, complicating the relationships between zero-inflated and non-inflated variables. Inadequate modeling and correction of these dependencies can substantially degrade the reliability of hydrometeorological methodologes.
In an earlier work, we developed a novel multivariate density decomposition for zero inflated variables based on vine copulas. This method has been integrated into multivariate Vine Copula Bias Correction for partially zero-inflated margins (VBC), with potential applications in other fields facing high-resolution climate data challenges. We resume the idea behind VBC and illustrate it’s advantages to other bias correction methods. This highlights the interpretability and the advantages of control and assessment of the results generated by VBC.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Thomas Nagler

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computational Statistics & Data Science

[121]

K. Heß, D. Frauen, V. Melnychuk and S. Feuerriegel.
Efficient and Sharp Off-Policy Learning under Unobserved Confounding.
Preprint (Feb. 2025). arXiv

Abstract

We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no unobserved factors influence both treatment assignment and outcomes. However, this assumption is often violated, because of which standard policy learning produces biased estimates and thus leads to policies that can be harmful. To address this limitation, we employ causal sensitivity analysis and derive a statistically efficient estimator for a sharp bound on the value function under unobserved confounding. Our estimator has three advantages: (1) Unlike existing works, our estimator avoids unstable minimax optimization based on inverse propensity weighted outcomes. (2) Our estimator is statistically efficient. (3) We prove that our estimator leads to the optimal confounding-robust policy. Finally, we extend our theory to the related task of policy improvement under unobserved confounding, i.e., when a baseline policy such as the standard of care is available. We show in experiments with synthetic and real-world data that our method outperforms simple plug-in approaches and existing baselines. Our method is highly relevant for decision-making where unobserved confounding can be problematic, such as in healthcare and public policy.

MCML Authors

Konstantin Heß

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[120]

V. Melnychuk, D. Frauen, J. Schweisthal and S. Feuerriegel.
Orthogonal Representation Learning for Estimating Causal Quantities.
Preprint (Feb. 2025). arXiv

Abstract

Representation learning is widely used for estimating causal quantities (e.g., the conditional average treatment effect) from observational data. While existing representation learning methods have the benefit of allowing for end-to-end learning, they do not have favorable theoretical properties of Neyman-orthogonal learners, such as double robustness and quasi-oracle efficiency. Also, such representation learning methods often employ additional constraints, like balancing, which may even lead to inconsistent estimation. In this paper, we propose a novel class of Neyman-orthogonal learners for causal quantities defined at the representation level, which we call OR-learners. Our OR-learners have several practical advantages: they allow for consistent estimation of causal quantities based on any learned representation, while offering favorable theoretical properties including double robustness and quasi-oracle efficiency. In multiple experiments, we show that, under certain regularity conditions, our OR-learners improve existing representation learning methods and achieve state-of-the-art performance. To the best of our knowledge, our OR-learners are the first work to offer a unified framework of representation learning methods and Neyman-orthogonal learners for causal quantities estimation.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[119]

C. Wu, B. Ma, N. Deng, Y. He and Y. Xue.
Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis.
Preprint (Feb. 2025). arXiv

Abstract

Aspect-based sentiment analysis (ABSA) is a sequence labeling task that has garnered growing research interest in multilingual contexts. However, recent studies lack more robust feature alignment and finer aspect-level alignment. In this paper, we propose a novel framework, Multi-Scale and Multi-Objective optimization (MSMO) for cross-lingual ABSA. During multi-scale alignment, we achieve cross-lingual sentence-level and aspect-level alignment, aligning features of aspect terms in different contextual environments. Specifically, we introduce code-switched bilingual sentences into the language discriminator and consistency training modules to enhance the model’s robustness. During multi-objective optimization, we design two optimization objectives: supervised training and consistency training, aiming to enhance cross-lingual semantic alignment. To further improve model performance, we incorporate distilled knowledge of the target language into the model. Results show that MSMO significantly enhances cross-lingual ABSA by achieving state-of-the-art performance across multiple languages and models.

MCML Authors

Bolei Ma

Social Data Science and AI

[118]

C. Wu, B. Ma, Y. Liu, Z. Zhang, N. Deng, Y. Li, B. Chen, Y. Zhang, B. Plank and Y. Xue.
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis.
Preprint (Feb. 2025). arXiv

Abstract

Aspect-based sentiment analysis (ABSA) is a crucial task in information extraction and sentiment analysis, aiming to identify aspects with associated sentiment elements in text. However, existing ABSA datasets are predominantly English-centric, limiting the scope for multilingual evaluation and research. To bridge this gap, we present M-ABSA, a comprehensive dataset spanning 7 domains and 21 languages, making it the most extensive multilingual parallel dataset for ABSA to date. Our primary focus is on triplet extraction, which involves identifying aspect terms, aspect categories, and sentiment polarities. The dataset is constructed through an automatic translation process with human review to ensure quality. We perform extensive experiments using various baselines to assess performance and compatibility on M-ABSA. Our empirical findings highlight that the dataset enables diverse evaluation tasks, such as multilingual and multi-domain transfer learning, and large language model evaluation, underscoring its inclusivity and its potential to drive advancements in multilingual ABSA research.

MCML Authors

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Yihong Liu

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Barbara Plank

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

AI and Computational Linguistics

[117]

J. Beck, L. M. Kemeter, K. Dürrbeck, M. H. I. Abdalla and F. Kreuter.
Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18 (Jan. 2025). DOI

Abstract

High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have
traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of Large Language Models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyses examine the annotation quality loss between the expert and other annotators. This comparison is conducted through (1) descriptive analyses, (2) fitting linear probability models, and (3) comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering and the task-specificity of expertise.

MCML Authors

Jacob Beck

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[116]

E. Achterhold, M. Mühlböck, N. Steiber and C. Kern.
Fairness in Algorithmic Profiling: The AMAS Case.
Minds and Machines 35.9 (Jan. 2025). DOI

Abstract

We study a controversial application of algorithmic profiling in the public sector, the Austrian AMAS system. AMAS was supposed to help caseworkers at the Public Employment Service (PES) Austria to allocate support measures to job seekers based on their predicted chance of (re-)integration into the labor market. Shortly after its release, AMAS was criticized for its apparent unequal treatment of job seekers based on gender and citizenship. We systematically investigate the AMAS model using a novel real-world dataset of young job seekers from Vienna, which allows us to provide the first empirical evaluation of the AMAS model with a focus on fairness measures. We further apply bias mitigation strategies to study their effectiveness in our real-world setting. Our findings indicate that the prediction performance of the AMAS model is insufficient for use in practice, as more than 30% of job seekers would be misclassified in our use case. Further, our results confirm that the original model is biased with respect to gender as it tends to (incorrectly) assign women to the group with high chances of re-employment, which is not prioritized in the PES’ allocation of support measures. However, most bias mitigation strategies were able to improve fairness without compromising performance and thus may form an important building block in revising profiling schemes in the present context.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[115]

S. Feuerriegel, A. Maarouf, D. Bär, D. Geißler, J. Schweisthal, N. Pröllochs, C. E. Robertson, S. Rathje, J. Hartmann, S. M. Mohammad, O. Netzer, A. A. Siegel, B. Plank and J. J. Van Bavel.
Using natural language processing to analyse text data in behavioural science.
Nature Reviews Psychology (Jan. 2025). DOI

Abstract

Language is a uniquely human trait at the core of human interactions. The language people use often reflects their personality, intentions and state of mind. With the integration of the Internet and social media into everyday life, much of human communication is documented as written text. These online forms of communication (for example, blogs, reviews, social media posts and emails) provide a window into human behaviour and therefore present abundant research opportunities for behavioural science. In this Review, we describe how natural language processing (NLP) can be used to analyse text data in behavioural science. First, we review applications of text data in behavioural science. Second, we describe the NLP pipeline and explain the underlying modelling approaches (for example, dictionary-based approaches and large language models). We discuss the advantages and disadvantages of these methods for behavioural science, in particular with respect to the trade-off between interpretability and accuracy. Finally, we provide actionable recommendations for using NLP to ensure rigour and reproducibility.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Abdurahman Maarouf

Artificial Intelligence in Management

Dominik Bär

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dominique Geißler

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[114]

M. Gorski, S. Wiegrebe, R. Burkhardt, M. Behr, H. Küchenhoff, K. J. Stark, C. A. Böger and I. M. Heid.
Bias-corrected serum creatinine from UK Biobank electronic medical records generates an important data resource for kidney function trajectories.
Scientific Reports 15.3540 (Jan. 2025). DOI

Abstract

Loss of kidney function is a substantial personal and public health burden. Kidney function is typically assessed as estimated glomerular filtration rate (eGFR) based on serum creatinine. UK Biobank provides serum creatinine measurements from study center assessments (SC, n = 425,147 baseline, n = 15,314 with follow-up) and emerging electronic Medical Records (eMR, ‘GP-clinical’) present a promising resource to augment this data longitudinally. However, it is unclear whether eMR-based and SC-based creatinine values can be used jointly for research on eGFR decline. When comparing eMR-based with SC-based creatinine by calendar year (n = 70,231), we found a year-specific multiplicative bias for eMR-based creatinine that decreased over time (factor 0.84 for 2007, 0.97 for 2013). Deriving eGFR based on SC- and bias-corrected eMR-creatinine yielded 454,907 individuals with ≥ 1eGFR assessment (2,102,174 assessments). This included 206,063 individuals with ≥ 2 assessments over up to 60.2 years (median 6.00 assessments, median time = 8.7 years), where we also obtained eMR-based information on kidney disease or renal replacement therapy. We found an annual eGFR decline of 0.11 (95%-CI = 0.10–0.12) versus 1.04 mL/min/1.73m2/year (9%-CI = 1.03–1.05) without and with bias-correction, the latter being in line with literature. In summary, our bias-corrected eMR-based creatinine values enabled a 4-fold increased number of eGFR assessments in UK Biobank suitable for kidney function research.

MCML Authors

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

[113]

M. Aleksic, T. Ehring, A. Kunze, Y. Han, H. Funk and L. Wolkenstein.
Selective Effects of Eye Movement Desensitization and Reprocessing, Imagery Rescripting and Imaginal Exposure on Voluntary and Involuntary Memory of an Aversive Autobiographical Event.
Preprint (Jan. 2025). DOI

Abstract

Clinical theories suggest that trauma-focused interventions reduce intrusive memories while preserving voluntary recall. However, concerns persist that they may inadvertently compromise factual memory content. To test these contrasting predictions, we examined the effects of Eye Movement Desensitization and Reprocessing (EMDR), Imagery Rescripting (ImRs), Imaginal Exposure (IE), on involuntary and voluntary memories of an aversive autobiographical event. Healthy participants (N = 182), recruited between 2021 and 2023, completed a free recall task before receiving either one of the interventions or no intervention (NIC). One week later, the recall task was repeated. Intrusion load and frequency were assessed with an app-diary; psychophysiological responses to intrusions were assessed in a laboratory task. Independent raters evaluated disorganization, coherence, consistency of voluntary memory. All interventions reduced intrusion load, but only ImRs decreased intrusion frequency compared to NIC. Psychophysiological responses to intrusions showed no group differences. IE improved the structural organization of voluntary memory by reducing disorganized thoughts, while EMDR and ImRs enhanced conceptual organization by increasing contextual coherence. None of the interventions impaired memory consistency, with no group differences in contradictions or omissions. These findings suggest that these interventions reduce distressing intrusions without compromising voluntary memory. Further research should replicate these effects in clinical samples.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

[112]

S. Eckman, B. Ma, C. Kern, R. Chew, B. Plank and F. Kreuter.
Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR).
Preprint (Jan. 2025). arXiv

Abstract

Models trained on crowdsourced labels may not reflect broader population views when annotator pools are not representative. Since collecting representative labels is challenging, we propose Population-Aligned Instance Replication (PAIR), a method to address this bias through statistical adjustment. Using a simulation study of hate speech and offensive language detection, we create two types of annotators with different labeling tendencies and generate datasets with varying proportions of the types. Models trained on unbalanced annotator pools show poor calibration compared to those trained on representative data. However, PAIR, which duplicates labels from underrepresented annotator groups to match population proportions, significantly reduces bias without requiring new data collection. These results suggest statistical techniques from survey research can help align model training with target populations even when representative annotator pools are unavailable. We conclude with three practical recommendations for improving training data quality.

MCML Authors

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[111]

Y. Feng, S. Feuerriegel and Y. R. Shrestha.
Contextualizing Recommendation Explanations with LLMs: A User Study.
Preprint (Jan. 2025). arXiv

Abstract

Large language models (LLMs) are increasingly prevalent in recommender systems, where LLMs can be used to generate personalized recommendations. Here, we examine how different LLM-generated explanations for movie recommendations affect users’ perceptions of cognitive, affective, and utilitarian needs and consumption intentions. In a pre-registered, between-subject online experiment (N=759) and follow-up interviews (N=30), we compare (a) LLM-generated generic explanations, and (b) LLM-generated contextualized explanations. Our findings show that contextualized explanations (i.e., explanations that incorporate users’ past behaviors) effectively meet users’ cognitive needs while increasing users’ intentions to watch recommended movies. However, adding explanations offers limited benefits in meeting users’ utilitarian and affective needs, raising concerns about the proper design and implications of LLM-generated explanations. Qualitative insights from interviews reveal that referencing users’ past preferences enhances trust and understanding but can feel excessive if overused. Furthermore, users with more active and positive engagement with the recommender system and movie-watching get substantial gains from contextualized explanations. Overall, our research clarifies how LLM-generated recommendations influence users’ motivations and behaviors, providing valuable insights for the future development of user-centric recommender systems, a key element in social media platforms and online ecosystems.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[110]

O. Kononykhina, M. Schierholz and F. Kreuter.
The Impact of Question Framing on the Precision of Automatic Occupation Coding.
Preprint (Jan. 2025). arXiv

Abstract

Occupational data play a vital role in research, official statistics, and policymaking, yet their collection and accurate classification remain a persistent challenge. This study investigates the effects of occupational question wording on data variability and the performance of automatic coding tools. Through a series of survey experiments conducted and replicated in Germany, we tested two widely-used occupational question formats: one focusing on ‘job title’ (Berufsbezeichnung) and another on ‘occupational tasks’ (berufliche Tätigkeit). Our analysis reveals that automatic coding tools, such as CASCOT and OccuCoDe, exhibit significant sensitivity to the form and origin of the data. Specifically, these tools performed more efficiently when coding responses to the job title question format compared to the occupational task format. Additionally, we found that including examples of main tasks and duties in the questions led respondents to provide more detailed but less linguistically diverse responses. This reduced diversity may negatively affect the precision of automatic coding. These findings highlight the importance of tailoring automatic coding tools to the specific structure and origin of the data they are applied to. We emphasize the need for further research to optimize question design and coding tools for greater accuracy and applicability in occupational data collection.

MCML Authors

Olga Kononykhina

Social Data Science and AI

Malte Schierholz

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI

[109]

Y. Ma, V. Melnychuk, J. Schweisthal and S. Feuerriegel.
DiffPO: A causal diffusion model for learning distributions of potential outcomes.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Predicting potential outcomes of interventions from observational data is crucial for decision-making in medicine, but the task is challenging due to the fundamental problem of causal inference. Existing methods are largely limited to point estimates of potential outcomes with no uncertain quantification; thus, the full information about the distributions of potential outcomes is typically ignored. In this paper, we propose a novel causal diffusion model called DiffPO, which is carefully designed for reliable inferences in medicine by learning the distribution of potential outcomes. In our DiffPO, we leverage a tailored conditional denoising diffusion model to learn complex distributions, where we address the selection bias through a novel orthogonal diffusion loss. Another strength of our DiffPO method is that it is highly flexible (e.g., it can also be used to estimate different causal quantities such as CATE). Across a wide range of experiments, we show that our method achieves state-of-the-art performance.

MCML Authors

Yuchen Ma

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[108]

V. Melnychuk, S. Feuerriegel and M. van der Schaar.
Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner.
NeurIPS 2024 - 38th Conference on Neural Information Processing Systems. Vancouver, Canada, Dec 10-15, 2024. URL

Abstract

Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the individualized (covariate-conditional) level, namely, the conditional distribution of the treatment effect (CDTE). Unlike average causal quantities, the CDTE is not point identifiable without strong additional assumptions. As a remedy, we employ partial identification to obtain sharp bounds on the CDTE and thereby quantify the aleatoric uncertainty of the treatment effect. We then develop a novel, orthogonal learner for the bounds on the CDTE, which we call AU-learner. We further show that our AU-learner has several strengths in that it satisfies Neyman-orthogonality and is doubly robust. Finally, we propose a fully-parametric deep learning instantiation of our AU-learner.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Artificial Intelligence in Management

[107]

U. Fischer Abaigar, C. Kern, N. Barda and F. Kreuter.
Bridging the gap: Towards an expanded toolkit for AI-driven decision-making in the public sector.
Government Information Quarterly 41.4 (Dec. 2024). DOI

Abstract

AI-driven decision-making systems are becoming instrumental in the public sector, with applications spanning areas like criminal justice, social welfare, financial fraud detection, and public health. While these systems offer great potential benefits to institutional decision-making processes, such as improved efficiency and reliability, these systems face the challenge of aligning machine learning (ML) models with the complex realities of public sector decision-making. In this paper, we examine five key challenges where misalignment can occur, including distribution shifts, label bias, the influence of past decision-making on the data side, as well as competing objectives and human-in-the-loop on the model output side. Our findings suggest that standard ML methods often rely on assumptions that do not fully account for these complexities, potentially leading to unreliable and harmful predictions. To address this, we propose a shift in modeling efforts from focusing solely on predictive accuracy to improving decision-making outcomes. We offer guidance for selecting appropriate modeling frameworks, including counterfactual prediction and policy learning, by considering how the model estimand connects to the decision-maker’s utility. Additionally, we outline technical methods that address specific challenges within each modeling approach. Finally, we argue for the importance of external input from domain experts and stakeholders to ensure that model assumptions and design choices align with real-world policy objectives, taking a step towards harmonizing AI and public sector objectives.

MCML Authors

Unai Fischer Abaigar

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI

[106]

J. Senoner, S. Schallmoser, B. Kratzwald, S. Feuerriegel and T. Netland.
Explainable AI improves task performance in human–AI collaboration.
Scientific Reports 14.31150 (Dec. 2024). DOI

Abstract

Artificial intelligence (AI) provides considerable opportunities to assist human work. However, one crucial challenge of human-AI collaboration is that many AI algorithms operate in a black-box manner where the way how the AI makes predictions remains opaque. This makes it difficult for humans to validate a prediction made by AI against their own domain knowledge. For this reason, we hypothesize that augmenting humans with explainable AI as a decision aid improves task performance in human-AI collaboration. To test this hypothesis, we analyze the effect of augmenting domain experts with explainable AI in the form of visual heatmaps. We then compare participants that were either supported by (a) black-box AI or (b) explainable AI, where the latter supports them to follow AI predictions when the AI is accurate or overrule the AI when the AI predictions are wrong. We conducted two preregistered experiments with representative, real-world visual inspection tasks from manufacturing and medicine. The first experiment was conducted with factory workers from an electronics factory, who performed N=9,600 assessments of whether electronic products have defects. The second experiment was conducted with radiologists, who performed N=5,650 assessments of chest X-ray images to identify lung lesions. The results of our experiments with domain experts performing real-world tasks show that task performance improves when participants are supported by explainable AI instead of black-box AI. For example, in the manufacturing setting, we find that augmenting participants with explainable AI (as opposed to black-box AI) leads to a five-fold decrease in the median error rate of human decisions, which gives a significant improvement in task performance.

MCML Authors

Simon Schallmoser

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[105]

B. Ma, X. Wang, T. Hu, A.-C. Haensch, M. A. Hedderich, B. Plank and F. Kreuter.
The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models.
EMNLP 2024 - Findings of the Conference on Empirical Methods in Natural Language Processing. Miami, FL, USA, Nov 12-16, 2024. DOI

Abstract

Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may capture and convey. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOVs). However, measuring AOVs embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of clarity on how different studies are related to each other and how they can be interpreted. This paper aims to bridge this gap by providing a comprehensive overview of recent works on the evaluation of AOVs in LLMs. Moreover, we survey related approaches in different stages of the evaluation pipeline in these works. By doing so, we address the potential and challenges with respect to understanding the model, human-AI alignment, and downstream application in social sciences. Finally, we provide practical insights into evaluation methods, model enhancement, and interdisciplinary collaboration, thereby contributing to the evolving landscape of evaluating AOVs in LLMs.

MCML Authors

Bolei Ma

Social Data Science and AI

Xinpeng Wang

B2 | Natural Language Processing
→ Group Barbara Plank

AI and Computational Linguistics

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Michael Hedderich

Dr.

AI and Computational Linguistics

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[104]

A. Bashardoust, S. Feuerriegel and Y. R. Shrestha.
Comparing the Willingness to Share for Human-generated vs. AI-generated Fake News.
CSCW 2024 - 27th ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing. San José, Costa Rica, Nov 09-13, 2024. DOI

Abstract

Generative artificial intelligence (AI) presents large risks for society when it is used to create fake news. A crucial factor for fake news to go viral on social media is that users share such content. Here, we aim to shed light on the sharing behavior of users across human-generated vs. AI-generated fake news. Specifically, we study: (1) What is the perceived veracity of human-generated fake news vs. AI-generated fake news? (2) What is the user’s willingness to share human-generated fake news vs. AI-generated fake news on social media? (3) What socio-economic characteristics let users fall for AI-generated fake news? To this end, we conducted a pre-registered, online experiment with N= 988 subjects and 20 fake news from the COVID-19 pandemic generated by GPT-4 vs. humans. Our findings show that AI-generated fake news is perceived as less accurate than human-generated fake news, but both tend to be shared equally. Further, several socio-economic factors explain who falls for AI-generated fake news.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[103]

D. Geißler and S. Feuerriegel.
Analyzing the Strategy of Propaganda using Inverse Reinforcement Learning: Evidence from the 2022 Russian Invasion of Ukraine.
CSCW 2024 - 27th ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing. San José, Costa Rica, Nov 09-13, 2024. DOI

Abstract

The 2022 Russian invasion of Ukraine was accompanied by a large-scale, pro-Russian propaganda campaign on social media. However, the strategy behind the dissemination of propaganda has remained unclear, particularly how the online discourse was strategically shaped by the propagandists’ community. Here, we analyze the strategy of the Twitter community using an inverse reinforcement learning (IRL) approach. Specifically, IRL allows us to model online behavior as a Markov decision process, where the goal is to infer the underlying reward structure that guides propagandists when interacting with users with a supporting or opposing stance toward the invasion. Thereby, we aim to understand empirically whether and how between-user interactions are strategically used to promote the proliferation of Russian propaganda. For this, we leverage a large-scale dataset with 349,455 posts with pro-Russian propaganda from 132,131 users. We show that bots and humans follow a different strategy: bots respond predominantly to pro-invasion messages, suggesting that they seek to drive virality; while messages indicating opposition primarily elicit responses from humans, suggesting that they tend to engage in critical discussions. To the best of our knowledge, this is the first study analyzing the strategy behind propaganda from the 2022 Russian invasion of Ukraine through the lens of IRL.

MCML Authors

Dominique Geißler

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[102]

A. Maarouf, N. Pröllochs and S. Feuerriegel.
The Virality of Hate Speech on Social Media.
CSCW 2024 - 27th ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing. San José, Costa Rica, Nov 09-13, 2024. DOI

Abstract

Online hate speech is responsible for violent attacks such as, e.g., the Pittsburgh synagogue shooting in 2018, thereby posing a significant threat to vulnerable groups and society in general. However, little is known about what makes hate speech on social media go viral. In this paper, we collect N = 25,219 cascades with 65,946 retweets from X (formerly known as Twitter) and classify them as hateful vs. normal. Using a generalized linear regression, we then estimate differences in the spread of hateful vs. normal content based on author and content variables. We thereby identify important determinants that explain differences in the spreading of hateful vs. normal content. For example, hateful content authored by verified users is disproportionally more likely to go viral than hateful content from non-verified ones: hateful content from a verified user (as opposed to normal content) has a 3.5 times larger cascade size, a 3.2 times longer cascade lifetime, and a 1.2 times larger structural virality. Altogether, we offer novel insights into the virality of hate speech on social media.

MCML Authors

Abdurahman Maarouf

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[101]

C. Kern, R. Bach, H. Mautner and F. Kreuter.
When Small Decisions Have Big Impact: Fairness Implications of Algorithmic Profiling Schemes.
ACM Journal on Responsible Computing (Nov. 2024). DOI

Abstract

Algorithmic profiling is increasingly used in the public sector with the hope of allocating limited public resources more effectively and objectively. One example is the prediction-based profiling of job seekers to guide the allocation of support measures by public employment services. However, empirical evaluations of potential side-effects such as unintended discrimination and fairness concerns are rare in this context. We systematically compare and evaluate statistical models for predicting job seekers’ risk of becoming long-term unemployed concerning subgroup prediction performance, fairness metrics, and vulnerabilities to data analysis decisions. Focusing on Germany as a use case, we evaluate profiling models under realistic conditions using large-scale administrative data. We show that despite achieving high prediction performance on average, profiling models can be considerably less accurate for vulnerable social subgroups. In this setting, different classification policies can have very different fairness implications. We therefore call for rigorous auditing processes before such models are put to practice.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI

[100]

D. Bär, A. Maarouf and S. Feuerriegel.
Generative AI may backfire for counterspeech.
Preprint (Nov. 2024). arXiv

Abstract

Online hate speech poses a serious threat to individual well-being and societal cohesion. A promising solution to curb online hate speech is counterspeech. Counterspeech is aimed at encouraging users to reconsider hateful posts by direct replies. However, current methods lack scalability due to the need for human intervention or fail to adapt to the specific context of the post. A potential remedy is the use of generative AI, specifically large language models (LLMs), to write tailored counterspeech messages. In this paper, we analyze whether contextualized counterspeech generated by state-of-the-art LLMs is effective in curbing online hate speech. To do so, we conducted a large-scale, pre-registered field experiment (N=2,664) on the social media platform Twitter/X. Our experiment followed a 2x2 between-subjects design and, additionally, a control condition with no counterspeech. On the one hand, users posting hateful content on Twitter/X were randomly assigned to receive either (a) contextualized counterspeech or (b) non-contextualized counterspeech. Here, the former is generated through LLMs, while the latter relies on predefined, generic messages. On the other hand, we tested two counterspeech strategies: (a) promoting empathy and (b) warning about the consequences of online misbehavior. We then measured whether users deleted their initial hateful posts and whether their behavior changed after the counterspeech intervention (e.g., whether users adopted a less toxic language). We find that non-contextualized counterspeech employing a warning-of-consequence strategy significantly reduces online hate speech. However, contextualized counterspeech generated by LLMs proves ineffective and may even backfire.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Abdurahman Maarouf

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[99]

P. Janetzky, T. Schlagenhauf and S. Feuerriegel.
Slowing Down Forgetting in Continual Learning.
Preprint (Nov. 2024). arXiv

Abstract

A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods to slow down forgetting. We further demonstrate the performance gain from our framework across a large series of experiments, including different CL scenarios (class incremental, domain incremental, task incremental learning) different datasets (MNIST, CIFAR10), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.

MCML Authors

Pascal Janetzky

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Artificial Intelligence in Management

[98]

X. Wang, C. Hu, B. Ma, P. Rottger and B. Plank.
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think.
COLM 2024 - Conference on Language Modeling. Philadelphia, PA, USA, Oct 07-09, 2024. PDF

Abstract

Multiple choice questions (MCQs) are commonly used to evaluate the capabilities of large language models (LLMs). One common way to evaluate the model response is to rank the candidate answers based on the log probability of the first token prediction. An alternative way is to examine the text output. Prior work has shown that first token probabilities lack robustness to changes in MCQ phrasing, and that first token probabilities do not match text answers for instruction-tuned models. Therefore, in this paper, we investigate the robustness of text answers. We show that the text answers are more robust to question perturbations than the first token probabilities, when the first token answers mismatch the text answers. The difference in robustness increases as the mismatch rate becomes greater. As the mismatch reaches over 50%, the text answer is more robust to option order changes than the debiased first token probabilities using state-of-the-art debiasing methods such as PriDe. Our findings provide further evidence for the benefits of text answer evaluation over first token probability evaluation.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

[97]

P. O. Schenk and C. Kern.
Connecting algorithmic fairness to quality dimensions in machine learning in official statistics and survey production.
AStA Wirtschafts- und Sozialstatistisches Archiv 18 (Oct. 2024). DOI

Abstract

National Statistical Organizations (NSOs) increasingly draw on Machine Learning (ML) to improve the timeliness and cost-effectiveness of their products. When introducing ML solutions, NSOs must ensure that high standards with respect to robustness, reproducibility, and accuracy are upheld as codified, e.g., in the Quality Framework for Statistical Algorithms (QF4SA; Yung et al. 2022, Statistical Journal of the IAOS). At the same time, a growing body of research focuses on fairness as a pre-condition of a safe deployment of ML to prevent disparate social impacts in practice. However, fairness has not yet been explicitly discussed as a quality aspect in the context of the application of ML at NSOs. We employ the QF4SA quality framework and present a mapping of its quality dimensions to algorithmic fairness. We thereby extend the QF4SA framework in several ways: First, we investigate the interaction of fairness with each of these quality dimensions. Second, we argue for fairness as its own, additional quality dimension, beyond what is contained in the QF4SA so far. Third, we emphasize and explicitly address data, both on its own and its interaction with applied methodology. In parallel with empirical illustrations, we show how our mapping can contribute to methodology in the domains of official statistics, algorithmic fairness, and trustworthy machine learning.

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[96]

C. Kern, M. P. Kim and A. Zhou.
Multi-Accurate CATE is Robust to Unknown Covariate Shifts.
Transactions on Machine Learning Research (Oct. 2024). URL

Abstract

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Social Data Science and AI Lab

[95]

H. Funk, R. Ludwig, H. Küchenhoff and T. Nagler.
Towards more realistic climate model outputs: A multivariate bias correction based on zero-inflated vine copulas.
Preprint (Oct. 2024). arXiv

Abstract

Climate model large ensembles are an essential research tool for analysing and quantifying natural climate variability and providing robust information for rare extreme events. The models simulated representations of reality are susceptible to bias due to incomplete understanding of physical processes. This paper aims to correct the bias of five climate variables from the CRCM5 Large Ensemble over Central Europe at a 3-hourly temporal resolution. At this high temporal resolution, two variables, precipitation and radiation, exhibit a high share of zero inflation. We propose a novel bias-correction method, VBC (Vine copula bias correction), that models and transfers multivariate dependence structures for zero-inflated margins in the data from its error-prone model domain to a reference domain. VBC estimates the model and reference distribution using vine copulas and corrects the model distribution via (inverse) Rosenblatt transformation. To deal with the variables’ zero-inflated nature, we develop a new vine density decomposition that accommodates such variables and employs an adequately randomized version of the Rosenblatt transform. This novel approach allows for more accurate modelling of multivariate zero-inflated climate data. Compared with state-of-the-art correction methods, VBC is generally the best-performing correction and the most accurate method for correcting zero-inflated events.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Thomas Nagler

Prof. Dr.

Computational Statistics & Data Science

[94]

Y. Ozyurt, S. Feuerriegel and M. Sachan.
Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing.
Preprint (Oct. 2024). arXiv

Abstract

Knowledge tracing (KT) is a popular approach for modeling students’ learning progress over time, which can enable more personalized and adaptive learning. However, existing KT approaches face two major limitations: (1) they rely heavily on expert-defined knowledge concepts (KCs) in questions, which is time-consuming and prone to errors; and (2) KT methods tend to overlook the semantics of both questions and the given KCs. In this work, we address these challenges and present KCQRL, a framework for automated knowledge concept annotation and question representation learning that can improve the effectiveness of any existing KT model. First, we propose an automated KC annotation process using large language models (LLMs), which generates question solutions and then annotates KCs in each solution step of the questions. Second, we introduce a contrastive learning approach to generate semantically rich embeddings for questions and solution steps, aligning them with their associated KCs via a tailored false negative elimination approach. These embeddings can be readily integrated into existing KT models, replacing their randomly initialized embeddings. We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets, where we achieve consistent performance improvements.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[93]

Y. Ozyurt, S. Feuerriegel and C. Zhang.
Document-Level In-Context Few-Shot Relation Extraction via Pre-Trained Language Models.
Preprint (Oct. 2024). arXiv

Abstract

Document-level relation extraction aims at inferring structured human knowledge from textual documents. State-of-the-art methods for this task use pre-trained language models (LMs) via fine-tuning, yet fine-tuning is computationally expensive and cannot adapt to new relation types or new LMs. As a remedy, we leverage the generalization capabilities of pre-trained LMs and present a novel framework for document-level in-context few-shot relation extraction. Our framework has three strengths: it eliminates the need (1) for named entity recognition and (2) for human annotations of documents, and (3) it can be updated to new LMs without re-training. We evaluate our framework using DocRED, the largest publicly available dataset for document-level relation extraction, and demonstrate that our framework achieves state-of-the-art performance. We further show that our framework actually performs much better than the original labels from the development set of DocRED. Finally, we conduct an extensive benchmark demonstrating the effectiveness of our framework, achieving state-of-the-art results across six relation extraction datasets and outperforming more than 30 baseline methods. Unlike our framework, the baseline methods have large computational overhead (e.g., from fine-tuning). To the best of our knowledge, we are the first to reformulate the document-level relation extraction task as a tailored in-context few-shot learning paradigm.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[92]

D. Tschernutter, M. Kraus and S. Feuerriegel.
A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions.
Transactions on Machine Learning Research (Sep. 2024). URL

Abstract

We propose an algorithm for optimizing the parameters of single hidden layer neural networks. Specifically, we derive a blockwise difference-of-convex (DC) functions representation of the objective function. Based on the latter, we propose a block coordinate descent (BCD) approach that we combine with a tailored difference-of-convex functions algorithm (DCA). We prove global convergence of the proposed algorithm. Furthermore, we mathematically analyze the convergence rate of parameters and the convergence rate in value (i.e., the training loss). We give conditions under which our algorithm converges linearly or even faster depending on the local shape of the loss function. We confirm our theoretical derivations numerically and compare our algorithm against state-of-the-art gradient-based solvers in terms of both training loss and test loss.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Artificial Intelligence in Management

[91]

C. Janke, R. Rubio-Acero, M. Weigert, C. Reinkemeyer, Y. Khazaei, L. Kleinlein, R. Le Gleut, K. Radon, M. Hannes, F. Picasso, A. E. Lucke, M. Plank, I. C. Kotta, I. Paunovic, A. Zhelyazkova, I. Noreña, S. Winter, M. Hoelscher, A. Wieser, H. Küchenhoff, N. Castelletti and o. b. o. t. ORCHESTRA Working Group.
Understanding the Omicron Variant Impact in Healthcare Workers: Insights from the Prospective COVID-19 Post-Immunization Serological Cohort in Munich (KoCo-Impf) on Risk Factors for Breakthrough and Reinfections.
Viruses 16.10 (Sep. 2024). DOI

Abstract

This study analyzes immune responses to SARS-CoV-2 vaccination and infection, including asymptomatic cases, focusing on infection risks during the Omicron wave, particularly among high-risk healthcare workers. In the KoCo-Impf study, we monitored 6088 vaccinated participants in Munich aged 18 and above. From 13 May to 31 July 2022, 2351 participants were follow-uped. Logistic regression models evaluated primary, secondary, and breakthrough infections (BTIs). Roche Elecsys® Anti-SARS-CoV-2 assays detected prior infections (via anti-Nucleocapsid antibodies) and assessed vaccination/infection impact (via anti-Spike antibodies) using dried blood spots. Our findings revealed an anti-Nucleocapsid seroprevalence of 44.1%. BTIs occurred in 38.8% of participants, with reinfections in 48.0%. Follow-up participation was inversely associated with current smoking and non-vaccination, while significantly increasing with age and receipt of three vaccine doses. Larger household sizes and younger age increased infection risks, whereas multiple vaccinations and older age reduced them. Household size and specific institutional subgroups were risk factors for BTIs. The anti-Nucleocapsid value prior to the second infection was significantly associated with reinfection risk. Institutional subgroups influenced all models, underscoring the importance of tailored outbreak responses. The KoCo-Impf study underscores the importance of vaccination, demographic factors, and institutional settings in understanding SARS-CoV-2 infection risks during the Omicron wave.

MCML Authors

Maximilian Weigert

* Former Member

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Statistical Consulting Unit (StaBLab)

[90]

A. Bashardoust, Y. Feng, D. Geißler, S. Feuerriegel and Y. R. Shrestha.
The Effect of Education in Prompt Engineering: Evidence from Journalists.
Preprint (Sep. 2024). arXiv

Abstract

Large language models (LLMs) are increasingly used in daily work. In this paper, we analyze whether training in prompt engineering can improve the interactions of users with LLMs. For this, we conducted a field experiment where we asked journalists to write short texts before and after training in prompt engineering. We then analyzed the effect of training on three dimensions: (1) the user experience of journalists when interacting with LLMs, (2) the accuracy of the texts (assessed by a domain expert), and (3) the reader perception, such as clarity, engagement, and other text quality dimensions (assessed by non-expert readers). Our results show: (1) Our training improved the perceived expertise of journalists but also decreased the perceived helpfulness of LLM use. (2) The effect on accuracy varied by the difficulty of the task. (3) There is a mixed impact of training on reader perception across different text quality dimensions.

MCML Authors

Dominique Geißler

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[89]

L. von der Heyde, A.-C. Haensch, A. Wenz and B. Ma.
United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections.
Preprint (Sep. 2024). arXiv

Abstract

Large language models (LLMs) are perceived by some as having the potential to revolutionize social science research, considering their training data includes information on human attitudes and behavior. If these attitudes are reflected in LLM output, LLM-generated ‘synthetic samples’ could be used as a viable and efficient alternative to surveys of real humans. However, LLM-synthetic samples might exhibit coverage bias due to training data and fine-tuning processes being unrepresentative of diverse linguistic, social, political, and digital contexts. In this study, we examine to what extent LLM-based predictions of public opinion exhibit context-dependent biases by predicting voting behavior in the 2024 European Parliament elections using a state-of-the-art LLM. We prompt GPT-4-Turbo with anonymized individual-level background information, varying prompt content and language, ask the LLM to predict each person’s voting behavior, and compare the weighted aggregates to the real election results. Our findings emphasize the limited applicability of LLM-synthetic samples to public opinion prediction. We show that (1) the LLM-based prediction of future voting behavior largely fails, (2) prediction accuracy is unequally distributed across national and linguistic contexts, and (3) improving LLM predictions requires detailed attitudinal information about individuals for prompting. In investigating the contextual differences of LLM-based predictions of public opinion, our research contributes to the understanding and mitigation of biases and inequalities in the development of LLMs and their applications in computational social science.

MCML Authors

Leah von der Heyde

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[88]

M. Kuzmanovic, D. Frauen, T. Hatt and S. Feuerriegel.
Causal Machine Learning for Cost-Effective Allocation of Development Aid.
KDD 2024 - 30th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Barcelona, Spain, Aug 25-29, 2024. DOI

Abstract

The Sustainable Development Goals (SDGs) of the United Nations provide a blueprint of a better future by ’leaving no one behind’, and, to achieve the SDGs by 2030, poor countries require immense volumes of development aid. In this paper, we develop a causal machine learning framework for predicting heterogeneous treatment effects of aid disbursements to inform effective aid allocation. Specifically, our framework comprises three components: (i) a balancing autoencoder that uses representation learning to embed high-dimensional country characteristics while addressing treatment selection bias; (ii) a counterfactual generator to compute counterfactual outcomes for varying aid volumes to address small sample-size settings; and (iii) an inference model that is used to predict heterogeneous treatment-response curves. We demonstrate the effectiveness of our framework using data with official development aid earmarked to end HIV/AIDS in 105 countries, amounting to more than USD 5.2 billion. For this, we first show that our framework successfully computes heterogeneous treatment-response curves using semi-synthetic data. Then, we demonstrate our framework using real-world HIV data. Our framework points to large opportunities for a more effective aid allocation, suggesting that the total number of new HIV infections could be reduced by up to 3.3% (~50,000 cases) compared to the current allocation practice.

MCML Authors

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[87]

A. Maarouf, D. Bär, D. Geißler and S. Feuerriegel.
HQP: A human-annotated dataset for detecting online propaganda.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

Online propaganda poses a severe threat to the integrity of societies. However, existing datasets for detecting online propaganda have a key limitation: they were annotated using weak labels that can be noisy and even incorrect. To address this limitation, our work makes the following contributions: (1) We present HQP: a novel dataset (N=30000) for detecting online propaganda with high-quality labels. To the best of our knowledge, HQP is the first large-scale dataset for detecting online propaganda that was created through human annotation. (2) We show empirically that state-of-the-art language models fail in detecting online propaganda when trained with weak labels (AUC: 64.03). In contrast, state-of-the-art language models can accurately detect online propaganda when trained with our high-quality labels (AUC: 92.25), which is an improvement of 44%. (3) We show that prompt-based learning using a small sample of high-quality labels can still achieve a reasonable performance (AUC: 80.27) while significantly reducing the cost of labeling. (4) We extend HQP to HQP+ to test how well propaganda across different contexts can be detected. Crucially, our work highlights the importance of high-quality labels for sensitive NLP tasks such as propaganda detection.

MCML Authors

Abdurahman Maarouf

Artificial Intelligence in Management

Dominik Bär

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dominique Geißler

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

Artificial Intelligence in Management

[86]

X. Wang, B. Ma, C. Hu, L. Weber-Genzel, P. Röttger, F. Kreuter, D. Hovy and B. Plank.
My Answer is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models.
ACL 2024 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions to limit the response space. The model is then evaluated by ranking the candidate answers by the log probability of the first token prediction. However, first-tokens may not consistently reflect the final response output, due to model’s diverse response styles such as starting with ‘Sure’ or refusing to answer. Consequently, first-token evaluation is not indicative of model behaviour when interacting with users. But by how much? We evaluate how aligned first-token evaluation is with the text output along several dimensions, namely final option choice, refusal rate, choice distribution and robustness under prompt perturbation. Our results show that the two approaches are severely misaligned on all dimensions, reaching mismatch rates over 60%. Models heavily fine-tuned on conversational or safety data are especially impacted. Crucially, models remain misaligned even when we increasingly constrain prompts, i.e., force them to start with an option letter or example template. Our findings i) underscore the importance of inspecting the text output as well and ii) caution against relying solely on first-token evaluation.

MCML Authors

Xinpeng Wang

AI and Computational Linguistics

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Leon Weber-Genzel

Dr.

B2 | Natural Language Processing
→ Group Barbara Plank

* Former Member

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Barbara Plank

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

AI and Computational Linguistics

[85]

B. Ma.
Evaluating Lexical Aspect with Large Language Models.
CMCL @ACL 2024 - Workshop on Cognitive Modeling and Computational Linguistics at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

In this study, we explore the proficiency of large language models (LLMs) in understanding two key lexical aspects: duration (durative/stative) and telicity (telic/atelic). Through experiments on datasets featuring sentences, verbs, and verb positions, we prompt the LLMs to identify aspectual features of verbs in sentences. Our findings reveal that certain LLMs, particularly those closed-source ones, are able to capture information on duration and telicity, albeit with some performance variations and weaker results compared to the baseline. By employing prompts at three levels (sentence-only, sentence with verb, and sentence with verb and its position), we demonstrate that integrating verb information generally enhances performance in aspectual feature recognition, though it introduces instability. We call for future research to look deeper into methods aimed at optimizing LLMs for aspectual feature comprehension.

MCML Authors

Bolei Ma

Social Data Science and AI

[84]

A. Dimmelmeier, H. Doll, M. Schierholz, E. Kormanyos, M. Fehr, B. Ma, J. Beck, A. Fraser and F. Kreuter.
Informing climate risk analysis using textual information - A research agenda.
ClimateNLP @ACL 2024 - 1st Workshop on Natural Language Processing Meets Climate Change at the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). Bangkok, Thailand, Aug 11-16, 2024. DOI

Abstract

We present a research agenda focused on efficiently extracting, assuring quality, and consolidating textual company sustainability information to address urgent climate change decision-making needs. Starting from the goal to create integrated FAIR (Findable, Accessible, Interoperable, Reusable) climate-related data, we identify research needs pertaining to the technical aspects of information extraction as well as to the design of the integrated sustainability datasets that we seek to compile. Regarding extraction, we leverage technological advancements, particularly in large language models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines, to unlock the underutilized potential of unstructured textual information contained in corporate sustainability reports. In applying these techniques, we review key challenges, which include the retrieval and extraction of CO2 emission values from PDF documents, especially from unstructured tables and graphs therein, and the validation of automatically extracted data through comparisons with human-annotated values. We also review how existing use cases and practices in climate risk analytics relate to choices of what textual information should be extracted and how it could be linked to existing structured data.

MCML Authors

Malte Schierholz

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Jacob Beck

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Alexander Fraser

Prof. Dr.

Data Analytics & Statistics

Frauke Kreuter

Prof. Dr.

A3 | Computational Models
→ Group Eyke Hüllermeier

Social Data Science and AI

[83]

S. Dutta, T. Kaufmann, G. Glavaš, I. Habernal, K. Kersting, F. Kreuter, M. Mezini, I. Gurevych, E. Hüllermeier and H. Schütze.
Problem Solving Through Human-AI Preference-Based Cooperation.
Preprint (Aug. 2024). arXiv

Abstract

While there is a widespread belief that artificial general intelligence (AGI) – or even superhuman AI – is imminent, complex problems in expert domains are far from being solved. We argue that such problems require human-AI cooperation and that the current state of the art in generative AI is unable to play the role of a reliable partner due to a multitude of shortcomings, including difficulty to keep track of a complex solution artifact (e.g., a software program), limited support for versatile human preference expression and lack of adapting to human preference in an interactive setting. To address these challenges, we propose HAICo2, a novel human-AI co-construction framework. We take first steps towards a formalization of HAICo2 and discuss the difficult open research problems that it faces.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Hinrich Schütze

Prof. Dr.

Computational Linguistics

[82]

S. Eckman, B. Plank and F. Kreuter.
Position: Insights from Survey Methodology can Improve Training Data.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Whether future AI models are fair, trustworthy, and aligned with the public’s interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality data is difficult, and few AI/ML researchers are trained in data collection methods. Recent research in data-centric AI has show that higher quality training data leads to better performing models, making this the right moment to introduce AI/ML researchers to the field of survey methodology, the science of data collection. We summarize insights from the survey methodology literature and discuss how they can improve the quality of training and feedback data. We also suggest collaborative research ideas into how biases in data collection can be mitigated, making models more accurate and human-centric.

MCML Authors

Barbara Plank

Prof. Dr.

AI and Computational Linguistics

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI

[81]

D. Frauen, V. Melnychuk and S. Feuerriegel.
Fair Off-Policy Learning from Observational Data.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Algorithmic decision-making in practice must be fair for legal, ethical, and societal reasons. To achieve this, prior research has contributed various approaches that ensure fairness in machine learning predictions, while comparatively little effort has focused on fairness in decision-making, specifically off-policy learning. In this paper, we propose a novel framework for fair off-policy learning: we learn decision rules from observational data under different notions of fairness, where we explicitly assume that observational data were collected under a different – potentially discriminatory – behavioral policy. Importantly, our framework applies to different fairness notions for off-policy learning, where fairness is formalized based on actions or policy values. As our main contribution, we propose a neural network-based framework to learn optimal policies under different fairness notions. We further provide theoretical guarantees in the form of generalization bounds for the finite-sample version of our framework. We demonstrate the effectiveness of our framework through extensive numerical experiments using both simulated and real-world data. Altogether, our work enables algorithmic decision-making in a wide array of practical applications where fairness must be ensured.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[80]

J. Schweisthal, D. Frauen, M. Van der Schaar and S. Feuerriegel.
Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments.
ICML 2024 - 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. URL

Abstract

Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine. Here, we focus on the widespread setting where the observational data come from multiple environments, such as different hospitals, physicians, or countries. Furthermore, we allow for violations of standard causal assumptions, namely, overlap within the environments and unconfoundedness. To this end, we move away from point identification and focus on partial identification. Specifically, we show that current assumptions from the literature on multiple environments allow us to interpret the environment as an instrumental variable (IV). This allows us to adapt bounds from the IV literature for partial identification of CATE by leveraging treatment assignment mechanisms across environments. Then, we propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models. We further demonstrate the effectiveness of our meta-learners across various experiments using both simulated and real-world data. Finally, we discuss the applicability of our meta-learners to partial identification in instrumental variable settings, such as randomized controlled trials with non-compliance.

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Artificial Intelligence in Management

[79]

U. Fischer Abaigar, C. Kern and F. Kreuter.
The Missing Link: Allocation Performance in Causal Machine Learning.
ICML 2024 - Workshop Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact at the 41st International Conference on Machine Learning. Vienna, Austria, Jul 21-27, 2024. arXiv URL

Abstract

Automated decision-making (ADM) systems are being deployed across a diverse range of critical problem areas such as social welfare and healthcare. Recent work highlights the importance of causal ML models in ADM systems, but implementing them in complex social environments poses significant challenges. Research on how these challenges impact the performance in specific downstream decision-making tasks is limited. Addressing this gap, we make use of a comprehensive real-world dataset of jobseekers to illustrate how the performance of a single CATE model can vary significantly across different decision-making scenarios and highlight the differential influence of challenges such as distribution shifts on predictions and allocations.

MCML Authors

Unai Fischer Abaigar

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Social Data Science and AI

[78]

C. A. Scholbeck, H. Funk and G. Casalicchio.
Algorithm-Agnostic Feature Attributions for Clustering.
xAI 2024 - 2nd World Conference on Explainable Artificial Intelligence. Valletta, Malta, Jul 17-19, 2024. DOI

Abstract

Understanding how assignments of instances to clusters can be attributed to the features can be vital in many applications. However, research to provide such feature attributions has been limited. Clustering algorithms with built-in explanations are scarce. Common algorithm-agnostic approaches involve dimension reduction and subsequent visualization, which transforms the original features used to cluster the data; or training a supervised learning classifier on the found cluster labels, which adds additional and intractable complexity. We present FACT (feature attributions for clustering), an algorithm-agnostic framework that preserves the integrity of the data and does not introduce additional models. As the defining characteristic of FACT, we introduce a set of work stages: sampling, intervention, reassignment, and aggregation. Furthermore, we propose two novel FACT methods: SMART (scoring metric after permutation) measures changes in cluster assignments by custom scoring functions after permuting selected features; IDEA (isolated effect on assignment) indicates local and global changes in cluster assignments after making uniform changes to selected features.

MCML Authors

Henri Funk

Statistical Consulting Unit (StaBLab)

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[77]

J. Piller, H. Küchenhoff and A. Bender.
Flexible additive models for multi-event survival analysis.
IWSM 2024 - 38th International Workshop on Statistical Modelling. Durham, UK, Jul 14-19, 2024. PDF

Abstract

Piecewise Exponential Additive Mixed Models (PAMMs) (Bender et al., 2018) have gained popularity in various domains due to their ability to tackle a wide variety of survival problems and their flexibility to model non-linear covariate effects, including time-varying effects and cumulative effects (Bender et al., 2019). One advantage of such reduction techniques is that they do not require any specialised software for the estimation of the model parameters. Thus, in the case of the PAMM, they can be conveniently estimated using generalized additive mixed modeling methodology or, for example, respective boosting or deep learning based approaches (Bender et al., 2022). Nevertheless, their use in practice requires pre-processing, which differs depending on the survival task at hand (e.g. left-truncation, competing risks, etc.) and post-processing (e.g. transforming estimated parameters to useful quantities like survival or transition probabilities). The R package pammtools facilitates the entire modeling process, so far, however, only for single-event data. Here we extend the framework and package capabilities to handle general multi-state models.

MCML Authors

Johannes Piller

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[76]

E. Kraus and C. Kern.
Measurement Modeling of Predictors and Outcomes in Algorithmic Fairness.
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

This contribution investigates structural equation modeling (SEM) as a pre-processing approach to mitigate measurement bias in algorithmic decision-making systems. We construct latent predictors and latent targets based on different measurement modeling strategies and evaluate their interplay in simulations and an application study. We systematically compare SEMs which preserve group-differences (group-overarching) to models which equalize group-differences (group-specific) in predictors and outcomes. In our simulations, we find that group-overarching models are a more effective strategy than group-specific models and lead to smaller subgroup prediction error and better calibrated risk scores. In the application study we apply SEM to a health risk prediction task and find support for the benefit of group-overarching models. We conclude that tackling fairness concerns by utilizing measurement models of both the predictors and the outcome can contribute to the fairness of ADM systems. Utilizing SEM during preprocessing allows to incorporate substantive knowledge about the prediction task into the model implementation.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

[75]

J. Simson, A. Fabris and C. Kern.
Unveiling the Blindspots: Examining Availability and Usage of Protected Attributes in Fairness Datasets.
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

This work examines the representation of protected attributes across tabular datasets used in algorithmic fairness research. Drawing from international human rights and anti-discrimination laws, we compile a set of protected attributes and investigate both their availability and usage in the literature. Our analysis reveals a significant underrepresentation of certain attributes in datasets that is exacerbated by a strong focus on race and sex in dataset usage. We identify a geographical bias towards the Global North, particularly North America, potentially limiting the applicability of fairness detection and mitigation strategies in less-represented regions. The study exposes critical blindspots in fairness research, highlighting the need for a more inclusive and representative approach to data collection and usage in the field. We propose a shift away from a narrow focus on a small number of datasets and advocate for initiatives aimed at sourcing more diverse and representative data.

MCML Authors

Jan Simson

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[74]

C. Strasser Ceballos and C. Kern.
Deciding the Future of Refugees: Rolling the Dice or Algorithmic Location Assignment?
EWAF 2024 - 3rd European Workshop on Algorithmic Fairness. Mainz, Germany, Jul 01-03, 2024. PDF

Abstract

Upon arrival in Germany, refugees are distributed among the 16 federal states. This distribution decision is based on a fixed formula consisting of two components: tax revenue and the population size of the federal state. Research suggests that optimal refugee-location matching enhances refugee integration into the labor market. However, the current mechanism fails to align refugees’ characteristics with their assigned locations, resulting in a missed opportunity to leverage synergies. To this end, we use comprehensive refugee data in Germany and exploit an existing machine learning matching tool to assign refugees to states algorithmically. Our findings reveal potential improvements in refugee employment, depending on the modeling setup. Our study provides two key contributions. First, we evaluate the effectiveness of an algorithmic matching tool within Germany. Second, we investigate the fairness implications of such an algorithmic decision-making tool by evaluating the impact of different train data setups on group-specific model performance.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI Lab

[73]

M. Schröder, D. Frauen, J. Schweisthal, K. Heß, V. Melnychuk and S. Feuerriegel.
Conformal Prediction for Causal Effects of Continuous Treatments.
Preprint (Jul. 2024). arXiv

Abstract

Uncertainty quantification of causal effects is crucial for safety-critical applications such as personalized medicine. A powerful approach for this is conformal prediction, which has several practical benefits due to model-agnostic finite-sample guarantees. Yet, existing methods for conformal prediction of causal effects are limited to binary/discrete treatments and make highly restrictive assumptions such as known propensity scores. In this work, we provide a novel conformal prediction method for potential outcomes of continuous treatments. We account for the additional uncertainty introduced through propensity estimation so that our conformal prediction intervals are valid even if the propensity score is unknown. Our contributions are three-fold: (1) We derive finite-sample prediction intervals for potential outcomes of continuous treatments. (2) We provide an algorithm for calculating the derived intervals. (3) We demonstrate the effectiveness of the conformal prediction intervals in experiments on synthetic and real-world datasets. To the best of our knowledge, we are the first to propose conformal prediction for continuous treatments when the propensity score is unknown and must be estimated from data.

MCML Authors

Maresa Schröder

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[72]

C. F. Naumzik, A. Kongsted, W. Vach and S. Feuerriegel.
Data-driven subgrouping of patient trajectories with chronic diseases: Evidence from low back pain.
CHIL 2024 - 5th AHLI Conference on Health, Inference, and Learning . New York City, NY, USA, Jun 27-28, 2024. URL

Abstract

Clinical data informs the personalization of health care with a potential for more effective disease management. In practice, this is achieved by emph{subgrouping}, whereby clusters with similar patient characteristics are identified and then receive customized treatment plans with the goal of targeting subgroup-specific disease dynamics. In this paper, we propose a novel mixture hidden Markov model for subgrouping patient trajectories from emph{chronic diseases}. Our model is probabilistic and carefully designed to capture different trajectory phases of chronic diseases (i.e., “severe”, “moderate”, and “mild”) through tailored latent states. We demonstrate our subgrouping framework based on a longitudinal study across 847 patients with non-specific low back pain. Here, our subgrouping framework identifies 8 subgroups. Further, we show that our subgrouping framework outperforms common baselines in terms of cluster validity indices. Finally, we discuss the applicability of the model to other chronic and long-lasting diseases.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[71]

P. Resnik, B. Ma, A. Hoyle, P. Goel, R. Sarkar, M. Gearing, A.-C. Haensch and F. Kreuter.
TOPCAT: Topic-Oriented Protocol for Content Analysis of Text – A Preliminary Study.
NLP+CSS @NAACL 2024 - 6th Workshop on Natural Language Processing and Computational Social Science at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico City, Mexico, Jun 16-21, 2024. URL

Abstract

Identifying constructs in text data is a labor-intensive task in social science research. Despite the potential richness of open-ended survey responses, the complexity of analyzing them often leads researchers to underutilize or ignore them entirely. While topic modeling offers a technological solution, qualitative researchers may remain skeptical of its rigor. In this paper, we introduce TOPCAT: Topic-Oriented Protocol for Content Analysis of Text, a systematic approach that integrates off-the-shelf topic modeling with human decisionmaking and curation. Our method aims to provide a viable solution for topicalizing open-ended responses in survey research, ensuring both efficiency and trustworthiness. We present the TOPCAT protocol, define an evaluation process, and demonstrate its effectiveness using open-ended responses from a U.S. survey on COVID-19 impact. Our findings suggest that TOPCAT enables efficient and rigorous qualitative analysis, offering a promising avenue for future research in this domain. Furthermore, our findings challenge the adequacy of expert coding schemes as ‘‘gold’’ standards, emphasizing the subjectivity inherent in qualitative content interpretation.

MCML Authors

Bolei Ma

Social Data Science and AI

Anna-Carolina Haensch

Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[70]

S. Jaime and C. Kern.
Ethnic Classifications in Algorithmic Fairness: Concepts, Measures and Implications in Practice.
ACM FAccT 2024 - 7th ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI

Abstract

We address the challenges and implications of ensuring fairness in algorithmic decision-making (ADM) practices related to ethnicity. Expanding beyond the U.S.-centric approach to race, we provide an overview of ethnic classification schemes in European countries and emphasize how the distinct approaches to ethnicity in Europe can impact fairness assessments in ADM. Drawing on large-scale German survey data, we highlight differences in ethnic disadvantage across subpopulations defined by different measures of ethnicity. We build prediction models in the labor market, health, and finance domain and investigate the fairness implications of different ethnic classification schemes across multiple prediction tasks and fairness metrics. Our results show considerable variation in fairness scores across ethnic classifications, where error disparities for the same model can be twice as large when using different operationalizations of ethnicity. We argue that ethnic classifications differ in their ability to identify ethnic disadvantage across ADM domains and advocate for context-sensitive operationalizations of ethnicity and its transparent reporting in fair machine learning (ML) applications.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

[69]

J. Simson, A. Fabris and C. Kern.
Lazy Data Practices Harm Fairness Research.
ACM FAccT 2024 - 7th ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI

Abstract

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a lack of representation for certain protected attributes in both data and evaluations, (2) the widespread exclusion of minorities during data preprocessing, and (3) a lack of transparency about consequential yet overlooked dataset processing choices. We further note additional factors, such as limitations in publicly available data, privacy considerations and a general lack of awareness that further contribute to these issues. Through exemplary analyses on the usage of popular datasets, we demonstrate how opaque data choices significantly impact minorities, fairness metrics, and the resulting model comparison. To address these challenges, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.

MCML Authors

Jan Simson

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Christoph Kern

Social Data Science and AI Lab

[68]

J. Simson, F. Pfisterer and C. Kern.
One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions.
ACM FAccT 2024 - 7th ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro, Brazil, Jun 03-06, 2024. DOI

Abstract

A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. The downstream effects of ADM systems critically depend on the decisions made during a systems’ design, implementation, and evaluation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these decisions are made implicitly, without knowing exactly how they will influence the final system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit decisions during design and evaluation into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible “universes” of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can investigate the variability and robustness of fairness scores and see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand fairness implications of design and evaluation decisions using an exemplary case study of predicting public health care coverage for vulnerable populations. Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model. This is problematic, as a nefarious actor could optimise or “hack” a fairness metric to portray a discriminating model as fair merely by changing how it is evaluated. We illustrate how a multiverse analysis can help to address this issue.

MCML Authors

Jan Simson

Social Data Science and AI Lab

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI Lab

[67]

D. Bär, F. Pierri, G. De Francisci Morales and S. Feuerriegel.
Systematic discrepancies in the delivery of political ads on facebook and instagram.
PNAS Nexus.pgae247 (Jun. 2024). DOI

Abstract

Political advertising on social media has become a central element in election campaigns. However, granular information about political advertising on social media was previously unavailable, thus raising concerns regarding fairness, accountability, and transparency in the electoral process. In this article, we analyze targeted political advertising on social media via a unique, large-scale dataset of over 80,000 political ads from Meta during the 2021 German federal election, with more than billion impressions. For each political ad, our dataset records granular information about targeting strategies, spending, and actual impressions. We then study (i) the prevalence of targeted ads across the political spectrum; (ii) the discrepancies between targeted and actual audiences due to algorithmic ad delivery; and (iii) which targeting strategies on social media attain a wide reach at low cost. We find that targeted ads are prevalent across the entire political spectrum. Moreover, there are considerable discrepancies between targeted and actual audiences, and systematic differences in the reach of political ads (in impressions-per-EUR) among parties, where the algorithm favor ads from populists over others.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[66]

S. Ball, F. Kreuter and N. Panickssery.
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models.
Preprint (Jun. 2024). arXiv

Abstract

Conversational large language models are trained to refuse to answer harmful questions. However, emergent jailbreaking techniques can still elicit unsafe outputs, presenting an ongoing challenge for model alignment. To better understand how different jailbreak types circumvent safeguards, this paper analyses model activations on different jailbreak inputs. We find that it is possible to extract a jailbreak vector from a single class of jailbreaks that works to mitigate jailbreak effectiveness from other semantically-dissimilar classes. This may indicate that different kinds of effective jailbreaks operate via a similar internal mechanism. We investigate a potential common mechanism of harmfulness feature suppression, and find evidence that effective jailbreaks noticeably reduce a model’s perception of prompt harmfulness. These findings offer actionable insights for developing more robust jailbreak countermeasures and lay the groundwork for a deeper, mechanistic understanding of jailbreak dynamics in language models.

MCML Authors

Sarah Ball

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI

[65]

B. Felderer, L. Repke, W. Weber, J. Schweisthal and L. Bothmann.
Predicting the Validity and Reliability of Survey Questions.
Preprint (Jun. 2024). DOI

Abstract

The Survey Quality Predictor (SQP) is an open-access system to predict the quality, i.e., the reliability and validity, of survey questions based on the characteristics of the questions. The prediction is based on a meta-regression of many multitrait-multimethod (MTMM) experiments in which characteristics of the survey questions were systematically varied. The release of SQP 3.0 that is based on an expanded data base as compared to previous SQP versions raised the need for a new meta-regression. To find the best method for analyzing the complex data structure of SQP (e.g., the existence of various uncorrelated predictors), we compared four suitable machine learning methods in terms of their ability to predict both survey quality indicators: LASSO, elastic net, boosting and random forest. The article discusses the performance of the models and illustrates the importance of the individual item characteristics in the random forest model, which was chosen for SQP 3.0.

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

Ludwig Bothmann

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[64]

Á. F. Junquera and C. Kern.
From rules to forests: rule-based versus statistical models for jobseeker profiling.
Preprint (Jun. 2024). DOI

Abstract

Public employment services (PES) commonly apply profiling models to target labour market programs to jobseekers at risk of becoming long-term unemployed. Such allocation systems often codify institutional experiences in a set of profiling rules, whose predictive ability, however, is seldomly tested. We systematically evaluate the predictive performance of a rule-based profiling procedure currently implemented by the PES of Catalonia, Spain, in comparison to the performance of statistical models in predicting future long-term unemployment (LTU) episodes. Using comprehensive administrative data, we develop logit and machine learning models and evaluate their performance with respect to both discrimination and calibration. Compared to the current rule-based procedure of Catalonia, our machine learning models achieve greater discrimination ability and remarkable improvements in calibration. Particularly, our random forest model is able to accurately forecast LTU episodes and outperforms the rule-based model by offering robust predictions that perform well under stress tests. This paper presents the first performance comparison between a complex, currently implemented, rule-based approach and complex statistical profiling models. Our work illustrates the importance of assessing the calibration of profiling models and the potential of statistical tools to assist public employment offices in Spain.

MCML Authors

Christoph Kern

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI Lab

[63]

D. Frauen, F. Imrie, A. Curth, V. Melnychuk, S. Feuerriegel and M. van der Schaar.
A Neural Framework for Generalized Causal Sensitivity Analysis.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

Unobserved confounding is common in many applications, making causal inference from observational data challenging. As a remedy, causal sensitivity analysis is an important tool to draw causal conclusions under unobserved confounding with mathematical guarantees. In this paper, we propose NeuralCSA, a neural framework for generalized causal sensitivity analysis. Unlike previous work, our framework is compatible with (i) a large class of sensitivity models, including the marginal sensitivity model, -sensitivity models, and Rosenbaum’s sensitivity model; (ii) different treatment types (i.e., binary and continuous); and (iii) different causal queries, including (conditional) average treatment effects and simultaneous effects on multiple outcomes. This generality is achieved by learning a latent distribution shift that corresponds to a treatment intervention using two conditional normalizing flows. We provide theoretical guarantees that NeuralCSA is able to infer valid bounds on the causal query of interest and also demonstrate this empirically using both simulated and real-world data.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[62]

K. Heß, V. Melnychuk, D. Frauen and S. Feuerriegel.
Bayesian Neural Controlled Differential Equations for Treatment Effect Estimation.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

Treatment effect estimation in continuous time is crucial for personalized medicine. However, existing methods for this task are limited to point estimates of the potential outcomes, whereas uncertainty estimates have been ignored. Needless to say, uncertainty quantification is crucial for reliable decision-making in medical applications. To fill this gap, we propose a novel Bayesian neural controlled differential equation (BNCDE) for treatment effect estimation in continuous time. In our BNCDE, the time dimension is modeled through a coupled system of neural controlled differential equations and neural stochastic differential equations, where the neural stochastic differential equations allow for tractable variational Bayesian inference. Thereby, for an assigned sequence of treatments, our BNCDE provides meaningful posterior predictive distributions of the potential outcomes. To the best of our knowledge, ours is the first tailored neural method to provide uncertainty estimates of treatment effects in continuous time. As such, our method is of direct practical value for promoting reliable decision-making in medicine.

MCML Authors

Konstantin Heß

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[61]

V. Melnychuk, D. Frauen and S. Feuerriegel.
Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic refutation framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATE is non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose a neural refutation framework which performs partial identification of CATE or, equivalently, aims at estimating lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our refutation framework is of direct relevance in practice where the validity of CATE estimation is of importance.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[60]

M. Schröder, D. Frauen and S. Feuerriegel.
Causal Fairness under Unobserved Confounding: A Neural Sensitivity Framework.
ICLR 2024 - 12th International Conference on Learning Representations. Vienna, Austria, May 07-11, 2024. URL

Abstract

Fairness of machine learning predictions is widely required in practice for legal, ethical, and societal reasons. Existing work typically focuses on settings without unobserved confounding, even though unobserved confounding can lead to severe violations of causal fairness and, thus, unfair predictions. In this work, we analyze the sensitivity of causal fairness to unobserved confounding. Our contributions are three-fold. First, we derive bounds for causal fairness metrics under different sources of unobserved confounding. This enables practitioners to examine the sensitivity of their machine learning models to unobserved confounding in fairness-critical applications. Second, we propose a novel neural framework for learning fair predictions, which allows us to offer worst-case guarantees of the extent to which causal fairness can be violated due to unobserved confounding. Third, we demonstrate the effectiveness of our framework in a series of experiments, including a real-world case study about predicting prison sentences. To the best of our knowledge, ours is the first work to study causal fairness under unobserved confounding. To this end, our work is of direct practical value as a refutation strategy to ensure the fairness of predictions in high-stakes applications.

MCML Authors

Maresa Schröder

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[59]

K. Heß, D. Frauen, V. Melnychuk and S. Feuerriegel.
G-Transformer for Conditional Average Potential Outcome Estimation over Time.
Preprint (May. 2024). arXiv

Abstract

Estimating potential outcomes for treatments over time based on observational data is important for personalized decision-making in medicine. Yet, existing neural methods for this task either (1) do not perform proper adjustments for time-varying confounders, or (2) suffer from large estimation variance. In order to address both limitations, we introduce the G-transformer (GT). Our GT is a novel, neural end-to-end model which adjusts for time-varying confounders, and provides low-variance estimation of conditional average potential outcomes (CAPOs) over time. Specifically, our GT is the first neural model to perform regression-based iterative G-computation for CAPOs in the time-varying setting. We evaluate the effectiveness of our GT across various experiments. In sum, this work represents a significant step towards personalized decision-making from electronic health records.

MCML Authors

Konstantin Heß

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[58]

C. Kern, M. Kim and A. Zhou.
Multi-CATE: Multi-Accurate Conditional Average Treatment Effect Estimation Robust to Unknown Covariate Shifts.
Preprint (May. 2024). arXiv

Abstract

MCML Authors

Christoph Kern

Prof. Dr.

Social Data Science and AI Lab

[57]

S. Feuerriegel, D. Frauen, V. Melnychuk, J. Schweisthal, K. Heß, A. Curth, S. Bauer, N. Kilbertus, I. S. Kohane and M. van der Schaar.
Causal machine learning for predicting treatment outcomes.
Nature Medicine 30 (Apr. 2024). DOI

Abstract

Causal machine learning (ML) offers flexible, data-driven methods for predicting treatment outcomes including efficacy and toxicity, thereby supporting the assessment and safety of drugs. A key benefit of causal ML is that it allows for estimating individualized treatment effects, so that clinical decision-making can be personalized to individual patient profiles. Causal ML can be used in combination with both clinical trial data and real-world data, such as clinical registries and electronic health records, but caution is needed to avoid biased or incorrect predictions. In this Perspective, we discuss the benefits of causal ML (relative to traditional statistical or ML approaches) and outline the key components and steps. Finally, we provide recommendations for the reliable use of causal ML and effective translation into the clinic.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Konstantin Heß

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Niki Kilbertus

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Ethics in Systems Design and Machine Learning

[56]

B. Ma, E. Nie, S. Yuan, H. Schmid, M. Färber, F. Kreuter and H. Schütze.
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks.
EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt Decomposition (ToPro), which facilitates the prompt-based method for token-level sequence labeling tasks. The ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer, especially for languages that are typologically different from the source language English. Our method also attains state-of-the-art performance when employed with the mT5 model. Besides, our exploratory study in multilingual large language models shows that ToPro performs much better than the current in-context learning method. Overall, the performance improvements show that ToPro could potentially serve as a novel and simple benchmarking method for sequence labeling tasks.

MCML Authors

Bolei Ma

Social Data Science and AI

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Computational Linguistics

[55]

J. Beck, S. Eckman, B. Ma, R. Chew and F. Kreuter.
Order Effects in Annotation Tasks: Further Evidence of Annotation Sensitivity.
UncertaiNLP @EACL 2024 - 1st Workshop on Uncertainty-Aware NLP at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024). St. Julians, Malta, Mar 17-22, 2024. URL

Abstract

The data-centric revolution in AI has revealed the importance of high-quality training data for developing successful AI models. However, annotations are sensitive to annotator characteristics, training materials, and to the design and wording of the data collection instrument. This paper explores the impact of observation order on annotations. We find that annotators’ judgments change based on the order in which they see observations. We use ideas from social psychology to motivate hypotheses about why this order effect occurs. We believe that insights from social science can help AI researchers improve data and model quality.

MCML Authors

Jacob Beck

Social Data Science and AI

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[54]

W. H. Hartl, P. Kopper, L. Xu, L. Heller, M. Mironov, R. Wang, A. G. Day, G. Elke, H. Küchenhoff and A. Bender.
Relevance of Protein Intake for Weaning in the Mechanically Ventilated Critically Ill: Analysis of a Large International Database.
Critical Care Medicine 50.3 (Mar. 2024). DOI

Abstract

The association between protein intake and the need for mechanical ventilation (MV) is controversial. We aimed to investigate the associations between protein intake and outcomes in ventilated critically ill patients.

MCML Authors

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

[53]

M. Maritsch, S. Föll, V. Lehmann, N. Styger, C. Bérubé, M. Kraus, S. Feuerriegel, T. Kowatsch, T. Züger, E. Fleisch, F. Wortmann and C. Stettler.
Smartwatches for non-invasive hypoglycaemia detection during cognitive and psychomotor stress.
Diabetes, Obesity and Metabolism 26.3 (Mar. 2024). DOI

Abstract

Hypoglycaemia is one of the most relevant complications of diabetes1 and induces alterations in physiological parameters2, 3 that can be measured with smartwatches and detected using machine learning (ML).4 The performance of these algorithms when applied to different hypoglycaemic ranges or in situations involving cognitive and psychomotor stress remains unclear. Demanding tasks can significantly affect the physiological responses on which the wearable-based hypoglycaemia detection relies.5 The present analysis aimed to investigate ML-based hypoglycaemia detection using wearable data at different levels of hypoglycaemia during a complex task involving cognitive and psychomotor challenges (driving).

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[52]

S. Feuerriegel, J. Hartmann, C. Janiesch and P. Zschech.
Generative AI.
Business and Information Systems Engineering 66.1 (Feb. 2024). DOI

Abstract

In this Catchword article, we provide a conceptualization of generative AI as an entity in socio-technical systems and provide examples of models, systems, and applications. Based on that, we introduce limitations of current generative AI and provide an agenda for BISE research. Previous papers discuss generative AI around specific methods such as language models (e.g., Teubner et al. 2023; Dwivedi et al. 2023; Schöbel et al. 2023) or specific applications such as marketing (e.g., Peres et al. 2023), innovation management (Burger et al. 2023), scholarly research (e.g., Susarla et al. 2023; Davison et al. 2023), and education (e.g., Kasneci et al. 2023; Gimpel et al. 2023). Different from these works, we focus on generative AI in the context of information systems, and, to this end, we discuss several opportunities and challenges that are unique to the BISE community and make suggestions for impactful directions for BISE research.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Artificial Intelligence in Management

[51]

E. Nie, S. Yuan, B. Ma, H. Schmid, M. Färber, F. Kreuter and H. Schütze.
Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models.
Preprint (Feb. 2024). arXiv

Abstract

Despite the predominance of English in their training data, English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks, raising questions about the depth and nature of their cross-lingual capabilities. This paper introduces the decomposed prompting approach to probe the linguistic structure understanding of these LLMs in sequence labeling tasks. Diverging from the single text-to-text prompt, our method generates for each token of the input sentence an individual prompt which asks for its linguistic label. We assess our method on the Universal Dependencies part-of-speech tagging dataset for 38 languages, utilizing both English-centric and multilingual LLMs. Our findings show that decomposed prompting surpasses the iterative prompting baseline in efficacy and efficiency under zero- and few-shot settings. Further analysis reveals the influence of evaluation methods and the use of instructions in prompts. Our multilingual investigation shows that English-centric language models perform better on average than multilingual models. Our study offers insights into the multilingual transferability of English-centric LLMs, contributing to the understanding of their multilingual linguistic knowledge.

MCML Authors

Ercong Nie

Computational Linguistics

Bolei Ma

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computational Linguistics

[50]

V. Lehmann, T. Zueger, M. Maritsch, M. Notter, S. Schallmoser, C. Bérubé, C. Albrecht, M. Kraus, S. Feuerriegel, E. Fleisch, T. Kowatsch, S. Lagger, M. Laimer, F. Wortmann and C. Stettler.
Machine Learning to Infer a Health State Using Biomedical Signals - Detection of Hypoglycemia in People with Diabetes while Driving Real Cars.
NEJM AI (Jan. 2024). DOI

Abstract

BACKGROUND: Hypoglycemia, one of the most dangerous acute complications of diabetes, poses a substantial risk for vehicle accidents. To date, both reliable detection and warning of hypoglycemia while driving remain unmet needs, as current sensing approaches are restricted by diagnostic delay, invasiveness, low availability, and high costs. This research aimed to develop and evaluate a machine learning (ML) approach for the detection of hypoglycemia during driving through data collected on driving characteristics and gaze/head motion.
METHODS: We collected driving and gaze/head motion data (47,998 observations) during controlled euglycemia and hypoglycemia from 30 individuals with type 1 diabetes (24 male participants; mean ±SD age, 40.1±10.3 years; mean glycated hemoglobin value, 6.9±0.7% [51.9±8.0 mmol/mol]) while participants drove a real car. ML models were built and evaluated to detect hypoglycemia solely on the basis of data regarding driving characteristics and gaze/head motion.
RESULTS: The ML approach detected hypoglycemia with high accuracy (area under the receiver-operating characteristic curve [AUROC], 0.80±0.11). When restricted to either driving characteristics or gaze/head motion data only, the detection performance remained high (AUROC, 0.73±0.07 and 0.70±0.16, respectively).
CONCLUSIONS: Hypoglycemia could be detected noninvasively during real car driving with an ML approach that used only data on driving characteristics and gaze/head motion, thus improving driving safety and self-management for people with diabetes. Interpretable ML also provided novel insights into behavioral changes in people driving while hypoglycemic. (Funded by the Swiss National Science Foundation and others; ClinicalTrials.gov numbers, NCT04569630 and NCT05308095.)

MCML Authors

Simon Schallmoser

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Artificial Intelligence in Management

[49]

L. Zumeta-Olaskoaga, M. Weigert, J. Larruskain, E. Bikandi, I. Setuain, J. Lekue, H. Küchenhoff and D.-J. Lee.
Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models.
Advances in Statistical Analysis (Nov. 2023). DOI

Abstract

Data-based methods and statistical models are given special attention to the study of sports injuries to gain in-depth understanding of its risk factors and mechanisms. The objective of this work is to evaluate the use of shared frailty Cox models for the prediction of occurring sports injuries, and to compare their performance with different sets of variables selected by several regularized variable selection approaches. The study is motivated by specific characteristics commonly found for sports injury data, that usually include reduced sample size and even fewer number of injuries, coupled with a large number of potentially influential variables. Hence, we conduct a simulation study to address these statistical challenges and to explore regularized Cox model strategies together with shared frailty models in different controlled situations. We show that predictive performance greatly improves as more player observations are available. Methods that result in sparse models and favour interpretability, e.g. Best Subset Selection and Boosting, are preferred when the sample size is small. We include a real case study of injuries of female football players of a Spanish football club.

MCML Authors

Maximilian Weigert

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[48]

M. von Zahn, O. Hinz and S. Feuerriegel.
Locating disparities in machine learning.
IEEE BigData 2023 - IEEE International Conference on Big Data. Sorrento, Italy, Dec 15-18, 2023. DOI

Abstract

Machine learning can provide predictions with disparate outcomes, in which subgroups of the population (e.g., defined by age, gender, or other sensitive attributes) are systematically disadvantaged. In order to comply with upcoming legislation, practitioners need to locate such disparate outcomes. However, previous literature typically detects disparities through statistical procedures for when the sensitive attribute is specified a priori. This limits applicability in real-world settings where datasets are high dimensional and, on top of that, sensitive attributes may be unknown. As a remedy, we propose a data-driven framework called Automatic Location of Disparities (ALD) which aims at locating disparities in machine learning. ALD meets several demands from industry: ALD (1) is applicable to arbitrary machine learning classifiers; (2) operates on different definitions of disparities (e.g., statistical parity or equalized odds); (3) deals with both categorical and continuous predictors even if disparities arise from complex and multi-way interactions known as intersectionality (e.g., age above 60 and female). ALD produces interpretable audit reports as output. We demonstrate the effectiveness of ALD based on both synthetic and real-world datasets. As a result, we empower practitioners to effectively locate and mitigate disparities in machine learning algorithms, conduct algorithmic audits, and protect individuals from discrimination.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[47]

D. Frauen, V. Melnychuk and S. Feuerriegel.
Sharp Bounds for Generalized Causal Sensitivity Analysis.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Causal inference from observational data is crucial for many disciplines such as medicine and economics. However, sharp bounds for causal effects under relaxations of the unconfoundedness assumption (causal sensitivity analysis) are subject to ongoing research. So far, works with sharp bounds are restricted to fairly simple settings (e.g., a single binary treatment). In this paper, we propose a unified framework for causal sensitivity analysis under unobserved confounding in various settings. For this, we propose a flexible generalization of the marginal sensitivity model (MSM) and then derive sharp bounds for a large class of causal effects. This includes (conditional) average treatment effects, effects for mediation analysis and path analysis, and distributional effects. Furthermore, our sensitivity model is applicable to discrete, continuous, and time-varying treatments. It allows us to interpret the partial identification problem under unobserved confounding as a distribution shift in the latent confounders while evaluating the causal effect of interest. In the special case of a single binary treatment, our bounds for (conditional) average treatment effects coincide with recent optimality results for causal sensitivity analysis. Finally, we propose a scalable algorithm to estimate our sharp bounds from observational data.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[46]

V. Melnychuk, D. Frauen and S. Feuerriegel.
Partial Counterfactual Identification of Continuous Outcomes with a Curvature Sensitivity Model.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Counterfactual inference aims to answer retrospective ‘what if’ questions and thus belongs to the most fine-grained type of inference in Pearl’s causality ladder. Existing methods for counterfactual inference with continuous outcomes aim at point identification and thus make strong and unnatural assumptions about the underlying structural causal model. In this paper, we relax these assumptions and aim at partial counterfactual identification of continuous outcomes, i.e., when the counterfactual query resides in an ignorance interval with informative bounds. We prove that, in general, the ignorance interval of the counterfactual queries has non-informative bounds, already when functions of structural causal models are continuously differentiable. As a remedy, we propose a novel sensitivity model called Curvature Sensitivity Model. This allows us to obtain informative bounds by bounding the curvature of level sets of the functions. We further show that existing point counterfactual identification methods are special cases of our Curvature Sensitivity Model when the bound of the curvature is set to zero. We then propose an implementation of our Curvature Sensitivity Model in the form of a novel deep generative model, which we call Augmented Pseudo-Invertible Decoder. Our implementation employs (i) residual normalizing flows with (ii) variational augmentations. We empirically demonstrate the effectiveness of our Augmented Pseudo-Invertible Decoder. To the best of our knowledge, ours is the first partial identification model for Markovian structural causal models with continuous outcomes.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[45]

J. Schweisthal, D. Frauen, V. Melnychuk and S. Feuerriegel.
Reliable Off-Policy Learning for Dosage Combinations.
NeurIPS 2023 - 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA, Dec 10-16, 2023. URL

Abstract

Decision-making in personalized medicine such as cancer therapy or critical care must often make choices for dosage combinations, i.e., multiple continuous treatments. Existing work for this task has modeled the effect of multiple treatments independently, while estimating the joint effect has received little attention but comes with non-trivial challenges. In this paper, we propose a novel method for reliable off-policy learning for dosage combinations. Our method proceeds along three steps: (1) We develop a tailored neural network that estimates the individualized dose-response function while accounting for the joint effect of multiple dependent dosages. (2) We estimate the generalized propensity score using conditional normalizing flows in order to detect regions with limited overlap in the shared covariate-treatment space. (3) We present a gradient-based learning algorithm to find the optimal, individualized dosage combinations. Here, we ensure reliable estimation of the policy value by avoiding regions with limited overlap. We finally perform an extensive evaluation of our method to show its effectiveness. To the best of our knowledge, ours is the first work to provide a method for reliable off-policy learning for optimal dosage combinations.

MCML Authors

Jonas Schweisthal

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[44]

Z. Zhang, H. Yang, B. Ma, D. Rügamer and E. Nie.
Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models.
CoNLL 2023 - BabyLM Challenge at 27th Conference on Computational Natural Language Learning. Singapore, Dec 06-10, 2023. DOI GitHub

Abstract

Large Language Models (LLMs) demonstrate remarkable performance on a variety of natural language understanding (NLU) tasks, primarily due to their in-context learning ability. This ability could be applied to building babylike models, i.e. models at small scales, improving training efficiency. In this paper, we propose a ‘CoThought’ pipeline, which efficiently trains smaller ‘baby’ language models (BabyLMs) by leveraging the Chain of Thought prompting of LLMs. Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that are comparable to the school texts for language learners. The BabyLM is then pretrained on this restructured dataset in a RoBERTa fashion. In evaluations across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3 points, showing a superior ability to extract contextual information. These results suggest that compact LMs pretrained on small, LLM-resabructured data can better understand tasks and achieve improved performance.

MCML Authors

Bolei Ma

Social Data Science and AI

David Rügamer

Prof. Dr.

B2 | Natural Language Processing
→ Group Hinrich Schütze

Statistics, Data Science and Machine Learning

Ercong Nie

Computational Linguistics

[43]

D. Geißler, D. Bär, N. Pröllochs and S. Feuerriegel.
Russian propaganda on social media during the 2022 invasion of Ukraine.
EPJ Data Science (Dec. 2023). DOI

Abstract

The Russian invasion of Ukraine in February 2022 was accompanied by practices of information warfare, yet existing evidence is largely anecdotal while large-scale empirical evidence is lacking. Here, we analyze the spread of pro-Russian support on social media. For this, we collected messages from Twitter with pro-Russian support. Our findings suggest that pro-Russian messages received ∼251,000 retweets and thereby reached around 14.4 million users. We further provide evidence that bots played a disproportionate role in the dissemination of pro-Russian messages and amplified its proliferation in early-stage diffusion. Countries that abstained from voting on the United Nations Resolution ES-11/1 such as India, South Africa, and Pakistan showed pronounced activity of bots. Overall, 20.28% of the spreaders are classified as bots, most of which were created at the beginning of the invasion. Together, our findings suggest the presence of a large-scale Russian propaganda campaign on social media and highlight the new threats to society that originate from it. Our results also suggest that curbing bots may be an effective strategy to mitigate such campaigns.

MCML Authors

Dominique Geißler

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dominik Bär

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[42]

J. Rausch, G. Rashiti, M. Gusev, C. Zhang and S. Feuerriegel.
DSG: An End-to-End Document Structure Generator.
ICDM 2023 - 23rd IEEE International Conference on Data Mining. Shanghai, China, Dec 01-04, 2023. DOI

Abstract

Information in industry, research, and the public sector is widely stored as rendered documents (e.g., PDF files, scans). Hence, to enable downstream tasks, systems are needed that map rendered documents onto a structured hierarchical format. However, existing systems for this task are limited by heuristics and are not end-to-end trainable. In this work, we introduce the Document Structure Generator (DSG), a novel system for document parsing that is fully end-to-end trainable. DSG combines a deep neural network for parsing (i) entities in documents (e.g., figures, text blocks, headers, etc.) and (ii) relations that capture the sequence and nested structure between entities. Unlike existing systems that rely on heuristics, our DSG is trained end-to-end, making it effective and flexible for real-world applications. We further contribute a new, large-scale dataset called E-Periodica comprising real-world magazines with complex document structures for evaluation. Our results demonstrate that our DSG outperforms commercial OCR tools and, on top of that, achieves state-of-the-art performance. To the best of our knowledge, our DSG system is the first end-to-end trainable system for hierarchical document parsing.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Artificial Intelligence in Management

[41]

A. Beyerlein, M. Weigert, K. Katz, H. Küchenhoff and W. Hartl.
Langzeitveränderungen des Impfschutzes vor schweren COVID-19-Verläufen.
Deutsches Ärzteblatt 120.51-52 (Nov. 2023). DOI

Abstract

Hintergrund: Für Deutschland ist der Langzeitverlauf des Schutzes durch eine Impfstoff-induzierte oder hybride Immunität vor schweren COVID-19-Verläufen unklar. Methode: Wir untersuchten 146.457 geimpfte und zwischen Februar 2022 und Januar 2023 positiv auf SARS-CoV-2- getestete Personen im Alter von 60 bis 99 Jahren aus Bayern. Berechnet wurden adjustierte Hazard Ratios (aHR) für einen schweren Verlauf (COVID-19-bedingte Hospitalisierung oder Tod) in Abhängigkeit vom zeitlichen Abstand zwischen dem Eintritt einer vollständigen oder geboosterten Immunität und dem Infektionsdatum. Ergebnisse: Es wurden 3.342 (2,3%) schwere COVID-19-Verläufe innerhalb der ersten 60 Tage nach der Infektion beobachtet. Das Risiko eines schweren Verlaufs stieg mit zunehmendem Abstand zwischen dem Eintritt des Immunschutzes und der Infektion schrittweise an (aHR [95-%-Konfidenzintervall] nach 6, 9, 12 beziehungsweise 15 Monaten: 1,14 [1,08; 1,20]; 1,33 [1,24; 1,42]; 1,39 [1,25; 1,54]; 1,61 [1,35; 1,93]). Das Risiko stieg langsamer an, wenn ausschließlich mRNA-basierte Impfstoffe zur Anwendung gekommen waren. Wir haben in einer Vorgängerstudie eine anfängliche Wirksamkeit von 82% bei geboosterten (verglichen mit ungeimpften) Fälle ≥ 60 Jahre und eine absolute Risikoreduktion von 2,1% beobachtet. Überträgt man diese Ergebnisse auf unsere aktuelle Studie, so beträgt die verbleibende Wirksamkeit beziehungsweise die absolute Risikoreduktion nach sechs Monaten etwa 71% beziehungsweise 1,8% und nach 15 Monaten 32% beziehungsweise 0,8%. Schlussfolgerung: Diese Ergebnisse deuten darauf hin, dass während der Omikron-Welle der Schutz vor einem schweren COVID-19-Verlauf bei älteren Personen ab dem sechsten Monat nach Impfung graduell nachließ. Limitierungen sind nicht berücksichtigte Störfaktoren, eine mögliche Fehlklassifikation der Todesursache sowie ein Selektionsbias aufgrund fehlender Informationen über Impfstatus und schwere COVID-19-Verläufe.

MCML Authors

Maximilian Weigert

* Former Member

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

[40]

A. Beyerlein, M. Weigert, K. Katz, H. Küchenhoff and W. Hartl.
Long-Term Trends in the Protection Against Severe Courses of COVID-19 by Vaccination.
Deutsches Ärzteblatt 120.51-52 (Nov. 2023). DOI

Abstract

MCML Authors

Maximilian Weigert

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

[39]

S. Feuerriegel, R. DiResta, J. A. Goldstein, S. Kumar, P. Lorenz-Spreen, M. Tomz and N. Pröllochs.
Research can help to tackle AI-generated disinformation.
Nature Human Behaviour 7 (Nov. 2023). DOI

Abstract

Generative artificial intelligence (AI) tools have made it easy to create realistic disinformation that is hard to detect by humans and may undermine public trust. Some approaches used for assessing the reliability of online information may no longer work in the AI age. We offer suggestions for how research can help to tackle the threats of AI-generated disinformation.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[38]

Y. R. Shrestha, G. von Krogh and S. Feuerriegel.
Building open-source AI.
Nature Computational Science 3.11 (Oct. 2023). DOI

Abstract

Artificial intelligence (AI) drives innovation across society, economies and science. We argue for the importance of building AI technology according to open-source principles to foster accessibility, collaboration, responsibility and interoperability.
The computer science community has a long tradition of embracing open-source principles. However, companies increasingly restrict access to AI innovations. An example is OpenAI, which was founded to make scientific research openly available but which eventually restricted access to research findings. Although such a strategy reflects a company’s legitimate incentive to obtain financial returns, such protection increases concentration of power, restricting access to AI technology. Further down the road, concentrated power could lead to growing inequality in AI research, education and public use. Here we discuss why proprietary AI technology should be complemented by open-source AI across the essential components for building AI technology: datasets, source codes and models.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[37]

Y. Ma, D. Frauen, V. Melnychuk and S. Feuerriegel.
Counterfactual Fairness for Predictions using Generative Adversarial Networks.
Preprint (Oct. 2023). arXiv

Abstract

Fairness in predictions is of direct importance in practice due to legal, ethical, and societal reasons. It is often achieved through counterfactual fairness, which ensures that the prediction for an individual is the same as that in a counterfactual world under a different sensitive attribute. However, achieving counterfactual fairness is challenging as counterfactuals are unobservable. In this paper, we develop a novel deep neural network called Generative Counterfactual Fairness Network (GCFN) for making predictions under counterfactual fairness. Specifically, we leverage a tailored generative adversarial network to directly learn the counterfactual distribution of the descendants of the sensitive attribute, which we then use to enforce fair predictions through a novel counterfactual mediator regularization. If the counterfactual distribution is learned sufficiently well, our method is mathematically guaranteed to ensure the notion of counterfactual fairness. Thereby, our GCFN addresses key shortcomings of existing baselines that are based on inferring latent variables, yet which (a) are potentially correlated with the sensitive attributes and thus lead to bias, and (b) have weak capability in constructing latent representations and thus low prediction performance. Across various experiments, our method achieves state-of-the-art performance. Using a real-world case study from recidivism prediction, we further demonstrate that our method makes meaningful predictions in practice.

MCML Authors

Yuchen Ma

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

A3 | Computational Models
→ Group Eyke Hüllermeier

Artificial Intelligence in Management

[36]

T. Kaufmann, S. Ball, J. Beck, E. Hüllermeier and F. Kreuter.
On the challenges and practices of reinforcement learning from real human feedback.
HLDM @ECML-PKDD 2023 - 1st Workshop on Hybrid Human-Machine Learning and Decision Making at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). Turin, Italy, Sep 18-22, 2023. DOI

Abstract

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that does not require an engineered reward function but instead learns from human feedback. Due to its increasing popularity, various authors have studied how to learn an accurate reward model from only few samples, making optimal use of this feedback. Because of the cost and complexity of user studies, however, this research is often conducted with synthetic human feedback. Such feedback can be generated by evaluating behavior based on ground-truth rewards which are available for some benchmark tasks. While this setting can help evaluate some aspects of RLHF, it differs from practical settings in which synthetic feedback is not available. Working with real human feedback brings additional challenges that cannot be observed with synthetic feedback, including fatigue, inter-rater inconsistencies, delay, misunderstandings, and modality-dependent difficulties. We describe and discuss some of these challenges together with current practices and opportunities for further research in this paper.

MCML Authors

Timo Kaufmann

Artificial Intelligence and Machine Learning

Sarah Ball

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Jacob Beck

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence and Machine Learning

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Social Data Science and AI

[35]

B. Ma, E. Nie, H. Schmid and H. Schütze.
Is Prompt-Based Finetuning Always Better than Vanilla Finetuning? Insights from Cross-Lingual Language Understanding.
KONVENS 2023 - 19th Conference on Natural Language Processing. Ingolstadt, Germany, Sep 18-22, 2023. URL

Abstract

Multilingual pretrained language models (MPLMs) have demonstrated substantial performance improvements in zero-shot cross-lingual transfer across various natural language understanding tasks by finetuning MPLMs on task-specific labelled data of a source language (e.g. English) and evaluating on a wide range of target languages. Recent studies show that prompt-based finetuning surpasses regular finetuning in few-shot scenarios. However, the exploration of prompt-based learning in multilingual tasks remains limited. In this study, we propose the PROFIT pipeline to investigate the cross-lingual capabilities of Prompt-based Finetuning. We conduct comprehensive experiments on diverse cross-lingual language understanding tasks (sentiment classification, paraphrase identification, and natural language inference) and empirically analyze the variation trends of prompt-based finetuning performance in cross-lingual transfer across different few-shot and full-data settings. Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding. Our findings indicate that prompt-based finetuning outperforms vanilla finetuning in full-data scenarios and exhibits greater advantages in few-shot scenarios, with different performance patterns dependent on task types. Additionally, we analyze underlying factors such as language similarity and pretraining data size that impact the cross-lingual performance of prompt-based finetuning. Overall, our work provides valuable insights into the cross-lingual prowess of prompt-based finetuning.

MCML Authors

Bolei Ma

Social Data Science and AI

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Hinrich Schütze

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Computational Linguistics

[34]

D. Bär, N. Pröllochs and S. Feuerriegel.
New Threats to Society from Free-Speech Social Media Platforms.
Communications of the ACM 66.10 (Sep. 2023). DOI

Abstract

Understanding emerging threats from social media platforms.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[33]

M. Toetzke, B. Probst and S. Feuerriegel.
Leveraging large language models to monitor climate technology innovation.
Environmental Research Letters 18.9 (Sep. 2023). DOI

Abstract

To achieve net-zero emissions, public policy needs to foster rapid innovation of climate technologies. However, there is a scarcity of comprehensive and up-to-date evidence to guide policymaking by monitoring climate innovation systems. This is notable, especially at the center of the innovation process, where nascent inventions transition into profitable and scalable market solutions. Here, we discuss the potential of large language models (LLMs) to monitor climate technology innovation. By analyzing large pools of unstructured text data sources, such as company reports and social media, LLMs can automate information retrieval processes and thereby improve existing monitoring in terms of cost-effectiveness, timeliness, and comprehensiveness. In this perspective, we show how LLMs can play a crucial role in informing innovation policy for the energy transition by highlighting promising use cases and prevailing challenges for research and policy.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[32]

D. Wolffram, S. Abbott, M. an der Heiden, S. Funk, F. Günther, D. Hailer, S. Heyder, T. Hotz, J. van de Kassteele, H. Küchenhoff, S. Müller-Hansen, D. Syliqi, A. Ullrich, M. Weigert, M. Schienle and J. Bracher.
Collaborative nowcasting of COVID-19 hospitalization incidences in Germany.
PLOS Computational Biology 19.8 (Aug. 2023). DOI

Abstract

Real-time surveillance is a crucial element in the response to infectious disease outbreaks. However, the interpretation of incidence data is often hampered by delays occurring at various stages of data gathering and reporting. As a result, recent values are biased downward, which obscures current trends. Statistical nowcasting techniques can be employed to correct these biases, allowing for accurate characterization of recent developments and thus enhancing situational awareness. In this paper, we present a preregistered real-time assessment of eight nowcasting approaches, applied by independent research teams to German 7-day hospitalization incidences during the COVID-19 pandemic. This indicator played an important role in the management of the outbreak in Germany and was linked to levels of non-pharmaceutical interventions via certain thresholds. Due to its definition, in which hospitalization counts are aggregated by the date of case report rather than admission, German hospitalization incidences are particularly affected by delays and can take several weeks or months to fully stabilize. For this study, all methods were applied from 22 November 2021 to 29 April 2022, with probabilistic nowcasts produced each day for the current and 28 preceding days. Nowcasts at the national, state, and age-group levels were collected in the form of quantiles in a public repository and displayed in a dashboard. Moreover, a mean and a median ensemble nowcast were generated. We find that overall, the compared methods were able to remove a large part of the biases introduced by delays. Most participating teams underestimated the importance of very long delays, though, resulting in nowcasts with a slight downward bias. The accompanying prediction intervals were also too narrow for almost all methods. Averaged over all nowcast horizons, the best performance was achieved by a model using case incidences as a covariate and taking into account longer delays than the other approaches. For the most recent days, which are often considered the most relevant in practice, a mean ensemble of the submitted nowcasts performed best. We conclude by providing some lessons learned on the definition of nowcasting targets and practical challenges.

MCML Authors

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

Maximilian Weigert

* Former Member

[31]

V. Melnychuk, D. Frauen and S. Feuerriegel.
Normalizing Flows for Interventional Density Estimation.
ICML 2023 - 40th International Conference on Machine Learning. Honolulu, Hawaii, Jul 23-29, 2023. URL

Abstract

Existing machine learning methods for causal inference usually estimate quantities expressed via the mean of potential outcomes (e.g., average treatment effect). However, such quantities do not capture the full information about the distribution of potential outcomes. In this work, we estimate the density of potential outcomes after interventions from observational data. For this, we propose a novel, fully-parametric deep learning method called Interventional Normalizing Flows. Specifically, we combine two normalizing flows, namely (i) a nuisance flow for estimating nuisance parameters and (ii) a target flow for parametric estimation of the density of potential outcomes. We further develop a tractable optimization objective based on a one-step bias correction for efficient and doubly robust estimation of the target flow parameters. As a result, our Interventional Normalizing Flows offer a properly normalized density estimator. Across various experiments, we demonstrate that our Interventional Normalizing Flows are expressive and highly effective, and scale well with both sample size and high-dimensional confounding. To the best of our knowledge, our Interventional Normalizing Flows are the first proper fully-parametric, deep learning method for density estimation of potential outcomes.

MCML Authors

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Dennis Frauen

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Artificial Intelligence in Management

[30]

C. Reinkemeyer, Y. Khazaei, M. Weigert, M. Hannes, R. Le Gleut, M. Plank, S. Winter, I. Norena, T. Meier, L. Xu, R. Rubio-Acero, S. Wiegrebe, T. G. Le Thi, C. Fuchs, K. Radon, I. Paunovic, C. Janke, A. Wieser, H. Küchenhoff, M. Hoelscher, N. Castelletti and K. I. O. W. G. KoCo Impf ORCHESTRA Working Grp.
The Prospective COVID-19 Post-Immunization Serological Cohort in Munich (KoCo-Impf): Risk Factors and Determinants of Immune Response in Healthcare Workers.
Viruses 15.7 (Jul. 2023). DOI

Abstract

Antibody studies analyze immune responses to SARS-CoV-2 vaccination and infection, which is crucial for selecting vaccination strategies. In the KoCo-Impf study, conducted between 16 June and 16 December 2021, 6088 participants aged 18 and above from Munich were recruited to monitor antibodies, particularly in healthcare workers (HCWs) at higher risk of infection. Roche Elecsys® Anti-SARS-CoV-2 assays on dried blood spots were used to detect prior infections (anti-Nucleocapsid antibodies) and to indicate combinations of vaccinations/infections (anti-Spike antibodies). The anti-Spike seroprevalence was 94.7%, whereas, for anti-Nucleocapsid, it was only 6.9%. HCW status and contact with SARS-CoV-2-positive individuals were identified as infection risk factors, while vaccination and current smoking were associated with reduced risk. Older age correlated with higher anti-Nucleocapsid antibody levels, while vaccination and current smoking decreased the response. Vaccination alone or combined with infection led to higher anti-Spike antibody levels. Increasing time since the second vaccination, advancing age, and current smoking reduced the anti-Spike response. The cumulative number of cases in Munich affected the anti-Spike response over time but had no impact on anti-Nucleocapsid antibody development/seropositivity. Due to the significantly higher infection risk faced by HCWs and the limited number of significant risk factors, it is suggested that all HCWs require protection regardless of individual traits.

MCML Authors

Maximilian Weigert

* Former Member

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Statistical Consulting Unit (StaBLab)

[29]

D. Bär, N. Pröllochs and S. Feuerriegel.
Finding Qs: Profiling QAnon Supporters on Parler.
ICWSM 2023 - 17th International AAAI Conference on Web and Social Media. Limassol, Cyprus, Jun 05-08, 2023. DOI

Abstract

The social media platform ‘Parler has emerged into a prominent fringe community where a significant part of the user base are self-reported supporters of QAnon, a far-right conspiracy theory alleging that a cabal of elites controls global politics. QAnon is considered to have had an influential role in the public discourse during the 2020 U.S. presidential election. However, little is known about QAnon supporters on Parler and what sets them aside from other users. Building up on social identity theory, we aim to profile the characteristics of QAnon supporters on Parler. We analyze a large-scale dataset with more than 600,000 profiles of English-speaking users on Parler. Based on users’ profiles, posts, and comments, we then extract a comprehensive set of user features, linguistic features, network features, and content features. This allows us to perform user profiling and understand to what extent these features discriminate between QAnon and non-QAnon supporters on Parler. Our analysis is three-fold: (1) We quantify the number of QAnon supporters on Parler, finding that 34,913 users (5.5% of all users) openly report supporting the conspiracy. (2) We examine differences between QAnon vs. non-QAnon supporters. We find that QAnon supporters differ statistically significantly from non-QAnon supporters across multiple dimensions. For example, they have, on average, a larger number of followers, followees, and posts, and thus have a large impact on the Parler network. (3) We use machine learning to identify which user characteristics discriminate QAnon from non-QAnon supporters. We find that user features, linguistic features, network features, and content features, can - to a large extent - discriminate QAnon vs. non-QAnon supporters on Parler. In particular, we find that user features are highly discriminatory, followed by content features and linguistic features.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[28]

M. Trappmann, G.-C. Haas, S. Malich, F. Keusch, S. Bähr, F. Kreuter and S. Schwarz.
Augmenting survey data with digital trace data: Is there a threat to panel retention?
Journal of Survey Statistics and Methodology 11.3 (Jun. 2023). DOI

Abstract

Linking digital trace data to existing panel survey data may increase the overall analysis potential of the data. However, producing linked products often requires additional engagement from survey participants through consent or participation in additional tasks. Panel operators may worry that such additional requests may backfire and lead to lower panel retention, reducing the analysis potential of the data. To examine these concerns, we conducted an experiment in the German PASS panel survey after wave 11. Three quarters of panelists (n = 4,293) were invited to install a research app and to provide sensor data over a period of 6 months, while one quarter (n = 1,428) did not receive an invitation. We find that the request to install a smartphone app and share data significantly decreases panel retention in the wave immediately following the invitation by 3.3 percentage points. However, this effect wears off and is no longer significant in the second and third waves after the invitation. We conclude that researchers who run panel surveys have to take moderate negative effects on retention into account but that the potential gain likely outweighs these moderate losses.

MCML Authors

Frauke Kreuter

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Social Data Science and AI

[27]

D. Frauen and S. Feuerriegel.
Estimating individual treatment effects under unobserved confounding using binary instruments.
ICLR 2023 - 11th International Conference on Learning Representations. Kigali, Rwanda, May 01-05, 2023. URL

Abstract

Estimating conditional average treatment effects (CATEs) from observational data is relevant in many fields such as personalized medicine. However, in practice, the treatment assignment is usually confounded by unobserved variables and thus introduces bias. A remedy to remove the bias is the use of instrumental variables (IVs). Such settings are widespread in medicine (e.g., trials where the treatment assignment is used as binary IV). In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating CATEs using binary IVs and thus yield an unbiased CATE estimator. Different from previous work for binary IVs, our framework estimates the CATE directly via a pseudo outcome regression. (1)~We provide a theoretical analysis where we show that our framework yields multiple robust convergence rates: our CATE estimator achieves fast convergence even if several nuisance estimators converge slowly. (2)~We further show that our framework asymptotically outperforms state-of-the-art plug-in IV methods for CATE estimation, in the sense that it achieves a faster rate of convergence if the CATE is smoother than the individual outcome surfaces. (3)~We build upon our theoretical results and propose a tailored deep neural network architecture called MRIV-Net for CATE estimation using binary IVs. Across various computational experiments, we demonstrate empirically that our MRIV-Net achieves state-of-the-art performance. To the best of our knowledge, our MRIV is the first multiply robust machine learning framework tailored to estimating CATEs in the binary IV setting.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[26]

N. Banholzer, T. Mellan, H. J. T. Unwin, S. Feuerriegel, S. Mishra and S. Bhatt.
A comparison of short-term probabilistic forecasts for the incidence of COVID-19 using mechanistic and statistical time series models.
Preprint (May. 2023). arXiv

Abstract

Short-term forecasts of infectious disease spread are a critical component in risk evaluation and public health decision making. While different models for short-term forecasting have been developed, open questions about their relative performance remain. Here, we compare short-term probabilistic forecasts of popular mechanistic models based on the renewal equation with forecasts of statistical time series models. Our empirical comparison is based on data of the daily incidence of COVID-19 across six large US states over the first pandemic year. We find that, on average, probabilistic forecasts from statistical time series models are overall at least as accurate as forecasts from mechanistic models. Moreover, statistical time series models better capture volatility. Our findings suggest that domain knowledge, which is integrated into mechanistic models by making assumptions about disease dynamics, does not improve short-term forecasts of disease incidence. We note, however, that forecasting is often only one of many objectives and thus mechanistic models remain important, for example, to model the impact of vaccines or the emergence of new variants.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[25]

D. Bär, F. Calderon, M. Lawlor, S. Licklederer, M. Totzauer and S. Feuerriegel.
Analyzing Social Media Activities at Bellingcat.
WebSci 2023 - 15th ACM Web Science Conference 2023. Austin, TX, USA, Apr 30-May 01, 2023. DOI

Abstract

Open-source journalism emerged as a new phenomenon in the media ecosystem, which uses crowdsourcing to fact-check and generate investigative reports for world events using open sources (e.g., social media). A particularly prominent example is Bellingcat. Bellingcat is known for its investigations on the illegal use of chemical weapons during the Syrian war, the Russian responsibility for downing flight MH17, the identification of the perpetrators in the attempted murder of Alexei Navalny, and war crimes in the Russo-Ukraine war. Crucial for this is social media in order to disseminate findings and crowdsource fact-checks. In this work, we characterize the social media activities at Bellingcat on Twitter. For this, we built a comprehensive dataset of all N=24,682 tweets posted by Bellingcat on Twitter since its inception in July 2014. Our analysis is three-fold: (1) We analyze how Bellingcat uses Twitter to disseminate information and collect information from its follower base. Here, we find a steady increase in both posts and replies over time, particularly during the Russo-Ukrainian war, which is in line with the growing importance of Bellingcat for the traditional media ecosystem. (2) We identify characteristics of posts that are successful in eliciting user engagement. User engagement is particularly large for posts embedding additional media items and with a more negative sentiment. (3) We examine how the follower base has responded to the Russian invasion of Ukraine. Here, we find that the sentiment has become more polarized and negative. We attribute this to a ~13-fold increase in bots interacting with the Bellingcat account. Overall, our findings provide recommendations for how open-source journalism such as Bellingcat can successfully operate on social media.

MCML Authors

Dominik Bär

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

Artificial Intelligence in Management

[24]

N. Pröllochs and S. Feuerriegel.
Mechanisms of True and False Rumor Sharing in Social Media: Collective Intelligence or Herd Behavior?
CHI 2023 - Conference on Human Factors in Computing Systems. Hamburg, Germany, Apr 23-28, 2023. DOI

Abstract

Social media platforms disseminate extensive volumes of online content, including true and, in particular, false rumors. Previous literature has studied the diffusion of offline rumors, yet more research is needed to understand the diffusion of online rumors. In this paper, we examine the role of lifetime and crowd effects in social media sharing behavior for true vs. false rumors. Based on 126,301 Twitter cascades, we find that the sharing behavior is characterized by lifetime and crowd effects that explain differences in the spread of true as opposed to false rumors. All else equal, we find that a longer lifetime is associated with less sharing activities, yet the reduction in sharing is larger for false than for true rumors. Hence, lifetime is an important determinant explaining why false rumors die out. Furthermore, we find that the spread of false rumors is characterized by herding tendencies (rather than collective intelligence), whereby the spread of false rumors becomes proliferated at a larger cascade depth. These findings explain differences in the diffusion dynamics of true and false rumors and further offer practical implications for social media platforms.

MCML Authors

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[23]

D. Frauen, T. Hatt, V. Melnychuk and S. Feuerriegel.
Estimating Average Causal Effects from Patient Trajectories.
AAAI 2023 - 37th Conference on Artificial Intelligence. Washington, DC, USA, Feb 07-14, 2023. DOI

Abstract

In medical practice, treatments are selected based on the expected causal effects on patient outcomes. Here, the gold standard for estimating causal effects are randomized controlled trials; however, such trials are costly and sometimes even unethical. Instead, medical practice is increasingly interested in estimating causal effects among patient (sub)groups from electronic health records, that is, observational data. In this paper, we aim at estimating the average causal effect (ACE) from observational data (patient trajectories) that are collected over time. For this, we propose DeepACE: an end-to-end deep learning model. DeepACE leverages the iterative G-computation formula to adjust for the bias induced by time-varying confounders. Moreover, we develop a novel sequential targeting procedure which ensures that DeepACE has favorable theoretical properties, i. e., is doubly robust and asymptotically efficient. To the best of our knowledge, this is the first work that proposes an end-to-end deep learning model tailored for estimating time-varying ACEs. We compare DeepACE in an extensive number of experiments, confirming that it achieves state-of-the-art performance. We further provide a case study for patients suffering from low back pain to demonstrate that DeepACE generates important and meaningful findings for clinical practice. Our work enables practitioners to develop effective treatment recommendations based on population effects.

MCML Authors

Dennis Frauen

Artificial Intelligence in Management

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

[22]

S. Schallmoser, T. Zueger, M. Kraus, M. Saar-Tsechansky, C. Stettler and S. Feuerriegel.
Machine Learning for Predicting Micro- and Macrovascular Complications in Individuals With Prediabetes or Diabetes: Retrospective Cohort Study.
Journal of Medical Internet Research 25 (Feb. 2023). DOI

Abstract

Background: Micro- and macrovascular complications are a major burden for individuals with diabetes and can already arise in a prediabetic state. To allocate effective treatments and to possibly prevent these complications, identification of those at risk is essential.
Objective: This study aimed to build machine learning (ML) models that predict the risk of developing a micro- or macrovascular complication in individuals with prediabetes or diabetes.
Methods: In this study, we used electronic health records from Israel that contain information about demographics, biomarkers, medications, and disease codes; span from 2003 to 2013; and were queried to identify individuals with prediabetes or diabetes in 2008. Subsequently, we aimed to predict which of these individuals developed a micro- or macrovascular complication within the next 5 years. We included 3 microvascular complications: retinopathy, nephropathy, and neuropathy. In addition, we considered 3 macrovascular complications: peripheral vascular disease (PVD), cerebrovascular disease (CeVD), and cardiovascular disease (CVD). Complications were identified via disease codes, and, for nephropathy, the estimated glomerular filtration rate and albuminuria were considered additionally. Inclusion criteria were complete information on age and sex and on disease codes (or measurements of estimated glomerular filtration rate and albuminuria for nephropathy) until 2013 to account for patient dropout. Exclusion criteria for predicting a complication were diagnosis of this specific complication before or in 2008. In total, 105 predictors from demographics, biomarkers, medications, and disease codes were used to build the ML models. We compared 2 ML models: logistic regression and gradient-boosted decision trees (GBDTs). To explain the predictions of the GBDTs, we calculated Shapley additive explanations values.
Results: Overall, 13,904 and 4259 individuals with prediabetes and diabetes, respectively, were identified in our underlying data set. For individuals with prediabetes, the areas under the receiver operating characteristic curve for logistic regression and GBDTs were, respectively, 0.657 and 0.681 (retinopathy), 0.807 and 0.815 (nephropathy), 0.727 and 0.706 (neuropathy), 0.730 and 0.727 (PVD), 0.687 and 0.693 (CeVD), and 0.707 and 0.705 (CVD); for individuals with diabetes, the areas under the receiver operating characteristic curve were, respectively, 0.673 and 0.726 (retinopathy), 0.763 and 0.775 (nephropathy), 0.745 and 0.771 (neuropathy), 0.698 and 0.715 (PVD), 0.651 and 0.646 (CeVD), and 0.686 and 0.680 (CVD). Overall, the prediction performance is comparable for logistic regression and GBDTs. The Shapley additive explanations values showed that increased levels of blood glucose, glycated hemoglobin, and serum creatinine are risk factors for microvascular complications. Age and hypertension were associated with an elevated risk for macrovascular complications.
Conclusions: Our ML models allow for an identification of individuals with prediabetes or diabetes who are at increased risk of developing micro- or macrovascular complications. The prediction performance varied across complications and target populations but was in an acceptable range for most prediction tasks.

MCML Authors

Simon Schallmoser

Artificial Intelligence in Management

Stefan Feuerriegel

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Artificial Intelligence in Management

[21]

I. Ziegler, B. Ma, B. Bischl, E. Dorigatti and B. Schubert.
Proteasomal cleavage prediction: state-of-the-art and future directions.
Preprint (2023). DOI GitHub

Abstract

Epitope vaccines are a promising approach for precision treatment of pathogens, cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate proteasomal cleavage prediction to ensure that the epitopes included in the vaccine trigger an immune response. The performance of proteasomal cleavage predictors has been steadily improving over the past decades owing to increasing data availability and methodological advances. In this review, we summarize the current proteasomal cleavage prediction landscape and, in light of recent progress in the field of deep learning, develop and compare a wide range of recent architectures and techniques, including long short-term memory (LSTM), transformers, and convolutional neural networks (CNN), as well as four different denoising techniques. All open-source cleavage predictors re-trained on our dataset performed within two AUC percentage points. Our comprehensive deep learning architecture benchmark improved performance by 1.7 AUC percentage points, while closed-source predictors performed considerably worse. We found that a wide range of architectures and training regimes all result in very similar performance, suggesting that the specific modeling approach employed has a limited impact on predictive performance compared to the specifics of the dataset employed. We speculate that the noise and implicit nature of data acquisition techniques used for training proteasomal cleavage prediction models and the complexity of biological processes of the antigen processing pathway are the major limiting factors. While biological complexity can be tackled by more data and, to a lesser extent, better models, noise and randomness inherently limit the maximum achievable predictive performance.

MCML Authors

Bolei Ma

Social Data Science and AI

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Emilio Dorigatti

Dr.

* Former Member

[20]

C. Fritz, G. De Nicola, F. Günther, D. Rügamer, M. Rave, M. Schneble, A. Bender, M. Weigert, R. Brinks, A. Hoyer, U. Berger, H. Küchenhoff and G. Kauermann.
Challenges in Interpreting Epidemiological Surveillance Data – Experiences from Germany.
Journal of Computational and Graphical Statistics 32.3 (Dec. 2022). DOI

Abstract

As early as March 2020, the authors of this letter started to work on surveillance data to obtain a clearer picture of the pandemic’s dynamic. This letter outlines the lessons learned during this peculiar time, emphasizing the benefits that better data collection, management, and communication processes would bring to the table. We further want to promote nuanced data analyses as a vital element of general political discussion as opposed to drawing conclusions from raw data, which are often flawed in epidemiological surveillance data, and therefore underline the overall need for statistics to play a more central role in public discourse.

MCML Authors

Cornelius Fritz

Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

* Former Member

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Göran Kauermann

Prof. Dr.

C4 | Computational Social Sciences
→ Group Frauke Kreuter

Applied Statistics in Social Sciences, Economics and Business

[19]

I. Ziegler, B. Ma, E. Nie, B. Bischl, D. Rügamer, B. Schubert and E. Dorigatti.
What cleaves? Is proteasomal cleavage prediction reaching a ceiling?
LMRL @NeurIPS 2022 - Workshop on Learning Meaningful Representations of Life at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). New Orleans, LA, USA, Nov 28-Dec 09, 2022. URL

Abstract

Epitope vaccines are a promising direction to enable precision treatment for cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate prediction of proteasomal cleavage in order to ensure that the epitopes in the vaccine are presented to T cells by the major histocompatibility complex (MHC). While direct identification of proteasomal cleavage in vitro is cumbersome and low throughput, it is possible to implicitly infer cleavage events from the termini of MHC-presented epitopes, which can be detected in large amounts thanks to recent advances in high-throughput MHC ligandomics. Inferring cleavage events in such a way provides an inherently noisy signal which can be tackled with new developments in the field of deep learning that supposedly make it possible to learn predictors from noisy labels. Inspired by such innovations, we sought to modernize proteasomal cleavage predictors by benchmarking a wide range of recent methods, including LSTMs, transformers, CNNs, and denoising methods, on a recently introduced cleavage dataset. We found that increasing model scale and complexity appeared to deliver limited performance gains, as several methods reached about 88.5% AUC on C-terminal and 79.5% AUC on N-terminal cleavage prediction. This suggests that the noise and/or complexity of proteasomal cleavage and the subsequent biological processes of the antigen processing pathway are the major limiting factors for predictive performance rather than the specific modeling approach used. While biological complexity can be tackled by more data and better models, noise and randomness inherently limit the maximum achievable predictive performance.

MCML Authors

Bolei Ma

Social Data Science and AI

Ercong Nie

B2 | Natural Language Processing
→ Group Hinrich Schütze

Computational Linguistics

Bernd Bischl

Prof. Dr.

Statistical Learning and Data Science

David Rügamer

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistics, Data Science and Machine Learning

Emilio Dorigatti

Dr.

* Former Member

[18]

E. Pretzsch, V. Heinemann, S. Stintzing, A. Bender, S. Chen, J. W. Holch, F. O. Hofmann, H. Ren, F. Böschand, H. Küchenhoff, J. Werner and M. K. Angele.
EMT-Related Genes Have No Prognostic Relevance in Metastatic Colorectal Cancer as Opposed to Stage II/III: Analysis of the Randomised, Phase III Trial FIRE-3 (AIO KRK 0306; FIRE-3).
Cancers 14.22 (Nov. 2022). DOI

Abstract

Despite huge advances in local and systemic therapies, the 5-year relative survival rate for patients with metastatic CRC is still low. To avoid over- or undertreatment, proper risk stratification with regard to treatment strategy is highly needed. As EMT (epithelial-mesenchymal transition) is a major step in metastatic spread, this study analysed the prognostic effect of EMT-related genes in stage IV colorectal cancer patients using the study cohort of the FIRE-3 trial, an open-label multi-centre randomised controlled phase III trial of stage IV colorectal cancer patients. Overall, the prognostic relevance of EMT-related genes seems stage-dependent. EMT-related genes have no prognostic relevance in stage IV CRC as opposed to stage II/III.

MCML Authors

Andreas Bender

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Machine Learning Consulting Unit (MLCU)

Shuo Chen

Database Systems and Data Mining

Helmut Küchenhoff

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Consulting Unit (StaBLab)

[17]

E. Dorigatti, J. Schweisthal, B. Bischl and M. Rezaei.
Robust and Efficient Imbalanced Positive-Unlabeled Learning with Self-supervision.
Preprint (Sep. 2022). arXiv GitHub

Abstract

Learning from positive and unlabeled (PU) data is a setting where the learner only has access to positive and unlabeled samples while having no information on negative examples. Such PU setting is of great importance in various tasks such as medical diagnosis, social network analysis, financial markets analysis, and knowledge base completion, which also tend to be intrinsically imbalanced, i.e., where most examples are actually negatives. Most existing approaches for PU learning, however, only consider artificially balanced datasets and it is unclear how well they perform in the realistic scenario of imbalanced and long-tail data distribution. This paper proposes to tackle this challenge via robust and efficient self-supervised pretraining. However, training conventional self-supervised learning methods when applied with highly imbalanced PU distribution needs better reformulation. In this paper, we present textit{ImPULSeS}, a unified representation learning framework for underline{Im}balanced underline{P}ositive underline{U}nlabeled underline{L}earning leveraging underline{Se}lf-underline{S}upervised debiase pre-training. ImPULSeS uses a generic combination of large-scale unsupervised learning with debiased contrastive loss and additional reweighted PU loss. We performed different experiments across multiple datasets to show that ImPULSeS is able to halve the error rate of the previous state-of-the-art, even compared with previous methods that are given the true prior. Moreover, our method showed increased robustness to prior misspecification and superior performance even when pretraining was performed on an unrelated dataset. We anticipate such robustness and efficiency will make it much easier for practitioners to obtain excellent results on other PU datasets of interest.

MCML Authors

Emilio Dorigatti

Dr.

* Former Member

Jonas Schweisthal

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Bernd Bischl

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

Mina Rezaei

Dr.

Statistical Learning and Data Science

[16]

C. A. Scholbeck, H. Funk and G. Casalicchio.
Algorithm-Agnostic Interpretations for Clustering.
Preprint (Sep. 2022). arXiv

Abstract

A clustering outcome for high-dimensional data is typically interpreted via post-processing, involving dimension reduction and subsequent visualization. This destroys the meaning of the data and obfuscates interpretations. We propose algorithm-agnostic interpretation methods to explain clustering outcomes in reduced dimensions while preserving the integrity of the data. The permutation feature importance for clustering represents a general framework based on shuffling feature values and measuring changes in cluster assignments through custom score functions. The individual conditional expectation for clustering indicates observation-wise changes in the cluster assignment due to changes in the data. The partial dependence for clustering evaluates average changes in cluster assignments for the entire feature space. All methods can be used with any clustering algorithm able to reassign instances through soft or hard labels. In contrast to common post-processing methods such as principal component analysis, the introduced methods maintain the original structure of the features.

MCML Authors

Henri Funk

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

Giuseppe Casalicchio

Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Statistical Learning and Data Science

[15]

C. Fritz, G. De Nicola, M. Rave, M. Weigert, Y. Khazaei, U. Berger, H. Küchenhoff and G. Kauermann.
Statistical modelling of COVID-19 data: Putting generalized additive models to work.
Statistical Modelling 24.4 (Aug. 2022). DOI

Abstract

Over the course of the COVID-19 pandemic, Generalized Additive Models (GAMs) have been successfully employed on numerous occasions to obtain vital data-driven insights. In this article we further substantiate the success story of GAMs, demonstrating their flexibility by focusing on three relevant pandemic-related issues. First, we examine the interdepency among infections in different age groups, concentrating on school children. In this context, we derive the setting under which parameter estimates are independent of the (unknown) case-detection ratio, which plays an important role in COVID-19 surveillance data. Second, we model the incidence of hospitalizations, for which data is only available with a temporal delay. We illustrate how correcting for this reporting delay through a nowcasting procedure can be naturally incorporated into the GAM framework as an offset term. Third, we propose a multinomial model for the weekly occupancy of intensive care units (ICU), where we distinguish between the number of COVID-19 patients, other patients and vacant beds. With these three examples, we aim to showcase the practical and ‘off-the-shelf’ applicability of GAMs to gain new insights from real-world data.

MCML Authors

Cornelius Fritz

Dr.

A1 | Statistical Foundations & Explainability
→ Group Göran Kauermann

* Former Member

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

Göran Kauermann

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Applied Statistics in Social Sciences, Economics and Business

[14]

M. Mittermeier, M. Weigert, D. Rügamer, H. Küchenhoff and R. Ludwig.
A deep learning based classification of atmospheric circulation types over Europe: projection of future changes in a CMIP6 large ensemble.
Environmental Research Letters 17.8 (Jul. 2022). DOI

Abstract

High- and low pressure systems of the large-scale atmospheric circulation in the mid-latitudes drive European weather and climate. Potential future changes in the occurrence of circulation types are highly relevant for society. Classifying the highly dynamic atmospheric circulation into discrete classes of circulation types helps to categorize the linkages between atmospheric forcing and surface conditions (e.g. extreme events). Previous studies have revealed a high internal variability of projected changes of circulation types. Dealing with this high internal variability requires the employment of a single-model initial-condition large ensemble (SMILE) and an automated classification method, which can be applied to large climate data sets. One of the most established classifications in Europe are the 29 subjective circulation types called Grosswetterlagen by Hess & Brezowsky (HB circulation types). We developed, in the first analysis of its kind, an automated version of this subjective classification using deep learning. Our classifier reaches an overall accuracy of 41.1% on the test sets of nested cross-validation. It outperforms the state-of-the-art automatization of the HB circulation types in 20 of the 29 classes. We apply the deep learning classifier to the SMHI-LENS, a SMILE of the Coupled Model Intercomparison Project phase 6, composed of 50 members of the EC-Earth3 model under the SSP37.0 scenario. For the analysis of future frequency changes of the 29 circulation types, we use the signal-to-noise ratio to discriminate the climate change signal from the noise of internal variability. Using a 5%-significance level, we find significant frequency changes in 69% of the circulation types when comparing the future (2071–2100) to a reference period (1991–2020).

MCML Authors

Maximilian Weigert

* Former Member

David Rügamer

Prof. Dr.

Statistics, Data Science and Machine Learning

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

[13]

A. Bauer.
Flexible approaches in functional data and age-period-cohort analysis with application on complex geoscience data.
Dissertation 2022. DOI

Abstract

This dissertation develops new approaches for robustly estimating functional data structures and analyzing age-period-cohort (APC) effects, with applications in seismology and tourism science. The first part introduces a method that separates amplitude and phase variation in functional data, adapting a likelihood-based registration approach for generalized and incomplete data, demonstrated on seismic data. The second part presents generalized functional additive models (GFAMs) for analyzing associations between functional data and scalar covariates, along with practical guidelines and an R package. The final part addresses APC analysis, proposing new visualization techniques and a semiparametric estimation approach to disentangle temporal dimensions, with applications to tourism data, and is supported by the APCtools R package. (Shortened.)

MCML Authors

Alexander Bauer

* Former Member

[12]

A. Bauer, M. Weigert and H. Jalal.
APCtools: Descriptive and Model-based Age-Period-Cohort Analysis.
The Journal of Open Source Software 7.73 (May. 2022). DOI

Abstract

Age-Period-Cohort (APC) analysis aims to determine relevant drivers for long-term developments and is used in many fields of science (Yang & Land, 2013). The R package APCtools offers modern visualization techniques and general routines to facilitate the interpretability of the interdependent temporal structures and to simplify the workflow of an APC analysis. Separation of the temporal effects is performed utilizing a semiparametric regression approach. We shortly discuss the challenges of APC analysis, give an overview of existing statistical software packages and outline the main functionalities of the package.

MCML Authors

Alexander Bauer

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

[11]

K. E. Riehm, E. Badillo Goicoechea, F. M. Wang, E. Kim, L. R. Aldridge, C. P. Lupton-Smith, R. Presskreischer, T.-H. Chang, S. LaRocca, F. Kreuter and E. A. Stuart.
Association of Non-Pharmaceutical Interventions to Reduce the Spread of SARS-CoV-2 With Anxiety and Depressive Symptoms: A Multi-National Study of 43 Countries.
International Journal of Public Health 67 (Mar. 2022). DOI

Abstract

Objectives: To examine the association of non-pharmaceutical interventions (NPIs) with anxiety and depressive symptoms among adults and determine if these associations varied by gender and age.
Methods: We combined survey data from 16,177,184 adults from 43 countries who participated in the daily COVID-19 Trends and Impact Survey via Facebook with time-varying NPI data from the Oxford COVID-19 Government Response Tracker between 24 April 2020 and 20 December 2020. Using logistic regression models, we examined the association of [1] overall NPI stringency and [2] seven individual NPIs (school closures, workplace closures, cancellation of public events, restrictions on the size of gatherings, stay-at-home requirements, restrictions on internal movement, and international travel controls) with anxiety and depressive symptoms.
Results: More stringent implementation of NPIs was associated with a higher odds of anxiety and depressive symptoms, albeit with very small effect sizes. Individual NPIs had heterogeneous associations with anxiety and depressive symptoms by gender and age.
Conclusion: Governments worldwide should be prepared to address the possible mental health consequences of stringent NPI implementation with both universal and targeted interventions for vulnerable groups.

MCML Authors

Frauke Kreuter

Prof. Dr.

Social Data Science and AI

[10]

R. Valliant, J. A. Dever, F. Kreuter and G. Zipf.
Package ‘PracTools’.
2022. URL

Abstract

Functions and datasets to support Valliant, Dever, and Kreuter (2018), ‘Practical Tools for Designing and Weighting Survey Samples’. Contains functions for sample size calculation for survey samples using stratified or clustered one-, two-, and three-stage sample designs, and single-stage audit sample designs. Functions are included that will group geographic units accounting for distances apart and measures of size. Other functions compute variance components for multistage designs and sample sizes in two-phase designs. A number of example data sets are included.

MCML Authors

Frauke Kreuter

Prof. Dr.

A1 | Statistical Foundations & Explainability
→ Group Bernd Bischl

Social Data Science and AI

[9]

W. Hartl, P. Kopper, A. Bender, F. Scheipl, A. G. Day, G. Elke and H. Küchenhoff.
Protein intake and outcome of critically ill patients: analysis of a large international database using piece-wise exponential additive mixed models.
Critical Care 26.7 (Jan. 2022). DOI

Abstract

Background: Proteins are an essential part of medical nutrition therapy in critically ill patients. Guidelines almost universally recommend a high protein intake without robust evidence supporting its use.
Methods: Using a large international database, we modelled associations between the hazard rate of in-hospital death and live hospital discharge (competing risks) and three categories of protein intake (low: < 0.8 g/kg per day, standard: 0.8–1.2 g/kg per day, high: > 1.2 g/kg per day) during the first 11 days after ICU admission (acute phase). Time-varying cause-specific hazard ratios (HR) were calculated from piece-wise exponential additive mixed models. We used the estimated model to compare five different hypothetical protein diets (an exclusively low protein diet, a standard protein diet administered early (day 1 to 4) or late (day 5 to 11) after ICU admission, and an early or late high protein diet).
Results: Of 21,100 critically ill patients in the database, 16,489 fulfilled inclusion criteria for the analysis. By day 60, 11,360 (68.9%) patients had been discharged from hospital, 4,192 patients (25.4%) had died in hospital, and 937 patients (5.7%) were still hospitalized. Median daily low protein intake was 0.49 g/kg [IQR 0.27–0.66], standard intake 0.99 g/kg [IQR 0.89– 1.09], and high intake 1.41 g/kg [IQR 1.29–1.60]. In comparison with an exclusively low protein diet, a late standard protein diet was associated with a lower hazard of in-hospital death: minimum 0.75 (95% CI 0.64, 0.87), and a higher hazard of live hospital discharge: maximum HR 1.98 (95% CI 1.72, 2.28). Results on hospital discharge, however, were qualitatively changed by a sensitivity analysis. There was no evidence that an early standard or a high protein intake during the acute phase was associated with a further improvement of outcome.
Conclusions: Provision of a standard protein intake during the late acute phase may improve outcome compared to an exclusively low protein diet. In unselected critically ill patients, clinical outcome may not be improved by a high protein intake during the acute phase.

MCML Authors

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Fabian Scheipl

PD Dr.

Functional Data Analysis

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

[8]

M. Mittermeier, M. Weigert and D. Rügamer.
Identifying the atmospheric drivers of drought and heat using a smoothed deep learning approach.
NeurIPS 2021 - Workshop on Tackling Climate Change with Machine Learning at the 35th Conference on Neural Information Processing Systems. Virtual, Dec 06-14, 2021. PDF

Abstract

Europe was hit by several, disastrous heat and drought events in recent summers. Besides thermodynamic influences, such hot and dry extremes are driven by certain atmospheric situations including anticyclonic conditions. Effects of climate change on atmospheric circulations are complex and many open research questions remain in this context, e.g., on future trends of anticyclonic conditions. Based on the combination of a catalog of labeled circulation patterns and spatial atmospheric variables, we propose a smoothed convolutional neural network classifier for six types of anticyclonic circulations that are associated with drought and heat. Our work can help to identify important drivers of hot and dry extremes in climate simulations, which allows to unveil the impact of climate change on these drivers. We address various challenges inherent to circulation pattern classification that are also present in other climate patterns, e.g., subjective labels and unambiguous transition periods.

MCML Authors

Maximilian Weigert

* Former Member

David Rügamer

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistics, Data Science and Machine Learning

[7]

A. Bauer, F. Scheipl and H. Küchenhoff.
Registration for Incomplete Non-Gaussian Functional Data.
Preprint (Aug. 2021). arXiv

Abstract

Accounting for phase variability is a critical challenge in functional data analysis. To separate it from amplitude variation, functional data are registered, i.e., their observed domains are deformed elastically so that the resulting functions are aligned with template functions. At present, most available registration approaches are limited to datasets of complete and densely measured curves with Gaussian noise. However, many real-world functional data sets are not Gaussian and contain incomplete curves, in which the underlying process is not recorded over its entire domain. In this work, we extend and refine a framework for joint likelihood-based registration and latent Gaussian process-based generalized functional principal component analysis that is able to handle incomplete curves. Our approach is accompanied by sophisticated open-source software, allowing for its application in diverse non-Gaussian data settings and a public code repository to reproduce all results. We register data from a seismological application comprising spatially indexed, incomplete ground velocity time series with a highly volatile Gamma structure. We describe, implement and evaluate the approach for such incomplete non-Gaussian functional data and compare it to existing routines.

MCML Authors

Alexander Bauer

* Former Member

Fabian Scheipl

PD Dr.

Functional Data Analysis

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Statistical Consulting Unit (StaBLab)

[6]

A. M. Keppler, K. Küßner, A.-L. Schulze, E. M. Suero, C. Neuerburg, M. Weigert, C. Braun, W. Böcker, C. Kammerlander and C. Zeckey.
Radiographic cortical thickness parameters as predictors of rotational alignment in proximal tibial shaft fractures: a cadaveric study.
BMC Musculoskeletal Disorders 22.590 (Jun. 2021). DOI

Abstract

Aim: The treatment of tibial fractures with an intramedullary nail is an established procedure. However, torsional control remains challenging using intraoperatively diagnostic tools. Radiographic tools such as the Cortical Step Sign (CSS) and the Diameter Difference Sign (DDS) may serve as tools for diagnosing a relevant malrotation. The aim of this study was to investigate the effect of torsional malalignment on CSS and DDS parameters and to construct a prognostic model to detect malalignment.
Methods: A proximal tibial shaft fracture was set in human tibiae. Torsion was set stepwise from 0° to 30° in external and internal torsion. Images were obtained with a C-arm and transferred to a PC for measuring the medical cortical thickness (MCT), lateral cortical thickness (LCT), tibial diameter (TD) in AP and the anterior cortical thickness (ACT) as well as the posterior cortical thickness (PCT) and the transverse diameter (TD) of the proximal and the distal main fragment.
Results: There were significant differences between the various degrees of torsion for each of the absolute values of the examined variables. The parameters with the highest correlation were TD, LCT and ACT. A model combining ACT, LCT, PCT and TD lateral was most suitable model in identifying torsional malalignment. The best prediction of clinically relevant torsional malalignment, namely 15°, was obtained with the TD and the ACT.
Conclusion: This study shows that the CSS and DDS are useful tools for the intraoperative detection of torsional malalignment in proximal tibial shaft fractures and should be used to prevent maltorsion.

MCML Authors

Maximilian Weigert

* Former Member

[5]

M. Weigert, A. Bauer, J. Karl, A. Nalmpatian, H. Küchenhoff and J. Schmude.
Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances.
Tourism Economics 28.5 (Jan. 2021). DOI

Abstract

This study investigates how age, period, and birth cohorts are related to altering travel distances. We analyze a repeated cross-sectional survey of German pleasure travels for the period 1971–2018 using a holistic age–period–cohort (APC) analysis framework. Changes in travel distances are attributed to the life cycle (age effect), macro-level developments (period effect), and generational membership (cohort effect). We introduce ridgeline matrices and partial APC plots as innovative visualization techniques facilitating the intuitive interpretation of complex temporal structures. Generalized additive models are used to circumvent the identification problem by fitting a bivariate tensor product spline between age and period. The results indicate that participation in short-haul trips is mainly associated with age, while participation in long-distance travel predominantly changed over the period. Generational membership shows less association with destination choice concerning travel distance. The presented APC approach is promising to address further questions of interest in tourism research.

MCML Authors

Maximilian Weigert

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Alexander Bauer

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

* Former Member

Helmut Küchenhoff

Prof. Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Statistical Consulting Unit (StaBLab)

[4]

V. Melnychuk, E. Faerman, I. Manakov and T. Seidl.
Matching the Clinical Reality: Accurate OCT-Based Diagnosis From Few Labels.
CIKMW @CIKM 2020 - Workshop at the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020). Galway, Ireland, Oct 19-23, 2020. PDF GitHub

Abstract

Unlabeled data is often abundant in the clinic, making machine learning methods based on semi-supervised learning a good match for this setting. Despite this, they are currently receiving relatively little attention in medical image analysis literature. Instead, most practitioners and researchers focus on supervised or transfer learning approaches. The recently proposed Mix-Match and FixMatch algorithms have demonstrated promising results in extracting useful representations while requiring very few labels. Motivated by these recent successes, we apply MixMatch and FixMatch in an ophthalmological diagnostic setting and investigate how they fare against standard transfer learning. We find that both algorithms outperform the transfer learning baseline on all fractions of labelled data. Furthermore, our experiments show that Mean Teacher, which is a component of both algorithms, is not needed for our classification problem, as disabling it leaves the outcome unchanged.

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Evgeny Faerman

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Thomas Seidl

Prof. Dr.

Database Systems and Data Mining

[3]

M. Berrendorf, E. Faerman, V. Melnychuk, V. Tresp and T. Seidl.
Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned.
ECIR 2020 - 42nd European Conference on Information Retrieval. Virtual, Apr 14-17, 2020. DOI GitHub

Abstract

In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive.We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets.We believe that people interested in KG matching might profit from our work, as well as novices entering the field.

MCML Authors

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Valentyn Melnychuk

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Artificial Intelligence in Management

Volker Tresp

Prof. Dr.

Database Systems and Data Mining

Thomas Seidl

Prof. Dr.

C4 | Computational Social Sciences
→ Group Helmut Küchenhoff

Database Systems and Data Mining

[2]

J. Wrobel, A. Bauer, J. McDonnel and F. Scheipl.
registr: Curve Registration for Exponential Family Functional Data. R package.
2020. GitHub

Abstract

Registration for incomplete exponential family functional data.

MCML Authors

Alexander Bauer

* Former Member

Fabian Scheipl

PD Dr.

C4 | Computational Social Sciences
→ Group Stefan Feuerriegel

Functional Data Analysis

[1]

D. Davletshina, V. Melnychuk, V. Tran, H. Singla, M. Berrendorf, E. Faerman, M. Fromm and M. Schubert.
Unsupervised Anomaly Detection for X-Ray Images.
Preprint (Jan. 2020). arXiv GitHub

Abstract

MCML Authors

Valentyn Melnychuk

Artificial Intelligence in Management

Viet Tran

Biomedical Statistics and Data Science

Max Berrendorf

Dr.

A3 | Computational Models
→ Group Matthias Schubert

* Former Member

Evgeny Faerman

Dr.

* Former Member

Michael Fromm

Dr.

A3 | Computational Models
→ Group Thomas Seidl

* Former Member

Matthias Schubert

Prof. Dr.

Spatial Artificial Intelligence

C5 | Humane AI

MCML emphasizes ML and data science research for human benefit, improving actions, automating tasks, and revealing insights. Basic ML research, though generic, offers wide applicability. In human-centered ML, we prioritize efficient human-algorithm-data interaction, expanding beyond traditional human-computer interaction to include intelligent systems and data, all within a framework of ethical considerations in AI development and deployment.

Alena Buyx

Prof. Dr.

Ethics in Medicine and Health Technologies

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Benjamin Lange

Dr.

JRG Leader Ethics of AI

Ethics of Artificial Intelligence

Sven Mayer

Prof. Dr.

Associate

Human-Centered Ubiquitous Media

Publications in Research Area C5

[56]

M. Windl, O. Akgul, N. Malkin and L. F. Cranor.
Privacy Solution or Menace? Investigating Perceptions of Radio-Frequency Sensing.
USENIX 2025 - 34th USENIX Security Symposium. Seattle, WA, USA, Aug 13-15, 2025. To be published.

Abstract

Radio-frequency sensors are often introduced as privacy-preserving alternatives to cameras, as they enable similar use cases without relying on visual data. However, researchers argue that radio-frequency sensors cause privacy risks similar to cameras and even introduce additional risks. We conducted in-depth interviews (N= 14) and a large-scale vignette survey (N= 510) to understand people’s perceptions and privacy concerns around radio-frequency sensing. Most interviewees were initially unaware of the full capabilities of radio-frequency sensing but expressed nuanced concerns upon learning more. Our survey revealed that, while people expressed concerns, they mostly preferred radio-frequency sensors over cameras in private locations. However, they preferred cameras when considering radio-frequency sensing from a neighbor’s perspective and in security-relevant situations. Protective measures can reduce concerns, but the best protection depends on the context. Our findings can inform educational and legislative efforts to ensure a privacy-preserving future with radio-frequency technology.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[55]

S. Campell, P. Liu and S. Nyholm.
Can Chatbots Preserve Our Relationships with the Dead?
Journal of the American Philosophical Association 11.2 (Jun. 2025). DOI

Abstract

Imagine that you are given access to an AI chatbot that compellingly mimics the personality and speech of a deceased loved one. If you start having regular interactions with this ’thanabot’, could this new relationship be a continuation of the relationship you had with your loved one? And could a relationship with a thanabot preserve or replicate the value of a close human relationship? To the first question, we argue that a relationship with a thanabot cannot be a true continuation of your relationship with a deceased loved one, though it might support one’s continuing bonds with the dead. To the second question, we argue that, in and of themselves, relationships with thanabots cannot benefit us as much as rewarding and healthy intimate relationships with other humans, though we explain why it is difficult to make reliable comparative generalizations about the instrumental value of these relationships.

MCML Authors

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[54]

J. W. Grootjen, F. Prummer, M. Bâce, C. Jiao, S. Jindal and A. Bulling.
PETMEI: 10th Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction.
PETMEI @ETRA 2025 - 10th International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction at the ACM Symposium on Eye Tracking Research and Applications (ETRA 2025). Tokyo, Japan, May 26-29, 2025. DOI

Abstract

The first applications of eye tracking and eye-based human-computer interfaces mainly concentrated on making use of the eyes in traditional desktop settings. However, this changed in the last decade with a growth of interest in smart eyewear. With recent advances in low-cost mobile eye trackers, gaze-based techniques for mobile computing have become increasingly important. PETMEI 2025 focuses on the pervasive eye tracking paradigm as a trailblazer for mobile eye-based interaction and eye-based context-awareness. We want to stimulate and explore the creativity of these communities with respect to the implications, key research challenges, and new applications for pervasive eye tracking in ubiquitous computing. The long-term goal is to create a strong interdisciplinary research community linking these fields and establish the workshop as the premier forum for research on pervasive eye tracking.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

[53]

N. Broestl, B. Lange, C. Voinea, G. Keeling and R. Lam.
Evaluating Intra-firm LLM Alignment Strategies in Business Contexts.
Preprint (May. 2025). arXiv

Abstract

Instruction-tuned Large Language Models (LLMs) are increasingly deployed as AI Assistants in firms for support in cognitive tasks. These AI assistants carry embedded perspectives which influence factors across the firm including decision-making, collaboration, and organizational culture. This paper argues that firms must align the perspectives of these AI Assistants intentionally with their objectives and values, framing alignment as a strategic and ethical imperative crucial for maintaining control over firm culture and intra-firm moral norms. The paper highlights how AI perspectives arise from biases in training data and the fine-tuning objectives of developers, and discusses their impact and ethical significance, foregrounding ethical concerns like automation bias and reduced critical thinking. Drawing on normative business ethics, particularly non-reductionist views of professional relationships, three distinct alignment strategies are proposed: supportive (reinforcing the firm’s mission), adversarial (stress-testing ideas), and diverse (broadening moral horizons by incorporating multiple stakeholder views). The ethical trade-offs of each strategy and their implications for manager-employee and employee-employee relationships are analyzed, alongside the potential to shape the culture and moral fabric of the firm.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[52]

S. Delgado Rodriguez, M. Windl, F. Alt and K. Marky.
The TaPSI Research Framework - A Systematization of Knowledge on Tangible Privacy and Security Interfaces.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

This paper presents a comprehensive Systematization of Knowledge on tangible privacy and security interfaces (TaPSI). Tangible interfaces provide physical forms for digital interactions. They can offer significant benefits for privacy and security applications by making complex and abstract security concepts more intuitive, comprehensible, and engaging. Through a literature survey, we collected and analyzed 80 publications. We identified terminology used in these publications and addressed usable privacy and security domains, contributions, applied methods, implementation details, and opportunities or challenges inherent to TaPSI. Based on our findings, we define TaPSI and propose the TaPSI Research Framework, which guides future research by offering insights into when and how to conduct research on privacy and security involving TaPSI as well as a design space of TaPSI.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[51]

T. Mitrevska, F. Chiossi and S. Mayer.
ERP Markers of Visual and Semantic Processing in AI-Generated Images: From Perception to Meaning.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

Perceptual similarity assessment plays an important role in processing visual information, which is often employed in Human-AI interaction tasks such as object recognition or content generation. It is important to understand how humans perceive and evaluate visual similarity to iteratively generate outputs that meet the users’ expectations better and better. By leveraging physiological signals, systems can rely on users’ EEG responses to support the similarity assessment process. We conducted a study (N=20), presenting diverse AI-generated images as stimuli and evaluating their semantic similarity to a target image while recording event-related potentials (ERPs). Our results show that the N400 component distinguishes low, medium, and high similarity of images, while the P2 component showed no significant impact, implying consistent early perceptual processing. Thus, we demonstrate that ERPs allow us to assess the users’ perceived visual similarity to support rapid interactions with human-AI systems.

MCML Authors

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[50]

M. Windl, R. Amberg and T. Kosch.
The Illusion of Privacy: Investigating User Misperceptions in Browser Tracking Protection.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

Third parties track users’ web browsing activities, raising privacy concerns. Tracking protection extensions prevent this, but their influence on privacy protection beliefs shaped by narratives remains uncertain. This paper investigates users’ misperception of tracking protection offered by browser plugins. Our study explores how different narratives influence users’ perceived privacy protection by examining three tracking protection extension narratives: no protection, functional protection, and a placebo. In a study (N=36), participants evaluated their anticipated protection during a hotel
booking process, influenced by the narrative about the plugin’s functionality. However, participants viewed the same website without tracking protection adaptations. We show that users feel more protected when informed they use a functional or placebo extension, compared to no protection. Our findings highlight the deceptive nature of misleading privacy tools, emphasizing the need for greater transparency to prevent users from a false sense of protection, as such misleading tools negatively affect user study results.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[49]

M. Windl, P. Thalhammer, D. Müller, A. Schmidt and S. S. Feger.
PrivacyHub: A Functional Tangible and Digital Ecosystem for Interoperable Smart Home Privacy Awareness and Control.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. DOI

Abstract

Hubs are at the core of most smart homes. Modern cross-ecosystem protocols and standards enable smart home hubs to achieve interoperability across devices, offering the unique opportunity to integrate universally available smart home privacy awareness and control features. To date, such privacy features mainly focus on individual products or prototypical research artifacts. We developed a cross-ecosystem hub featuring a tangible dashboard and a digital web application to deepen our understanding of how smart home users interact with functional privacy features. The ecosystem allows users to control the connectivity states of their devices and raises awareness by visualizing device positions, states, and data flows. We deployed the ecosystem in six households for one week and found that it increased participants’ perceived control, awareness, and understanding of smart home privacy. We further found distinct differences between tangible and digital mechanisms. Our findings highlight the value of cross-ecosystem hubs for effective privacy management.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[48]

J. Leusmann, A. Belardinelli, L. Haliburton, S. Hasler, A. Schmidt, S. Mayer, M. Gienger and C. Wang.
Investigating LLM-Driven Curiosity in Human-Robot Interaction.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. To be published. DOI

Abstract

Integrating curious behavior traits into robots is essential for them to learn and adapt to new tasks over their lifetime and to enhance human-robot interaction. However, the effects of robots expressing curiosity on user perception, user interaction, and user experience in collaborative tasks are unclear. In this work, we present a Multimodal Large Language Model-based system that equips a robot with non-verbal and verbal curiosity traits. We conducted a user study (N=20) to investigate how these traits modulate the robot’s behavior and the users’ impressions of sociability and quality of interaction. Participants prepared cocktails or pizzas with a robot, which was either curious or non-curious. Our results show that we could create user-centric curiosity, which users perceived as more human-like, inquisitive, and autonomous while resulting in a longer interaction time. We contribute a set of design recommendations allowing system designers to take advantage of curiosity in collaborative tasks.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[47]

M. Windl, P. Z. Laboda and S. Mayer.
Designing Effective Consent Mechanisms for Spontaneous Interactions in Augmented Reality.
CHI 2025 - Conference on Human Factors in Computing Systems. Yokohama, Japan, Apr 26-May 01, 2025. To be published. DOI

Abstract

Ubiquitous computing devices like Augmented Reality (AR) glasses allow countless spontaneous interactions – all serving different goals. AR devices rely on data transfer to personalize recommendations and adapt to the user. Today’s consent mechanisms, such as privacy policies, are suitable for long-lasting interactions; however, how users can consent to fast, spontaneous interactions is unclear. We first conducted two focus groups (N=17) to identify privacy-relevant scenarios in AR. We then conducted expert interviews (N=11) with co-design activities to establish effective consent mechanisms. Based on that, we contribute (1) a validated scenario taxonomy to define privacy-relevant AR interaction scenarios, (2) a flowchart to decide on the type of mechanisms considering contextual factors, (3) a design continuum and design aspects chart to create the mechanisms, and (4) a trade-off and prediction chart to evaluate the mechanism. Thus, we contribute a conceptual framework fostering a privacy-preserving future with AR.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[46]

B. Lange.
Beyond the Ivory Tower? The Practical Role of Ethicists in Business.
Artificial Intelligence, Entrepreneurship and Risk. Technikzukünfte, Wissenschaft und Gesellschaft / Futures of Technology, Science and Society (Apr. 2025). DOI

Abstract

‘AI Ethics’, ‘Digital Ethics’ or ‘Corporate Digital Responsibility’—ethics in business, especially with the rise of Artificial Intelligence (AI), is now in vogue. But how, if at all, can ethicists meaningfully contribute to practical business challenges? I examine the value that resources from moral philosophy can bring to ethical issues in business, particularly the technology sector. I show that there is a specific need for sharpened ethical acumen in so-called ‘grey areas’, in which laws and regulation do not provide definite answers to the ethical challenges businesses face. I argue that ethicists can distinctively help businesses navigate grey areas by strengthening their ethical capabilities and functions, which concern an organization’s ethical awareness, deliberation, decision-making, and commitment. I conclude by discussing some practical examples of how ethicists can strengthen these capabilities.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[45]

B. D. Earp, S. P. Mann, M. Aboy, E. Awad, M. Betzler, M. Botes, R. Calcott, M. Caraccio, N. Chater, M. Coeckelbergh, M. Constantinescu, H. Dabbagh, K. Devlin, X. Ding, V. Dranseika, J. A. C. Everett, R. Fan, F. Feroz, K. B. Francis, C. Friedman, O. Friedrich, I. Gabriel, I. Hannikainen, J. Hellmann, A. K. Jahrome, N. S. Janardhanan, P. Jurcys, A. Kappes, M. A. Khan, G. Kraft-Todd, M. Kroner Dale, S. M. Laham, B. Lange, M. Leuenberger, J. Lewis, P. Liu, D. M. Lyreskog, M. Maas, J. McMillan, E. Mihailov, T. Minssen, J. Teperowski Monrad, K. Muyskens, S. Myers, S. Nyholm, A. M. Owen, A. Puzio, C. Register, M. G. Reinecke, A. Safron, H. Shevlin, H. Shimizu, P. V. Treit, C. Voinea, K. Yan, A. Zahiu, R. Zhang, H. Zohny, W. Sinnott-Armstrong, I. Singh, J. Savulescu and M. S. Clark.
Relational Norms for Human-AI Cooperation.
Preprint (Feb. 2025). arXiv

Abstract

How we should design and interact with social artificial intelligence depends on the socio-relational role the AI is meant to emulate or occupy. In human society, relationships such as teacher-student, parent-child, neighbors, siblings, or employer-employee are governed by specific norms that prescribe or proscribe cooperative functions including hierarchy, care, transaction, and mating. These norms shape our judgments of what is appropriate for each partner. For example, workplace norms may allow a boss to give orders to an employee, but not vice versa, reflecting hierarchical and transactional expectations. As AI agents and chatbots powered by large language models are increasingly designed to serve roles analogous to human positions - such as assistant, mental health provider, tutor, or romantic partner - it is imperative to examine whether and how human relational norms should extend to human-AI interactions. Our analysis explores how differences between AI systems and humans, such as the absence of conscious experience and immunity to fatigue, may affect an AI’s capacity to fulfill relationship-specific functions and adhere to corresponding norms. This analysis, which is a collaborative effort by philosophers, psychologists, relationship scientists, ethicists, legal experts, and AI researchers, carries important implications for AI systems design, user behavior, and regulation. While we accept that AI systems can offer significant benefits such as increased availability and consistency in certain socio-relational roles, they also risk fostering unhealthy dependencies or unrealistic expectations that could spill over into human-human relationships. We propose that understanding and thoughtfully shaping (or implementing) suitable human-AI relational norms will be crucial for ensuring that human-AI interactions are ethical, trustworthy, and favorable to human well-being.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[44]

B. Lange.
Moral parenthood and gestation: replies to Cordeiro, Murphy, Robinson and Baron.
Journal of Medical Ethics 51.2 (Jan. 2025). DOI

Abstract

I am grateful to James Cordeiro, Timothy Murphy, Heloise Robinson and Teresa Baron for their perceptive and stimulating comments on my article in this journal. In what follows, I seek to respond to some of the main points raised in each commentary.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[43]

B. Lange.
Moral parenthood: not gestational.
Journal of Medical Ethics 51.2 (Jan. 2025). DOI

Abstract

Parenting our biological children is a centrally important matter, but how, if it all, can it be justified? According to a contemporary influential line of thinking, the acquisition by parents of a moral right to parent their biological children should be grounded by appeal to the value of the intimate emotional relationship that gestation facilitates between a newborn and a gestational procreator. I evaluate two arguments in defence of this proposal and argue that both are unconvincing.Data are available in a public, open access repository.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[42]

B. Lange.
Digital Duplicates and Collective Scarcity.
Philosophy and Technology 38.7 (Jan. 2025). DOI

Abstract

Digital duplicates reduce the scarcity of individuals and thus may impact their instrumental and intrinsic value. I here expand upon this idea by introducing the notion of collective scarcity, which pertains to the limitations faced by social groups in maintaining their size, cohesion and function.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[41]

H. Weingärtner, M. Windl, L. L. Chuang and F. Draxler.
Useful but Distracting: Viewer Experience with Keyword Highlights and Time-Synchronization in Captions for Language Learning.
MUM 2024 - 23rd International Conference on Mobile and Ubiquitous Multimedia. Stockholm, Sweden, Dec 01-04, 2024. DOI

Abstract

Captions are a valuable scaffold for language learners, aiding comprehension and vocabulary acquisition. Past work has proposed enhancements such as keyword highlights for increased learning gains. However, little is known about learners’ experience with enhanced captions, although this is critical for adoption in everyday life. We conducted a survey and focus group to elicit learner preferences and requirements and implemented a processing pipeline for enhanced captions with keyword highlights, time-synchronized keyword highlights, and keyword captions. A subsequent online study (n = 66) showed that time-synchronized keyword highlights were the preferred design for learning but were perceived as too distracting to replace standard captions in everyday viewing scenarios. We conclude that keyword highlights and time-synchronization are suitable for integrating learning into an entertaining everyday- life activity, but the design should be optimized to provide a more seamless experience.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[40]

S. Nyholm.
Digital Duplicates and Personal Scarcity: Reply to Voinea et al and Lundgren.
Philosophy and Technology 37.132 (Nov. 2024). DOI

Abstract

In our recent paper in this journal, (‘Digital Duplicates and the Scarcity Problem: Might AI Make Us Less Scarce and Therefore Less Valuable?’’, Danaher & Nyholm (2024)), John Danaher and I discussed the possibility of creating digital duplicates of particular people (e.g. by means of creating fine-tuned language models whose outputs sound like those of a particular person). We were specifically interested in how this might be seen as affecting the value of particular people as unique individuals and as scarce resources…

MCML Authors

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[39]

B. Lange.
The Future Audit Society? Automated Assurance and Auditing.
AISoLA 2024 - 2nd International Conference on Bridging the Gap Between AI and Reality. Crete, Greece, Oct 30-Nov 03, 2024. To be published.

Abstract

AI audits are a key mechanism for responsible AI governance. AI audits have been proposed in a variety of laws and regulations standardized frameworks and guidelines for industry best practices as a mechanism to facilitate public trust and accountability for AI system developers and deployers. Though AI auditing for the purpose of compliance and assurance with normative requirements currently lacks defined norms and standardized practices, some systematic assurance AI audit methodologies are emerging that are modelled on financial auditing practices. In the spirit of financial audits which aim to uphold trust in the integrity of the proper function of the financial markets for stakeholders, AI audits, on this line of reasoning, aim to provide assurance to their stakeholders about AI organizations’ ability to govern their algorithms in ways that mitigate harms and uphold human values. Against this backdrop, the nature of the auditing industry is currently evolving. Traditional financial auditing practices are becoming increasingly automated by AI and, given the complexity of some AI-systems themselves and the high degree of assurance that they will require, the future of AI auditing itself will foreseeably be automated. This paper makes a first step toward exploring this picture. I argue that current automated auditing trends run the risk of undermining the justificatory plausibility of auditing as an accountability and trust-facilitating mechanism itself. In particular, I suggest that this leads to a continuous desire for verification, in which the epistemic obscurity of auditing assurance – the nature of the judgment provided auditors – increases and the operational capability of audits to achieve their aims decreases.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[38]

G. Keeling, B. Lange, A. McCroskery, K. Pedersen, D. Weinberger and B. Zevenbergen.
Moral Imagination for Engineering Teams: The Technomoral Scenario.
The International Review of Information Ethics 34.1 (Oct. 2024). DOI

Abstract

‘Moral imagination’ is the capacity to register that one’s perspective on a decision-making situation is limited, and to imagine alternative perspectives that reveal new considerations or approaches. We have developed a Moral Imagination approach that aims to drive a culture of responsible innovation, ethical awareness, deliberation, decision-making, and commitment in organizations developing new technologies. We here present a case study that illustrates one key aspect of our approach – the technomoral scenario – as we have applied it in our work with product and engineering teams. Technomoral scenarios are fictional narratives that raise ethical issues surrounding the interaction between emerging technologies and society. Through facilitated roleplaying and discussion, participants are prompted to examine their own intentions, articulate justifications for actions, and consider the impact of decisions on various stakeholders. This process helps developers to reenvision their choices and responsibilities, ultimately contributing to a culture of responsible innovation.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[37]

F. Chiossi, U. Gruenefeld, B. J. Hou, J. Newn, C. Ou, R. Liao, R. Welsch and S. Mayer.
Understanding the Impact of the Reality-Virtuality Continuum on Visual Search Using Fixation-Related Potentials and Eye Tracking Features.
MobileHCI 2024 - ACM International Conference on Mobile Human-Computer Interaction. Melbourne, Australia, Sep 30-Oct 03, 2024. DOI

Abstract

While Mixed Reality allows the seamless blending of digital content in users’ surroundings, it is unclear if its fusion with physical information impacts users’ perceptual and cognitive resources differently. While the fusion of digital and physical objects provides numerous opportunities to present additional information, it also introduces undesirable side effects, such as split attention and increased visual complexity. We conducted a visual search study in three manifestations of mixed reality (Augmented Reality, Augmented Virtuality, Virtual Reality) to understand the effects of the environment on visual search behavior. We conducted a multimodal evaluation measuring Fixation-Related Potentials (FRPs), alongside eye tracking to assess search efficiency, attention allocation, and behavioral measures. Our findings indicate distinct patterns in FRPs and eye-tracking data that reflect varying cognitive demands across environments. Specifically, AR environments were associated with increased workload, as indicated by decreased FRP - P3 amplitudes and more scattered eye movement patterns, impairing users’ ability to identify target information efficiently. Participants reported AR as the most demanding and distracting environment. These insights inform design implications for MR adaptive systems, emphasizing the need for interfaces that dynamically respond to user cognitive load based on physiological inputs.

MCML Authors

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[36]

J. W. Grootjen, P. Thallhammer and T. Kosch.
Your Eyes on Speed: Using Pupil Dilation to Adaptively Select Speed-Reading Parameters in Virtual Reality.
MobileHCI 2024 - ACM International Conference on Mobile Human-Computer Interaction. Melbourne, Australia, Sep 30-Oct 03, 2024. DOI GitHub

Abstract

Rapid Serial Visual Presentation (RSVP) improves the reading speed for optimizing the user’s information processing capabilities on Virtual Reality (VR) devices. Yet, the user’s RSVP reading performance changes over time while the reading speed remains static. In this paper, we evaluate pupil dilation as a physiological metric to assess the mental workload of readers in real-time. We assess mental workload under different background lighting and RSVP presentation speeds to estimate the optimal color that discriminates the pupil diameter varying RSVP presentation speeds. We discovered that a gray background provides the best contrast for reading at various presentation speeds. Then, we conducted a second study to evaluate the classification accuracy of mental workload for different presentation speeds. We find that pupil dilation relates to mental workload when reading with RSVP. We discuss how pupil dilation can be used to adapt the RSVP speed in future VR applications to optimize information intake.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

[35]

Y. Weiss, S. Villa, J. W. Grootjen, M. Hoppe, Y. Kale and F. Müller.
Exploring Redirection and Shifting Techniques to Mask Hand Movements from Shoulder-Surfing Attacks during PIN Authentication in Virtual Reality.
MobileHCI 2024 - ACM International Conference on Mobile Human-Computer Interaction. Melbourne, Australia, Sep 30-Oct 03, 2024. DOI

Abstract

The proliferation of mobile Virtual Reality (VR) headsets shifts our interaction with virtual worlds beyond our living rooms into shared spaces. Consequently, we are entrusting more and more personal data to these devices, calling for strong security measures and authentication. However, the standard authentication method of such devices - entering PINs via virtual keyboards - is vulnerable to shoulder-surfing, as movements to enter keys can be monitored by an unnoticed observer. To address this, we evaluated masking techniques to obscure VR users’ input during PIN authentication by diverting their hand movements. Through two experimental studies, we demonstrate that these methods increase users’ security against shoulder-surfing attacks from observers without excessively impacting their experience and performance. With these discoveries, we aim to enhance the security of future VR authentication without disrupting the virtual experience or necessitating additional hardware or training of users.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

[34]

M. Windl, M. Schlegel and S. Mayer.
Exploring Users’ Mental Models and Privacy Concerns During Interconnected Interactions.
MobileHCI 2024 - ACM International Conference on Mobile Human-Computer Interaction. Melbourne, Australia, Sep 30-Oct 03, 2024. DOI

Abstract

Users frequently use their smartphones in combination with other smart devices, for example, when streaming music to smart speakers or controlling smart appliances. During these interconnected interactions, user data gets handled and processed by several entities that employ different data protection practices or are subject to different regulations. Users need to understand these processes to inform themselves in the right places and make informed privacy decisions. We conducted an online survey (N=120) to investigate whether users have accurate mental models about interconnected interactions. We found that users consider scenarios more privacy-concerning when multiple devices are involved. Yet, we also found that most users do not fully comprehend the privacy-relevant processes in interconnected interactions. Our results show that current privacy information methods are insufficient and that users must be better educated to make informed privacy decisions. Finally, we advocate for restricting data processing to the app layer and better encryption to reduce users’ data protection responsibilities.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[33]

L. Haliburton, J. Leusmann, R. Welsch, S. Ghebremedhin, P. Isaakidis, A. Schmidt and S. Mayer.
Uncovering labeler bias in machine learning annotation tasks.
AI and Ethics (Sep. 2024). DOI

Abstract

As artificial intelligence becomes increasingly pervasive, it is essential that we understand the implications of bias in machine learning. Many developers rely on crowd workers to generate and annotate datasets for machine learning applications. However, this step risks embedding training data with labeler bias, leading to biased decision-making in systems trained on these datasets. To characterize labeler bias, we created a face dataset and conducted two studies where labelers of different ethnicity and sex completed annotation tasks. In the first study, labelers annotated subjective characteristics of faces. In the second, they annotated images using bounding boxes. Our results demonstrate that labeler demographics significantly impact both subjective and accuracy-based annotations, indicating that collecting a diverse set of labelers may not be enough to solve the problem. We discuss the consequences of these findings for current machine learning practices to create fair and unbiased systems.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[32]

W. Jiang, M. Windl, B. Tag, Z. Sarsenbayeva and S. Mayer.
An Immersive and Interactive VR Dataset to Elicit Emotions.
IEEE Transactions on Visualization and Computer Graphics 30.11 (Sep. 2024). DOI

Abstract

Images and videos are widely used to elicit emotions; however, their visual appeal differs from real-world experiences. With virtual reality becoming more realistic, immersive, and interactive, we envision virtual environments to elicit emotions effectively, rapidly, and with high ecological validity. This work presents the first interactive virtual reality dataset to elicit emotions. We created five interactive virtual environments based on corresponding validated 360° videos and validated their effectiveness with 160 participants. Our results show that our virtual environments successfully elicit targeted emotions. Compared with the existing methods using images or videos, our dataset allows virtual reality researchers and practitioners to integrate their designs effectively with emotion elicitation settings in an immersive and interactive way.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[31]

M. Windl, J. Leusmann, A. Schmidt, S. S. Feger and S. Mayer.
Privacy Communication Patterns for Domestic Robots.
SOUPS 2024 - 20th Symposium on Usable Privacy and Security. Philadelphia, PA, USA, Aug 11-13, 2024. URL

Abstract

Future domestic robots will become integral parts of our homes. They will have various sensors that continuously collect data and varying locomotion and interaction capabilities, enabling them to access all rooms and physically manipulate the environment. This raises many privacy concerns. We investigate how such concerns can be mitigated, using all possibilities enabled by the robot’s novel locomotion and interaction abilities. First, we found that privacy concerns increase with advanced locomotion and interaction capabilities through an online survey (N=90). Second, we conducted three focus groups (N=22) to construct 86 patterns to communicate the states of microphones, cameras, and the internet connectivity of domestic robots. Lastly, we conducted a large-scale online survey (N=1720) to understand which patterns perform best regarding trust, privacy, understandability, notification qualities, and user preference. Our final set of communication patterns will guide developers and researchers to ensure a privacy-preserving future with domestic robots.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[30]

L. Haliburton.
Designing behavior change technologies for workplace wellbeing.
Dissertation 2024. DOI

Abstract

Advances in technology have made humans more productive at work but often at the cost of wellbeing, with issues like sedentary behavior, social isolation, and excessive screen time affecting modern knowledge workers. Despite efforts to introduce healthy interventions, such as standing desks, uptake remains low due to the intention-behavior gap. This thesis explores ways to design technology that encourages healthy behaviors, using passive and active behavior change methods to motivate users, and proposes a design framework for ethical behavior change technologies that promote a healthier, more productive workplace. (Shortened).

MCML Authors

Luke Haliburton

Dr.

* Former Member

[29]

M. Windl and S. S. Feger.
Designing Interactive Privacy Labels for Advanced Smart Home Device Configuration Options.
DIS 2023 - ACM Conference on Designing Interactive Systems. Copenhagen, Denmark, Jul 01-05, 2024. DOI

Abstract

Labels inform smart home users about the privacy of devices before purchase and during use. Yet, current privacy labels fail to fully reflect the impact of advanced device configuration options like sensor state control. Based on the successful implementation of related privacy and security labels, we designed extended static and interactive labels that reflect sensor states and device connectivity. We first did expert interviews (N=10) that informed the final label design. Second, we ran an online survey (N=160) to assess the interpretation and usability of the novel interactive privacy label. Lastly, we conducted a second survey (N=120) to investigate how well our interactive labels educate users about sensor configuration. We found that most participants successfully used the interactive label and retrieved sensor information more efficiently and correctly. We discuss our findings in the context of a potential shift in label use toward control and use-case-based interaction.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

[28]

S. Milano and S. Nyholm.
Advanced AI assistants that act on our behalf may not be ethically or legally feasible.
Nature Machine Intelligence 6 (Jul. 2024). DOI

Abstract

Google and OpenAI have recently announced major product launches involving artificial intelligence (AI) agents based on large language models (LLMs) and other generative models. Notably, these are envisioned to function as personalized ‘advanced assistants’. With other companies following suit, such AI agents seem poised to be the next big thing in consumer technology, with the potential to disrupt work and social environments. To underscore the importance of these developments, Google DeepMind recently published an extensive report on the topic, which they describe as “one of [their] largest ethics foresight projects to date”1. The report defines AI assistants functionally as “artificial agent[s] with a natural language interface, the function of which is to plan and execute sequences of actions on the user’s behalf across one or more domains and in line with the user’s expectations”. The question the Google DeepMind researchers argue we should be pondering is ‘what kind of AI assistants do we want to see in the world?’. But a more fundamental question is whether AI assistants are feasible, given basic ethical and legal requirements. Key issues that will impact the deployment of AI agents concern liability and the ability of users to effectively transfer some of their agential powers to AI assistants.

MCML Authors

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[27]

F. Poszler and B. Lange.
The impact of intelligent decision-support systems on humans' ethical decision-making: A systematic literature review and an integrated framework.
Technological Forecasting and Social Change 204.123403 (Jul. 2024). DOI

Abstract

With the rise and public accessibility of AI-enabled decision-support systems, individuals outsource increasingly more of their decisions, even those that carry ethical dimensions. Considering this trend, scholars have highlighted that uncritical deference to these systems would be problematic and consequently called for investigations of the impact of pertinent technology on humans’ ethical decision-making. To this end, this article conducts a systematic review of existing scholarship and derives an integrated framework that demonstrates how intelligent decision-support systems (IDSSs) shape humans’ ethical decision-making. In particular, we identify resulting consequences on an individual level (i.e., deliberation enhancement, motivation enhancement, autonomy enhancement and action enhancement) and on a societal level (i.e., moral deskilling, restricted moral progress and moral responsibility gaps). We carve out two distinct methods/operation types (i.e., process-oriented and outcome-oriented navigation) that decision-support systems can deploy and postulate that these determine to what extent the previously stated consequences materialize. Overall, this study holds important theoretical and practical implications by establishing clarity in the conceptions, underlying mechanisms and (directions of) influences that can be expected when using particular IDSSs for ethical decisions.

MCML Authors

Benjamin Lange

Dr.

Ethics of Artificial Intelligence

[26]

L. Haliburton, S. Ghebremedhin, R. Welsch, A. Schmidt and S. Mayer.
Investigating Labeler Bias in Face Annotation for Machine Learning.
HHAI 2024 - 3rd International Conference on Hybrid Human-Artificial Intelligence. Malmö, Sweden, Jun 10-14, 2024. DOI

Abstract

In a world increasingly reliant on artificial intelligence, it is more important than ever to consider the ethical implications of artificial intelligence. One key under-explored challenge is labeler bias — bias introduced by individuals who label datasets — which can create inherently biased datasets for training and subsequently lead to inaccurate or unfair decisions in healthcare, employment, education, and law enforcement. Hence, we conducted a study (N=98) to investigate and measure the existence of labeler bias using images of people from different ethnicities and sexes in a labeling task. Our results show that participants hold stereotypes that influence their decision-making process and that labeler demographics impact assigned labels. We also discuss how labeler bias influences datasets and, subsequently, the models trained on them. Overall, a high degree of transparency must be maintained throughout the entire artificial intelligence training process to identify and correct biases in the data as early as possible.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[25]

J. W. Grootjen, H. Weingärtner and S. Mayer.
Investigating the Effects of Eye-Tracking Interpolation Methods on Model Performance of LSTM.
PETMEI @ETRA 2024 - 9th International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction at the ACM Symposium on Eye Tracking Research and Applications (ETRA 2024). Glasgow, Scotland, Jun 04-07, 2024. DOI

Abstract

Physiological sensing enables us to use advanced adaptive functionalities through physiological data (e.g., eye tracking) to change conditions. In this work, we investigate the impact of infilling methods on LSTM models’ performance in handling missing eye tracking data, specifically during blinks and gaps in recording. We conducted experiments using recommended infilling techniques from previous work on an openly available eye tracking dataset and LSTM model structure. Our findings indicate that the infilling method significantly influences LSTM prediction accuracy. These results underscore the importance of standardized infilling approaches for enhancing the reliability and reproducibility of LSTM-based eye tracking applications on a larger scale. Future work should investigate the impact of these infilling methods in larger datasets to investigate generalizability.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[24]

J. W. Grootjen, H. Weingärtner and S. Mayer.
Uncovering and Addressing Blink-Related Challenges in Using Eye Tracking for Interactive Systems.
CHI 2024 - Conference on Human Factors in Computing Systems. Honolulu, Hawaii, May 11-16, 2024. DOI

Abstract

Currently, interactive systems use physiological sensing to enable advanced functionalities. While eye tracking is a promising means to understand the user, eye tracking data inherently suffers from missing data due to blinks, which may result in reduced system performance. We conducted a literature review to understand how researchers deal with this issue. We uncovered that researchers often implemented their use-case-specific pipeline to overcome the issue, ranging from ignoring missing data to artificial interpolation. With these first insights, we run a large-scale analysis on 11 publicly available datasets to understand the impact of the various approaches on data quality and accuracy. By this, we highlight the pitfalls in data processing and which methods work best. Based on our results, we provide guidelines for handling eye tracking data for interactive systems. Further, we propose a standard data processing pipeline that allows researchers and practitioners to pre-process and standardize their data efficiently.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[23]

L. Haliburton, I. Damen, C. Lallemand, A. Ahtinen, J. Niess and P. W. Woźniak.
Office Wellbeing by Design: Don’t Stand for Anything Less.
CHI 2024 - Conference on Human Factors in Computing Systems. Honolulu, Hawaii, May 11-16, 2024. DOI

Abstract

The modern workplace has been optimized towards increasing productivity, often at the cost of long-term worker wellbeing. This systemic issue has been acknowledged in both research and practice, but has not yet been solved. There is a notable lack of practical methods of incorporating physical activity and other wellbeing practices into productive workplace activities. We see a gap between research endeavors and industry practice that motivates a call for increased collaboration between the two parties. In response, our workshop aims to bring together researchers and practitioners to work together in identifying a set of grand challenges for the field. Through collaboration, we will create a concrete research agenda to create a resilient future workplace that explicitly incorporates holistic worker wellbeing.

MCML Authors

Luke Haliburton

Dr.

* Former Member

[22]

L. Haliburton, D. J. Grüning, F. Riedel, A. Schmidt and N. Terzimehić.
A Longitudinal In-the-Wild Investigation of Design Frictions to Prevent Smartphone Overuse.
CHI 2024 - Conference on Human Factors in Computing Systems. Honolulu, Hawaii, May 11-16, 2024. DOI

Abstract

Smartphone overuse is hyper-prevalent in society, and developing tools to prevent this overuse has become a focus of HCI. However, there is a lack of work investigating smartphone overuse interventions over the long term. We collected usage data from N = 1, 039 users of one sec over an average of 13.4 weeks and qualitative insights from 249 of the users through an online survey. We found that users overwhelmingly choose to target Social Media apps. We found that the short design frictions introduced by one sec effectively reduce how often users attempt to open target apps and lead to more intentional app-openings over time. Additionally, we found that users take periodic breaks from one sec interventions, and quickly rebound from a pattern of overuse when returning from breaks. Overall, we contribute findings from a longitudinal investigation of design frictions in the wild and identify usage patterns from real users in practice.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[21]

S. Sakel, T. Blenk, A. Schmidt and L. Haliburton.
The Social Journal: Investigating Technology to Support and Reflect on Meaningful Social Interactions.
CHI 2024 - Conference on Human Factors in Computing Systems. Honolulu, Hawaii, May 11-16, 2024. DOI

Abstract

Social interaction is a crucial part of what it means to be human. Maintaining a healthy social life is strongly tied to positive outcomes for both physical and mental health. While we use personal informatics data to reflect on many aspects of our lives, technology-supported reflection for social interactions is currently under-explored. To address this, we first conducted an online survey (N=124) to understand how users want to be supported in their social interactions. Based on this, we designed and developed an app for users to track and reflect on their social interactions and deployed it in the wild for two weeks (N=25). Our results show that users are interested in tracking meaningful in-person interactions that are currently untraced and that an app can effectively support self-reflection on social interaction frequency and social load. We contribute insights and concrete design recommendations for technology-supported reflection for social interaction.

MCML Authors

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Luke Haliburton

Dr.

* Former Member

[20]

R. van Koningsbruggen, L. Haliburton, B. Rossmy, C. George, E. Hornecker and B. Hengeveld.
Metaphors and `Tacit' Data: the Role of Metaphors in Data and Physical Data Representations.
TEI 2024 - 18th International Conference on Tangible, Embedded, and Embodied Interaction. Cork, Ireland, Feb 11-14, 2024. DOI

Abstract

This paper explores (1) the role of metaphors in physical data representations and (2) the concept of tacit data: implicitly known data which are hard to uncover. In a semester course with twenty-three students, five teams explored how to represent self-chosen ‘tacit data’ in a visualisation, haptification, and dynamic physicalisation. Throughout these phases, our notion of tacit data evolved, resulting in a proposed working definition. Moreover, we noticed that metaphors played an increasingly important role. Based on analysis of students’ work and interviews with them, we found that tacit data and physical data representations need metaphors. For haptifications and physicalisations, metaphors help to circumvent limitations, curate data, and communicate to the audience. As tacit data were seen as ‘soft’ and difficult to quantify, metaphors made the data workable. Furthermore, tacit data benefit from physical representations, which offer further dimensions to represent the feeling and intimate aspects of data.

MCML Authors

Luke Haliburton

Dr.

* Former Member

[19]

L. Haliburton, B. Rossmy, A. Schmidt and C. George.
An Exploration of Hidden Data: Identifying and Physicalizing Personal Virtual Data to Extend Co-located Communication.
MUM 2023 - 22nd International Conference on Mobile and Ubiquitous Multimedia. Vienna, Austria, Dec 03-06, 2023. DOI

Abstract

Communication is crucial for interpersonal connection, but sometimes we simply cannot find the right words. Some data, such as complex emotions, are either hard to quantify or are otherwise difficult to communicate. We have access to numerous personal statistics from quantified self devices, but hidden data are either untracked or require abstraction. In this paper, we explore physicalizations to communicate hidden data between couples. We recruited six couples (N=12 participants, 163 telegram responses) to participate in a two-week sensitization diary study followed by two participatory co-design sessions. We then hosted a one-day expert prototyping workshop (N=5) to create tangible artifacts based on the findings of the participatory phase. By iterating on the topic in three ways, we contribute (i) a design framework for understanding and tangibly representing hidden data, (ii) a discussion on the appropriateness of these methodologies, and (iii) open research questions to guide future research in the field.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[18]

B. H. Lang, S. Nyholm and J. Blumenthal-Barby.
Responsibility Gaps and Black Box Healthcare Ai: Shared Responsibilization as a Solution.
Digital Society 2.52 (Nov. 2023). DOI

Abstract

As sophisticated artificial intelligence software becomes more ubiquitously and more intimately integrated within domains of traditionally human endeavor, many are raising questions over how responsibility (be it moral, legal, or causal) can be understood for an AI’s actions or influence on an outcome. So called ‘responsibility gaps’ occur whenever there exists an apparent chasm in the ordinary attribution of moral blame or responsibility when an AI automates physical or cognitive labor otherwise performed by human beings and commits an error. Healthcare administration is an industry ripe for responsibility gaps produced by these kinds of AI. The moral stakes of healthcare are often life and death, and the demand for reducing clinical uncertainty while standardizing care incentivizes the development and integration of AI diagnosticians and prognosticators. In this paper, we argue that (1) responsibility gaps are generated by ‘black box’ healthcare AI, (2) the presence of responsibility gaps (if unaddressed) creates serious moral problems, (3) a suitable solution is for relevant stakeholders to voluntarily responsibilize the gaps, taking on some moral responsibility for things they are not, strictly speaking, blameworthy for, and (4) should this solution be taken, black box healthcare AI will be permissible in the provision of healthcare.

MCML Authors

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[17]

J. Smids, H. Berkers, P. Le Blanc, S. Rispens and S. Nyholm.
Employers Have a Duty of Beneficence to Design for Meaningful Work: A General Argument and Logistics Warehouses as a Case Study.
The Journal of Ethics (Oct. 2023). DOI

Abstract

Artificial intelligence-driven technology increasingly shapes work practices and, accordingly, employees’ opportunities for meaningful work (MW). In our paper, we identify five dimensions of MW: pursuing a purpose, social relationships, exercising skills and self-development, autonomy, self-esteem and recognition. Because MW is an important good, lacking opportunities for MW is a serious disadvantage. Therefore, we need to know to what extent employers have a duty to provide this good to their employees. We hold that employers have a duty of beneficence to design for opportunities for MW when implementing AI-technology in the workplace. We argue that this duty of beneficence is supported by the three major ethical theories, namely, Kantian ethics, consequentialism, and virtue ethics. We defend this duty against two objections, including the view that it is incompatible with the shareholder theory of the firm. We then employ the five dimensions of MW as our analytical lens to investigate how AI-based technological innovation in logistic warehouses has an impact, both positively and negatively, on MW, and illustrate that design for MW is feasible. We further support this practical feasibility with the help of insights from organizational psychology. We end by discussing how AI-based technology has an impact both on meaningful work (often seen as an aspirational goal) and decent work (generally seen as a matter of justice). Accordingly, ethical reflection on meaningful and decent work should become more integrated to do justice to how AI-technology inevitably shapes both simultaneously.

MCML Authors

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[16]

L. Haliburton, B. Pirker, P. Holinski, A. Schmidt, P. W. Wozniak and M. Hoppe.
VR-Hiking: Physical Exertion Benefits Mindfulness and Positive Emotions in Virtual Reality.
MobileHCI 2023 - ACM International Conference on Mobile Human-Computer Interaction. Athens, Greece, Sep 26-29, 2023. DOI

Abstract

Exploring the great outdoors offers physical and mental health benefits. Hiking is healthy, provides a sense of accomplishment, and offers an opportunity to relax. However, a nature trip is not always possible, and there is a lack of evidence showing how these beneficial experiences can be replicated in Virtual Reality (VR). In response, we recruited (N=24) participants to explore a virtual mountain landscape in a within-subjects study with different levels of exertion: walking, using a chairlift, and teleporting. We found that physical exertion when walking produced significantly more positive emotions and mindfulness than other conditions. Our research shows that physically demanding outdoor activities in VR can be beneficial for the user and that the achievement of hiking up a virtual mountain on a treadmill positively impacts wellbeing. We demonstrate how physical exertion can be used to add mindfulness and positive affect to VR experiences and discuss consequences for VR designers.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[15]

S. Nyholm.
Is Academic Enhancement Possible by Means of Generative Ai-Based Digital Twins?
American Journal of Bioethics 23.10 (Sep. 2023). DOI

MCML Authors

Sven Nyholm

Prof. Dr.

Ethics of Artificial Intelligence

[14]

M. Windl, A. Scheidle, C. George and S. Mayer.
Investigating Security Indicators for Hyperlinking Within the Metaverse.
SOUPS 2023 - 19th Symposium on Usable Privacy and Security. Anaheim, CA, USA, Aug 06-08, 2023. URL

Abstract

Security indicators, such as the padlock icon indicating SSL encryption in browsers, are established mechanisms to convey secure connections. Currently, such indicators mainly exist for browsers and mobile environments. With the rise of the metaverse, we investigate how to mark secure transitions between applications in virtual reality to so-called sub-metaverses. For this, we first conducted in-depth interviews with domain experts (N=8) to understand the general design dimensions for security indicators in virtual reality (VR). Using these insights and considering additional design constraints, we implemented the five most promising indicators and evaluated them in a user study (N=25). While the visual blinking indicator placed in the periphery performed best regarding accuracy and task completion time, participants subjectively preferred the static visual indicator above the portal. Moreover, the latter received high scores regarding understandability while still being rated low regarding intrusiveness and disturbance. Our findings contribute to a more secure and enjoyable metaverse experience.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[13]

F. Draxler, A. Schmidt and L. L. Chuang.
Relevance, Effort, and Perceived Quality: Language Learners’ Experiences with AI-Generated Contextually Personalized Learning Material.
DIS 2023 - ACM Conference on Designing Interactive Systems. Pittsburgh, PA, USA, Jul 10-14, 2023. DOI

Abstract

Artificial intelligence has enabled scalable auto-creation of context-aware personalized learning materials. However, it remains unclear how content personalization shapes the learners’ experience. We developed one personalized and two non-personalized, crowdsourced versions of a mobile language learning app: (1) with personalized auto-generated photo flashcards, (2) the same flashcards provided through crowdsourcing, and (3) manually generated flashcards based on the same photos. A two-week in-situ study (n = 64) showed that learners assessed the quality of the non-personalized auto-generated material to be on par with manually generated material, which means that auto-generation is viable. However, when the auto-generation was personalized, the learners’ quality rating was significantly lower. Further analyses suggest that aspects such as prior expectations and required efforts must be addressed before learners can actually benefit from context-aware personalization with auto-generated material. We discuss design implications and provide an outlook on the role of content personalization in AI-supported learning.

MCML Authors

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[12]

L. Haliburton, S. Kheirinejad, A. Schmidt and S. Mayer.
Exploring Smart Standing Desks to Foster a Healthier Workplace.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7.2 (Jun. 2023). DOI

Abstract

Sedentary behavior is endemic in modern workplaces, contributing to negative physical and mental health outcomes. Although adjustable standing desks are increasing in popularity, people still avoid standing. We developed an open-source plug-and-play system to remotely control standing desks and investigated three system modes with a three-week in-the-wild user study (N=15). Interval mode forces users to stand once per hour, causing frustration. Adaptive mode nudges users to stand every hour unless the user has stood already. Smart mode, which raises the desk during breaks, was the best rated, contributing to increased standing time with the most positive qualitative feedback. However, non-computer activities need to be accounted for in the future. Therefore, our results indicate that a smart standing desk that shifts modes at opportune times has the most potential to reduce sedentary behavior in the workplace. We contribute our open-source system and insights for future intelligent workplace well-being systems.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[11]

J. W. Grootjen, H. Weingärtner and S. Mayer.
Highlighting the Challenges of Blinks in Eye Tracking for Interactive Systems.
PETMEI @ETRA 2023 - 8th International Workshop on Pervasive Eye Tracking and Mobile Eye-Based Interaction at the ACM Symposium on Eye Tracking Research and Applications (ETRA 2023). Tübingen, Germany, May 30-Jun 02, 2023. DOI

Abstract

Eye tracking is the basis for many intelligent systems to predict user actions. A core challenge with eye-tracking data is that it inherently suffers from missing data due to blinks. Approaches such as intent prediction and user state recognition process gaze data using neural networks; however, they often have difficulty handling missing information. In an effort to understand how prior work dealt with missing data, we found that researchers often simply ignore missing data or adopt use-case-specific approaches, such as artificially filling in missing data. This inconsistency in handling missing data in eye tracking hinders the development of effective intelligent systems for predicting user actions and limits reproducibility. Furthermore, this can even lead to incorrect results. Thus, this lack of standardization calls for investigating possible solutions to improve the consistency and effectiveness of processing eye-tracking data for user action prediction.

MCML Authors

Jesse Grootjen

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[10]

M. Rusu and S. Mayer.
Deep Learning Super-Resolution Network Facilitating Fiducial Tangibles on Capacitive Touchscreens.
CHI 2023 - Conference on Human Factors in Computing Systems. Hamburg, Germany, Apr 23-28, 2023. DOI

Abstract

Over the last few years, we have seen many approaches using tangibles to address the limited expressiveness of touchscreens. Mainstream tangible detection uses fiducial markers embedded in the tangibles. However, the coarse sensor size of capacitive touchscreens makes tangibles bulky, limiting their usefulness. We propose a novel deep-learning super-resolution network to facilitate fiducial tangibles on capacitive touchscreens better. In detail, our network super-resolves the markers enabling off-the-shelf detection algorithms to track tangibles reliably. Our network generalizes to unseen marker sets, such as AprilTag, ArUco, and ARToolKit. Therefore, we are not limited to a fixed number of distinguishable objects and do not require data collection and network training for new fiducial markers. With extensive evaluation, including real-world users and five showcases, we demonstrate the applicability of our open-source approach on commodity mobile devices and further highlight the potential of tangibles on capacitive touchscreens.

MCML Authors

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[9]

M. Windl, A. Schmidt and S. S. Feger.
Investigating Tangible Privacy-Preserving Mechanisms for Future Smart Homes.
CHI 2023 - Conference on Human Factors in Computing Systems. Hamburg, Germany, Apr 23-28, 2023. DOI

Abstract

Most smart home devices have multiple sensors, such as cameras and microphones; however, most cannot be controlled individually. Tangible privacy mechanisms provide control over individual sensors and instill high certainty of privacy. Yet, it remains unclear how they can be used in future smart homes. We conducted three studies to understand how tangible privacy mechanisms scale across multiple devices and respond to user needs. First, we conducted a focus group (N=8) on speculative tangible control artifacts to understand the user perspective. Second, we ran a workshop at a human-computer interaction conference (N=8) on tangible privacy. Third, we conducted a six-week in-the-wild study with a tangible, static privacy dashboard across six households. Our findings help to contrast the need for tangible privacy mechanisms on the sensor level with user needs on a smart home level. Finally, we discuss our design implications for future smart homes through the lens of inclusive privacy.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[8]

M. Windl, V. Winterhalter, A. Schmidt and S. Mayer.
Understanding and Mitigating Technology-Facilitated Privacy Violations in the Physical World.
CHI 2023 - Conference on Human Factors in Computing Systems. Hamburg, Germany, Apr 23-28, 2023. DOI

Abstract

We are constantly surrounded by technology that collects and processes sensitive data, paving the way for privacy violations. Yet, current research investigating technology-facilitated privacy violations in the physical world is scattered and focused on specific scenarios or investigates such violations purely from an expert’s perspective. Informed through a large-scale online survey, we first construct a scenario taxonomy based on user-experienced privacy violations in the physical world through technology. We then validate our taxonomy and establish mitigation strategies using interviews and co-design sessions with privacy and security experts. In summary, this work contributes (1) a refined scenario taxonomy for technology-facilitated privacy violations in the physical world, (2) an understanding of how privacy violations manifest in the physical world, (3) a decision tree on how to inform users, and (4) a design space to create notices whenever adequate. With this, we contribute a conceptual framework to enable a privacy-preserving technology-connected world.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[7]

L. Haliburton, S. Y. Schött, L. Hirsch, R. Welsch and A. Schmidt.
Feeling the Temperature of the Room: Unobtrusive Thermal Display of Engagement during Group Communication.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7.2 (Mar. 2023). DOI

Abstract

Thermal signals have been explored in HCI for emotion-elicitation and enhancing two-person communication, showing that temperature invokes social and emotional signals in individuals. Yet, extending these findings to group communication is missing. We investigated how thermal signals can be used to communicate group affective states in a hybrid meeting scenario to help people feel connected over a distance. We conducted a lab study (N=20 participants) and explored wrist-worn thermal feedback to communicate audience emotions. Our results show that thermal feedback is an effective method of conveying audience engagement without increasing workload and can help a presenter feel more in tune with the audience. We outline design implications for real-world wearable social thermal feedback systems for both virtual and in-person communication that support group affect communication and social connectedness. Thermal feedback has the potential to connect people across distances and facilitate more effective and dynamic communication in multiple contexts.

MCML Authors

Luke Haliburton

Dr.

* Former Member

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[6]

D. Bethge, L. F. Coelho, T. Kosch, S. Murugaboopathy, U. von Zadow, A. Schmidt and T. Grosse-Puppendahl.
Technical Design Space Analysis for Unobtrusive Driver Emotion Assessment Using Multi-Domain Context.
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6.4 (Dec. 2022). DOI

Abstract

Driver emotions play a vital role in driving safety and performance. Consequently, regulating driver emotions through empathic interfaces have been investigated thoroughly. However, the prerequisite - driver emotion sensing - is a challenging endeavor: Body-worn physiological sensors are intrusive, while facial and speech recognition only capture overt emotions. In a user study (N=27), we investigate how emotions can be unobtrusively predicted by analyzing a rich set of contextual features captured by a smartphone, including road and traffic conditions, visual scene analysis, audio, weather information, and car speed. We derive a technical design space to inform practitioners and researchers about the most indicative sensing modalities, the corresponding impact on users’ privacy, and the computational cost associated with processing this data. Our analysis shows that contextual emotion recognition is significantly more robust than facial recognition, leading to an overall improvement of 7% using a leave-one-participant-out cross-validation.

MCML Authors

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[5]

J. Ullerich, M. Windl, A. Bulling and S. Mayer.
ThumbPitch: Enriching Thumb Interaction on Mobile Touchscreens using Deep Learning.
OZCHI 2022 - 33rd Australian Conference on Human-Computer Interaction. Canberra, NSW, Australia, Nov 29-Dec 02, 2022. DOI

Abstract

Today touchscreens are one of the most common input devices for everyday ubiquitous interaction. Yet, capacitive touchscreens are limited in expressiveness; thus, a large body of work has focused on extending the input capabilities of touchscreens. One promising approach is to use index finger orientation; however, this requires a two-handed interaction and poses ergonomic constraints. We propose using the thumb’s pitch as an additional input dimension to counteract these limitations, enabling one-handed interaction scenarios. Our deep convolutional neural network detecting the thumb’s pitch is trained on more than 230,000 ground truth images recorded using a motion tracking system. We highlight the potential of ThumbPitch by proposing several use cases that exploit the higher expressiveness, especially for one-handed scenarios. We tested three use cases in a validation study and validated our model. Our model achieved a mean error of only 11.9°.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[4]

M. Windl, A. Hiesinger, R. Welsch, A. Schmidt and S. S. Feger.
SaferHome: Interactive Physical and Digital Smart Home Dashboards for Communicating Privacy Assessments to Owners and Bystanders.
ISS 2022 - ACM Interactive Surfaces and Spaces Conference. Wellington, New Zealand, Nov 20-23, 2022. DOI

Abstract

Private homes are increasingly becoming smart spaces. While smart homes promise comfort, they expose most intimate spaces to security and privacy risks. Unfortunately, most users today are not equipped with the right tools to assess the vulnerabilities or privacy practices of smart devices. Further, users might lose track of the devices installed in their homes or are unaware of devices placed by a partner or host. We developed SaferHome, an interactive digital-physical privacy framework, to provide smart home users with security and privacy assessments and a sense of device location. SaferHome includes a digital list view and physical and digital dashboards that map real floor plans. We evaluated SaferHome with eight households in the wild. We find that users adopted various strategies to integrate the dashboards into their understanding and interpretation of smart home privacy. We present implications for the design of future smart home privacy frameworks that are impacted by technical affinity, device types, device ownership, and tangibility of assessments.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[3]

M. Windl and S. Mayer.
The Skewed Privacy Concerns of Bystanders in Smart Environments.
MobileHCI 2022 - ACM International Conference on Mobile Human-Computer Interaction. Vancouver, Canada, Sep 28-Oct 01, 2022. DOI

Abstract

As ubiquitous computing brings sensors and actuators directly into our homes, they introduce privacy concerns for the owners and bystanders. However, privacy concerns may vary among devices and depend on the bystanders’ social relation to the owner. In this work, we hypothesize 1) that bystanders assign more privacy concerns to smart home devices than personal computing devices, such as smartphones, even though they have the same capabilities, and 2) that a stronger social relationship mitigates some of the bystanders’ privacy concerns. By conducting an online survey (n=170), we found that personal computing devices are perceived as significantly less privacy-concerning than smart home devices while having equal capabilities. By varying the assumed social relationship, we further found that a stronger connection to the owner reduces privacy concerns. Thus, as bystanders underestimate the risk of personal computing devices and are generally concerned about smart home devices, it is essential to alert the user about the presence of both. We argue that bystanders have to be informed about the privacy risks while entering a new space, in the best case, already in the entrance area.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Sven Mayer

Prof. Dr.

Human-Centered Ubiquitous Media

[2]

M. Windl, S. S. Feger, L. Zijlstra, A. Schmidt and P. W. Wozniak.
‘It Is Not Always Discovery Time’: Four Pragmatic Approaches in Designing AI Systems.
CHI 2022 - Conference on Human Factors in Computing Systems. New Orleans, LA, USA, Apr 30-May 05, 2022. DOI

Abstract

While systems that use Artificial Intelligence (AI) are increasingly becoming part of everyday technology use, we do not fully understand how AI changes design processes. A structured understanding of how designers work with AI is needed to improve the design process and educate future designers. To that end, we conducted interviews with designers who participated in projects which used AI. While past work focused on AI systems created by experienced designers, we focus on the perspectives of a diverse sample of interaction designers. Our results show that the design process of an interactive system is affected when AI is integrated and that design teams adapt their processes to accommodate AI. Based on our data, we contribute four approaches adopted by interaction designers working with AI: a priori, post-hoc, model-centric, and competence-centric. Our work contributes a pragmatic account of how design processes for AI systems are enacted.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.

Human-Centered Ubiquitous Media

[1]

M. Windl, N. Henze, A. Schmidt and S. S. Feger.
Automating Contextual Privacy Policies: Design and Evaluation of a Production Tool for Digital Consumer Privacy Awareness.
CHI 2022 - Conference on Human Factors in Computing Systems. New Orleans, LA, USA, Apr 30-May 05, 2022. DOI

Abstract

Users avoid engaging with privacy policies because they are lengthy and complex, making it challenging to retrieve relevant information. In response, research proposed contextual privacy policies (CPPs) that embed relevant privacy information directly into their affiliated contexts. To date, CPPs are limited to concept showcases. This work evolves CPPs into a production tool that automatically extracts and displays concise policy information. We first evaluated the technical functionality on the US’s 500 most visited websites with 59 participants. Based on our results, we further revised the tool to deploy it in the wild with 11 participants over ten days. We found that our tool is effective at embedding CPP information on websites. Moreover, we found that the tool’s usage led to more reflective privacy behavior, making CPPs powerful in helping users understand the consequences of their online activities. We contribute design implications around CPP presentation to inform future systems design.

MCML Authors

Maximiliane Windl

Human-Centered Ubiquitous Media

Albrecht Schmidt

Prof. Dr.