heads the Statistical Consulting Unit (StaBLab) at LMU Munich, which is known for providing expert statistical guidance to both academic researchers and industries.
His research interests include statistical modeling, measurement error, and misclassification, with a focus on applying statistical techniques to real-world data, including the analysis of COVID-19 data.
Understanding how assignments of instances to clusters can be attributed to the features can be vital in many applications. However, research to provide such feature attributions has been limited. Clustering algorithms with built-in explanations are scarce. Common algorithm-agnostic approaches involve dimension reduction and subsequent visualization, which transforms the original features used to cluster the data; or training a supervised learning classifier on the found cluster labels, which adds additional and intractable complexity. We present FACT (feature attributions for clustering), an algorithm-agnostic framework that preserves the integrity of the data and does not introduce additional models. As the defining characteristic of FACT, we introduce a set of work stages: sampling, intervention, reassignment, and aggregation. Furthermore, we propose two novel FACT methods: SMART (scoring metric after permutation) measures changes in cluster assignments by custom scoring functions after permuting selected features; IDEA (isolated effect on assignment) indicates local and global changes in cluster assignments after making uniform changes to selected features.
Recurrent events analysis plays an important role in many applications, including the study of chronic diseases or recurrence of infections. Historically, many models for recurrent events have been variants of the Cox model. In this article we introduce and describe the application of the piece-wise exponential Additive Mixed Model (PAMM) for recurrent events analysis and illustrate how PAMMs can be used to flexibly model the dependencies in recurrent events data. Simulations confirm that PAMMs provide unbiased estimates as well as equivalence to the Cox model when proportional hazards are assumed. Applications to recurrence of staphylococcus aureus and malaria in children illustrate the estimation of seasonality, bivariate non-linear effects, multiple timescales and relaxation of the proportional hazards assumption via time-varying effects. The R package pammtools is extended to facilitate estimation and visualization of PAMMs for recurrent events data.
The association between protein intake and the need for mechanical ventilation (MV) is controversial. We aimed to investigate the associations between protein intake and outcomes in ventilated critically ill patients.
Real-time surveillance is a crucial element in the response to infectious disease outbreaks. However, the interpretation of incidence data is often hampered by delays occurring at various stages of data gathering and reporting. As a result, recent values are biased downward, which obscures current trends. Statistical nowcasting techniques can be employed to correct these biases, allowing for accurate characterization of recent developments and thus enhancing situational awareness. In this paper, we present a preregistered real-time assessment of eight nowcasting approaches, applied by independent research teams to German 7-day hospitalization incidences during the COVID-19 pandemic. This indicator played an important role in the management of the outbreak in Germany and was linked to levels of non-pharmaceutical interventions via certain thresholds. Due to its definition, in which hospitalization counts are aggregated by the date of case report rather than admission, German hospitalization incidences are particularly affected by delays and can take several weeks or months to fully stabilize. For this study, all methods were applied from 22 November 2021 to 29 April 2022, with probabilistic nowcasts produced each day for the current and 28 preceding days. Nowcasts at the national, state, and age-group levels were collected in the form of quantiles in a public repository and displayed in a dashboard. Moreover, a mean and a median ensemble nowcast were generated. We find that overall, the compared methods were able to remove a large part of the biases introduced by delays. Most participating teams underestimated the importance of very long delays, though, resulting in nowcasts with a slight downward bias. The accompanying prediction intervals were also too narrow for almost all methods. Averaged over all nowcast horizons, the best performance was achieved by a model using case incidences as a covariate and taking into account longer delays than the other approaches. For the most recent days, which are often considered the most relevant in practice, a mean ensemble of the submitted nowcasts performed best. We conclude by providing some lessons learned on the definition of nowcasting targets and practical challenges.
Maximilian Weigert
* Former member
Antibody studies analyze immune responses to SARS-CoV-2 vaccination and infection, which is crucial for selecting vaccination strategies. In the KoCo-Impf study, conducted between 16 June and 16 December 2021, 6088 participants aged 18 and above from Munich were recruited to monitor antibodies, particularly in healthcare workers (HCWs) at higher risk of infection. Roche Elecsys® Anti-SARS-CoV-2 assays on dried blood spots were used to detect prior infections (anti-Nucleocapsid antibodies) and to indicate combinations of vaccinations/infections (anti-Spike antibodies). The anti-Spike seroprevalence was 94.7%, whereas, for anti-Nucleocapsid, it was only 6.9%. HCW status and contact with SARS-CoV-2-positive individuals were identified as infection risk factors, while vaccination and current smoking were associated with reduced risk. Older age correlated with higher anti-Nucleocapsid antibody levels, while vaccination and current smoking decreased the response. Vaccination alone or combined with infection led to higher anti-Spike antibody levels. Increasing time since the second vaccination, advancing age, and current smoking reduced the anti-Spike response. The cumulative number of cases in Munich affected the anti-Spike response over time but had no impact on anti-Nucleocapsid antibody development/seropositivity. Due to the significantly higher infection risk faced by HCWs and the limited number of significant risk factors, it is suggested that all HCWs require protection regardless of individual traits.
Maximilian Weigert
* Former member
As early as March 2020, the authors of this letter started to work on surveillance data to obtain a clearer picture of the pandemic’s dynamic. This letter outlines the lessons learned during this peculiar time, emphasizing the benefits that better data collection, management, and communication processes would bring to the table. We further want to promote nuanced data analyses as a vital element of general political discussion as opposed to drawing conclusions from raw data, which are often flawed in epidemiological surveillance data, and therefore underline the overall need for statistics to play a more central role in public discourse.
Maximilian Weigert
* Former member
Despite huge advances in local and systemic therapies, the 5-year relative survival rate for patients with metastatic CRC is still low. To avoid over- or undertreatment, proper risk stratification with regard to treatment strategy is highly needed. As EMT (epithelial-mesenchymal transition) is a major step in metastatic spread, this study analysed the prognostic effect of EMT-related genes in stage IV colorectal cancer patients using the study cohort of the FIRE-3 trial, an open-label multi-centre randomised controlled phase III trial of stage IV colorectal cancer patients. Overall, the prognostic relevance of EMT-related genes seems stage-dependent. EMT-related genes have no prognostic relevance in stage IV CRC as opposed to stage II/III.
Motivation: In this article, we consider how to evaluate survival distribution predictions with measures of discrimination. This is non-trivial as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages.
Results: Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons or ‘C-hacking’. We demonstrate by example how simple it can be to manipulate results and use this to argue for better reporting guidelines and transparency in the literature. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation.
Over the course of the COVID-19 pandemic, Generalized Additive Models (GAMs) have been successfully employed on numerous occasions to obtain vital data-driven insights. In this article we further substantiate the success story of GAMs, demonstrating their flexibility by focusing on three relevant pandemic-related issues. First, we examine the interdepency among infections in different age groups, concentrating on school children. In this context, we derive the setting under which parameter estimates are independent of the (unknown) case-detection ratio, which plays an important role in COVID-19 surveillance data. Second, we model the incidence of hospitalizations, for which data is only available with a temporal delay. We illustrate how correcting for this reporting delay through a nowcasting procedure can be naturally incorporated into the GAM framework as an offset term. Third, we propose a multinomial model for the weekly occupancy of intensive care units (ICU), where we distinguish between the number of COVID-19 patients, other patients and vacant beds. With these three examples, we aim to showcase the practical and ‘off-the-shelf’ applicability of GAMs to gain new insights from real-world data.
Maximilian Weigert
* Former member
High- and low pressure systems of the large-scale atmospheric circulation in the mid-latitudes drive European weather and climate. Potential future changes in the occurrence of circulation types are highly relevant for society. Classifying the highly dynamic atmospheric circulation into discrete classes of circulation types helps to categorize the linkages between atmospheric forcing and surface conditions (e.g. extreme events). Previous studies have revealed a high internal variability of projected changes of circulation types. Dealing with this high internal variability requires the employment of a single-model initial-condition large ensemble (SMILE) and an automated classification method, which can be applied to large climate data sets. One of the most established classifications in Europe are the 29 subjective circulation types called Grosswetterlagen by Hess & Brezowsky (HB circulation types). We developed, in the first analysis of its kind, an automated version of this subjective classification using deep learning. Our classifier reaches an overall accuracy of 41.1% on the test sets of nested cross-validation. It outperforms the state-of-the-art automatization of the HB circulation types in 20 of the 29 classes. We apply the deep learning classifier to the SMHI-LENS, a SMILE of the Coupled Model Intercomparison Project phase 6, composed of 50 members of the EC-Earth3 model under the SSP37.0 scenario. For the analysis of future frequency changes of the 29 circulation types, we use the signal-to-noise ratio to discriminate the climate change signal from the noise of internal variability. Using a 5%-significance level, we find significant frequency changes in 69% of the circulation types when comparing the future (2071–2100) to a reference period (1991–2020).
Maximilian Weigert
* Former member
Age-Period-Cohort (APC) analysis aims to determine relevant drivers for long-term developments and is used in many fields of science (Yang & Land, 2013). The R package APCtools offers modern visualization techniques and general routines to facilitate the interpretability of the interdependent temporal structures and to simplify the workflow of an APC analysis. Separation of the temporal effects is performed utilizing a semiparametric regression approach. We shortly discuss the challenges of APC analysis, give an overview of existing statistical software packages and outline the main functionalities of the package.
Alexander Bauer
* Former member
Maximilian Weigert
* Former member
This dissertation develops new approaches for robustly estimating functional data structures and analyzing age-period-cohort (APC) effects, with applications in seismology and tourism science. The first part introduces a method that separates amplitude and phase variation in functional data, adapting a likelihood-based registration approach for generalized and incomplete data, demonstrated on seismic data. The second part presents generalized functional additive models (GFAMs) for analyzing associations between functional data and scalar covariates, along with practical guidelines and an R package. The final part addresses APC analysis, proposing new visualization techniques and a semiparametric estimation approach to disentangle temporal dimensions, with applications to tourism data, and is supported by the APCtools R package. (Shortened.)
Alexander Bauer
* Former member
As the COVID-19 pandemic continues to threaten various regions around the world, obtaining accurate and reliable COVID-19 data is crucial for governments and local communities aiming at rigorously assessing the extent and magnitude of the virus spread and deploying efficient interventions. Using data reported between January and February 2020 in China, we compared counts of COVID-19 from near-real-time spatially disaggregated data (city level) with fine-spatial scale predictions from a Bayesian downscaling regression model applied to a reference province-level data set. The results highlight discrepancies in the counts of coronavirus-infected cases at the district level and identify districts that may require further investigation.
Accounting for phase variability is a critical challenge in functional data analysis. To separate it from amplitude variation, functional data are registered, i.e., their observed domains are deformed elastically so that the resulting functions are aligned with template functions. At present, most available registration approaches are limited to datasets of complete and densely measured curves with Gaussian noise. However, many real-world functional data sets are not Gaussian and contain incomplete curves, in which the underlying process is not recorded over its entire domain. In this work, we extend and refine a framework for joint likelihood-based registration and latent Gaussian process-based generalized functional principal component analysis that is able to handle incomplete curves. Our approach is accompanied by sophisticated open-source software, allowing for its application in diverse non-Gaussian data settings and a public code repository to reproduce all results. We register data from a seismological application comprising spatially indexed, incomplete ground velocity time series with a highly volatile Gamma structure. We describe, implement and evaluate the approach for such incomplete non-Gaussian functional data and compare it to existing routines.
Alexander Bauer
* Former member
Several thousand people die every year worldwide because of terrorist attacks perpetrated by non-state actors. In this context, reliable and accurate short-term predictions of non-state terrorism at the local level are key for policy makers to target preventative measures. Using only publicly available data, we show that predictive models that include structural and procedural predictors can accurately predict the occurrence of non-state terrorism locally and a week ahead in regions affected by a relatively high prevalence of terrorism. In these regions, theoretically informed models systematically outperform models using predictors built on past terrorist events only. We further identify and interpret the local effects of major global and regional terrorism drivers. Our study demonstrates the potential of theoretically informed models to predict and explain complex forms of political violence at policy-relevant scales.
This study investigates how age, period, and birth cohorts are related to altering travel distances. We analyze a repeated cross-sectional survey of German pleasure travels for the period 1971–2018 using a holistic age–period–cohort (APC) analysis framework. Changes in travel distances are attributed to the life cycle (age effect), macro-level developments (period effect), and generational membership (cohort effect). We introduce ridgeline matrices and partial APC plots as innovative visualization techniques facilitating the intuitive interpretation of complex temporal structures. Generalized additive models are used to circumvent the identification problem by fitting a bivariate tensor product spline between age and period. The results indicate that participation in short-haul trips is mainly associated with age, while participation in long-distance travel predominantly changed over the period. Generational membership shows less association with destination choice concerning travel distance. The presented APC approach is promising to address further questions of interest in tourism research.
Maximilian Weigert
* Former member
Alexander Bauer
* Former member
©all images: LMU | TUM