Home | Research | Groups | Christian Müller

Research Group Christian Müller

Link to website at LMU

Christian Müller

Prof. Dr.

Principal Investigator

Biomedical Statistics and Data Science

Christian Müller

is head of the Workgroup for Biomedical Statistics and Data Science at LMU Munich.

His group focus on developing and applying computational statistics and data science methods for the analysis of biological systems and is involved in multiple projects, ranging from the study of microbial communities to the dissection of epigenetic datasets.

Team members @MCML

Link to website

Stefanie Peschel

Biomedical Statistics and Data Science

Link to website

Viet Tran

Biomedical Statistics and Data Science

Publications @MCML

2023


[4]
C. Kolb, B. Bischl, C. L. Müller and D. Rügamer.
Sparse Modality Regression.
IWSM 2023 - 37th International Workshop on Statistical Modelling. Dortmund, Germany, Jul 17-21, 2023. Best Paper Award. PDF
Abstract

Deep neural networks (DNNs) enable learning from various data modalities, such as images or text. This concept has also found its way into statistical modelling through the use of semi-structured regression, a model additively combining structured predictors with unstructured effects from arbitrary data modalities learned through a DNN. This paper introduces a new framework called sparse modality regression (SMR). SMR is a regression model combining different data modalities and uses a group lasso-type regularization approach to perform modality selection by zeroing out potentially uninformative modalities.

MCML Authors
Link to website

Chris Kolb

Statistical Learning & Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Profile Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[3]
C. Kolb, C. L. Müller, B. Bischl and D. Rügamer.
Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization.
Under Review (Jul. 2023). arXiv
Abstract

We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning. The proposed optimization transfer comprises an overparameterization of selected parameters and a change of penalties. In the overparametrized problem, smooth surrogate regularization induces non-smooth, sparse regularization in the base parametrization. We prove that the surrogate objective is equivalent in the sense that it not only has identical global minima but also matching local minima, thereby avoiding the introduction of spurious solutions. Additionally, our theory establishes results of independent interest regarding matching local minima for arbitrary, potentially unregularized, objectives. We comprehensively review sparsity-inducing parametrizations across different fields that are covered by our general theory, extend their scope, and propose improvements in several aspects. Numerical experiments further demonstrate the correctness and effectiveness of our approach on several sparse learning problems ranging from high-dimensional regression to sparse neural network training.

MCML Authors
Link to website

Chris Kolb

Statistical Learning & Data Science

Link to Profile Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Data Science Group


[2]
T. Ullmann, S. Peschel, P. Finger, C. L. Müller and A.-L. Boulesteix.
Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering.
PLOS Computational Biology 19.1 (Jan. 2023). DOI
Abstract

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.

MCML Authors
Theresa Ullmann

Theresa Ullmann

Dr.

Biometry in Molecular Medicine

Link to website

Stefanie Peschel

Biomedical Statistics and Data Science

Link to Profile Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

Link to Profile Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine


2022


[1]
D. Rügamer, A. Bender, S. Wiegrebe, D. Racek, B. Bischl, C. L. Müller and C. Stachl.
Factorized Structured Regression for Large-Scale Varying Coefficient Models.
ECML-PKDD 2022 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Grenoble, France, Sep 19-23, 2022. DOI
Abstract

Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structured regression models, including time-aware varying coefficients models, are, however, limited in their applicability to categorical effects and inclusion of a large number of interactions. Here, we propose Factorized Structured Regression (FaStR) for scalable varying coefficient models. FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation. This fusion provides a scalable framework for the estimation of statistical models in previously infeasible data settings. Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques, while scaling notably better and also being competitive with other time-aware RS in terms of prediction performance. We illustrate FaStR’s performance and interpretability on a large-scale behavioral study with smartphone user data.

MCML Authors
Link to Profile David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

Link to website

Andreas Bender

Dr.

Machine Learning Consulting Unit (MLCU)

Link to Profile Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

Link to Profile Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science