Home | Research | Groups | Christian Müller

Research Group Christian Müller

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

C2 | Biology

Christian Müller

is head of the Workgroup for Biomedical Statistics and Data Science at LMU Munich.

His group focus on developing and applying computational statistics and data science methods for the analysis of biological systems and is involved in multiple projects, ranging from the study of microbial communities to the dissection of epigenetic datasets.

Team members @MCML

Link to Stefanie Peschel

Stefanie Peschel

Biomedical Statistics and Data Science

C2 | Biology

Link to Viet Tran

Viet Tran

Biomedical Statistics and Data Science

C2 | Biology

Publications @MCML

[4]
C. Kolb, B. Bischl, C. L. Müller and D. Rügamer.
Sparse Modality Regression.
37th International Workshop on Statistical Modelling (IWSM 2023). Dortmund, Germany, Jul 17-21, 2023. Best Paper Award. PDF.
MCML Authors
Link to Chris Kolb

Chris Kolb

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

C2 | Biology

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

A1 | Statistical Foundations & Explainability


[3]
C. Kolb, C. L. Müller, B. Bischl and D. Rügamer.
Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization.
Under Review (Jul. 2023). arXiv.
MCML Authors
Link to Chris Kolb

Chris Kolb

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

C2 | Biology

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

A1 | Statistical Foundations & Explainability


[2]
T. Ullmann, S. Peschel, P. Finger, C. L. Müller and A.-L. Boulesteix.
Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering.
PLOS Computational Biology 19.1 (Jan. 2023). DOI.
Abstract

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.

MCML Authors
Link to Stefanie Peschel

Stefanie Peschel

Biomedical Statistics and Data Science

C2 | Biology

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

C2 | Biology

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[1]
D. Rügamer, A. Bender, S. Wiegrebe, D. Racek, B. Bischl, C. L. Müller and C. Stachl.
Factorized Structured Regression for Large-Scale Varying Coefficient Models.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022). Grenoble, France, Sep 19-22, 2022. DOI.
MCML Authors
Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

A1 | Statistical Foundations & Explainability

Link to Andreas Bender

Andreas Bender

Dr.

Statistical Learning & Data Science

Coordinator Statistical and Machine Learning Consulting

A1 | Statistical Foundations & Explainability

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

C2 | Biology