Home | Research | Groups | Anne-Laure Boulesteix

Research Group Anne-Laure Boulesteix

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Anne-Laure Boulesteix

is Professor for Biometry in Molecular Medicine at LMU Munich.

Her working group focuses on developing advanced biostatistical methods for prediction modeling and high-dimensional data analysis, with applications in biomedical research, especially omics data. Additionally, they engage in metascience, examining research practices to improve study reliability and address issues like selective reporting and researchers’ degrees of freedom.

Team members @MCML

Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Coordinator for Reproducibility & Open Science

A1 | Statistical Foundations & Explainability

Link to Roman Hornung

Roman Hornung

Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Julian Lange

Julian Lange

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Maximilian Mandl

Maximilian Mandl

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Christina Sauer (née Nießl)

Christina Sauer (née Nießl)

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Hannah Schulz-Kümpel

Hannah Schulz-Kümpel

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Milena Wünsch

Milena Wünsch

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Publications @MCML

[24]
M. Herrmann, F. J. D. Lange, K. Eggensperger, G. Casalicchio, M. Wever, M. Feurer, D. Rügamer, E. Hüllermeier, A.-L. Boulesteix and B. Bischl.
Position: Why We Must Rethink Empirical Research in Machine Learning.
41st International Conference on Machine Learning (ICML 2024). Vienna, Austria, Jul 21-27, 2024. URL.
Abstract

We warn against a common but incomplete understanding of empirical research in machine learning (ML) that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical ML research is fashioned as confirmatory research while it should rather be considered exploratory.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Coordinator for Reproducibility & Open Science

A1 | Statistical Foundations & Explainability

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Marcel Wever

Marcel Wever

Dr.

* Former member

A3 | Computational Models

Link to Matthias Feurer

Matthias Feurer

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

A1 | Statistical Foundations & Explainability

Link to Eyke Hüllermeier

Eyke Hüllermeier

Prof. Dr.

Artificial Intelligence & Machine Learning

A3 | Computational Models

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability


[23]
M. M. Mandl, S. Hoffmann, S. Bieringer, A. E. Jacob, M. Kraft, S. Lemster and A.-L. Boulesteix.
Raising awareness of uncertain choices in empirical data analysis: A teaching concept towards replicable research practices.
PLOS Computational Biology 20.3 (2024). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[22]
M. M. Mandl, A. S. Becker-Pennrich, L. C. Hinske, S. Hoffmann and A.-L. Boulesteix.
Addressing researcher degrees of freedom through minP adjustment.
Preprint at arXiv (Jan. 2024). arXiv.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[21]
Z. S. Dunias, B. Van Calster, D. Timmerman, A.-L. Boulesteix and M. van Smeden.
A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study.
Statistics in Medicine (Jan. 2024). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[20]
M. Wünsch, C. Sauer, P. Callahan, L. C. Hinske and A.-L. Boulesteix.
From RNA sequencing measurements to the final results: a practical guide to navigating the choices and uncertainties of gene set analysis.
Wiley Interdisciplinary Reviews: Computational Statistics 16.1 (Jan. 2024). DOI.
Abstract

Gene set analysis (GSA), a popular approach for analyzing high-throughput gene expression data, aims to identify sets of related genes that show significantly enriched or depleted expression patterns between different conditions. In the last years, a multitude of methods have been developed for this task. However, clear guidance is lacking: choosing the right method is the first hurdle a researcher is confronted with. No less challenging than overcoming this so-called method uncertainty is the procedure of preprocessing, from knowing which steps are required to selecting a corresponding approach from the plethora of valid options to create the accepted input object (data preprocessing uncertainty), with clear guidance again being scarce. Here, we provide a practical guide through all steps required to conduct GSA, beginning with a concise overview of a selection of established methods, including Gene Set Enrichment Analysis and Database for Annotation, Visualization, and Integrated Discovery (DAVID). We thereby lay a special focus on reviewing and explaining the necessary preprocessing steps for each method under consideration (e.g., the necessity of a transformation of the RNA sequencing data)—an essential aspect that is typically paid only limited attention to in both existing reviews and applications. To raise awareness of the spectrum of uncertainties, our review is accompanied by an extensive overview of the literature on valid approaches for each step and illustrative R code demonstrating the complex analysis pipelines. It ends with a discussion and recommendations to both users and developers to ensure that the results of GSA are, despite the above-mentioned uncertainties, replicable and transparent.

MCML Authors
Link to Milena Wünsch

Milena Wünsch

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Christina Sauer (née Nießl)

Christina Sauer (née Nießl)

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[19]
R. Hornung, M. Nalenz, L. Schneider, A. Bender, L. Bothmann, B. Bischl, T. Augustin and A.-L. Boulesteix.
Evaluating machine learning models in non-standard settings: An overview and new findings.
Preprint at arXiv (Oct. 2023). arXiv.
MCML Authors
Link to Roman Hornung

Roman Hornung

Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Lennart Schneider

Lennart Schneider

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Andreas Bender

Andreas Bender

Dr.

Statistical Learning & Data Science

Coordinator Statistical and Machine Learning Consulting

A1 | Statistical Foundations & Explainability

Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[18]
S. Hoffmann, F. Scheipl and A.-L. Boulesteix.
Reproduzierbare und replizierbare Forschung.
Moderne Verfahren der Angewandten Statistik (Sep. 2023). URL.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[17]
I. van Mechelen, A.-L. Boulesteix, R. Dangl, N. Dean, C. Hennig, F. Leisch, D. Steinley and M. J. Warrens.
A white paper on good research practices in benchmarking: The case of cluster analysis.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13.6 (Jul. 2023). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[16]
R. Hornung, F. Ludwigs, J. Hagenberg and A.-L. Boulesteix.
Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study.
Wiley Interdisciplinary Reviews: Computational Statistics 16 (Jun. 2023). DOI.
MCML Authors
Link to Roman Hornung

Roman Hornung

Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[15]
C. Nießl, S. Hoffmann, T. Ullmann and A.-L. Boulesteix.
Explaining the optimistic performance evaluation of newly proposed methods: A cross-design validation experiment.
Biometrical Journal (Mar. 2023). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[14]
B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A.-L. Boulesteix, D. Deng and M. Lindauer.
Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13.2 (Mar. 2023). DOI.
Abstract

Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time-consuming and irreproducible manual process of trial-and-error to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.

MCML Authors
Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Martin Binder

Martin Binder

Statistical Learning & Data Science

Coordinator for Open Source & Open Data

A1 | Statistical Foundations & Explainability

Michel Lang

Michel Lang

Dr.

* Former member

A1 | Statistical Foundations & Explainability

Link to Tobias Pielok

Tobias Pielok

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Stefan Coors

Stefan Coors

* Former member

A1 | Statistical Foundations & Explainability

Link to Marc Becker

Marc Becker

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[13]
T. Ullmann, S. Peschel, P. Finger, C. L. Müller and A.-L. Boulesteix.
Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering.
PLOS Computational Biology 19.1 (Jan. 2023). DOI.
Abstract

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.

MCML Authors
Link to Stefanie Peschel

Stefanie Peschel

Biomedical Statistics and Data Science

C2 | Biology

Link to Christian Müller

Christian Müller

Prof. Dr.

Biomedical Statistics and Data Science

C2 | Biology

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[12]
M. van Smeden, G. Heinze, B. Van Calster, F. W. Asselbergs, P. E. Vardas, N. Bruining, P. de Jaegere, J. H. Moore, S. Denaxas, A.-L. Boulesteix and K. G. M. Moons.
Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease.
European Heart Journal 43.31 (Aug. 2022). DOI.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[11]
T. Ullmann, C. Hennig and A.-L. Boulesteix.
Validation of cluster analysis results on validation data: A systematic framework.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12.3 (May. 2022). DOI.
Abstract

Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To assess the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic, structured review of the existing literature about this topic. For this purpose, we outline a formal framework that covers most existing approaches for validating clustering results on validation data. In particular, we review classical validation techniques such as internal and external validation, stability analysis, and visual validation, and show how they can be interpreted in terms of our framework. We define and formalize different types of validation of clustering results on a validation dataset, and give examples of how clustering studies from the applied literature that used a validation dataset can be seen as instances of our framework.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[10]
T. Ullmann, A. Beer, M. Hünemörder, T. Seidl and A.-L. Boulesteix.
Over-optimistic evaluation and reporting of novel cluster algorithms: An illustrative study.
Advances in Data Analysis and Classification 17 (Mar. 2022). DOI.
Abstract

When researchers publish new cluster algorithms, they usually demonstrate the strengths of their novel approaches by comparing the algorithms’ performance with existing competitors. However, such studies are likely to be optimistically biased towards the new algorithms, as the authors have a vested interest in presenting their method as favorably as possible in order to increase their chances of getting published. Therefore, the superior performance of newly introduced cluster algorithms is over-optimistic and might not be confirmed in independent benchmark studies performed by neutral and unbiased authors. This problem is known among many researchers, but so far, the different mechanisms leading to over-optimism in cluster algorithm evaluation have never been systematically studied and discussed. Researchers are thus often not aware of the full extent of the problem. We present an illustrative study to illuminate the mechanisms by which authors—consciously or unconsciously—paint their cluster algorithm’s performance in an over-optimistic light. Using the recently published cluster algorithm Rock as an example, we demonstrate how optimization of the used datasets or data characteristics, of the algorithm’s parameters and of the choice of the competing cluster algorithms leads to Rock’s performance appearing better than it actually is. Our study is thus a cautionary tale that illustrates how easy it can be for researchers to claim apparent 'superiority' of a new cluster algorithm. This illuminates the vital importance of strategies for avoiding the problems of over-optimism (such as, e.g., neutral benchmark studies), which we also discuss in the article.

MCML Authors
Link to Anna Beer

Anna Beer

Dr.

* Former member

A3 | Computational Models

Link to Thomas Seidl

Thomas Seidl

Prof. Dr.

Database Systems & Data Mining

A3 | Computational Models

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[9]
C. Nießl, M. Herrmann, C. Wiedemann, G. Casalicchio and A.-L. Boulesteix.
Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12.2 (Mar. 2022). DOI.
Abstract

In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over-optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Coordinator for Reproducibility & Open Science

A1 | Statistical Foundations & Explainability

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[8]
H. Seibold, A. Charlton, A.-L. Boulesteix and S. Hoffmann.
Statisticians roll up your sleeves! There’s a crisis to be solved.
Significance 18.4 (Aug. 2021). DOI.
Abstract

Statisticians play a key role in almost all scientific research. As such, they may be key to solving the reproducibility crisis. Heidi Seibold, Alethea Charlton, Anne-Laure Boulesteix and Sabine Hoffmann urge statisticians to take an active role in promoting more credible science.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[7]
S. Klau, S. Hoffmann, C. Patel, J. P. A. Ioannidis and A.-L. Boulesteix.
Examining the robustness of observational associations to model, measurement and sampling uncertainty with the vibration of effects framework.
International Journal of Epidemiology 50.1 (Feb. 2021). DOI.
Abstract

Uncertainty is a crucial issue in statistics which can be considered from different points of view. One type of uncertainty, typically referred to as sampling uncertainty, arises through the variability of results obtained when the same analysis strategy is applied to different samples. Another type of uncertainty arises through the variability of results obtained when using the same sample but different analysis strategies addressing the same research question. We denote this latter type of uncertainty as method uncertainty. It results from all the choices to be made for an analysis, for example, decisions related to data preparation, method choice, or model selection. In medical sciences, a large part of omics research is focused on the identification of molecular biomarkers, which can either be performed through ranking or by selection from among a large number of candidates. In this paper, we introduce a general resampling-based framework to quantify and compare sampling and method uncertainty. For illustration, we apply this framework to different scenarios related to the selection and ranking of omics biomarkers in the context of acute myeloid leukemia: variable selection in multivariable regression using different types of omics markers, the ranking of biomarkers according to their predictive performance, and the identification of differentially expressed genes from RNA-seq data. For all three scenarios, our findings suggest highly unstable results when the same analysis strategy is applied to two independent samples, indicating high sampling uncertainty and a comparatively smaller, but non-negligible method uncertainty, which strongly depends on the methods being compared.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[6]
A.-L. Boulesteix, A. Charlton, S. Hoffmann and H. Seibold.
A replication crisis in methodological research?.
Significance 17.5 (Oct. 2020). DOI.
Abstract

Statisticians have been keen to critique statistical aspects of the enquote{replication crisis} in other scientific disciplines. But new statistical tools are often published and promoted without any thought to replicability. This needs to change, argue Anne-Laure Boulesteix, Sabine Hoffmann, Alethea Charlton and Heidi Seibold.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[5]
M. Herrmann, P. Probst, R. Hornung, V. Jurinovic and A.-L. Boulesteix.
Large-scale benchmark study of survival prediction methods using multi-omics data.
Briefings in Bioinformatics (Aug. 2020). DOI.
Abstract

Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database ‘The Cancer Genome Atlas’ (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan–Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups—especially clinical variables—from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Coordinator for Reproducibility & Open Science

A1 | Statistical Foundations & Explainability

Link to Roman Hornung

Roman Hornung

Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[4]
N. Ellenbach, A.-L. Boulesteix, B. Bischl, K. Unger and R. Hornung.
Improved outcome prediction across data sources through robust parameter tuning.
Journal of Classification (Jul. 2020). DOI.
Abstract

In many application areas, prediction rules trained based on high-dimensional data are subsequently applied to make predictions for observations from other sources, but they do not always perform well in this setting. This is because data sets from different sources can feature (slightly) differing distributions, even if they come from similar populations. In the context of high-dimensional data and beyond, most prediction methods involve one or several tuning parameters. Their values are commonly chosen by maximizing the cross-validated prediction performance on the training data. This procedure, however, implicitly presumes that the data to which the prediction rule will be ultimately applied, follow the same distribution as the training data. If this is not the case, less complex prediction rules that slightly underfit the training data may be preferable. Indeed, a tuning parameter does not only control the degree of adjustment of a prediction rule to the training data, but also, more generally, the degree of adjustment to the distribution of the training data. On the basis of this idea, in this paper we compare various approaches including new procedures for choosing tuning parameter values that lead to better generalizing prediction rules than those obtained based on cross-validation. Most of these approaches use an external validation data set. In our extensive comparison study based on a large collection of 15 transcriptomic data sets, tuning on external data and robust tuning with a tuned robustness parameter are the two approaches leading to better generalizing prediction rules.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Roman Hornung

Roman Hornung

Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[3]
S. Klau, M.-L. Martin-Magniette, A.-L. Boulesteix and S. Hoffmann.
Sampling uncertainty versus method uncertainty: a general framework with applications to omics biomarker selection.
Biometrical Journal 62.3 (May. 2020). DOI.
Abstract

Uncertainty is a crucial issue in statistics which can be considered from different points of view. One type of uncertainty, typically referred to as sampling uncertainty, arises through the variability of results obtained when the same analysis strategy is applied to different samples. Another type of uncertainty arises through the variability of results obtained when using the same sample but different analysis strategies addressing the same research question. We denote this latter type of uncertainty as method uncertainty. It results from all the choices to be made for an analysis, for example, decisions related to data preparation, method choice, or model selection. In medical sciences, a large part of omics research is focused on the identification of molecular biomarkers, which can either be performed through ranking or by selection from among a large number of candidates. In this paper, we introduce a general resampling-based framework to quantify and compare sampling and method uncertainty. For illustration, we apply this framework to different scenarios related to the selection and ranking of omics biomarkers in the context of acute myeloid leukemia: variable selection in multivariable regression using different types of omics markers, the ranking of biomarkers according to their predictive performance, and the identification of differentially expressed genes from RNA-seq data. For all three scenarios, our findings suggest highly unstable results when the same analysis strategy is applied to two independent samples, indicating high sampling uncertainty and a comparatively smaller, but non-negligible method uncertainty, which strongly depends on the methods being compared.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[2]
P. Probst, A.-L. Boulesteix and B. Bischl.
Tunability: Importance of Hyperparameters of Machine Learning Algorithms.
Journal of Machine Learning Research 20 (Mar. 2019). PDF.
MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability


[1]
P. Probst, M. Wright and A.-L. Boulesteix.
Hyperparameters and Tuning Strategies for Random Forest.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9.3 (Jan. 2019). DOI.
Abstract

The random forest (RF) algorithm has several hyperparameters that have to be set by the user, for example, the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain, and the number of trees. In this paper, we first provide a literature review on the parameters' influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a presenting brief overview of tuning strategies, we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters.

MCML Authors
Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability