Home  | Publications | CB24

Evaluation and Benchmarking

MCML Authors

Abstract

Machine learning models can only be deployed in practice if they are robustly evaluated to estimate a model's generalization performance, i.e. how well it will perform on new data. Resampling strategies including cross-validation and bootstrapping, can be used to estimate the generalization performance. Models can be compared to one another using a benchmark experiment, which makes use of the same resampling strategies and measures to fairly compare models and to help practitioners decide which model to use in practice.<br>This chapter introduces resample strategies in mlr3, including cross-validation, repeated cross-validation, leave-one-out, bootstrapping, and custom strategies. These are then demonstrated with the resample() function, which is used to resample a single learner with a given strategy. Benchmarking is then introduced and the benchmark() function is demonstrated for comparing multiple learners. The chapter concludes with a deep dive into binary classification evaluation, including ROC analysis and the Area Under the Curve metric.

article


Applied Machine Learning Using mlr3 in R

I.3. Jan. 2024.

Authors

G. CasalicchioL. Burk

Links

DOI

Research Area

 A1 | Statistical Foundations & Explainability

BibTeXKey: CB24

Back to Top