Home | Publications | BPB20a

Collecting Empirical Data About Hyperparameters for Data Driven AutoML

MCML Authors

Martin Binder

→ Group Bernd Bischl
Statistical Learning and Data Science

Florian Pfisterer

Dr.

* Former Member

→ Group Bernd Bischl
Statistical Learning and Data Science

Bernd Bischl

Prof. Dr.

Director

Statistical Learning and Data Science

Abstract

All optimization needs some kind of prior over the functions it is optimizing over. We used a large computing cluster to collect empirical data about the behavior of ML performance, by randomly sampling hyperparameter values and performing cross-validation. We also collected information about cross-validation error by performing some evaluations multiple times, and information about progression of performance with respect to training data size by performing some evaluations on data subsets. We present how we collected data, make some preliminary analyses on the surrogate models that can be built with them, and give an outlook over interesting analyses this should enable.

inproceedings BPB20a