This cumulative dissertation consists of five articles divided into three parts. The first part extends the mlr package in R to implement and benchmark multilabel classification methods. The second part focuses on simplifying benchmark experiments with OpenML.org, introducing the OpenML R package and the OpenML100 benchmarking suite for standardized dataset and result management. The third part addresses model evaluation and interpretability, proposing the residual-based predictiveness (RBP) curve to improve upon the predictiveness curve and introducing new visualization tools, including the Shapley feature importance (SFIMP) measure for model interpretation. (Shortened.)
BibTeXKey: Cas19