Home | Publications | DLT+24

Benchmarking AutoML Clustering Frameworks

MCML Authors

Gabriel Marques Tavares

Dr.

→ Group Thomas Seidl
Database Systems, Data Mining and AI

Abstract

The surge of frameworks for automated unsupervised clustering problems exposed the notable gap in performance assessment, unified datasets and methodologies for this field. The lack of standardization and proper clustering goal setting obscures the applicability and suitability of such solutions. Therefore, we propose a benchmark to bridge this gap by offering a comparative analysis of AutoML frameworks for clustering, using several criteria and a comprehensive set of benchmarking problems. Four prominent AutoML unsupervised frameworks (AutoML4Clust, Autocluster, cSmartML, and ML2DAC) were compared following our methodology. By extending the evaluation beyond quantitative metrics, this research contributes to a more nuanced understanding of the applicability and performance of AutoML for a diverse set of clustering problems. Our analysis shows the evident demand for effort in the direction of pipeline synthesis (i.e., search and optimization of complete pipelines), clustering goal definition and suitable analysis dimensions.

inproceedings DLT+24

ABCD Track @AutoML 2024

Track on Applications, Benchmarks, Challenges, and Datasets at the International Conference on Automated Machine Learning. Paris, France, Sep 09-12, 2024.