Home  | Publications | DJL+26

CHB: A Diagnostic Toolkit for Hardness-Aware Clustering Evaluation

MCML Authors

Abstract

Clustering is commonly compared through leaderboards that collapse performance into a single aggregate ranking. Such summaries obscure why methods succeed, which data properties align with failure, and how conclusions shift under representation changes and realistic tuning constraints. We present CHB, a diagnostic toolkit for hardness-aware clustering evaluation. CHB maps each dataset--representation pair to an interpretable hardness fingerprint capturing (i) separation, (ii) cohesion and scale heterogeneity, and (iii) topology through scalable persistent-homology summaries. Using this diagnostic space, CHB evaluates clustering algorithms under standardized, compute-aware tracks. Conditioning results on hardness coordinates turns comparison into diagnosis: across a broad range of datasets and their representations, CHB reveals reproducible structural regimes, uncovers regime-dependent ranking reversals across method families, and surfaces robustness signatures, including topology-linked breakdowns. CHB further enables representation auditing by attributing gains to measurable shifts in the hardness fingerprint rather than just external performance changes. We release CHB as an open, extensible artifact for evaluating new clustering methods and embeddings within a shared diagnostic framework.

inproceedings DJL+26


ICML 2026

43rd International Conference on Machine Learning. Seoul, South Korea, Jul 06-11, 2026. To be published.
Conference logo
A* Conference

Authors

W. DuraniP. Jahn • C. Leiber • D. B. Hoffmann • T. Seidl • C. Plant • C. Böhm

Links

URL

Research Area

 A3 | Computational Models

BibTeXKey: DJL+26

Back to Top