Home  | Publications | JDL+25

Going Offline: An Evaluation of the Offline Phase in Stream Clustering

MCML Authors

Abstract

Data streams are a challenging and ever more relevant setting for clustering methods as more data arrives faster and faster. Stream clustering strategies either determine the clusters in an online manner directly as the instances appear, or they employ an offline phase where the online summarization structures are processed to obtain a clustering result. A recent analysis found that offline clustering may often be unnecessary or even counterproductive. The methods used in the offline phase are usually fixed for each stream clustering approach and typically stem from only a handful of clustering techniques. In this paper, we perform a broad experimental analysis specifically targeting the offline phase of stream clustering. We analyze several ways of extracting information from the summarization structures, including a novel strategy<br>based on data generation. Ultimately, we showcase that an offline phase is an impactful design choice for stream clustering. We also find that the chosen offline method significantly impacts the clustering performance, with the clustering quality improving drastically for some settings.

inproceedings


ECML-PKDD 2025

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Porto, Portugal, Sep 15-19, 2025.
Conference logo
A Conference

Authors

P. JahnW. Durani • C. Leiber • A. Beer • T. Seidl

Links

DOI GitHub

Research Area

 A3 | Computational Models

BibTeXKey: JDL+25

Back to Top