Home  | Publications | BKH19

A Generic Summary Structure for Arbitrarily Oriented Subspace Clustering in Data Streams

MCML Authors

Peer Kröger

Prof. Dr.

Principal Investigator

* Former Principal Investigator

Abstract

Nowadays, as lots of data is gathered in large volumes and with high velocity, the development of algorithms capable of handling complex data streams in (near) real-time is a major challenge. In this work, we present the algorithm CORRSTREAM which tackles the problem of detecting arbitrarily oriented subspace clusters in high-dimensional data streams. The proposed method follows a two phase approach, where the continuous online phase aggregates data points within a proper microcluster structure that stores all necessary information to define a microcluster’s subspace and is generic enough to cope with a variety of offline procedures. Given several such microclusters, the offline phase is able to build a final clustering model which reveals arbitrarily oriented subspaces in which the data tend to cluster. In our experimental evaluation, we show that CORRSTREAM not only has an acceptable throughput but also outperforms static counterpart algorithms by orders of magnitude when considering the runtime. At the same time, the loss of accuracy is quite small.

inproceedings


SISAP 2019

12th International Conference on Similarity Search and Applications. Newark, New York, USA, Oct 02-04, 2019.

Authors

F. BoruttaP. Kröger • T. Hubauer

Links

DOI

Research Area

 A3 | Computational Models

BibTeXKey: BKH19

Back to Top