Home  | Publications | BAH+21

KISS - A Fast KNN-Based Importance Score for Subspaces

MCML Authors

Abstract

In high-dimensional datasets some dimensions or attributes can be more important than others. Whereas most algorithms neglect one or more dimensions for all points of a dataset or at least for all points of a certain cluster together, our method KISS (textbf{k}NN-based textbf{I}mportance textbf{S}core of textbf{S}ubspaces) detects the most important dimensions for each point individually. It is fully unsupervised and does not depend on distorted multidimensional distance measures. Instead, the $k$ nearest neighbors ($k$NN) in one-dimensional projections of the data points are used to calculate the score for every dimension's importance. Experiments across a variety of settings show that those scores reflect well the structure of the data. KISS can be used for subspace clustering. What sets it apart from other methods for this task is its runtime, which is linear in the number of dimensions and $O(n log(n))$ in the number of points, as opposed to quadratic or even exponential runtimes for previous algorithms.

inproceedings


EDBT 2021

24th International Conference on Extending Database Technology. Nicosia, Cyprus, Mar 23-26, 2021.
Conference logo
A Conference

Authors

A. Beer • E. Allerborn • V. Hartmann • T. Seidl

Links

PDF

Research Area

 A3 | Computational Models

BibTeXKey: BAH+21

Back to Top