Home | Publications | BHS20

Orderings of Data - More Than a Tripping Hazard

MCML Authors

Anna Beer

Dr.

* Former Member

→ Group Thomas Seidl
Database Systems, Data Mining and AI

Thomas Seidl

Prof. Dr.

Director

Database Systems, Data Mining and AI

Abstract

As data processing techniques get more and more sophisticated every day, many of us researchers often get lost in the details and subtleties of the algorithms we are developing and far too easily seem to forget to look also at the very first steps of every algorithm: the input of the data. Since there are plenty of library functions for this task, we indeed do not have to think about this part of the pipeline anymore. But maybe we should. All data is stored and loaded into a program in some order. In this vision paper we study how ignoring this order can (1) lead to performance issues and (2) make research results unreproducible. We furthermore examine desirable properties of a data ordering and why current approaches are often not suited to tackle the two mentioned problems.

inproceedings BHS20