Home | Research | Groups | Fabian Scheipl

Research Group Fabian Scheipl

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability

Fabian Scheipl

is Head of the Workgroup Functional Data Analysis at LMU Munich.

The group works on methodology and software implementations that process, describe, visualize and model functional data, such as curves, trajectories, or even higher dimensional surfaces. The research focuses on the analysis of functional data using generalized additive regression and on both supervised and unsupervised methods for functional data, for example for automated outlier detection or dimension reduction.

Publications @MCML

[18]
M. Herrmann, D. Kazempour, F. Scheipl and P. Kröger.
Enhancing cluster analysis via topological manifold learning.
Data Mining and Knowledge Discovery 38 (Apr. 2024). DOI.
MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Coordinator for Reproducibility & Open Science

A1 | Statistical Foundations & Explainability

Link to Daniyal Kazempour

Daniyal Kazempour

Dr.

* Former member

A3 | Computational Models

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability

Peer Kröger

Peer Kröger

Prof. Dr.

* Former member

A3 | Computational Models


[17]
J. Gauss, F. Scheipl and M. Herrmann.
DCSI--An improved measure of cluster separability based on separation and connectedness.
Preprint at arXiv (Oct. 2023). arXiv.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability

Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Coordinator for Reproducibility & Open Science

A1 | Statistical Foundations & Explainability


[16]
L. Bothmann, S. Strickroth, G. Casalicchio, D. Rügamer, M. Lindauer, F. Scheipl and B. Bischl.
Developing Open Source Educational Resources for Machine Learning and Data Science.
3rd Teaching Machine Learning and Artificial Intelligence Workshop. Grenoble, France, Sep 19-23, 2023. URL.
MCML Authors
Link to Ludwig Bothmann

Ludwig Bothmann

Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to Giuseppe Casalicchio

Giuseppe Casalicchio

Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

A1 | Statistical Foundations & Explainability

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability


[15]
S. Hoffmann, F. Scheipl and A.-L. Boulesteix.
Reproduzierbare und replizierbare Forschung.
Moderne Verfahren der Angewandten Statistik (Sep. 2023). URL.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability

Link to Anne-Laure Boulesteix

Anne-Laure Boulesteix

Prof. Dr.

Biometry in Molecular Medicine

A1 | Statistical Foundations & Explainability


[14]
A. Volkmann, A. Stöcker, F. Scheipl and S. Greven.
Multivariate Functional Additive Mixed Models.
Statistical Modelling 23.4 (Aug. 2023). DOI.
Abstract

Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.

MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[13]
M. Herrmann, F. Pfisterer and F. Scheipl.
A geometric framework for outlier detection in high-dimensional data.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery e1491 (Apr. 2023). DOI.
Abstract

Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework which exploits the metric structure of a data set. Our approach rests on the manifold assumption, that is, that the observed, nominally high-dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with manifold learning methods. We show that exploiting this structure significantly improves the detection of outlying observations in high dimensional data. We also suggest a novel, mathematically precise and widely applicable distinction between distributional and structural outliers based on the geometry and topology of the data manifold that clarifies conceptual ambiguities prevalent throughout the literature. Our experiments focus on functional data as one class of structured high-dimensional data, but the framework we propose is completely general and we include image and graph data applications. Our results show that the outlier structure of high-dimensional and non-tabular data can be detected and visualized using manifold learning methods and quantified using standard outlier scoring methods applied to the manifold embedding vectors.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Coordinator for Reproducibility & Open Science

A1 | Statistical Foundations & Explainability

Link to Florian Pfisterer

Florian Pfisterer

Dr.

* Former member

A1 | Statistical Foundations & Explainability

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[12]
J. Goldsmith and F. Scheipl.
tf: S3 classes and methods for tidy functional data. R package.
2022. GitHub.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[11]
J. Goldsmith and F. Scheipl.
tidyfun: Clean, wholesome, tidy fun with functional data in R. R package.
2022. GitHub.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[10]
M. Herrmann and F. Scheipl.
A Geometric Perspective on Functional Outlier Detection.
Stats 4.4 (Nov. 2021). DOI.
Abstract

We consider functional outlier detection from a geometric perspective, specifically: for functional datasets drawn from a functional manifold, which is defined by the data’s modes of variation in shape, translation, and phase. Based on this manifold, we developed a conceptualization of functional outlier detection that is more widely applicable and realistic than previously proposed taxonomies. Our theoretical and experimental analyses demonstrated several important advantages of this perspective: it considerably improves theoretical understanding and allows describing and analyzing complex functional outlier scenarios consistently and in full generality, by differentiating between structurally anomalous outlier data that are off-manifold and distributionally outlying data that are on-manifold, but at its margins. This improves the practical feasibility of functional outlier detection: we show that simple manifold-learning methods can be used to reliably infer and visualize the geometric structure of functional datasets. We also show that standard outlier-detection methods requiring tabular data inputs can be applied to functional data very successfully by simply using their vector-valued representations learned from manifold learning methods as the input features. Our experiments on synthetic and real datasets demonstrated that this approach leads to outlier detection performances at least on par with existing functional-data-specific methods in a large variety of settings, without the highly specialized, complex methodology and narrow domain of application these methods often entail.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Coordinator for Reproducibility & Open Science

A1 | Statistical Foundations & Explainability

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[9]
A. Bauer, F. Scheipl and H. Küchenhoff.
Registration for Incomplete Non-Gaussian Functional Data.
Preprint at arXiv (Aug. 2021). arXiv.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability

Link to Helmut Küchenhoff

Helmut Küchenhoff

Prof. Dr.

Statistical Consulting Unit (StaBLab)

C4 | Computational Social Sciences


[8]
M. Herrmann and F. Scheipl.
Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction.
Preprint at arXiv (Dec. 2020). arXiv.
Abstract

In recent years, manifold methods have moved into focus as tools for dimension reduction. Assuming that the high-dimensional data actually lie on or close to a low-dimensional nonlinear manifold, these methods have shown convincing results in several settings. This manifold assumption is often reasonable for functional data, i.e., data representing continuously observed functions, as well. However, the performance of manifold methods recently proposed for tabular or image data has not been systematically assessed in the case of functional data yet. Moreover, it is unclear how to evaluate the quality of learned embeddings that do not yield invertible mappings, since the reconstruction error cannot be used as a performance measure for such representations. In this work, we describe and investigate the specific challenges for nonlinear dimension reduction posed by the functional data setting. The contributions of the paper are three-fold: First of all, we define a theoretical framework which allows to systematically assess specific challenges that arise in the functional data context, transfer several nonlinear dimension reduction methods for tabular and image data to functional data, and show that manifold methods can be used successfully in this setting. Secondly, we subject performance assessment and tuning strategies to a thorough and systematic evaluation based on several different functional data settings and point out some previously undescribed weaknesses and pitfalls which can jeopardize reliable judgment of embedding quality. Thirdly, we propose a nuanced approach to make trustworthy decisions for or against competing nonconforming embeddings more objectively.

MCML Authors
Link to Moritz Herrmann

Moritz Herrmann

Dr.

Biometry in Molecular Medicine

Coordinator for Reproducibility & Open Science

A1 | Statistical Foundations & Explainability

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[7]
A. Bender, D. Rügamer, F. Scheipl and B. Bischl.
A General Machine Learning Framework for Survival Analysis.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2020). Virtual, Sep 14-18, 2020. DOI.
MCML Authors
Link to Andreas Bender

Andreas Bender

Dr.

Statistical Learning & Data Science

Coordinator Statistical and Machine Learning Consulting

A1 | Statistical Foundations & Explainability

Link to David Rügamer

David Rügamer

Prof. Dr.

Data Science Group

A1 | Statistical Foundations & Explainability

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability


[6]
F. Scheipl, J. Goldsmith and J. Wrobel.
tidyfun: Tools for Tidy Functional Data. R package.
2020. URL. GitHub.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[5]
J. Wrobel, A. Bauer, J. Goldsmith, E. McDonnel and F. Scheipl.
registr: Curve Registration for Exponential Family Functional Data. R package.
2020. GitHub.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[4]
F. Pfisterer, L. Beggel, X. Sun, F. Scheipl and B. Bischl.
Benchmarking time series classification -- Functional data vs machine learning approaches.
Preprint at arXiv (Nov. 2019). arXiv.
Abstract

Time series classification problems have drawn increasing attention in the machine learning and statistical community. Closely related is the field of functional data analysis (FDA): it refers to the range of problems that deal with the analysis of data that is continuously indexed over some domain. While often employing different methods, both fields strive to answer similar questions, a common example being classification or regression problems with functional covariates. We study methods from functional data analysis, such as functional generalized additive models, as well as functionality to concatenate (functional-) feature extraction or basis representations with traditional machine learning algorithms like support vector machines or classification trees. In order to assess the methods and implementations, we run a benchmark on a wide variety of representative (time series) data sets, with in-depth analysis of empirical results, and strive to provide a reference ranking for which method(s) to use for non-expert practitioners. Additionally, we provide a software framework in R for functional data analysis for supervised learning, including machine learning and more linear approaches from statistics. This allows convenient access, and in connection with the machine-learning toolbox mlr, those methods can now also be tuned and benchmarked.

MCML Authors
Link to Florian Pfisterer

Florian Pfisterer

Dr.

* Former member

A1 | Statistical Foundations & Explainability

Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability

Link to Bernd Bischl

Bernd Bischl

Prof. Dr.

Statistical Learning & Data Science

A1 | Statistical Foundations & Explainability


[3]
C. Happ, F. Scheipl, A. A. Gabriel and S. Greven.
A general framework for multivariate functional principal component analysis of amplitude and phase variation.
Stat 8.2 (Feb. 2019). DOI.
Abstract

Functional data typically contain amplitude and phase variation. In many data situations, phase variation is treated as a nuisance effect and is removed during preprocessing, although it may contain valuable information. In this note, we focus on joint principal component analysis (PCA) of amplitude and phase variation. As the space of warping functions has a complex geometric structure, one key element of the analysis is transforming the warping functions to urn:x-wiley:sta4:media:sta4220:sta4220-math-0001. We present different transformation approaches and show how they fit into a general class of transformations. This allows us to compare their strengths and limitations. In the context of PCA, our results offer arguments in favour of the centred log-ratio transformation. We further embed two existing approaches from the literature for joint PCA of amplitude and phase variation into the framework of multivariate functional PCA, where we study the properties of the estimators based on an appropriate metric. The approach is illustrated through an application from seismology.

MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[2]
J. Goldsmith, F. Scheipl, L. Huang, J. Wrobel, C. Di, J. Gellar, J. Harezlak, M. W. McLean, B. Swihart, L. Xiao, C. Crainiceanu and P. T. Reiss.
refund: Regression with Functional Data.
2019. URL.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability


[1]
J. Minkwitz, F. Scheipl, E. Binder, C. Sander, U. Hegerl and H. Himmerich.
Generalised functional additive models for brain arousal state dynamics (Poster).
20th International Pharmaco-EEG Society for Preclinical and Clinical Electrophysiological Brain Research Meeting (IPEG 2018). Zurich, Switzerland, Nov 21-25, 2018. DOI.
MCML Authors
Link to Fabian Scheipl

Fabian Scheipl

PD Dr.

Functional Data Analysis

A1 | Statistical Foundations & Explainability