Colloquium
Data Thinning and Beyond
Daniela Witten, University of Washington
06.05.2026
4:15 pm - 5:45 pm
LMU Munich, Department of Statistics and via zoom
The lecture describes the problem of reusing the same dataset in data analysis, for example for both hypothesis generation and subsequent testing. This double use creates dependencies that can invalidate classical statistical inference methods.
As a solution, “data thinning” is introduced, a method for splitting data into independent training and test sets to enable valid inference. However, this approach requires strong assumptions about the data distribution. Therefore, alternative strategies are presented that avoid such assumptions, for example by adjusting summary statistics or orthogonalizing dependent datasets.
Daniela Witten is a professor of Statistics and Biostatistics at University of Washington, and the Dorothy Gilford Endowed Chair in Mathematical Statistics. She develops statistical machine learning methods for high-dimensional data, with a focus on unsupervised learning.