Home | Publications | ZTW+26

Interpretable Self-Supervised Learning via Representer Landmarks and Nyström Approximation

MCML Authors

Debarghya Ghoshdastidar

Prof. Dr.

Core PI

Theoretical Foundations of Artificial Intelligence

Abstract

Self-supervised learning (SSL) learns representations from massive unlabeled data, yet the resulting models typically operate as black boxes, necessitating domain-specific explanations. We introduce KREPES, a unified framework to analytically interpret the learned representations of SSL objectives, including SimCLR, BYOL, and VICReg. By bridging empirical neural tangent kernel approximations of neural networks with the Representer Theorem for kernels, we express the learned latent space directly via 'Representer Landmarks', which are the representations of influential unlabeled training examples. We introduce novel metrics, 'Sample-Specific Influence Score', 'Concept-Conditioned Influence Score' and 'Feature Alignment Gap', to quantify the transparency of the learned representations. KREPES enables direct audit of the latent space without supervision, for example, revealing an algorithmic bias in the Adult-1M dataset where SSL uses demographic proxies for income. Finally, to ensure scalability to benchmarks with 1M+ samples (ImageNet-1K, Adult-1M), KREPES introduces a novel Nyström approximation-based analytical inference framework for SSL objectives.

inproceedings ZTW+26

ICML 2026

43rd International Conference on Machine Learning. Seoul, South Korea, Jul 06-11, 2026. To be published. Preprint available.

Authors

M. Zarvandi • M. Timothy • T. Wasserer • D. Ghoshdastidar

Links

URL

Research Area

A1 | Statistical Foundations & Explainability

BibTeXKey: ZTW+26

#p-ghoshdastidar