Home | Publications | LCC+25

Sparse Autoencoders Reveal Selective Remapping of Visual Concepts During Adaptation

MCML Authors

Hyesu Lim

Dr.

→ Group Steffen Schneider
Dynamical Inference

Jinho Choi

→ Group Steffen Schneider
Dynamical Inference

Steffen Schneider

Dr.

Collaborating PI

Dynamical Inference

Abstract

Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g., shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.

inproceedings LCC+25

ICLR 2025

13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025.

Authors

H. Lim • J. Choi • J. Choo • S. Schneider

Links

URL

Research Area

A3 | Computational Models

BibTeXKey: LCC+25

#p-schneider