Home  | Publications | MCT+26

Revealing Task-Dependent Layer Relevance via Attentive Multi-Layer Fusion

MCML Authors

Abstract

Efficiently adapting large-scale foundation models to downstream tasks is a central challenge in modern deep learning. While linear probing is a standard and computationally efficient method, it typically operates exclusively on the final layer's representation. In this work, we present experimental evidence that this approach discards crucial task-relevant information distributed across other layers of the network. To investigate this, we introduce Attentive Layer Fusion (ALF), a probing mechanism that dynamically fuses representations from all layers of Vision Transformers. Acting as a an investigation tool, ALF reveals that optimal representation depth is highly task-dependent: while tasks similar to the pre-training domain rely on the final layer, specialized domains (e.g., medical, satellite) benefit significantly from intermediate layers. Furthermore, by analyzing representational similarities, we show that intermediate layers often achieve high downstream performance despite having low similarity to the final layer, indicating they encode distinct, complementary features. Across 19 diverse datasets and 9 foundation models, our hierarchical approach achieves consistent gains, offering a new lens into how foundation models organize information.

inproceedings MCT+26


Sci4DL @ICLR 2026

Workshop on Scientific Methods for Understanding Deep Learning at the 14th International Conference on Learning Representations. Rio de Janeiro, Brazil, Apr 23-27, 2026. To be published. Preprint available.

Authors

M. Morik • L. Ciernik • L. Thede • L. Eyring • S. Nakajima • Z. Akata • L. Muttenthaler

Links

URL

In Collaboration

partnerlogo

Research Area

 B1 | Computer Vision

BibTeXKey: MCT+26

Back to Top