The rapid growth of digital pathology has produced vast repositories of hematoxylin and eosin stained whole slide images, yet most of them remain unindexed or unlabelled, limiting their utility for computational analysis. Reverse image search provides a scalable way to organize and access these archives by retrieving visually similar images. While currently deployed retrieval systems exist, they rely on manual configuration, highly affecting their performance. Thus, we propose CLEAR-WSI, Constant Length Embedding & Automatic Retrieval, a fully automated pathology reverse image search engine that leverages Vision Transformer foundation models for histopathology together with attention-based multiple instance learning (AttentionMIL). The AttentionMIL framework jointly identifies diagnostically relevant whole slide images and predicts slide-level diagnoses. To further improve performance, we introduce a self-reviewing classifier filtering mechanism: retrieved candidates are filtered according to their predicted labels, mostly outperforming class-informed filters. Across two public datasets, CAMELYON16 (lymph node metastases) and BRACS (breast cancer subtypes), our method establishes new state-of-the-art results, improving from 77.49% to 89.92% on CAMELYON16, from 54.12% to 75.86% on BRACS level-1, and from 36.47% to 51.72% on BRACS level-2. Our general-purpose, annotation-free, dataset-agnostic, search engine that scales across diverse data sources is openly available: https://github.com/youssefwally/CLEAR-WSI.
misc WLW+25
BibTeXKey: WLW+25