Home  | Publications | KMM+23

Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning

MCML Authors

Link to Profile Björn Ommer PI Matchmaking

Björn Ommer

Prof. Dr.

Principal Investigator

Abstract

Learning compact image embeddings that yield seman-tic similarities between images and that generalize to un-seen test classes, is at the core of deep metric learning (DML). Finding a mapping from a rich, localized image feature map onto a compact embedding vector is challenging: Although similarity emerges between tuples of images, DML approaches marginalize out information in an individ-ual image before considering another image to which simi-larity is to be computed. Instead, we propose during training to condition the em-bedding of an image on the image we want to compare it to. Rather than embedding by a simple pooling as in standard DML, we use cross-attention so that one image can iden-tify relevant features in the other image. Consequently, the attention mechanism establishes a hierarchy of conditional embeddings that gradually incorporates information about the tuple to steer the representation of an individual image. The cross-attention layers bridge the gap between the origi-nal unconditional embedding and the final similarity and al-low backpropagtion to update encodings more directly than through a lossy pooling layer. At test time we use the re-sulting improved unconditional embeddings, thus requiring no additional parameters or computational overhead. Ex-periments on established DML benchmarks show that our cross-attention conditional embedding during training im-proves the underlying standard DML pipeline significantly so that it outperforms the state-of-the-art.

inproceedings


CVPR 2023

IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada, Jun 18-23, 2023.
Conference logo
A* Conference

Authors

D. Kotovenko • P. Ma • T. Milbich • B. Ommer

Links

DOI

Research Area

 B1 | Computer Vision

BibTeXKey: KMM+23

Back to Top