Home | Publications | WAB+24

Decoupling Common and Unique Representations for Multimodal Self-Supervised Learning

MCML Authors

Chenying Liu

→ Group Xiaoxiang Zhu
Data Science in Earth Observation

Zhitong Xiong

Dr.

* Former Member

→ Group Xiaoxiang Zhu
Data Science in Earth Observation

Xiaoxiang Zhu

Prof. Dr.

Core PI

Data Science in Earth Observation

Abstract

The increasing availability of multi-sensor data sparks wide interest in multimodal self-supervised learning. However, most existing approaches learn only common representations across modalities while ignoring intra-modal training and modality-unique representations. We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning. By distinguishing inter- and intra-modal embeddings through multimodal redundancy reduction, DeCUR can integrate complementary information across different modalities. We evaluate DeCUR in three common multimodal scenarios (radar-optical, RGB-elevation, and RGB-depth), and demonstrate its consistent improvement regardless of architectures and for both multimodal and modality-missing settings. With thorough experiments and comprehensive analysis, we hope this work can provide valuable insights and raise more interest in researching the hidden relationships of multimodal representations.

inproceedings WAB+24

ECCV 2024

18th European Conference on Computer Vision. Milano, Italy, Sep 29-Oct 04, 2024.

Authors

Y. Wang • C. M. Albrecht • N. A. A. Braham • C. Liu • Z. Xiong • X. Zhu

Links

DOI GitHub

Research Area

C3 | Physics and Geo Sciences

BibTeXKey: WAB+24

#p-zhu