MCML - Cross-Modal Common Representation Learning With Triplet Loss Functions

Home | Publications | ORH+22

MCML Authors

Felix Ott

Dr.

* Former Member

→ Group Bernd Bischl
Statistical Learning and Data Science

David Rügamer

Prof. Dr.

Principal Investigator

Statistics, Data Science and Machine Learning

Bernd Bischl

Prof. Dr.

Director

Statistical Learning and Data Science

Abstract

Common representation learning (CRL) learns a shared embedding between two or more modalities to improve in a given task over using only one of the modalities. CRL from different data types such as images and time-series data (e.g., audio or text data) requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the triplet loss, which uses positive and negative identities to create sample pairs with different labels, for CRL between image and time-series modalities. By adapting the triplet loss for CRL, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. Our experiments on synthetic data and handwriting recognition data from sensor-enhanced pens show an improved classification accuracy, faster convergence, and a better generalizability.

misc ORH+22