Home  | Publications | NCR24

Hyperbolic Contrastive Learning for Document Representations – A Multi-View Approach With Paragraph-Level Similarities

MCML Authors

Abstract

Self-supervised learning (SSL) has gained prominence due to the increasing availability of unlabeled data and advances in computational efficiency, leading to revolutionized natural language processing with pre-trained language models like BERT and GPT. Representation learning, a core concept in SSL, aims to reduce data dimensionality while preserving meaningful aspects. Conventional SSL methods typically embed data in Euclidean space. However, recent research has revealed that alternative geometries can hold even richer representations, unlocking more meaningful insights from the data. Motivated by this, we propose two novel methods for integrating Hilbert geometry into self-supervised learning for efficient document embedding. First, we present a method directly incorporating Hilbert geometry into the standard Euclidean contrastive learning framework. Additionally, we propose a multi-view hyperbolic contrastive learning framework contrasting both documents and paragraphs. Our findings demonstrate that contrasting only paragraphs, rather than entire documents, can lead to superior efficiency and effectiveness.

inproceedings


ECAI 2024

27th European Conference on Artificial Intelligence. Santiago de Compostela, Spain, Oct 19-24, 2024.
Conference logo
A Conference

Authors

J. Nam • I. Chalkidis • M. Rezaei

Links

DOI

Research Area

 A1 | Statistical Foundations & Explainability

BibTeXKey: NCR24

Back to Top