MCML - OneProt: Towards Multi-Modal Protein Foundation Models

Home | Publications | Fus 24

MCML Authors

Vincent Fortuin

Dr.

Associate

Bayesian Deep Learning

Abstract

Recent AI advances have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal AI for proteins that integrates structural, sequence, alignment, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of modality encoders along protein sequences. It demonstrates strong performance in retrieval tasks and surpasses state-of-the-art methods in various downstream tasks, including metal ion binding classification, gene-ontology annotation, and enzyme function prediction. This work expands multi-modal capabilities in protein models, paving the way for applications in drug discovery, biocatalytic reaction planning, and protein engineering.

misc FUS+24

Preprint

Nov. 2024

Authors

K. Flöge • S. Udayakumar • J. Sommer • M. Piraud • S. Kesselheim • V. Fortuin • S. Günneman • K. J. van der Weg • H. Gohlke • E. Merdivan • A. Bazarova

Links

arXiv

Research Area

A1 | Statistical Foundations & Explainability

BibTeXKey: FUS+24

#p-fortuin

OneProt: Towards Multi-Modal Protein Foundation Models