Recent large-scale single-cell foundation models have shown promise for exploring cellular states, yet they often underperform compared to simpler, domain-specific methods, raising concerns about their broader applicability. A key limitation lies in their reliance on masked language modeling, which is well suited for generative language tasks but poorly aligned with learning rich cell-level embeddings required in single-cell biology. Moreover, the proliferation of transcriptomic technologies—from whole transcriptome dissociated assays to image-based targeted profiling—poses a major challenge for cross-platform generalization. Here, we align with recent advances in machine learning to move beyond reconstruction metrics, which often do not capture important sample variation. We present scConcept (“contrastive cell pre-training”), a transformer-based contrastive learning framework that directly optimizes cell embeddings by contrasting multiple views of cells. By replacing gene-level reconstruction with a cell-level identification task, scConcept learns robust representations that are invariant to count distributions and gene panel selection, across diverse assays and technologies. To highlight the capability of the proposed framework, we pretrain scConcept on a similar corpus of over 30 million single-cell RNA-seq profiles as recent foundation models. Our approach demonstrates superior performance not only compared to state-of-the-art pretrained foundation models but also domain-specific methods in various downstream tasks, including cell-type annotation, technology integration, dissociated to spatial cell-type transfer, spatial imputation, gene panel optimization, and mapping new technologies on already existing atlases. Our results highlight contrastive pretraining as a powerful alternative to reconstruction-based strategies for single-cell modeling, providing a path toward general-purpose, technology-agnostic cell representations.
misc
BibTeXKey: BTB+25