Home | Research | Groups | Julien Gagneur

Research Group Julien Gagneur


Link to website at TUM

Julien Gagneur

Prof. Dr.

Principal Investigator

Computational Molecular Medicine

Julien Gagneur

is Assistant Professor for Computational Biology at TU Munich.

His research focuses on delineating the genetic basis of gene regulation and its implication in diseases. To this end, he is developing statistical and machine learning algorithms and works with experimentalists to design novel experimental approaches. His group is also developing strategies to pinpoint the cause of genetic disorders by integrating data from genetics and ‘multiomics’ disciplines such as transcriptomics and proteomics.

Team members @MCML

PhD Students

Link to website

Pedro Tomaz da Silva

Computational Molecular Medicine

Link to website

Johannes Hingerl

Computational Molecular Medicine

Link to website

Alexander Karollus

Computational Molecular Medicine

Recent News @MCML

Publications @MCML

2025


[7]
J. Hingerl, L. D. Martens, A. Karollus, T. Manz, J. D. Buenrostro, F. J. Theis and J. Gagneur.
scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution.
Preprint (Mar. 2025). DOI
Abstract

Understanding how regulatory DNA elements shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA-seq and epigenomic profiling provides opportunities to build unifying models of gene regulation capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multi-modal technologies. Here, we introduce scooby, the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome from sequence at single-cell resolution. For this, we leverage the pre-trained multi-omics profile predictor Borzoi as a foundation model, equip it with a cell-specific decoder, and fine-tune its sequence embeddings. Specifically, we condition the decoder on the cell position in a precomputed single-cell embedding resulting in strong generalization capability. Applied to a hematopoiesis dataset, scooby recapitulates cell-specific expression levels of held-out genes and cells, and identifies regulators and their putative target genes through in silico motif deletion. Moreover, accurate variant effect prediction with scooby allows for breaking down bulk eQTL effects into single-cell effects and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.Competing Interest StatementJ.D.B. holds patents related to ATAC-seq and is an SAB member of Camp4 and seqWell. F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd and Omniscope Ltd, and has ownership interest in Dermagnostix GmbH and Cellarity.

MCML Authors
Link to website

Johannes Hingerl

Computational Molecular Medicine

Link to website

Alexander Karollus

Computational Molecular Medicine

Link to Profile Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine


2024


[6]
J. Hingerl, A. Karollus and J. Gagneur.
Flashzoi: An enhanced Borzoi model for accelerated genomic analysis.
Preprint (Dec. 2024). DOI
Abstract

Accurately predicting how DNA sequence drives gene regulation and how genetic variants alter gene expression is a central challenge in genomics. Borzoi, which models over ten thousand genomic assays including RNA-seq coverage from over half a megabase of sequence context alone promises to become an important foundation model in regulatory genomics, both for massively annotating variants and for further model development. However, its reliance on handcrafted, relative positional encodings within the transformer architecture limits its computational efficiency. Here we present Flashzoi, an enhanced Borzoi model that leverages rotary positional encodings and FlashAttention-2. This achieves over 3-fold faster training and inference and up to 2.4-fold reduced memory usage, while maintaining or improving accuracy in modeling various genomic assays including RNA-seq coverage, predicting variant effects, and enhancer-promoter linking. Flashzoi{textquoteright}s improved efficiency facilitates large-scale genomic analyses and opens avenues for exploring more complex regulatory mechanisms and modeling.Competing Interest StatementThe authors have declared no competing interest.

MCML Authors
Link to website

Johannes Hingerl

Computational Molecular Medicine

Link to website

Alexander Karollus

Computational Molecular Medicine

Link to Profile Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine


[5]
B. Clarke, E. Holtkamp, H. Öztürk, M. Mück, M. Wahlberg, K. Meyer, F. Munzlinger, F. Brechtmann, F. R. Hölzlwimmer, J. Lindner, Z. Chen, J. Gagneur and O. Stegle.
Integration of variant annotations using deep set networks boosts rare variant association testing.
Nature Genetics 56 (Sep. 2024). DOI
Abstract

Rare genetic variants can have strong effects on phenotypes, yet accounting for rare variants in genetic analyses is statistically challenging due to the limited number of allele carriers and the burden of multiple testing. While rich variant annotations promise to enable well-powered rare variant association tests, methods integrating variant annotations in a data-driven manner are lacking. Here we propose deep rare variant association testing (DeepRVAT), a model based on set neural networks that learns a trait-agnostic gene impairment score from rare variant annotations and phenotypes, enabling both gene discovery and trait prediction. On 34 quantitative and 63 binary traits, using whole-exome-sequencing data from UK Biobank, we find that DeepRVAT yields substantial gains in gene discoveries and improved detection of individuals at high genetic risk. Finally, we demonstrate how DeepRVAT enables calibrated and computationally efficient rare variant tests at biobank scale, aiding the discovery of genetic risk factors for human disease traits.

MCML Authors
Link to Profile Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine


[4]
P. Silva, A. Karollus, J. Hingerl, G. Galindez, N. Wagner, X. Hernandez-Alias, D. Incarnato and J. Gagneur.
Nucleotide dependency analysis of DNA language models reveals genomic functional elements.
Preprint (Jul. 2024). DOI
Abstract

Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal in biology. DNA language models (LMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, using DNA LMs for discovering functional genomic elements has been challenging due to the lack of interpretable methods. Here, we introduce nucleotide dependencies which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We generated genome-wide maps of pairwise nucleotide dependencies within kilobase ranges for animal, fungal, and bacterial species. We show that nucleotide dependencies indicate deleteriousness of human genetic variants more effectively than sequence alignment and DNA LM reconstruction. Regulatory elements appear as dense blocks in dependency maps, enabling the systematic identification of transcription factor binding sites as accurately as models trained on experimental binding data. Nucleotide dependencies also highlight bases in contact within RNA structures, including pseudoknots and tertiary structure contacts, with remarkable accuracy. This led to the discovery of four novel, experimentally validated RNA structures in Escherichia coli. Finally, using dependency maps, we reveal critical limitations of several DNA LM architectures and training sequence selection strategies by benchmarking and visual diagnosis. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes.Competing Interest StatementThe authors have declared no competing interest.

MCML Authors
Link to website

Alexander Karollus

Computational Molecular Medicine

Link to website

Johannes Hingerl

Computational Molecular Medicine

Link to Profile Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine


2023


[3]
F. Brechtmann, T. Bechtler, S. Londhe, C. Mertes and J. Gagneur.
Evaluation of input data modality choices on functional gene embeddings.
NAR Genomics and Bioinformatics 5.4 (Dec. 2023). DOI
Abstract

Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein–protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we benchmarked functional gene embeddings obtained from various data modalities for predicting disease-gene lists, cancer drivers, phenotype–gene associations and scores from genome-wide association studies. Off-the-shelf predictors trained on precomputed embeddings matched or outperformed dedicated state-of-the-art predictors, demonstrating their high utility. Embeddings based on literature and protein–protein interactions inferred from low-throughput experiments outperformed embeddings derived from genome-wide experimental data (transcriptomics, deletion screens and protein sequence) when predicting curated gene lists. In contrast, they did not perform better when predicting genome-wide association signals and were biased towards highly-studied genes. These results indicate that embeddings derived from literature and low-throughput experiments appear favourable in many existing benchmarks because they are biased towards well-studied genes and should therefore be considered with caution. Altogether, our study and precomputed embeddings will facilitate the development of machine-learning models in genetics and related fields.

MCML Authors
Link to Profile Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine


[2]
A. Karollus, J. Hingerl, D. Gankin, M. Grosshauser, K. Klemon and J. Gagneur.
Species-aware DNA language models capture regulatory elements and their evolution.
Genome Biology 35.83 (Apr. 2023). DOI
Abstract

Background: The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution.
Results: Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery.
Conclusions: Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.

MCML Authors
Link to website

Alexander Karollus

Computational Molecular Medicine

Link to website

Johannes Hingerl

Computational Molecular Medicine

Link to Profile Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine


[1]
P. T. da Silva, Y. Zhang, E. Theodorakis, L. D. Martens, V. A. Yépez, V. Pelechano and J. Gagneur.
Cellular energy regulates mRNA translation and degradation in a codon-specific manner.
Preprint (2023). DOI
Abstract

Background: Codon optimality is a major determinant of mRNA translation and degradation rates. However, whether and through which mechanisms its effects are regulated remains poorly understood.
Results: Here we show that codon optimality associates with up to 2-fold change in mRNA stability variations between human tissues, and that its effect is attenuated in tissues with high energy metabolism and amplifies with age. Biochemical modeling and perturbation data through oxygen deprivation and ATP synthesis inhibition reveal that cellular energy variations non-uniformly affect the decoding kinetics of different codons.
Conclusions: This new mechanism of codon effect regulation, independent of tRNA regulation, provides a fundamental mechanistic link between cellular energy metabolism and eukaryotic gene expression.

MCML Authors
Link to website

Pedro Tomaz da Silva

Computational Molecular Medicine

Link to Profile Julien Gagneur

Julien Gagneur

Prof. Dr.

Computational Molecular Medicine