Home  | Publications | SGD+23

Neural Architecture Search for Genomic Sequence Data

MCML Authors

Abstract

Deep learning has enabled outstanding progress on bioinformatics datasets and a variety of tasks, such as protein structure prediction, identification of regulatory regions, genome annotation, and interpretation of the noncoding genome. The layout and configuration of neural networks used for these tasks have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Therefore, there is growing interest in automated neural architecture search (NAS) methods in bioinformatics. In this paper, we present a novel search space for NAS algorithms that operate on genome data, thus creating extensions for existing NAS algorithms for sequence data that we name Genome-DARTS, Genome-P-DARTS, Genome-BONAS, Genome-SH, and Genome-RS. Moreover, we introduce two novel NAS algorithms, CWP-DARTS and EDPDARTS, that build on and extend the idea of P-DARTS. We evaluate the presented methods and compare them to manually designed neural architectures on a widely used genome sequence machine learning task to show that NAS methods can be adapted well for bioinformatics sequence datasets. Our experiments show that architectures optimized by our NAS methods outperform manually developed architectures while having significantly fewer parameters.

inproceedings


CIBCB 2023

20th IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology. Eindhoven, The Netherlands, Aug 29-31, 2023.

Authors

A. Scheppach • H. A. GündüzE. Dorigatti • P. C. Münch • A. C. McHardy • B. BischlM. RezaeiM. Binder

Links

DOI

Research Area

 A1 | Statistical Foundations & Explainability

BibTeXKey: SGD+23

Back to Top