We consider the reconstruction of a codeword from multiple noisy copies, each independently corrupted by insertion, deletion, and substitution errors. This problem arises, for example, in DNA data storage. A common code construction uses a concatenated code with an outer linear block code and an inner marker code, decoded via Belief Propagation (BP) and the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm, respectively. However, the BCJR algorithm scales exponentially with the number of nois copies, making reconstruction from more than about four copies infeasible. In this paper, we introduce BCJRFormer, a transformer-based neural inner decoder for marker codes. BCJRFormer achieves error rates comparable to the BCJR algorithm for single-message transmissions while scaling only quadratically with the number of noisy copies. This makes BCJRFormer well suited for DNA data storage, where multiple reads of the same DNA sequence are common. To further reduce the bit error rate, we replace the BP outer decoder with a transformer-based decoder. Combined, this results in a performant and efficient end-to-end transformer-based pipeline for decoding multiple noisy copies corrupted by insertion, deletion, and substitution errors.
inproceedings
BibTeXKey: SWH25