Applying the Safe-And-Complete Framework to Practical Genome Assembly

Schmidt, Sebastian; Toivonen, Santeri; Medvedev, Paul; Tomescu, Alexandru I

doi:10.4230/LIPIcs.WABI.2024.8

Citation Details

Applying the Safe-And-Complete Framework to Practical Genome Assembly

Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs (simple omnitigs), giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the D. melanogaster and the C. elegans genomes. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible additional computational costs and either no or a small increase in the number of misassemblies. more »

Award ID(s):: 2138585 1931531

PAR ID:: 10616429

Author(s) / Creator(s):: Schmidt, Sebastian; Toivonen, Santeri; Medvedev, Paul; Tomescu, Alexandru I

Editor(s):: Pissis, Solon P; Sung, Wing-Kin

Publisher / Repository:: Schloss Dagstuhl – Leibniz-Zentrum für Informatik

Date Published:: 2024-01-01

Volume:: 312

ISSN:: 1868-8969

ISBN:: 978-3-95977-340-9

Page Range / eLocation ID:: 8:1-8:16

Subject(s) / Keyword(s):: Genome assembly Omnitigs Safe-and-complete framework graph algorithm HiFi sequencing data Assembly evaluation Applied computing → Computational biology Mathematics of computing → Paths and connectivity problems Theory of computation → Graph algorithms analysis

Format(s):: Medium: X Size: 16 pages; 1032186 bytes Other: application/pdf

Size(s):: 16 pages 1032186 bytes

Right(s):: Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.4230/LIPIcs.WABI.2024.8

More Like this