Posted ContentPangenome graph augmentation from unassembled long readsDenti, Luca [Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Slovakia] (ORCID:0000000187862276); Bonizzoni, Paola [Department of Computer Science, University of Milano-Bicocca, Milan, Italy] (ORCID:0000000172894988); Brejova, Brona [Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Slovakia] (ORCID:0000000294831766); Chikhi, Rayan [Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, F-75015 Paris, France] (ORCID:0000000310998735); Krannich, Thomas [Genome Competence Center, Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany] (ORCID:0000000255251849); Vinar, Tomas [Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Slovakia] (ORCID:0000000338983447); Hormozdiari, Fereydoun [Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA, USA] (ORCID:0000000327039274)<title>Abstract</title> Pangenomes are becoming increasingly popular data structures for genomics analyses due to their ability to compactly represent the genetic diversity within populations. Constructing a pangenome graph, however, is still a time-consuming and expensive process. A promising approach for pangenome construction consists of progressively augmenting a pangenome graph with additional high-quality assemblies. Currently, there is no method for augmenting a pangenome graph with unassembled reads from newly sequenced samples without first aligning the reads to a reference genome and performing variant calling and genotyping on the new individuals. In this work, we present the first assembly-free and mapping-free approach for augmenting an existing pangenome graph using unassembled long reads from an individual not already present in the pangenome. Our approach consists of finding sample specific sequences in reads using efficient indexes, clustering reads corresponding to the same novel variant(s), and then building a consensus sequence to be added to the pangenome graph for each variant separately. Using simulated reads based on Human Pangenome Reference Consortium (HPRC) assemblies, we demonstrate the effectiveness of the proposed approach for progressively augmenting the pangenome with long reads, without the need for<italic>de novo</italic>assembly or predicting genetic variants of the new sample. The software is freely available at<ext-link ext-link-type='uri' href='https://github.com/ldenti/palss'>https://github.com/ldenti/palss</ext-link>.bioRxiv2025-02-0810672681https://doi.org/10.1101/2025.02.07.6370572042518National Science Foundation