Pangenome graph augmentation from unassembled long reads

Denti, Luca  (ORCID:0000000187862276); Bonizzoni, Paola  (ORCID:0000000172894988); Brejova, Brona  (ORCID:0000000294831766); Chikhi, Rayan  (ORCID:0000000310998735); Krannich, Thomas  (ORCID:0000000255251849); Vinar, Tomas  (ORCID:0000000338983447); Hormozdiari, Fereydoun  (ORCID:0000000327039274)

doi:10.1101/2025.02.07.637057

Abstract Pangenomes are becoming increasingly popular data structures for genomics analyses due to their ability to compactly represent the genetic diversity within populations. Constructing a pangenome graph, however, is still a time-consuming and expensive process. A promising approach for pangenome construction consists of progressively augmenting a pangenome graph with additional high-quality assemblies. Currently, there is no method for augmenting a pangenome graph with unassembled reads from newly sequenced samples without first aligning the reads to a reference genome and performing variant calling and genotyping on the new individuals. In this work, we present the first assembly-free and mapping-free approach for augmenting an existing pangenome graph using unassembled long reads from an individual not already present in the pangenome. Our approach consists of finding sample specific sequences in reads using efficient indexes, clustering reads corresponding to the same novel variant(s), and then building a consensus sequence to be added to the pangenome graph for each variant separately. Using simulated reads based on Human Pangenome Reference Consortium (HPRC) assemblies, we demonstrate the effectiveness of the proposed approach for progressively augmenting the pangenome with long reads, without the need forde novoassembly or predicting genetic variants of the new sample. The software is freely available athttps://github.com/ldenti/palss.

More Like this