Search for: All records

Award ID contains: 2029552

« Prev Next »

Total Resources

19

Resource Type
Conference Paper

11

Conference Proceeding

0

Dataset

0

Journal Article

8

Workshop Report

0

Availability
Full Text / Resource Available

19

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pangenomic genotyping with the marker array

https://doi.org/10.1186/s13015-023-00225-3

Mun, Taher ; Vaddadi, Naga Sai Kavya ; Langmead, Ben ( May 2023 , Algorithms for Molecular Biology)

Abstract
We present a new method and software tool called that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while reducing the reference bias that results when aligning to a single linear reference. can infer accurate genotypes in less time and memory compared to existing graph-based methods. The method is implemented in the open source software tool available athttps://github.com/alshai/rowbowt.

more » « less
SPUMONI 2: improved classification using a pangenome index of minimizer digests

https://doi.org/10.1186/s13059-023-02958-1

Ahmed, Omar Y. ; Rossi, Massimiliano ; Gagie, Travis ; Boucher, Christina ; Langmead, Ben ( May 2023 , Genome Biology)

Abstract
Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2’s index is 65 times smaller than minimap2’s for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification.

more » « less
Augmented Thresholds for MONI

https://doi.org/10.1109/DCC55655.2023.00035

Martínez-Guardiola, César ; Brown, Nathaniel K. ; Silva-Coira, Fernando ; Köppl, Dominik ; Gagie, Travis ; Ladra, Susana ( March 2023 , IEEE Data Compression Conference (DCC))

Full Text Available
Recursive Prefix-Free Parsing for Building Big BWTs

https://doi.org/10.1109/DCC55655.2023.00014

Oliva, Marco ; Gagie, Travis ; Boucher, Christina ( March 2023 , IEEE Data Compression Conference)

Full Text Available
Efficient taxa identification using a pangenome index

https://doi.org/10.1101/gr.277642.123

Ahmed, Omar ; Rossi, Massimiliano ; Boucher, Christina ; Langmead, Ben ( January 2023 , Genome Research)

Full Text Available
MONI Can Find k-MEMs

Tatarnikov, Igor ; Shahrabi Farahani, Ardavan ; Kashgouli, Sana ; Gagie, Travis ( January 2023 , CPM)

Full Text Available
LZ77 via Prefix-Free Parsing

Aaron Hong, Massimiliano Rossi ( January 2023 , SIAM ALENEX)

Full Text Available
CSTs for Terabyte-Sized Data

https://doi.org/10.1109/DCC52660.2022.00017

Oliva, Marco ; Cenzato, Davide ; Rossi, Massimiliano ; Liptak, Zsuzsanna ; Gagie, Travis ; Boucher, Christina ( March 2022 , Data Compression Conference (DCC))

Generating pangenomic datasets is becoming increasingly common but there are still few tools able to handle them and even fewer accessible to non-specialists. Building compressed suffix trees (CSTs) for pangenomic datasets is still a major challenge but could be enor- mously beneficial to the community. In this paper, we present a method, which we refer to as RePFP-CST, for building CSTs in a manner that is scalable. To accomplish this, we show how to build a CST directly from VCF files without decompressing them, and to prune from the prefix-free parse (PFP) phrase boundaries whose removal reduces the total size of the dictionary and the parse. We show that these improvements reduce the time and space required for the construction of the CST, and the memory footprint of the finished CST, enabling us to build a CST for a terabyte of DNA for the first time in the literature.
more » « less
Full Text Available
Computational graph pangenomics: a tutorial on data structures and their applications

https://doi.org/10.1007/s11047-022-09882-6

Baaijens, Jasmijn A. ; Bonizzoni, Paola ; Boucher, Christina ; Della Vedova, Gianluca ; Pirola, Yuri ; Rizzi, Raffaella ; Sirén, Jouni ( March 2022 , Natural Computing)

Abstract Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations—thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome , is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.
more » « less
Full Text Available
MONI: A Pangenomic Index for Finding Maximal Exact Matches

https://doi.org/10.1089/cmb.2021.0290

Rossi, Massimiliano ; Oliva, Marco ; Langmead, Ben ; Gagie, Travis ; Boucher, Christina ( February 2022 , Journal of Computational Biology)

Full Text Available

« Prev Next »