Search for: All records

Award ID contains: 1816027

« Prev Next »

Total Resources

14

Resource Type
Conference Paper

7

Conference Proceeding

0

Dataset

0

Journal Article

7

Workshop Report

0

Availability
Full Text / Resource Available

14

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A fast adaptive algorithm for computing whole-genome homology maps

https://doi.org/10.1093/bioinformatics/bty597

Jain, Chirag ; Koren, Sergey ; Dilthey, Alexander ; Phillippy, Adam M. ; Aluru, Srinivas ( September 2018 , Bioinformatics)

Abstract Motivation
Whole-genome alignment is an important problem in genomics for comparing different species, mapping draft assemblies to reference genomes and identifying repeats. However, for large plant and animal genomes, this task remains compute and memory intensive. In addition, current practical methods lack any guarantee on the characteristics of output alignments, thus making them hard to tune for different application requirements.
Results
We introduce an approximate algorithm for computing local alignment boundaries between long DNA sequences. Given a minimum alignment length and an identity threshold, our algorithm computes the desired alignment boundaries and identity estimates using kmer-based statistics, and maintains sufficient probabilistic guarantees on the output sensitivity. Further, to prioritize higher scoring alignment intervals, we develop a plane-sweep based filtering technique which is theoretically optimal and practically efficient. Implementation of these ideas resulted in a fast and accurate assembly-to-genome and genome-to-genome mapper. As a result, we were able to map an error-corrected whole-genome NA12878 human assembly to the hg38 human reference genome in about 1 min total execution time and <4 GB memory using eight CPU threads, achieving significant improvement in memory-usage over competing methods. Recall accuracy of computed alignment boundaries was consistently found to be >97% on multiple datasets. Finally, we performed a sensitive self-alignment of the human genome to compute all duplications of length ≥1 Kbp and ≥90% identity. The reported output achieves good recall and covers twice the number of bases than the current UCSC browser’s segmental duplication annotation.
Availability and implementation
https://github.com/marbl/MashMap

more » « less
On the Hardness of Sequence Alignment on De Bruijn Graphs

https://doi.org/10.1089/cmb.2022.0411

Gibney, Daniel ; Thankachan, Sharma V. ; Aluru, Srinivas ( December 2022 , Journal of Computational Biology)

Full Text Available
Algorithms for Colinear Chaining with Overlaps and Gap Costs

https://doi.org/10.1089/cmb.2022.0266

Jain, Chirag ; Gibney, Daniel ; Thankachan, Sharma V. ( November 2022 , Journal of Computational Biology)

Full Text Available
Haplotype-aware variant selection for genome graphs

Tavakoli, Neda ; Gibney, Daniel ; Aluru, Srinivas. ( August 2022 , Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics)

Graph-based genome representations have proven to be a powerful tool in genomic analysis due to their ability to encode variations found in multiple haplotypes and capture population genetic diversity. Such graphs also unavoidably contain paths which switch between haplotypes (i.e., recombinant paths) and thus do not fully match any of the constituent haplotypes. The number of such recombinant paths increases combinatorially with path length and cause inefficiencies and false positives when mapping reads. In this paper, we study the problem of finding reduced haplotype-aware genome graphs that incorporate only a selected subset of variants, yet contain paths corresponding to all α-long substrings of the input haplotypes (i.e., non-recombinant paths) with at most δ mismatches. Solving this problem optimally, i.e., minimizing the number of variants selected, is previously known to be NP-hard. Here, we first establish several inapproximability results regarding finding haplotype-aware reduced variation graphs of optimal size. We then present an integer linear programming (ILP) formulation for solving the problem, and experimentally demonstrate this is a computationally feasible approach for real-world problems and provides far superior reduction compared to prior approaches.
more » « less
Full Text Available
Co-linear Chaining with Overlaps and Gap Costs

https://doi.org/10.1007/978-3-031-04749-7_15

Jain, Chirag ; Gibney, Daniel ; Thankachan, Sharma V. ( January 2022 , Research in Computational Molecular Biology - 26th Annual International Conference, RECOMB 2022)

Full Text Available
The Complexity of Approximate Pattern Matching on de Bruijn Graphs

https://doi.org/10.1007/978-3-031-04749-7_16

Gibney, Daniel ; Thankachan, Sharma V. ; Aluru, Srinivas ( January 2022 , Research in Computational Molecular Biology - 26th Annual International Conference, RECOMB 2022)

Full Text Available
Feasibility of flow decomposition with subpath constraints in linear time

Gibney, Daniel ; Thankachan ; Sharma V ; Aluru, S. ( January 2022 , 22nd International Workshop on Algorithms in Bioinformatics, Leibniz International Proceedings in Informatics)
Boucher, Chritina ; Rahmann, Sven. (Ed.)
Full Text Available
Real-time mapping of nanopore raw signals

https://doi.org/10.1093/bioinformatics/btab264

Zhang, Haowen ; Li, Haoran ; Jain, Chirag ; Cheng, Haoyu ; Au, Kin Fai ; Li, Heng ; Aluru, Srinivas ( July 2021 , Bioinformatics)

Abstract Motivation Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either require a computationally expensive base-calling procedure before using aligners to map partial reads or work well only on small genomes. Results In this work, we present a new streaming method that can map nanopore raw signals for real-time selective sequencing. Rather than converting read signals to bases, we propose to convert reference genomes to signals and fully operate in the signal space. Our method features a new way to index reference genomes using k-d trees, a novel seed selection strategy and a seed chaining algorithm tailored toward the current signal characteristics. We implemented the method as a tool Sigmap. Then we evaluated it on both simulated and real data and compared it to the state-of-the-art nanopore raw signal mapper Uncalled. Our results show that Sigmap yields comparable performance on mapping yeast simulated raw signals, and better mapping accuracy on mapping yeast real raw signals with a 4.4× speedup. Moreover, our method performed well on mapping raw signals to genomes of size >100 Mbp and correctly mapped 11.49% more real raw signals of green algae, which leads to a significantly higher F1-score (0.9354 versus 0.8660). Availability and implementation Sigmap code is accessible at https://github.com/haowenz/sigmap. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
A variant selection framework for genome graphs

https://doi.org/10.1093/bioinformatics/btab302

Jain, Chirag ; Tavakoli, Neda ; Aluru, Srinivas ( July 2021 , Bioinformatics)

Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
On the Complexity of Sequence-to-Graph Alignment

https://doi.org/10.1089/cmb.2019.0066

Jain, Chirag ; Zhang, Haowen ; Gao, Yu ; Aluru, Srinivas ( April 2020 , Journal of Computational Biology)

Full Text Available

« Prev Next »