Search for: All records

Award ID contains: 2029424

« Prev Next »

Total Resources

20

Resource Type
Conference Paper

2

Conference Proceeding

0

Dataset

0

Journal Article

18

Workshop Report

0

Availability
Full Text / Resource Available

19

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

simpleaf : a simple, flexible, and scalable framework for single-cell data processing using alevin-fry

https://doi.org/10.1093/bioinformatics/btad614

He, Dongze ; Patro, Rob ; Ponty, ed., Yann ( October 2023 , Bioinformatics)

Abstract Summary
The alevin-fry ecosystem provides a robust and growing suite of programs for single-cell data processing. However, as new single-cell technologies are introduced, as the community continues to adjust best practices for data processing, and as the alevin-fry ecosystem itself expands and grows, it is becoming increasingly important to manage the complexity of alevin-fry’s single-cell preprocessing workflows while retaining the performance and flexibility that make these tools enticing. We introduce simpleaf, a program that simplifies the processing of single-cell data using tools from the alevin-fry ecosystem, and adds new functionality and capabilities, while retaining the flexibility and performance of the underlying tools.
Availability and implementation
Simpleaf is written in Rust and released under a BSD 3-Clause license. It is freely available from its GitHub repository https://github.com/COMBINE-lab/simpleaf, and via bioconda. Documentation for simpleaf is available at https://simpleaf.readthedocs.io/en/latest/ and tutorials for simpleaf that have been developed can be accessed at https://combine-lab.github.io/alevin-fry-tutorials.

more » « less
SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty

https://doi.org/10.1186/s13059-023-03003-x

Wu, Euphy Y. ; Singh, Noor P. ; Choi, Kwangbom ; Zakeri, Mohsen ; Vincent, Matthew ; Churchill, Gary A. ; Ackert-Bicknell, Cheryl L. ; Patro, Rob ; Love, Michael I. ( July 2023 , Genome Biology)

Abstract
Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time.

more » « less
An incrementally updatable and scalable system for large-scale sequence search using the Bentley–Saxe transformation

https://doi.org/10.1093/bioinformatics/btac142

Almodaresi, Fatemeh ; Khan, Jamshed ; Madaminov, Sergey ; Ferdman, Michael ; Johnson, Rob ; Pandey, Prashant ; Patro, Rob ; Boeva, ed., Valentina ( March 2022 , Bioinformatics)

Abstract Motivation
In the past few years, researchers have proposed numerous indexing schemes for searching large datasets of raw sequencing experiments. Most of these proposed indexes are approximate (i.e. with one-sided errors) in order to save space. Recently, researchers have published exact indexes—Mantis, VariMerge and Bifrost—that can serve as colored de Bruijn graph representations in addition to serving as k-mer indexes. This new type of index is promising because it has the potential to support more complex analyses than simple searches. However, in order to be useful as indexes for large and growing repositories of raw sequencing data, they must scale to thousands of experiments and support efficient insertion of new data.
Results
In this paper, we show how to build a scalable and updatable exact raw sequence-search index. Specifically, we extend Mantis using the Bentley–Saxe transformation to support efficient updates, called Dynamic Mantis. We demonstrate Dynamic Mantis’s scalability by constructing an index of ≈40K samples from SRA by adding samples one at a time to an initial index of 10K samples. Compared to VariMerge and Bifrost, Dynamic Mantis is more efficient in terms of index-construction time and memory, query time and memory and index size. In our benchmarks, VariMerge and Bifrost scaled to only 5K and 80 samples, respectively, while Dynamic Mantis scaled to more than 39K samples. Queries were over 24× faster in Mantis than in Bifrost (VariMerge does not immediately support general search queries we require). Dynamic Mantis indexes were about 2.5× smaller than Bifrost’s indexes and about half as big as VariMerge’s indexes.
Availability and implementation
Dynamic Mantis implementation is available at https://github.com/splatlab/mantis/tree/mergeMSTs.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

https://doi.org/10.1186/s13059-022-02743-6

Khan, Jamshed ; Kokot, Marek ; Deorowicz, Sebastian ; Patro, Rob ( September 2022 , Genome Biology)

Abstract
The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17–23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54–58 h, using considerably more memory.

more » « less
TreeTerminus —creating transcript trees using inferential replicate counts

https://doi.org/10.1016/j.isci.2023.106961

Singh, Noor Pratap ; Love, Michael I. ; Patro, Rob ( June 2023 , iScience)

Free, publicly-accessible full text available June 1, 2024
Spectrum Preserving Tilings Enable Sparse and Modular Reference Indexing

Jason Fan ; Jamshed Khan ; Giulio Ermanno Pibiri ; Rob Patro ( April 2023 , RECOMB 2023: Research in Computational Molecular Biology)

Full Text Available
Perplexity: evaluating transcript abundance estimation in the absence of ground truth

https://doi.org/10.1186/s13015-022-00214-y

Fan, Jason ; Chan, Skylar ; Patro, Rob ( December 2022 , Algorithms for Molecular Biology)

Abstract Background There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Although hyperparameter selection is known to affect the distributions of inferred abundances (e.g. producing smooth versus sparse estimates), strategies for performing model selection in experimental data have been addressed informally at best. Results We derive perplexity for evaluating abundance estimates on fragment sets directly. We adapt perplexity from the analogous metric used to evaluate language and topic models and extend the metric to carefully account for corner cases unique to RNA-seq. In experimental data, estimates with the best perplexity also best correlate with qPCR measurements. In simulated data, perplexity is well behaved and concordant with genome-wide measurements against ground truth and differential expression analysis. Furthermore, we demonstrate theoretically and experimentally that perplexity can be computed for arbitrary transcript abundance estimation models. Conclusions Alongside the derivation and implementation of perplexity for transcript abundance estimation, our study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth.
more » « less
Full Text Available
AGAMEMNON: an Accurate metaGenomics And MEtatranscriptoMics quaNtificatiON analysis suite

https://doi.org/10.1186/s13059-022-02610-4

Skoufos, Giorgos ; Almodaresi, Fatemeh ; Zakeri, Mohsen ; Paulson, Joseph N. ; Patro, Rob ; Hatzigeorgiou, Artemis G. ; Vlachos, Ioannis S. ( December 2022 , Genome Biology)

Abstract We introduce AGAMEMNON ( https://github.com/ivlachos/agamemnon ) for the acquisition of microbial abundances from shotgun metagenomics and metatranscriptomic samples, single-microbe sequencing experiments, or sequenced host samples. AGAMEMNON delivers accurate abundances at genus, species, and strain resolution. It incorporates a time and space-efficient indexing scheme for fast pattern matching, enabling indexing and analysis of vast datasets with widely available computational resources. Host-specific modules provide exceptional accuracy for microbial abundance quantification from tissue RNA/DNA sequencing, enabling the expansion of experiments lacking metagenomic/metatranscriptomic analyses. AGAMEMNON provides an R-Shiny application, permitting performance of investigations and visualizations from a graphics interface.
more » « less
Full Text Available
Airpart: Interpretable statistical models for analyzing allelic imbalance in single-cell datasets

https://doi.org/10.1093/bioinformatics/btac212

Mu, Wancen ; Sarkar, Hirak ; Srivastava, Avi ; Choi, Kwangbom ; Patro, Rob ; Love, Michael I ( April 2022 , Bioinformatics)
Kendziorski, Christina (Ed.)
Abstract Motivation Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial-, or time-dependent AI signals may be dampened or not detected. Results We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing (scRNA-seq) data, or other spatially- or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower RMSE of allelic ratio estimates than existing methods. In real data, airpart identified DAI patterns across cell states and could be used to define trends of AI signal over spatial or time axes. Availability The airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart.
more » « less
Full Text Available
Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data

https://doi.org/10.1038/s41592-022-01408-3

He, Dongze ; Zakeri, Mohsen ; Sarkar, Hirak ; Soneson, Charlotte ; Srivastava, Avi ; Patro, Rob ( March 2022 , Nature Methods)

Full Text Available

« Prev Next »