skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: RRAP: RPKM Recruitment Analysis Pipeline
ABSTRACT A common method for quantifying microbial abundances in situ is through metagenomic read recruitment to genomes and normalizing read counts as reads per kilobase (of genome) per million (bases of recruited sequences) (RPKM). We created RRAP (RPKM Recruitment Analysis Pipeline), a wrapper that automates this process using Bowtie2 and SAMtools.  more » « less
Award ID(s):
1931113 1945279
PAR ID:
10408621
Author(s) / Creator(s):
; ;
Editor(s):
Newton, Irene L.
Date Published:
Journal Name:
Microbiology Resource Announcements
Volume:
11
Issue:
9
ISSN:
2576-098X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This dataset contains measurements of Eastern oyster (Crassostrea virginica) recruitment to standardized ceramic tiles deployed across intertidal oyster reef sites in the Virginia Coast Reserve. Recruitment is defined as the number of macroscopic oyster recruits (less than or equal to 25 mm shell height) per square centimeter of tile surface, capturing settlement and early post-settlement survival. Data were collected in 2018, 2019, and 2021 across 9-16 reef sites per year, including both natural and restored reefs. The dataset supports research on spatial and environmental drivers of oyster recruitment and has been validated against natural reef substrate data for comparability. 
    more » « less
  2. Classified as a complex big data analytics problem, DNA short read alignment serves as a major sequential bottleneck to massive amounts of data generated by next-generation sequencing platforms. With Von-Neumann computing architectures struggling to address such computationally-expensive and memory-intensive task today, Processing-in-Memory (PIM) platforms are gaining growing interests. In this paper, an energy-efficient and parallel PIM accelerator (AlignS) is proposed to execute DNA short read alignment based on an optimized and hardware-friendly alignment algorithm. We first develop AlignS platform that harnesses SOT-MRAM as computational memory and transforms it to a fundamental processing unit for short read alignment. Accordingly, we present a novel, customized, highly parallel read alignment algorithm that only seeks the proposed simple and parallel in-memory operations (i.e. comparisons and additions). AlignS is then optimized through a new correlated data partitioning and mapping methodology that allows local storage and processing of DNA sequence to fully exploit the algorithm-level's parallelism, and to accelerate both exact and inexact matches. The device-to-architecture co-simulation results show that AlignS improves the short read alignment throughput per Watt per mm^2 by ~12X compared to the ASIC accelerator. Compared to recent FM-index-based ReRAM platform, AlignS achieves 1.6X higher throughput per Watt. 
    more » « less
  3. Anticipating the next generation of forests requires understanding of recruitment responses to habitat change. Tree distribution and abundance depend not only on climate, but also on habitat variables, such as soils and drainage, and on competition beneath a shaded canopy. Recent analyses show that North American tree species are migrating in response to climate change, which is exposing each population to novel climate-habitat interactions (CHI). Because CHI have not been estimated for either adult trees or regeneration (recruits per year per adult basal area), we cannot evaluate migration potential into the future. Using the Masting Inference and Forecasting (MASTIF) network of tree fecundity and new continent-wide observations of tree recruitment, we quantify impacts for redistribution across life stages from adults to fecundity to recruitment. We jointly modeled response of adult abundance and recruitment rate to climate/habitat conditions, combined with fecundity sensitivity, to evaluate if shifting CHI explain community reorganization. To compare climate effects with tree fecundity, which is estimated from trees and thus is "conditional" on tree presence, we demonstrate how to quantify this conditional status for regeneration. We found that fecundity was regulated by temperature to a greater degree than other stages, yet exhibited limited responses to moisture deficit. Recruitment rate expressed strong sensitivities to CHI, more like adults than fecundity, but still with substantial differences. Communities reorganized from adults to fecundity, but there was a re-coalescence of groups as seedling recruitment partially reverted to community structure similar to that of adults. Results provide the first estimates of continent-wide community sensitivity and their implications for reorganization across three life-history stages under climate change. 
    more » « less
  4. Abstract Growth rates are central to understanding microbial interactions and community dynamics. Metagenomic growth estimators have been developed, specifically codon usage bias (CUB) for maximum growth rates and “peak-to-trough ratio” (PTR) for in situ rates. Both were originally tested with pure cultures, but natural populations are more heterogeneous, especially in individual cell histories pertinent to PTR. To test these methods, we compared predictors with observed growth rates of freshly collected marine prokaryotes in unamended seawater. We prefiltered and diluted samples to remove grazers and greatly reduce virus infection, so net growth approximated gross growth. We sampled over 44 h for abundances and metagenomes, generating 101 metagenome-assembled genomes (MAGs), including Actinobacteria, Verrucomicrobia, SAR406, MGII archaea, etc. We tracked each MAG population by cell-abundance-normalized read recruitment, finding growth rates of 0 to 5.99 per day, the first reported rates for several groups, and used these rates as benchmarks. PTR, calculated by three methods, rarely correlated to growth (r ~−0.26–0.08), except for rapidly growing γ-Proteobacteria (r ~0.63–0.92), while CUB correlated moderately well to observed maximum growth rates (r = 0.57). This suggests that current PTR approaches poorly predict actual growth of most marine bacterial populations, but maximum growth rates can be approximated from genomic characteristics. 
    more » « less
  5. null (Ed.)
    Abstract Background PacBio sequencing is an incredibly valuable third-generation DNA sequencing method due to very long read lengths, ability to detect methylated bases, and its real-time sequencing methodology. Yet, hitherto no tool was available for analyzing the quality of, subsampling, and filtering PacBio data. Results Here we present SequelTools , a command-line program containing three tools: Quality Control, Read Subsampling, and Read Filtering. The Quality Control tool quickly processes PacBio Sequel raw sequence data from multiple SMRTcells producing multiple statistics and publication-quality plots describing the quality of the data including N50, read length and count statistics, PSR, and ZOR. The Read Subsampling tool allows the user to subsample reads by one or more of the following criteria: longest subreads per CLR or random CLR selection. The Read Filtering tool provides options for normalizing data by filtering out certain low-quality scraps reads and/or by minimum CLR length. SequelTools is implemented in bash, R, and Python using only standard libraries and packages and is platform independent. Conclusions SequelTools is a program that provides the only free, fast, and easy-to-use quality control tool, and the only program providing this kind of read subsampling and read filtering for PacBio Sequel raw sequence data, and is available at https://github.com/ISUgenomics/SequelTools . 
    more » « less