skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Controlling taxa abundance improves metatranscriptomics differential analysis
Abstract BackgroundA common task in analyzing metatranscriptomics data is to identify microbial metabolic pathways with differential RNA abundances across multiple sample groups. With information from paired metagenomics data, some differential methods control for either DNA or taxa abundances to address their strong correlation with RNA abundance. However, it remains unknown if both factors need to be controlled for simultaneously. ResultsWe discovered that when either DNA or taxa abundance is controlled for, RNA abundance still has a strong partial correlation with the other factor. In both simulation studies and a real data analysis, we demonstrated that controlling for both DNA and taxa abundances leads to superior performance compared to only controlling for one factor. ConclusionsTo fully address the confounding effects in analyzing metatranscriptomics data, both DNA and taxa abundances need to be controlled for in the differential analysis.  more » « less
Award ID(s):
2133504
PAR ID:
10400673
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
BMC Microbiology
Volume:
23
Issue:
1
ISSN:
1471-2180
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Differential abundance tests for compositional data are essential and fundamental in various biomedical applications, such as single-cell, bulk RNA-seq and microbiome data analysis. However, because of the compositional constraint and the prevalence of zero counts in the data, differential abundance analysis on compositional data remains a complicated and unsolved statistical problem. This article proposes a new differential abundance test, the robust differential abundance test, to address these challenges. Compared with existing methods, the robust differential abundance test is simple and computationally efficient, is robust to prevalent zero counts in compositional datasets, can take the data’s compositional nature into account, and has a theoretical guarantee of controlling false discoveries in a general setting. Furthermore, in the presence of observed covariates, the robust differential abundance test can work with covariate-balancing techniques to remove potential confounding effects and draw reliable conclusions. The proposed test is applied to several numerical examples, and its merits are demonstrated using both simulated and real datasets. 
    more » « less
  2. Abstract BackgroundStudying the co-occurrence network structure of microbial samples is one of the critical approaches to understanding the perplexing and delicate relationship between the microbe, host, and diseases. It is also critical to develop a tool for investigating co-occurrence networks and differential abundance analyses to reveal the disease-related taxa–taxa relationship. In addition, it is also necessary to tighten the co-occurrence network into smaller modules to increase the ability for functional annotation and interpretability of  these taxa-taxa relationships.  Also, it is critical to retain the phylogenetic relationship among the taxa to identify differential abundance patterns, which can be used to resolve contradicting functions reported by different studies. ResultsIn this article, we present Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA), a user-friendly R package for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA contains two interactive graphic user interfaces (Shiny applications), one of them dedicated to the comparison between two diagnoses, e.g., disease versus control. We used C3NA to analyze two well-studied diseases, colorectal cancer, and Crohn’s disease. We discovered clusters of study and disease-dependent taxa that overlap with known functional taxa studied by other discovery studies and differential abundance analyses. ConclusionC3NA offers a new microbial data analyses pipeline for refined and enriched taxa–taxa co-occurrence network analyses, and the usability was further expanded via the built-in Shiny applications for interactive investigation. 
    more » « less
  3. Abstract AimAbundance–occupancy relationships posit that more locally abundant species occupy more sites than less abundant species. Although widely supported, the occurrence and detection of abundance–occupancy relationships is sensitive to sampling and detection processes. Data from large‐scale standardized sampling efforts are key to address abundance–occupancy relationships. We aimed to use such a dataset to evaluate the occurrence of abundance–occupancy relationships across different spatial grains and over time for aquatic and terrestrial taxa. LocationUSA. Time period2014–2019. Major taxa studiedBirds, mammals, beetles, ticks, fishes, macroinvertebrates and zooplankton. MethodsSpecies abundance and occupancy data were obtained from the National Ecological Observatory Network (NEON). Species mean abundance and occupancy (fraction of sampled locations that were occupied) were estimated for three different spatial grains (i.e., plot, site and domain) for all years sampled. Linear models were used to explore the consistency of interspecific abundance–occupancy relationships. The slope coefficients of these models were related to temporal and spatial variables and to species richness while controlling for taxa in a linear mixed‐effects model (LMM) framework. ResultsWe found evidence for positive abundance–occupancy relationships across the three spatial grains and over time for all taxa we studied. However, our linear models had low explanatory power, suggesting that relationships, although general, were weak. Abundance–occupancy relationships were slightly stronger at the smallest spatial grain than at the largest spatial grain, but showed no detectable change over time for any taxa. Finally, species richness was not associated with the strength of these relationships. Main conclusionsTogether, our results suggest that positive interspecific abundance–occupancy relationships are fairly general but are not capable of explaining substantial variation in spatial patterns of abundance, and that other factors, such as species traits and niche, are also likely to influence these relationships. 
    more » « less
  4. Mathelier, Anthony (Ed.)
    Abstract MotivationAs nanopore technology reaches ever higher throughput and accuracy, it becomes an increasingly viable candidate for reading out DNA data storage. Nanopore sequencing offers considerable flexibility by allowing long reads, real-time signal analysis, and the ability to read both DNA and RNA. We need flexible and efficient designs that match nanopore’s capabilities, but relatively few designs have been explored and many have significant inefficiency in read density, error rate, or compute time. To address these problems, we designed a new single-read per-strand decoder that achieves low byte error rates, offers high throughput, scales to long reads, and works well for both DNA and RNA molecules. We achieve these results through a novel soft decoding algorithm that can be effectively parallelized on a GPU. Our faster decoder allows us to study a wider range of system designs. ResultsWe demonstrate our approach on HEDGES, a state-of-the-art DNA-constrained convolutional code. We implement one hard decoder that runs serially and two soft decoders that run on GPUs. Our evaluation for each decoder is applied to the same population of nanopore reads collected from a synthesized library of strands. These same strands are synthesized with a T7 promoter to enable RNA transcription and decoding. Our results show that the hard decoder has a byte error rate over 25%, while the prior state of the art soft decoder can achieve error rates of 2.25%. However, that design also suffers a low throughput of 183 s/read. Our new Alignment Matrix Trellis soft decoder improves throughput by 257× with the trade-off of a higher byte error rate of 3.52% compared to the state of the art. Furthermore, we use the faster speed of our algorithm to explore more design options. We show that read densities of 0.33 bits/base can be achieved, which is 4× larger than prior MSA-based decoders. We also compare RNA to DNA, and find that RNA has 85% as many error-free reads when compared to DNA. Availability and implementationSource code for our soft decoder and data used to generate figures is available publicly in the Github repository https://github.com/dna-storage/hedges-soft-decoder (10.5281/zenodo.11454877). All raw FAST5/FASTQ data are available at 10.5281/zenodo.11985454 and 10.5281/zenodo.12014515. 
    more » « less
  5. The ecological response of benthic foraminifera to bioavailable Potentially Toxic Elements (PTEs) was evaluated in Lagos Lagoon (Nigeria). We sampled and analyzed PTEs across Lagos Lagoon with the aim to investigate the extent of contaminated sediments, to document their distribution, and to explore the relationship between PTE concentration and the spatial distribution, composition, abundance, and species richness of benthic foraminifera biotas. PTE’s recordings showed a wide range reflecting a diffuse contamination, where Contamination and Enrichment Factor suggest low to extremely polluted sediments. Findings of a previous survey of the benthic foraminifera inhabiting Lagos Lagoon revealed diverse assemblages of benthic taxa, species-specific distribution patterns, gradients of species richness and abundance, and a disjunct distribution of agglutinated and hyaline-perforate/porcelaneous taxa along a pronounced salinity gradient. Correlation matrix analysis shows that except for Selenium, all PTE total concentrations positively correlate with mud and Total Organic Carbon (TOC) and two of the most abundant agglutinated taxa, Ammotium salsum, and Trochammina sp. 1. Moreover, both species display significant positive correlations with CrF4-CoF2-F3-F4-total-CuF4-total-NiF3-F4-total-AlF4-total-FeF3-F4-total-ZnF3-F4-total. On the other hand, both foraminifers correlate negatively with PbF4-SeF3-Setotal. The overall significant positive correlation of these PTEs suggests that they behave as micronutrients when complexed with organic matter. No significant positive correlation with none of the PTEs in any fraction was found for neither species richness nor for the most abundant hyaline perforate species (Ammonia aoteana). Some PTE fractions were found to correlate either positively or negatively with individual species, suggesting that they function as either micronutrients and/or stressors. The resulting Contamination Factor of the PTE total concentrations shows that only a few sample sites can be classified as “moderately” polluted for chromium, zinc, and copper and that all sampled sites are classified as “highly polluted” for selenium. The highest concentrations for Cr, Cu, Ni, and Zn were found towards the industrialized western part, an area that is characterized by moderate to high diversity but low abundances. 
    more » « less