skip to main content


Title: Optimizing UniFrac with OpenACC Yields Greater Than One Thousand Times Speed Increase
ABSTRACT UniFrac is an important tool in microbiome research that is used for phylogenetically comparing microbiome profiles to one another (beta diversity). Striped UniFrac recently added the ability to split the problem into many independent subproblems, exhibiting nearly linear scaling but suffering from memory contention. Here, we adapt UniFrac to graphics processing units using OpenACC, enabling greater than 1,000× computational improvement, and apply it to 307,237 samples, the largest 16S rRNA V4 uniformly preprocessed microbiome data set analyzed to date. IMPORTANCE UniFrac is an important tool in microbiome research that is used for phylogenetically comparing microbiome profiles to one another. Here, we adapt UniFrac to operate on graphics processing units, enabling a 1,000× computational improvement. To highlight this advance, we perform what may be the largest microbiome analysis to date, applying UniFrac to 307,237 16S rRNA V4 microbiome samples preprocessed with Deblur. These scaling improvements turn UniFrac into a real-time tool for common data sets and unlock new research questions as more microbiome data are collected.  more » « less
Award ID(s):
2038509
NSF-PAR ID:
10336375
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Greene, Casey S.
Date Published:
Journal Name:
mSystems
Volume:
7
Issue:
3
ISSN:
2379-5077
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Staley, Christopher (Ed.)
    The microbiota gut-brain-axis is a bidirectional circuit that links the neural, endocrine, and immunological systems with gut microbial communities. The gut microbiome plays significant roles in human mind and behavior, specifically pain perception, learning capacity, memory, and temperament. Studies have shown that disruptions in the gut microbiota have been associated with substance use disorders. The interplay of gut microbiota in substance abuse disorders has not been elucidated; however, postmortem microbiome profiles may produce promising avenues for future forensic investigations. The goal of the current study was to determine gut microbiome composition in substance abuse disorder cases using transverse colon tissues of 21 drug overdose versus 19 non-overdose-related cases. We hypothesized that postmortem samples of the same cause of death will reveal similar microbial taxonomic relationships. We compared microbial diversity profiles using amplicon-based sequencing of the 16S rRNA gene V4 hypervariable region. The results demonstrated that the microbial abundance in younger-aged cases were found to have significantly more operational taxonomic units than older cases. Using weighted UniFrac analysis, the influence of substances in overdose cases was found to be a significant factor in determining microbiome similarity. The results also revealed that samples of the same cause of death cluster together, showing a high degree of similarity between samples and a low degree of similarity among samples of different causes of death. In conclusion, our examination of human transverse colon microflora in decomposing remains extends emerging literature on postmortem microbial communities, which will ultimately contribute to advanced knowledge of human putrefaction. 
    more » « less
  2. Moreno-Hagelsieb, Gabriel (Ed.)
    Advances in the analysis of amplicon sequence datasets have introduced a methodological shift in how research teams investigate microbial biodiversity, away from sequence identity-based clustering (producing Operational Taxonomic Units, OTUs) to denoising methods (producing amplicon sequence variants, ASVs). While denoising methods have several inherent properties that make them desirable compared to clustering-based methods, questions remain as to the influence that these pipelines have on the ecological patterns being assessed, especially when compared to other methodological choices made when processing data (e.g. rarefaction) and computing diversity indices. We compared the respective influences of two widely used methods, namely DADA2 (a denoising method) vs. Mothur (a clustering method) on 16S rRNA gene amplicon datasets (hypervariable region v4), and compared such effects to the rarefaction of the community table and OTU identity threshold (97% vs. 99%) on the ecological signals detected. We used a dataset comprising freshwater invertebrate (three Unionidae species) gut and environmental (sediment, seston) communities sampled in six rivers in the southeastern USA. We ranked the respective effects of each methodological choice on alpha and beta diversity, and taxonomic composition. The choice of the pipeline significantly influenced alpha and beta diversities and changed the ecological signal detected, especially on presence/absence indices such as the richness index and unweighted Unifrac. Interestingly, the discrepancy between OTU and ASV-based diversity metrics could be attenuated by the use of rarefaction. The identification of major classes and genera also revealed significant discrepancies across pipelines. Compared to the pipeline’s effect, OTU threshold and rarefaction had a minimal impact on all measurements. 
    more » « less
  3. Abstract

    Sponges occur across diverse marine biomes and host internal microbial communities that can provide critical ecological functions. While strong patterns of host specificity have been observed consistently in sponge microbiomes, the precise ecological relationships between hosts and their symbiotic microbial communities remain to be fully delineated. In the current study, we investigate the relative roles of host population genetics and biogeography in structuring the microbial communities hosted by the excavating spongeCliona delitrix. A total of 53 samples, previously used to demarcate the population genetic structure ofC. delitrix,were selected from two locations in the Caribbean Sea and from eight locations across the reefs of Florida and the Bahamas. Microbial community diversity and composition were measured using Illumina‐based high‐throughput sequencing of the 16S rRNA V4 region and related to host population structure and geographic distribution. Most operational taxonomic units (OTUs) specific toCliona delitrixmicrobiomes were rare, while other OTUs were shared with congeneric hosts. Across a large regional scale (>1,000 km), geographic distance was associated with considerable variability of the sponge microbiome, suggesting a distance–decay relationship, but little impact over smaller spatial scales (<300 km) was observed. Host population structure had a moderate effect on the structure of these microbial communities, regardless of geographic distance. These results support the interplay between geographic, environmental, and host factors as forces determining the community structure of microbiomes associated withC. delitrix. Moreover, these data suggest that the mechanisms of host regulation can be observed at the population genetic scale, prior to the onset of speciation.

     
    more » « less
  4. We introduce Operational Genomic Unit (OGU), a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent from taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldomly applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in one synthetic and two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome datasets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project dataset, and more accurate prediction of human age by the gut microbiomes in the Finnish population. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate OGU adoption in future metagenomics studies. Importance Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene amplicon sequencing for decoding the composition and structure of microbial communities. However, current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution compared to 16S rRNA amplicon sequence variant analysis. To solve these challenges, we introduce Operational Genomic Units (OGUs), which are the individual reference genomes derived from sequence alignment results, without further assigning them taxonomy. The OGU method advances current read-based metagenomics in two dimensions: (i) providing maximal resolution of community composition while (ii) permitting use of phylogeny-aware tools. Our analysis of real-world datasets shows several advantages over currently adopted metagenomic analysis methods and the finest-grained 16S rRNA analysis methods in predicting biological traits. We thus propose the adoption of OGU as standard practice in metagenomic studies. 
    more » « less
  5. Summary

    Universal primers for SSU rRNA genes allow profiling of natural communities by simultaneously amplifying templates from Bacteria, Archaea, and Eukaryota in a single PCR reaction. Despite the potential to show relative abundance for all rRNA genes, universal primers are rarely used, due to various concerns including amplicon length variation and its effect on bioinformatic pipelines. We thus developed 16S and 18S rRNA mock communities and a bioinformatic pipeline to validate this approach. Using these mocks, we show that universal primers (515Y/926R) outperformed eukaryote‐specific V4 primers in observed versus expected abundance correlations (slope = 0.88 vs. 0.67–0.79), and mock community members with single mismatches to the primer were strongly underestimated (threefold to eightfold). Using field samples, both primers yielded similar 18S beta‐diversity patterns (Mantel test,p < 0.001) but differences in relative proportions of many rarer taxa. To test for length biases, we mixed mock communities (16S + 18S) before PCR and found a twofold underestimation of 18S sequences due to sequencing bias. Correcting for the twofold underestimation, we estimate that, in Southern California field samples (1.2–80 μm), there were averages of 35% 18S, 28% chloroplast 16S, and 37% prokaryote 16S rRNA genes. These data demonstrate the potential for universal primers to generate comprehensive microbiome profiles.

     
    more » « less