skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Finer Metagenomic Reconstruction via Biodiversity Optimization
When analyzing communities of microorganisms from their sequenced DNA, an important task is taxonomic profiling: enumerating the presence and relative abundance of all organisms, or merely of all taxa, contained in the sample. This task can be tackled via compressive-sensing-based approaches, which favor communities featuring the fewest organisms among those consistent with the observed DNA data. Despite their successes, these parsimonious approaches sometimes conflict with biological realism by overlooking organism similarities. Here, we leverage a recently developed notion of biological diversity that simultaneously accounts for organism similarities and retains the optimization strategy underlying compressive-sensing-based approaches. We demonstrate that minimizing biological diversity still produces sparse taxonomic profiles and we experimentally validate superiority to existing compressive-sensing-based approaches. Despite showing that the objective function is almost never convex and often concave, generally yielding NP-hard problems, we exhibit ways of representing organism similarities for which minimizing diversity can be performed via a sequence of linear programs guaranteed to decrease diversity. Better yet, when biological similarity is quantified by k-mer co-occurrence (a popular notion in bioinformatics), minimizing diversity actually reduces to one linear program that can utilize multiple k-mer sizes to enhance performance. In proof-of-concept experiments, we verify that the latter procedure can lead to significant gains when taxonomically profiling a metagenomic sample, both in terms of reconstruction accuracy and computational performance.  more » « less
Award ID(s):
2029170
PAR ID:
10247255
Author(s) / Creator(s):
;
Editor(s):
Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H.
Date Published:
Journal Name:
Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Volume:
33
Page Range / eLocation ID:
9169-9179
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Synthetic biology conceptualises biological complexity as a network of biological parts, devices and systems with predetermined functionalities, and has had a revolutionary impact on fundamental and applied research. With the unprecedented ability to synthesise and transfer any DNA and RNA across organisms, the scope of synthetic biology is expanding and being recreated in previously unimaginable ways. The field has matured to a level where highly complex networks, such as artificial communities of synthetic organisms can be constructed. In parallel, computational biology became an integral part of biological studies, with computational models aiding the unravelling of the escalating complexity and emerging properties of biological phenomena. However, there is still a vast untapped potential for the complete integration of modelling into the synthetic design process, presenting exciting opportunities for scientific advancements. Here, we first highlight the most recent advances in computer-aided design of microbial communities. Next, we propose that such a design can benefit from an organism-free modular modelling approach that places its emphasis on modules of organismal function towards the design of multi-species communities. We argue for a shift in perspective from single organism-centred approaches to emphasising the functional contributions of organisms within the community. By assembling synthetic biological systems using modular computational models with mathematical descriptions of parts and circuits, we can tailor organisms to fulfil specific functional roles within the community. This approach aligns with synthetic biology strategies and presents exciting possibilities for the design of artificial communities. 
    more » « less
  2. Abstract Corals and sponges harbor diverse microbial communities that are integral to the functioning of the host. While the taxonomic diversity of their microbiomes has been well-established for corals and sponges, their functional roles are less well-understood. It is unclear if the similarities of symbiosis in an invertebrate host would result in functionally similar microbiomes, or if differences in host phylogeny and environmentally driven microhabitats within each host would shape functionally distinct communities. Here we addressed this question, using metatranscriptomic and 16S rRNA gene profiling techniques to compare the microbiomes of two host organisms from different phyla. Our results indicate functional similarity in carbon, nitrogen, and sulfur assimilation, and aerobic nitrogen cycling. Additionally, there were few statistical differences in pathway coverage or abundance between the two hosts. For example, we observed higher coverage of phosphonate and siderophore metabolic pathways in the star coral,Montastraea cavernosa, while there was higher coverage of chloroalkane metabolism in the giant barrel sponge,Xestospongia muta. Higher abundance of genes associated with carbon fixation pathways was also observed inM. cavernosa, while inX. mutathere was higher abundance of fatty acid metabolic pathways. Metagenomic predictions based on 16S rRNA gene profiling analysis were similar, and there was high correlation between the metatranscriptome and metagenome predictions for both hosts. Our results highlight several metabolic pathways that exhibit functional similarity in these coral and sponge microbiomes despite the taxonomic differences between the two microbiomes, as well as potential specialization of some microbially based metabolism within each host. 
    more » « less
  3. David, Lawrence A. (Ed.)
    ABSTRACT Shotgun metagenomic sequencing has transformed our understanding of microbial community ecology. However, preparing metagenomic libraries for high-throughput DNA sequencing remains a costly, labor-intensive, and time-consuming procedure, which in turn limits the utility of metagenomes. Several library preparation procedures have recently been developed to offset these costs, but it is unclear how these newer procedures compare to current standards in the field. In particular, it is not clear if all such procedures perform equally well across different types of microbial communities or if features of the biological samples being processed (e.g., DNA amount) impact the accuracy of the approach. To address these questions, we assessed how five different shotgun DNA sequence library preparation methods, including the commonly used Nextera Flex kit, perform when applied to metagenomic DNA. We measured each method’s ability to produce metagenomic data that accurately represent the underlying taxonomic and genetic diversity of the community. We performed these analyses across a range of microbial community types (e.g., soil, coral associated, and mouse gut associated) and input DNA amounts. We find that the type of community and amount of input DNA influence each method’s performance, indicating that careful consideration may be needed when selecting between methods, especially for low-complexity communities. However, the cost-effective preparation methods that we assessed are generally comparable to the current gold-standard Nextera DNA Flex kit for high-complexity communities. Overall, the results from this analysis will help expand and even facilitate access to metagenomic approaches in future studies. IMPORTANCE Metagenomic library preparation methods and sequencing technologies continue to advance rapidly, allowing researchers to characterize microbial communities in previously underexplored environmental samples and systems. However, widely accepted standardized library preparation methods can be cost-prohibitive. Newly available approaches may be less expensive, but their efficacy in comparison to standardized methods remains unknown. In this study, we compared five different metagenomic library preparation methods. We evaluated each method across a range of microbial communities varying in complexity and quantity of input DNA. Our findings demonstrate the importance of considering sample properties, including community type, composition, and DNA amount, when choosing the most appropriate metagenomic library preparation method. 
    more » « less
  4. Abstract BackgroundIn light of the current biodiversity crisis, DNA barcoding is developing into an essential tool to quantify state shifts in global ecosystems. Current barcoding protocols often rely on short amplicon sequences, which yield accurate identification of biological entities in a community but provide limited phylogenetic resolution across broad taxonomic scales. However, the phylogenetic structure of communities is an essential component of biodiversity. Consequently, a barcoding approach is required that unites robust taxonomic assignment power and high phylogenetic utility. A possible solution is offered by sequencing long ribosomal DNA (rDNA) amplicons on the MinION platform (Oxford Nanopore Technologies). FindingsUsing a dataset of various animal and plant species, with a focus on arthropods, we assemble a pipeline for long rDNA barcode analysis and introduce a new software (MiniBar) to demultiplex dual indexed Nanopore reads. We find excellent phylogenetic and taxonomic resolution offered by long rDNA sequences across broad taxonomic scales. We highlight the simplicity of our approach by field barcoding with a miniaturized, mobile laboratory in a remote rainforest. We also test the utility of long rDNA amplicons for analysis of community diversity through metabarcoding and find that they recover highly skewed diversity estimates. ConclusionsSequencing dual indexed, long rDNA amplicons on the MinION platform is a straightforward, cost-effective, portable, and universal approach for eukaryote DNA barcoding. Although bulk community analyses using long-amplicon approaches may introduce biases, the long rDNA amplicons approach signifies a powerful tool for enabling the accurate recovery of taxonomic and phylogenetic diversity across biological communities. 
    more » « less
  5. Abstract Efforts to catalog global biodiversity have often focused on aboveground taxonomic diversity, with limited consideration of belowground communities. However, diversity aboveground may influence the diversity of belowground communities and vice versa. In addition to taxonomic diversity, the structural diversity of plant communities may be related to the diversity of soil bacterial and fungal communities, which drive important ecosystem processes but are difficult to characterize across broad spatial scales. In forests, canopy structural diversity may influence soil microorganisms through its effects on ecosystem productivity and root architecture, and via associations between canopy structure, stand age, and species richness. Given that structural diversity is one of the few types of diversity that can be readily measured remotely (e.g., using light detection and ranging—LiDAR), establishing links between structural and microbial diversity could facilitate the detection of belowground biodiversity hotspots. We investigated the potential for using remotely sensed information about forest structural diversity as a predictor of soil microbial community richness and composition. We calculated LiDAR‐derived metrics of structural diversity as well as a suite of stand and soil properties from 38 forested plots across the central hardwoods region of Indiana, USA, to test whether forest canopy structure is linked with the community richness and diversity of four key soil microbial groups: bacteria, fungi, arbuscular mycorrhizal (AM) fungi, and ectomycorrhizal (EM) fungi. We found that the density of canopy vegetation is positively associated with the taxonomic richness (alpha diversity) of EM fungi, independent of changes in plant taxonomic richness. Further, structural diversity metrics were significantly correlated with the overall community composition of bacteria, EM, and total fungal communities. However, soil properties were the strongest predictors of variation in the taxonomic richness and community composition of microbial communities in comparison with structural diversity and tree species diversity. As remote sensing tools and algorithms are rapidly advancing, these results may have important implications for the use of remote sensing of vegetation structural diversity for management and restoration practices aimed at preserving belowground biodiversity. 
    more » « less