skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Digital Microbe: a genome-informed data integration framework for team science on emerging model organisms
Abstract The remarkable pace of genomic data generation is rapidly transforming our understanding of life at the micron scale. Yet this data stream also creates challenges for team science. A single microbe can have multiple versions of genome architecture, functional gene annotations, and gene identifiers; additionally, the lack of mechanisms for collating and preserving advances in this knowledge raises barriers to community coalescence around shared datasets. “Digital Microbes” are frameworks for interoperable and reproducible collaborative science through open source, community-curated data packages built on a (pan)genomic foundation. Housed within an integrative software environment, Digital Microbes ensure real-time alignment of research efforts for collaborative teams and facilitate novel scientific insights as new layers of data are added. Here we describe two Digital Microbes: 1) the heterotrophic marine bacteriumRuegeria pomeroyiDSS-3 with > 100 transcriptomic datasets from lab and field studies, and 2) the pangenome of the cosmopolitan marine heterotrophAlteromonascontaining 339 genomes. Examples demonstrate how an integrated framework collating public (pan)genome-informed data can generate novel and reproducible findings.  more » « less
Award ID(s):
2019589
PAR ID:
10559564
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Portfolio
Date Published:
Journal Name:
Scientific Data
Volume:
11
Issue:
1
ISSN:
2052-4463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Setaria italica(foxtail millet), a founder crop of East Asian agriculture, is a model plant for C4 photosynthesis and developing approaches to adaptive breeding across multiple climates. Here we established theSetariapan-genome by assembling 110 representative genomes from a worldwide collection. The pan-genome is composed of 73,528 gene families, of which 23.8%, 42.9%, 29.4% and 3.9% are core, soft core, dispensable and private genes, respectively; 202,884 nonredundant structural variants were also detected. The characterization of pan-genomic variants suggests their importance during foxtail millet domestication and improvement, as exemplified by the identification of the yield geneSiGW3, where a 366-bp presence/absence promoter variant accompanies gene expression variation. We developed a graph-based genome and performed large-scale genetic studies for 68 traits across 13 environments, identifying potential genes for millet improvement at different geographic sites. These can be used in marker-assisted breeding, genomic selection and genome editing to accelerate crop improvement under different climatic conditions. 
    more » « less
  2. Abstract Global change is impacting biodiversity across all habitats on earth. New selection pressures from changing climatic conditions and other anthropogenic activities are creating heterogeneous ecological and evolutionary responses across many species' geographic ranges. Yet we currently lack standardised and reproducible tools to effectively predict the resulting patterns in species vulnerability to declines or range changes.We developed an informatic toolbox that integrates ecological, environmental and genomic data and analyses (environmental dissimilarity, species distribution models, landscape connectivity, neutral and adaptive genetic diversity, genotype‐environment associations and genomic offset) to estimate population vulnerability. In our toolbox, functions and data structures are coded in a standardised way so that it is applicable to any species or geographic region where appropriate data are available, for example individual or population sampling and genomic datasets (e.g. RAD‐seq, ddRAD‐seq, whole genome sequencing data) representing environmental variation across the species geographic range.To demonstrate multi‐species applicability, we apply our toolbox to three georeferenced genomic datasets for co‐occurring East African spiny reed frogs (Afrixalus fornasini, A. delicatusandA. sylvaticus) to predict their population vulnerability, as well as demonstrating that range loss projections based on adaptive variation can be accurately reproduced from a previous study using data for two European bat species (Myotis escaleraiandM. crypticus).Our framework sets the stage for large scale, multi‐species genomic datasets to be leveraged in a novel climate change vulnerability framework to quantify intraspecific differences in genetic diversity, local adaptation, range shifts and population vulnerability based on exposure, sensitivity and landscape barriers. 
    more » « less
  3. Hom, Erik_F Y (Ed.)
    ABSTRACT Viruses that infect phytoplankton are an integral part of marine ecosystems, but the vast majority of viral diversity remains uncultivated. Here, we introduce four near-complete genomic assemblies of viruses that infect the widespread marine picoeukaryoteMicromonas commoda, doubling the number of reported genomes ofMicromonasdsDNA viruses. All host and virus isolates were obtained from tropical waters of the North Pacific, a first for viruses infecting green algae in the order Mamiellales. Genome length of the new isolates ranges from 205 to 212 kb, and phylogenetic analysis shows that all four are members of the genusPrasinovirus. Three of the viruses form a clade that is adjacent to previously sequencedMicromonasviruses, while the fourth virus is relatively divergent from previously sequenced prasinoviruses. We identified 61 putative genes not previously found in prasinovirus isolates, including a phosphate transporter and a potential apoptosis inhibitor novel to marine viruses. Forty-eight genes in the new viruses are also found in host genome(s) and may have been acquired through horizontal gene transfer. By analyzing the coding sequences of all published prasinoviruses, we found that ~25% of prasinovirus gene content is significantly correlated with host genus identity (i.e.,Micromonas,Ostreococcus, orBathycoccus), and the functions of these genes suggest that much of the viral life cycle is differentially adapted to the three host genera. Mapping of metagenomic reads from global survey data indicates that one of the new isolates, McV-SA1, is relatively common in multiple ocean basins.IMPORTANCEThe genomes analyzed here represent the first viruses from the tropical North Pacific that infect the abundant phytoplankton order Mamiellales. Comparing isolates from the same location demonstrates high genomic diversity among viruses that co-occur and presumably compete for hosts. Comparing all published prasinovirus genomes highlights gene functions that are likely associated with adaptation to different host genera. Metagenomic data indicate these viruses are globally distributed, and one of the novel isolates may be among the most abundant marine viruses. 
    more » « less
  4. AbstractThe animal gut microbiome is a complex system of diverse, predominantly anaerobic microbiota with secondary metabolite potential. These metabolites likely play roles in shaping microbial community membership and influencing animal host health. As such, novel secondary metabolites from gut microbes hold significant biotechnological and therapeutic interest. Despite their potential, gut microbes are largely untapped for secondary metabolites, with gut fungi and obligate anaerobes being particularly under-explored. To advance understanding of these metabolites, culture-based and (meta)genome-based approaches are essential. Culture-based approaches enable isolation, cultivation, and direct study of gut microbes, and (meta)genome-based approaches utilizeinsilicotools to mine biosynthetic gene clusters (BGCs) from microbes that have not yet been successfully cultured. In this mini-review, we highlight recent innovations in this area, including anaerobic biofoundries like ExFAB, the NSF BioFoundry for Extreme & Exceptional Fungi, Archaea, and Bacteria. These facilities enable high-throughput workflows to study oxygen-sensitive microbes and biosynthetic machinery. Such recent advances promise to improve our understanding of the gut microbiome and its secondary metabolism. Key points• Gut microbial secondary metabolites have therapeutic and biotechnological potential• Culture- and (meta)genome-based workflows drive gut anaerobe metabolite discovery• Anaerobic biofoundries enable high-throughput workflows for metabolite discovery Graphical abstract 
    more » « less
  5. Faust, Karoline (Ed.)
    ABSTRACT Bacillus subtilisis an important industrial and environmental microorganism known to occupy many niches and produce many compounds of interest. Although it is one of the best-studied organisms, much of this focus including the reconstruction of genome-scale metabolic models has been placed on a few key laboratory strains. Here, we substantially expand these prior models to pan-genome-scale, representing 481 genomes ofB. subtiliswith 2,315 orthologous gene clusters, 1,874 metabolites, and 2,239 reactions. Furthermore, we incorporate data from carbon utilization experiments for eight strains to refine and validate its metabolic predictions. This comprehensive pan-genome model enables the assessment of strain-to-strain differences related to nutrient utilization, fermentation outputs, robustness, and other metabolic aspects. Using the model and phenotypic predictions, we divideB. subtilisstrains into five groups with distinct patterns of behavior that correlate across these features. The pan-genome model offers deep insights intoB. subtilis’metabolism as it varies across environments and provides an understanding as to how different strains have adapted to dynamic habitats. IMPORTANCEAs the volume of genomic data and computational power have increased, so has the number of genome-scale metabolic models. These models encapsulate the totality of metabolic functions for a given organism.Bacillus subtilisstrain 168 is one of the first bacteria for which a metabolic network was reconstructed. Since then, several updated reconstructions have been generated for this model microorganism. Here, we expand the metabolic model for a single strain into a pan-genome-scale model, which consists of individual models for 481B. subtilisstrains. By evaluating differences between these strains, we identified five distinct groups of strains, allowing for the rapid classification of any particular strain. Furthermore, this classification into five groups aids the rapid identification of suitable strains for any application. 
    more » « less