Abstract Setaria italica(foxtail millet), a founder crop of East Asian agriculture, is a model plant for C4 photosynthesis and developing approaches to adaptive breeding across multiple climates. Here we established theSetariapan-genome by assembling 110 representative genomes from a worldwide collection. The pan-genome is composed of 73,528 gene families, of which 23.8%, 42.9%, 29.4% and 3.9% are core, soft core, dispensable and private genes, respectively; 202,884 nonredundant structural variants were also detected. The characterization of pan-genomic variants suggests their importance during foxtail millet domestication and improvement, as exemplified by the identification of the yield geneSiGW3, where a 366-bp presence/absence promoter variant accompanies gene expression variation. We developed a graph-based genome and performed large-scale genetic studies for 68 traits across 13 environments, identifying potential genes for millet improvement at different geographic sites. These can be used in marker-assisted breeding, genomic selection and genome editing to accelerate crop improvement under different climatic conditions.
more »
« less
Digital Microbe: a genome-informed data integration framework for team science on emerging model organisms
Abstract The remarkable pace of genomic data generation is rapidly transforming our understanding of life at the micron scale. Yet this data stream also creates challenges for team science. A single microbe can have multiple versions of genome architecture, functional gene annotations, and gene identifiers; additionally, the lack of mechanisms for collating and preserving advances in this knowledge raises barriers to community coalescence around shared datasets. “Digital Microbes” are frameworks for interoperable and reproducible collaborative science through open source, community-curated data packages built on a (pan)genomic foundation. Housed within an integrative software environment, Digital Microbes ensure real-time alignment of research efforts for collaborative teams and facilitate novel scientific insights as new layers of data are added. Here we describe two Digital Microbes: 1) the heterotrophic marine bacteriumRuegeria pomeroyiDSS-3 with > 100 transcriptomic datasets from lab and field studies, and 2) the pangenome of the cosmopolitan marine heterotrophAlteromonascontaining 339 genomes. Examples demonstrate how an integrated framework collating public (pan)genome-informed data can generate novel and reproducible findings.
more »
« less
- Award ID(s):
- 2019589
- PAR ID:
- 10539840
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Scientific Data
- Volume:
- 11
- Issue:
- 1
- ISSN:
- 2052-4463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract BackgroundAdvances in microbiome science are being driven in large part due to our ability to study and infer microbial ecology from genomes reconstructed from mixed microbial communities using metagenomics and single-cell genomics. Such omics-based techniques allow us to read genomic blueprints of microorganisms, decipher their functional capacities and activities, and reconstruct their roles in biogeochemical processes. Currently available tools for analyses of genomic data can annotate and depict metabolic functions to some extent; however, no standardized approaches are currently available for the comprehensive characterization of metabolic predictions, metabolite exchanges, microbial interactions, and microbial contributions to biogeochemical cycling. ResultsWe present METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable software to advance microbial ecology and biogeochemistry studies using genomes at the resolution of individual organisms and/or microbial communities. The genome-scale workflow includes annotation of microbial genomes, motif validation of biochemically validated conserved protein residues, metabolic pathway analyses, and calculation of contributions to individual biogeochemical transformations and cycles. The community-scale workflow supplements genome-scale analyses with determination of genome abundance in the microbiome, potential microbial metabolic handoffs and metabolite exchange, reconstruction of functional networks, and determination of microbial contributions to biogeochemical cycles. METABOLIC can take input genomes from isolates, metagenome-assembled genomes, or single-cell genomes. Results are presented in the form of tables for metabolism and a variety of visualizations including biogeochemical cycling potential, representation of sequential metabolic transformations, community-scale microbial functional networks using a newly defined metric “MW-score” (metabolic weight score), and metabolic Sankey diagrams. METABOLIC takes ~ 3 h with 40 CPU threads to process ~ 100 genomes and corresponding metagenomic reads within which the most compute-demanding part of hmmsearch takes ~ 45 min, while it takes ~ 5 h to complete hmmsearch for ~ 3600 genomes. Tests of accuracy, robustness, and consistency suggest METABOLIC provides better performance compared to other software and online servers. To highlight the utility and versatility of METABOLIC, we demonstrate its capabilities on diverse metagenomic datasets from the marine subsurface, terrestrial subsurface, meadow soil, deep sea, freshwater lakes, wastewater, and the human gut. ConclusionMETABOLIC enables the consistent and reproducible study of microbial community ecology and biogeochemistry using a foundation of genome-informed microbial metabolism, and will advance the integration of uncultivated organisms into metabolic and biogeochemical models. METABOLIC is written in Perl and R and is freely available under GPLv3 athttps://github.com/AnantharamanLab/METABOLIC.more » « less
-
Abstract Global change is impacting biodiversity across all habitats on earth. New selection pressures from changing climatic conditions and other anthropogenic activities are creating heterogeneous ecological and evolutionary responses across many species' geographic ranges. Yet we currently lack standardised and reproducible tools to effectively predict the resulting patterns in species vulnerability to declines or range changes.We developed an informatic toolbox that integrates ecological, environmental and genomic data and analyses (environmental dissimilarity, species distribution models, landscape connectivity, neutral and adaptive genetic diversity, genotype‐environment associations and genomic offset) to estimate population vulnerability. In our toolbox, functions and data structures are coded in a standardised way so that it is applicable to any species or geographic region where appropriate data are available, for example individual or population sampling and genomic datasets (e.g. RAD‐seq, ddRAD‐seq, whole genome sequencing data) representing environmental variation across the species geographic range.To demonstrate multi‐species applicability, we apply our toolbox to three georeferenced genomic datasets for co‐occurring East African spiny reed frogs (Afrixalus fornasini, A. delicatusandA. sylvaticus) to predict their population vulnerability, as well as demonstrating that range loss projections based on adaptive variation can be accurately reproduced from a previous study using data for two European bat species (Myotis escaleraiandM. crypticus).Our framework sets the stage for large scale, multi‐species genomic datasets to be leveraged in a novel climate change vulnerability framework to quantify intraspecific differences in genetic diversity, local adaptation, range shifts and population vulnerability based on exposure, sensitivity and landscape barriers.more » « less
-
Faust, Karoline (Ed.)ABSTRACT Bacillus subtilisis an important industrial and environmental microorganism known to occupy many niches and produce many compounds of interest. Although it is one of the best-studied organisms, much of this focus including the reconstruction of genome-scale metabolic models has been placed on a few key laboratory strains. Here, we substantially expand these prior models to pan-genome-scale, representing 481 genomes ofB. subtiliswith 2,315 orthologous gene clusters, 1,874 metabolites, and 2,239 reactions. Furthermore, we incorporate data from carbon utilization experiments for eight strains to refine and validate its metabolic predictions. This comprehensive pan-genome model enables the assessment of strain-to-strain differences related to nutrient utilization, fermentation outputs, robustness, and other metabolic aspects. Using the model and phenotypic predictions, we divideB. subtilisstrains into five groups with distinct patterns of behavior that correlate across these features. The pan-genome model offers deep insights intoB. subtilis’metabolism as it varies across environments and provides an understanding as to how different strains have adapted to dynamic habitats. IMPORTANCEAs the volume of genomic data and computational power have increased, so has the number of genome-scale metabolic models. These models encapsulate the totality of metabolic functions for a given organism.Bacillus subtilisstrain 168 is one of the first bacteria for which a metabolic network was reconstructed. Since then, several updated reconstructions have been generated for this model microorganism. Here, we expand the metabolic model for a single strain into a pan-genome-scale model, which consists of individual models for 481B. subtilisstrains. By evaluating differences between these strains, we identified five distinct groups of strains, allowing for the rapid classification of any particular strain. Furthermore, this classification into five groups aids the rapid identification of suitable strains for any application.more » « less
-
Abstract Genomic methods are becoming increasingly valuable and established in ecological research, particularly in nonmodel species. Supporting their progress and adoption requires investment in resources that promote (i) reproducibility of genomic analyses, (ii) accessibility of learning tools and (iii) keeping pace with rapidly developing methods and principles.We introduce marineomics.io, an open‐source, living document to disseminate tutorials, reproducibility tools and best principles for ecological genomic research in marine and nonmodel systems.The website's existing content spans population and functional genomics, including current recommendations for whole‐genome sequencing, RAD‐seq, Pool‐seq and RNA‐seq. With the goal to facilitate the development of new, similar resources, we describe our process for aggregating and synthesizing methodological principles from the ecological genomics community to inform website content. We also detail steps for authorship and submission of new website content, as well as protocols for providing feedback and topic requests from the community.These web resources were constructed with guidance for doing rigorous, reproducible science. Collaboration and contributions to the website are encouraged from scientists of all skill sets and levels of expertise.more » « less
An official website of the United States government
