skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Plant Metabolic Network 16: expansion of underrepresented plant groups and experimentally supported enzyme data
Abstract The Plant Metabolic Network (PMN) is a free online database of plant metabolism available at https://plantcyc.org. The latest release, PMN 16, provides metabolic databases representing >1200 metabolic pathways, 1.3 million enzymes, >8000 metabolites, >10 000 reactions and >15 000 citations for 155 plant and green algal genomes, as well as a pan-plant reference database called PlantCyc. This release contains 29 additional genomes compared with PMN 15, including species listed by the African Orphan Crop Consortium and nonflowering plant species. Furthermore, 52 new enzymes with experimentally supported function information have been included in this release. The single-species databases contain a combination of experimental information from the literature and computationally predicted information obtained through PMN’s database generation pipeline for a single species, while PlantCyc contains only experimental information but for any species within Viridiplantae. PMN is a comprehensive resource for querying, visualizing, analyzing and interpreting omics data with metabolic knowledge. It also serves as a useful and interactive tool for teaching plant metabolism.  more » « less
Award ID(s):
2406533 2420360 2434687 2213983
PAR ID:
10555811
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
53
Issue:
D1
ISSN:
0305-1048
Format(s):
Medium: X Size: p. D1606-D1613
Size(s):
p. D1606-D1613
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary: Polyphenols are diverse and abundant carbon sources across ecosystems- having important roles in host-associated and terrestrial systems alike. However, the microbial genes encoding polyphenol metabolic enzymes are poorly represented in commonly used annotation databases, limiting widespread surveying of this metabolism. Here we present CAMPER, a tool that combines custom annotation searches with database-derived searches to both annotate and summarize polyphenol metabolism genes for a wide audience. With CAMPER, users will identify potential polyphenol-active genes and genomes to more broadly understand microbial carbon cycling in their datasets. Availability and Implementation: CAMPER is implemented in Python and is published under the GNU General Public License Version 3. It is available as both a standalone tool and as a database in DRAM v.1.5+. The source code and full documentation is available on GitHub at https://github.com/WrightonLabCSU/CAMPER. 
    more » « less
  2. Studies of enzymes in modern-day plants have documented the diversity of metabolic activities retained by species today but only provide limited insight into how those properties evolved. Ancestral sequence reconstruction (ASR) is an approach that provides statistical estimates of ancient plant enzyme sequences which can then be resurrected to test hypotheses about the evolution of catalytic activities and pathway assembly. Here, I review the insights that have been obtained using ASR to study plant metabolism and highlight important methodological aspects. Overall, studies of resurrected plant enzymes show that (i) exaptation is widespread such that even low or undetectable levels of ancestral activity with a substrate can later become the apparent primary activity of descendant enzymes, (ii) intramolecular epistasis may or may not limit evolutionary paths towards catalytic or substrate preference switches, and (iii) ancient pathway flux often differs from modern-day metabolic networks. These and other insights gained from ASR would not have been possible using only modern-day sequences. Future ASR studies characterizing entire ancestral metabolic networks as well as those that link ancient structures with enzymatic properties should continue to provide novel insights into how the chemical diversity of plants evolved. This article is part of the theme issue ‘The evolution of plant metabolism’. 
    more » « less
  3. This beginner’s guide is intended for plant biologists new to network analysis. Here, we introduce key concepts and resources for researchers interested in incorporating network analysis into research, either as a stand-alone component for generating hypotheses or as a framework for examining and visualizing experimental results. Network analysis provides a powerful tool to predict gene functions. Advances in and reduced costs for systems biology techniques, such as genomics, transcriptomics, and proteomics, have generated abundant -omics data for plants; however, the functional annotation of plant genes lags. Therefore, predictions from network analysis can be a starting point to annotate genes and ultimately elucidate genotype-phenotype relationships. In this paper, we introduce networks and compare network-building resources available for plant biologists, including databases and software for network analysis. We then compare four databases available for plant biologists in more detail: AraNet, GeneMANIA, ATTED-II, and STRING. AraNet, and GeneMANIA are functional association networks, ATTED-II is a gene coexpression database, and STRING is a protein-protein interaction database. AraNet, and ATTED-II are plant-specific databases that can analyze multiple plant species, whereas GeneMANIA builds networks for Arabidopsis thaliana and non-plant species, and STRING for multiple species. Finally, we compare the performance of the four databases in predicting known and probable gene functions of the A. thaliana Nuclear Factor-Y (NF-Y) genes. We conclude that plant biologists have an invaluable resource in these databases and discuss how users can decide which type of database to use depending on their research question. 
    more » « less
  4. Abstract Traits with intuitive names, a clear scope and explicit description are essential for all trait databases. The lack of unified, comprehensive, and machine-readable plant trait definitions limits the utility of trait databases, including reanalysis of data from a single database, or analyses that integrate data across multiple databases. Both can only occur if researchers are confident the trait concepts are consistent within and across sources. Here we describe the AusTraits Plant Dictionary (APD), a new data source of terms that extends the trait definitions included in a recent trait database, AusTraits. The development process of the APD included three steps: review and formalisation of the scope of each trait and the accompanying trait description; addition of trait metadata; and publication in both human and machine-readable forms. Trait definitions include keywords, references, and links to related trait concepts in other databases, enabling integration of AusTraits with other sources. The APD will both improve the usability of AusTraits and foster the integration of trait data across global and regional plant trait databases. 
    more » « less
  5. Abstract Genome search and/or classification typically involves finding the best-match database (reference) genomes and has become increasingly challenging due to the growing number of available database genomes and the fact that traditional methods do not scale well with large databases. By combining k-mer hashing-based probabilistic data structures (i.e. ProbMinHash, SuperMinHash, Densified MinHash and SetSketch) to estimate genomic distance, with a graph based nearest neighbor search algorithm (Hierarchical Navigable Small World Graphs, or HNSW), we created a new data structure and developed an associated computer program, GSearch, that is orders of magnitude faster than alternative tools while maintaining high accuracy and low memory usage. For example, GSearch can search 8000 query genomes against all available microbial or viral genomes for their best matches (n = ∼318 000 or ∼3 000 000, respectively) within a few minutes on a personal laptop, using ∼6 GB of memory (2.5 GB via SetSketch). Notably, GSearch has an O(log(N)) time complexity and will scale well with billions of genomes based on a database splitting strategy. Further, GSearch implements a three-step search strategy depending on the degree of novelty of the query genomes to maximize specificity and sensitivity. Therefore, GSearch solves a major bottleneck of microbiome studies that require genome search and/or classification. 
    more » « less