skip to main content

Title: The ModelSEED Biochemistry Database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes
Abstract For over 10 years, ModelSEED has been a primary resource for the construction of draft genome-scale metabolic models based on annotated microbial or plant genomes. Now being released, the biochemistry database serves as the foundation of biochemical data underlying ModelSEED and KBase. The biochemistry database embodies several properties that, taken together, distinguish it from other published biochemistry resources by: (i) including compartmentalization, transport reactions, charged molecules and proton balancing on reactions; (ii) being extensible by the user community, with all data stored in GitHub; and (iii) design as a biochemical ‘Rosetta Stone’ to facilitate comparison and integration of annotations from many different tools and databases. The database was constructed by combining chemical data from many resources, applying standard transformations, identifying redundancies and computing thermodynamic properties. The ModelSEED biochemistry is continually tested using flux balance analysis to ensure the biochemical network is modeling-ready and capable of simulating diverse phenotypes. Ontologies can be designed to aid in comparing and reconciling metabolic reconstructions that differ in how they represent various metabolic pathways. ModelSEED now includes 33,978 compounds and 36,645 reactions, available as a set of extensible files on GitHub, and available to search at and KBase.
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; « less
Award ID(s):
Publication Date:
Journal Name:
Nucleic Acids Research
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Summary

    Although advances in untargeted metabolomics have made it possible to gather data on thousands of cellular metabolites in parallel, identification of novel metabolites from these datasets remains challenging. To address this need, Metabolic in silico Network Expansions (MINEs) were developed. A MINE is an expansion of known biochemistry which can be used as a list of potential structures for unannotated metabolomics peaks. Here, we present MINE 2.0, which utilizes a new set of biochemical transformation rules that covers 93% of MetaCyc reactions (compared to 25% in MINE 1.0). This results in a 17-fold increase in database size and a 40% increase in MINE database compounds matching unannotated peaks from an untargeted metabolomics dataset. MINE 2.0 is thus a significant improvement to this community resource.

    Availability and implementation

    The MINE 2.0 website can be accessed at The MINE 2.0 web API documentation can be accessed at The data and code underlying this article are available in the MINE-2.0-Paper repository at MINE 2.0 source code can be accessed at (MINE construction), (backend web API) and (web app).

    Supplementary information

    Supplementary data are available at Bioinformatics online.

  2. Abstract Background

    Microbiomes are now recognized as the main drivers of ecosystem function ranging from the oceans and soils to humans and bioreactors. However, a grand challenge in microbiome science is to characterize and quantify the chemical currencies of organic matter (i.e., metabolites) that microbes respond to and alter. Critical to this has been the development of Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), which has drastically increased molecular characterization of complex organic matter samples, but challenges users with hundreds of millions of data points where readily available, user-friendly, and customizable software tools are lacking.


    Here, we build on years of analytical experience with diverse sample types to develop MetaboDirect, an open-source, command-line-based pipeline for the analysis (e.g., chemodiversity analysis, multivariate statistics), visualization (e.g., Van Krevelen diagrams, elemental and molecular class composition plots), and presentation of direct injection high-resolution FT-ICR MS data sets after molecular formula assignment has been performed. When compared to other available FT-ICR MS software, MetaboDirect is superior in that it requires a single line of code to launch a fully automated framework for the generation and visualization of a wide range of plots, with minimal coding experience required. Among the tools evaluated, MetaboDirect is alsomore »uniquely able to automatically generate biochemical transformation networks (ab initio) based on mass differences (mass difference network-based approach) that provide an experimental assessment of metabolite connections within a given sample or a complex metabolic system, thereby providing important information about the nature of the samples and the set of microbial reactions or pathways that gave rise to them. Finally, for more experienced users, MetaboDirect allows users to customize plots, outputs, and analyses.


    Application of MetaboDirect to FT-ICR MS-based metabolomic data sets from a marine phage-bacterial infection experiment and aSphagnumleachate microbiome incubation experiment showcase the exploration capabilities of the pipeline that will enable the research community to evaluate and interpret their data in greater depth and in less time. It will further advance our knowledge of how microbial communities influence and are influenced by the chemical makeup of the surrounding system. The source code and User’s guide of MetaboDirect are freely available through ( and (, respectively.

    « less
  3. Claesen, Jan (Ed.)
    ABSTRACT Trophic interactions between microbes are postulated to determine whether a host microbiome is healthy or causes predisposition to disease. Two abundant taxa, the Gram-negative heterotrophic bacterium Bacteroides thetaiotaomicron and the methanogenic archaeon Methanobrevibacter smithii , are proposed to have a synergistic metabolic relationship. Both organisms play vital roles in human gut health; B. thetaiotaomicron assists the host by fermenting dietary polysaccharides, whereas M. smithii consumes end-stage fermentation products and is hypothesized to relieve feedback inhibition of upstream microbes such as B. thetaiotaomicron . To study their metabolic interactions, we defined and optimized a coculture system and used software testing techniques to analyze growth under a range of conditions representing the nutrient environment of the host. We verify that B. thetaiotaomicron fermentation products are sufficient for M. smithii growth and that accumulation of fermentation products alters secretion of metabolites by B. thetaiotaomicron to benefit M. smithii . Studies suggest that B. thetaiotaomicron metabolic efficiency is greater in the absence of fermentation products or in the presence of M. smithii . Under certain conditions, B. thetaiotaomicron and M. smithii form interspecies granules consistent with behavior observed for syntrophic partnerships between microbes in soil or sediment enrichments and anaerobic digesters. Furthermore, whenmore »vitamin B 12 , hematin, and hydrogen gas are abundant, coculture growth is greater than the sum of growth observed for monocultures, suggesting that both organisms benefit from a synergistic mutual metabolic relationship. IMPORTANCE The human gut functions through a complex system of interactions between the host human tissue and the microbes which inhabit it. These diverse interactions are difficult to model or examine under controlled laboratory conditions. We studied the interactions between two dominant human gut microbes, B. thetaiotaomicron and M. smithii , using a seven-component culturing approach that allows the systematic examination of the metabolic complexity of this binary microbial system. By combining high-throughput methods with machine learning techniques, we were able to investigate the interactions between two dominant genera of the gut microbiome in a wide variety of environmental conditions. Our approach can be broadly applied to studying microbial interactions and may be extended to evaluate and curate computational metabolic models. The software tools developed for this study are available as user-friendly tutorials in the Department of Energy KBase.« less
  4. Elmer Ottis Wooton (1865–1945) was one of the most important early botanists to work in the Southwestern United States, contributing a great deal of natural history knowledge and botanical research on the flora of New Mexico that shaped many naturalists and scientists for generations. The extensive Wooton legacy includes herbarium collections that he and his famous student Paul Carpenter Standley (1884–1963), prolific botanist and explorer, used for the first Flora of New Mexi co by Wooten and Standley 1915 , along with resources covering botany and range management strategies for the northern Chihuahuan Desert, and an extensive, yet to be digitized, historical archive of correspondence, field notes, vegetation sketches, photographs, and lantern slides, all from his travels and field work in the region. Starting in 1890, the most complete set of Wooton’s herbarium collections were deposited in the NMC herbarium at New Mexico State University (NMSU), and his archives, now stored in a Campus library, have together been underutilized, offline resources. The goals of this ongoing project are to secure, preserve, and promote Wooton’s important historical resources, by fleshing out the botanical history of the region, raising appreciation of herbarium collections within the community, and emphasizing their unique role inmore »facilitating contemporary research aimed at addressing pressing scientific questions such as vegetation responses to global climate change. Students and the general public involved in this project are engaged through hands-on activities including cataloging, databasing and digitization of nearly 10,000 herbarium specimens and Wooton’s archives. These outputs, combined with contemporary data collection and computational biology techniques from an ecological perspective, are being used to document vegetation changes in iconic, climate-sensitive, high-elevation mountainous ecosystems present in southwestern New Mexico. In a later phase of the project, a variety of public audiences will participate through interactive online story maps and citizen science programs such as iNaturalist , Notes from Nature , and BioBlitz . Images of herbarium specimens will be shared via an online database and other relevant biodiversity portals ( Symbiota , iDigBio , JStor ) Community members reached through this project will be better-informed citizens, who may go on to become new stewards of natural history collections, with the potential to influence policies safeguarding the future of our planet’s biodiversity. More locally, the project will support the management of Organ Mountains Desert Peaks National Monument, which was established in 2014 to protect the area's human and environmental resources, and for which knowledge and data are currently limited.« less
  5. Optimization-based models have been used to predict cellular behavior for over 25 years. The constraints in these models are derived from genome annotations, measured macromolecular composition of cells, and by measuring the cell's growth rate and metabolism in different conditions. The cellular goal (the optimization problem that the cell is trying to solve) can be challenging to derive experimentally for many organisms, including human or mammalian cells, which have complex metabolic capabilities and are not well understood. Existing approaches to learning goals from data include (a) estimating a linear objective function, or (b) estimating linear constraints that model complex biochemical reactions and constrain the cell's operation. The latter approach is important because often the known reactions are not enough to explain observations; therefore, there is a need to extend automatically the model complexity by learning new reactions. However, this leads to nonconvex optimization problems, and existing tools cannot scale to realistically large metabolic models. Hence, constraint estimation is still used sparingly despite its benefits for modeling cell metabolism, which is important for developing novel antimicrobials against pathogens, discovering cancer drug targets, and producing value-added chemicals. Here, we develop the first approach to estimating constraint reactions from data that can scalemore »to realistically large metabolic models. Previous tools were used on problems having less than 75 reactions and 60 metabolites, which limits real-life-size applications. We perform extensive experiments using 75 large-scale metabolic network models for different organisms (including bacteria, yeasts, and mammals) and show that our algorithm can recover cellular constraint reactions. The recovered constraints enable accurate prediction of metabolic states in hundreds of growth environments not seen in training data, and we recover useful cellular goals even when some measurements are missing.« less