skip to main content


Title: Hierarchical Harmonization of Atom-Resolved Metabolic Reactions across Metabolic Databases
Metabolic models have been proven to be useful tools in system biology and have been successfully applied to various research fields in a wide range of organisms. A relatively complete metabolic network is a prerequisite for deriving reliable metabolic models. The first step in constructing metabolic network is to harmonize compounds and reactions across different metabolic databases. However, effectively integrating data from various sources still remains a big challenge. Incomplete and inconsistent atomistic details in compound representations across databases is a very important limiting factor. Here, we optimized a subgraph isomorphism detection algorithm to validate generic compound pairs. Moreover, we defined a set of harmonization relationship types between compounds to deal with inconsistent chemical details while successfully capturing atom-level characteristics, enabling a more complete enabling compound harmonization across metabolic databases. In total, 15,704 compound pairs across KEGG (Kyoto Encyclopedia of Genes and Genomes) and MetaCyc databases were detected. Furthermore, utilizing the classification of compound pairs and EC (Enzyme Commission) numbers of reactions, we established hierarchical relationships between metabolic reactions, enabling the harmonization of 3856 reaction pairs. In addition, we created and used atom-specific identifiers to evaluate the consistency of atom mappings within and between harmonized reactions, detecting some consistency issues between the reaction and compound descriptions in these metabolic databases.  more » « less
Award ID(s):
2020026
NSF-PAR ID:
10273292
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Metabolites
Volume:
11
Issue:
7
ISSN:
2218-1989
Page Range / eLocation ID:
431
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Metabolic flux analysis requires both a reliable metabolic model and reliable metabolic profiles in characterizing metabolic reprogramming. Advances in analytic methodologies enable production of high-quality metabolomics datasets capturing isotopic flux. However, useful metabolic models can be difficult to derive due to the lack of relatively complete atom-resolved metabolic networks for a variety of organisms, including human. Here, we developed a neighborhood-specific graph coloring method that creates unique identifiers for each atom in a compound facilitating construction of an atom-resolved metabolic network. What is more, this method is guaranteed to generate the same identifier for symmetric atoms, enabling automatic identification of possible additional mappings caused by molecular symmetry. Furthermore, a compound coloring identifier derived from the corresponding atom coloring identifiers can be used for compound harmonization across various metabolic network databases, which is an essential first step in network integration. With the compound coloring identifiers, 8865 correspondences between KEGG (Kyoto Encyclopedia of Genes and Genomes) and MetaCyc compounds are detected, with 5451 of them confirmed by other identifiers provided by the two databases. In addition, we found that the Enzyme Commission numbers (EC) of reactions can be used to validate possible correspondence pairs, with 1848 unconfirmed pairs validated by commonality in reaction ECs. Moreover, we were able to detect various issues and errors with compound representation in KEGG and MetaCyc databases by compound coloring identifiers, demonstrating the usefulness of this methodology for database curation. 
    more » « less
  2. Abstract Motivation

    A factory in a metabolic network specifies how to produce target molecules from source compounds through biochemical reactions, properly accounting for reaction stoichiometry to conserve or not deplete intermediate metabolites. While finding factories is a fundamental problem in systems biology, available methods do not consider the number of reactions used, nor address negative regulation.

    Methods

    We introduce the new problem of finding optimal factories that use the fewest reactions, for the first time incorporating both first- and second-order negative regulation. We model this problem with directed hypergraphs, prove it is NP-complete, solve it via mixed-integer linear programming, and accommodate second-order negative regulation by an iterative approach that generates next-best factories.

    Results

    This optimization-based approach is remarkably fast in practice, typically finding optimal factories in a few seconds, even for metabolic networks involving tens of thousands of reactions and metabolites, as demonstrated through comprehensive experiments across all instances from standard reaction databases.

    Availability and implementation

    Source code for an implementation of our new method for optimal factories with negative regulation in a new tool called Odinn, together with all datasets, is available free for non-commercial use at http://odinn.cs.arizona.edu.

     
    more » « less
  3. Abstract For over 10 years, ModelSEED has been a primary resource for the construction of draft genome-scale metabolic models based on annotated microbial or plant genomes. Now being released, the biochemistry database serves as the foundation of biochemical data underlying ModelSEED and KBase. The biochemistry database embodies several properties that, taken together, distinguish it from other published biochemistry resources by: (i) including compartmentalization, transport reactions, charged molecules and proton balancing on reactions; (ii) being extensible by the user community, with all data stored in GitHub; and (iii) design as a biochemical ‘Rosetta Stone’ to facilitate comparison and integration of annotations from many different tools and databases. The database was constructed by combining chemical data from many resources, applying standard transformations, identifying redundancies and computing thermodynamic properties. The ModelSEED biochemistry is continually tested using flux balance analysis to ensure the biochemical network is modeling-ready and capable of simulating diverse phenotypes. Ontologies can be designed to aid in comparing and reconciling metabolic reconstructions that differ in how they represent various metabolic pathways. ModelSEED now includes 33,978 compounds and 36,645 reactions, available as a set of extensible files on GitHub, and available to search at https://modelseed.org/biochem and KBase. 
    more » « less
  4. Gralnick, Jeffrey A. (Ed.)
    ABSTRACT Rhodopseudomonas palustris CGA009 is a Gram-negative purple nonsulfur bacterium that grows phototrophically by fixing carbon dioxide and nitrogen or chemotrophically by fixing or catabolizing a wide array of substrates, including lignin breakdown products for its carbon and fixing nitrogen for its nitrogen requirements. It can grow aerobically or anaerobically and can use light, inorganic, and organic compounds for energy production. Due to its ability to convert different carbon sources into useful products during anaerobic growth, this study reconstructed a metabolic and expression (ME) model of R. palustris to investigate its anaerobic-photoheterotrophic growth. Unlike metabolic (M) models, ME models include transcription and translation reactions along with macromolecules synthesis and couple these reactions with growth rate. This unique feature of the ME model led to nonlinear growth curve predictions, which matched closely with experimental growth rate data. At the theoretical maximum growth rate, the ME model suggested a diminishing rate of carbon fixation and predicted malate dehydrogenase and glycerol-3 phosphate dehydrogenase as alternate electron sinks. Moreover, the ME model also identified ferredoxin as a key regulator in distributing electrons between major redox balancing pathways. Because ME models include the turnover rate for each metabolic reaction, it was used to successfully capture experimentally observed temperature regulation of different nitrogenases. Overall, these unique features of the ME model demonstrated the influence of nitrogenases and rubiscos on R. palustris growth and predicted a key regulator in distributing electrons between major redox balancing pathways, thus establishing a platform for in silico investigation of R. palustris metabolism from a multiomics perspective. IMPORTANCE In this work, we reconstructed the first ME model for a purple nonsulfur bacterium (PNSB). Using the ME model, different aspects of R. palustris metabolism were examined. First, the ME model was used to analyze how reducing power entering the R. palustris cell through organic carbon sources gets partitioned into biomass, carbon dioxide fixation, and nitrogen fixation. Furthermore, the ME model predicted electron flux through ferredoxin as a major bottleneck in distributing electrons to nitrogenase enzymes. Next, the ME model characterized different nitrogenase enzymes and successfully recapitulated experimentally observed temperature regulations of those enzymes. Identifying the bottleneck responsible for transferring an electron to nitrogenase enzymes and recapitulating the temperature regulation of different nitrogenase enzymes can have profound implications in metabolic engineering, such as hydrogen production from R. palustris . Another interesting application of this ME model can be to take advantage of its redox balancing strategy to gain an understanding of the regulatory mechanism of biodegradable plastic production precursors, such as polyhydroxybutyrate (PHB). 
    more » « less
  5. Ouzounis, Christos A. (Ed.)
    Microbial community members exhibit various forms of interactions. Taking advantage of the increasing availability of microbiome data, many computational approaches have been developed to infer bacterial interactions from the co-occurrence of microbes across diverse microbial communities. Additionally, the introduction of genome-scale metabolic models have also enabled the inference of cooperative and competitive metabolic interactions between bacterial species. By nature, phylogenetically similar microbial species are more likely to share common functional profiles or biological pathways due to their genomic similarity. Without properly factoring out the phylogenetic relationship, any estimation of the competition and cooperation between species based on functional/pathway profiles may bias downstream applications. To address these challenges, we developed a novel approach for estimating the competition and complementarity indices for a pair of microbial species, adjusted by their phylogenetic distance. An automated pipeline, PhyloMint, was implemented to construct competition and complementarity indices from genome scale metabolic models derived from microbial genomes. Application of our pipeline to 2,815 human-gut associated bacteria showed high correlation between phylogenetic distance and metabolic competition/cooperation indices among bacteria. Using a discretization approach, we were able to detect pairs of bacterial species with cooperation scores significantly higher than the average pairs of bacterial species with similar phylogenetic distances. A network community analysis of high metabolic cooperation but low competition reveals distinct modules of bacterial interactions. Our results suggest that niche differentiation plays a dominant role in microbial interactions, while habitat filtering also plays a role among certain clades of bacterial species. 
    more » « less