skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, November 14 until 2:00 AM ET on Saturday, November 15 due to maintenance. We apologize for the inconvenience.


Title: Automating methods for estimating metabolite volatility
The volatility of metabolites can influence their biological roles and inform optimal methods for their detection. Yet, volatility information is not readily available for the large number of described metabolites, limiting the exploration of volatility as a fundamental trait of metabolites. Here, we adapted methods to estimate vapor pressure from the functional group composition of individual molecules (SIMPOL.1) to predict the gas-phase partitioning of compounds in different environments. We implemented these methods in a new open pipeline calledvolcalcthat uses chemoinformatic tools to automate these volatility estimates for all metabolites in an extensive and continuously updated pathway database: the Kyoto Encyclopedia of Genes and Genomes (KEGG) that connects metabolites, organisms, and reactions. We first benchmark the automated pipeline against a manually curated data set and show that the same category of volatility (e.g., nonvolatile, low, moderate, high) is predicted for 93% of compounds. We then demonstrate howvolcalcmight be used to generate and test hypotheses about the role of volatility in biological systems and organisms. Specifically, we estimate that 3.4 and 26.6% of compounds in KEGG have high volatility depending on the environment (soil vs. clean atmosphere, respectively) and that a core set of volatiles is shared among all domains of life (30%) with the largest proportion of kingdom-specific volatiles identified in bacteria. Withvolcalc, we lay a foundation for uncovering the role of the volatilome using an approach that is easily integrated with other bioinformatic pipelines and can be continually refined to consider additional dimensions to volatility. Thevolcalcpackage is an accessible tool to help design and test hypotheses on volatile metabolites and their unique roles in biological systems.  more » « less
Award ID(s):
2034192 2045332
PAR ID:
10482592
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
Frontiers in Microbiology
Date Published:
Journal Name:
Frontiers in Microbiology
Volume:
14
ISSN:
1664-302X
Subject(s) / Keyword(s):
bioinformatics chemoinformatics metabolic database VOCs volatile metabolite volatility
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Volatility describes the tendency for a compound to partition into the gas phase and volatile metabolites facilitate unique biological interactions which have an influence on Earth's atmospheric physics and chemistry. Estimating which metabolites may be volatile is difficult, especially for those which do not have measured vapor pressures. Volcalc is a newly developed vapor pressure estimation tool which utilizes the SIMPOL.1 method, allowing users to rapidly identify plausible volatile metabolites within the Kyoto Encyclopedia for Genes and Genomes (KEGG) database. Here, we estimate the volatiles of all KEGG metabolites and associate them with KEGG reactions, enzymes, orthologs (KOs) and whole genomes within the KEGG database. This information may be used to identify which genes or species may be linked to particular forms of volatile metabolism, for the purpose hypothesis generation and integration into additional bioinformatics pipelines. This data is listed as a compliment to the publication "Automating methods for estimating metabolite volatility". The column "Paper" indicates whether the listed species is one from the subset analyzed within the data for Figure 3.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu 
    more » « less
  2. Background/Objectives: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists. Methods: While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset. Results: The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098. Conclusions: This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways. 
    more » « less
  3. Abstract Biogenic volatile organic compounds (VOCs) constitute a significant portion of gas-phase metabolites in modern ecosystems and have unique roles in moderating atmospheric oxidative capacity, solar radiation balance, and aerosol formation. It has been theorized that VOCs may account for observed geological and evolutionary phenomena during the Archaean, but the direct contribution of biology to early non-methane VOC cycling remains unexplored. Here, we provide an assessment of all potential VOCs metabolized by the last universal common ancestor (LUCA). We identify enzyme functions linked to LUCA orthologous protein groups across eight literature sources and estimate the volatility of all associated substrates to identify ancient volatile metabolites. We hone in on volatile metabolites with confirmed modern emissions that exist in conserved metabolic pathways and produce a curated list of the most likely LUCA VOCs. We introduce volatile organic metabolites associated with early life and discuss their potential influence on early carbon cycling and atmospheric chemistry. 
    more » « less
  4. null (Ed.)
    Soils harbor complex biological processes intertwined with metabolic inputs from microbes and plants. Measuring the soil metabolome can reveal active metabolic pathways, providing insight into the presence of specific organisms and ecological interactions. A subset of the metabolome is volatile; however, current soil studies rarely consider volatile organic compounds (VOCs), contributing to biases in sample processing and metabolomic analytical techniques. Therefore, we hypothesize that overall, the volatility of detected compounds measured using current metabolomic analytical techniques will be lower than undetected compounds, a reflection of missed VOCs. To illustrate this, we examined a peatland metabolomic dataset collected using three common metabolomic analytical techniques: nuclear magnetic resonance (NMR), gas chromatography-mass spectroscopy (GC-MS), and fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR-MS). We mapped the compounds to three metabolic pathways (monoterpenoid biosynthesis, diterpenoid biosynthesis, and polycyclic aromatic hydrocarbon degradation), chosen for their activity in peatland ecosystems and involvement of VOCs. We estimated the volatility of the compounds by calculating relative volatility indices (RVIs), and as hypothesized, the average RVI of undetected compounds within each of our focal pathways was higher than detected compounds ( p < 0.001). Moreover, higher RVI compounds were absent even in sub-pathways where lower RVI compounds were observed. Our findings suggest that typical soil metabolomic analytical techniques may overlook VOCs and leave missing links in metabolic pathways. To more completely represent the volatile fraction of the soil metabolome, we suggest that environmental scientists take into consideration these biases when designing and interpreting their data and/or add direct online measurement methods that capture the integral role of VOCs in soil systems. 
    more » « less
  5. Abstract Despite significant advances in reconstructing genome-scale metabolic networks, the understanding of cellular metabolism remains incomplete for many organisms. A promising approach for elucidating cellular metabolism is analysing the full scope of enzyme promiscuity, which exploits the capacity of enzymes to bind to non-annotated substrates and generate novel reactions. To guide time-consuming costly experimentation, different computational methods have been proposed for exploring enzyme promiscuity. One relevant algorithm is PROXIMAL, which strongly relies on KEGG to define generic reaction rules and link specific molecular substructures with associated chemical transformations. Here, we present a completely new pipeline, PROXIMAL2, which overcomes the dependency on KEGG data. In addition, PROXIMAL2 introduces two relevant improvements with respect to the former version: i) correct treatment of multi-step reactions and ii) tracking of electric charges in the transformations. We compare PROXIMAL and PROXIMAL2 in recovering annotated products from substrates in KEGG reactions, finding a highly significant improvement in the level of accuracy. We then applied PROXIMAL2 to predict degradation reactions of phenolic compounds in the human gut microbiota. The results were compared to RetroPath RL, a different and relevant enzyme promiscuity method. We found a significant overlap between these two methods but also complementary results, which open new research directions into this relevant question in nutrition. 
    more » « less