skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Propagating annotations of molecular networks using in silico fragmentation
The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp.  more » « less
Award ID(s):
1656475
PAR ID:
10062093
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
PLOS computational biology
Volume:
14
ISSN:
1553-7358
Page Range / eLocation ID:
e1006089
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Metabolomics has started to embrace computational approaches for chemical interpretation of large data sets. Yet, metabolite annotation remains a key challenge. Recently, molecular networking and MS2LDA emerged as molecular mining tools that find molecular families and substructures in mass spectrometry fragmentation data. Moreover, in silico annotation tools obtain and rank candidate molecules for fragmentation spectra. Ideally, all structural information obtained and inferred from these computational tools could be combined to increase the resulting chemical insight one can obtain from a data set. However, integration is currently hampered as each tool has its own output format and efficient matching of data across these tools is lacking. Here, we introduce MolNetEnhancer, a workflow that combines the outputs from molecular networking, MS2LDA, in silico annotation tools (such as Network Annotation Propagation or DEREPLICATOR), and the automated chemical classification through ClassyFire to provide a more comprehensive chemical overview of metabolomics data whilst at the same time illuminating structural details for each fragmentation spectrum. We present examples from four plant and bacterial case studies and show how MolNetEnhancer enables the chemical annotation, visualization, and discovery of the subtle substructural diversity within molecular families. We conclude that MolNetEnhancer is a useful tool that greatly assists the metabolomics researcher in deciphering the metabolome through combination of multiple independent in silico pipelines. 
    more » « less
  2. RationaleA major hurdle in identifying chemicals in mass spectrometry experiments is the availability of tandem mass spectrometry (MS/MS) reference spectra in public databases. Currently, scientists purchase databases or use public databases such as Global Natural Products Social Molecular Networking (GNPS). The MSMS‐Chooser workflow is an open‐source protocol for the creation of MS/MS reference spectra directly in the GNPS infrastructure. MethodsAn MSMS‐Chooser Sample Template is provided and completed manually. The MSMS‐Chooser Submission File and Sequence Table for data acquisition were programmatically generated. Standards from the Mass Spectrometry Metabolite Library (MSMLS) suspended in a methanol–water (1:1) solution were analyzed. Flow injection on an LC/MS/MS system was used to generate negative and positive mode data using data‐dependent acquisition. The MS/MS spectra and Submission File were uploaded to MSMS‐Chooser workflow in GNPS for automatic selection of MS/MS spectra. ResultsData acquisition and processing required ~2 h and ~2 min, respectively, per 96‐well plate using MSMS‐Chooser. Analysis of the MSMLS, over 600 small molecules, using MSMS‐Chooser added 889 spectra (including multiple adducts) to the public library in GNPS. Manual validation of one plate indicated accurate selection of MS/MS scans (true positive rate of 0.96 and a true negative rate of 0.99). The MSMS‐Chooser output includes a table formatted for inclusion in the GNPS library as well as the ability to directly launch searches via MASST. ConclusionsMSMS‐Chooser enables rapid data acquisition, data analysis (selection of MS/MS spectra), and a formatted table for inspection and upload to GNPS. Open file‐format data (.mzML or.mzXML) from most mass spectrometry platforms containing MS/MS spectra can be processed using MSMS‐Chooser. MSMS‐Chooser democratizes the creation of MS/MS reference spectra in GNPS which will improve annotation and strengthen the tools which use the annotation information. 
    more » « less
  3. Recent developments in molecular networking have expanded our ability to characterize the metabolome of diverse samples that contain a significant proportion of ion features with no mass spectral match to known compounds. Manual and tool-assisted natural annotation propagation is readily used to classify molecular networks; however, currently no annotation propagation tools leverage consensus confidence strategies enabled by hierarchical chemical ontologies or enable the use of new in silico tools without significant modification. Herein we present ConCISE (Consensus Classifications of In Silico Elucidations) which is the first tool to fuse molecular networking, spectral library matching and in silico class predictions to establish accurate putative classifications for entire subnetworks. By limiting annotation propagation to only structural classes which are identical for the majority of ion features within a subnetwork, ConCISE maintains a true positive rate greater than 95% across all levels of the ChemOnt hierarchical ontology used by the ClassyFire annotation software (superclass, class, subclass). The ConCISE framework expanded the proportion of reliable and consistent ion feature annotation up to 76%, allowing for improved assessment of the chemo-diversity of dissolved organic matter pools from three complex marine metabolomics datasets comprising dominant reef primary producers, five species of the diatom genus Pseudo-nitzchia, and stromatolite sediment samples. 
    more » « less
  4. Background: Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. Aim of Review: We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. Key Scientific Concepts of Review: This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future. 
    more » « less
  5. Abstract MotivationTandem mass spectrometry is an essential technology for characterizing chemical compounds at high sensitivity and throughput, and is commonly adopted in many fields. However, computational methods for automated compound identification from their MS/MS spectra are still limited, especially for novel compounds that have not been previously characterized. In recent years, in silico methods were proposed to predict the MS/MS spectra of compounds, which can then be used to expand the reference spectral libraries for compound identification. However, these methods did not consider the compounds’ 3D conformations, and thus neglected critical structural information. ResultsWe present the 3D Molecular Network for Mass Spectra Prediction (3DMolMS), a deep neural network model to predict the MS/MS spectra of compounds from their 3D conformations. We evaluated the model on the experimental spectra collected in several spectral libraries. The results showed that 3DMolMS predicted the spectra with the average cosine similarity of 0.691 and 0.478 with the experimental MS/MS spectra acquired in positive and negative ion modes, respectively. Furthermore, 3DMolMS model can be generalized to the prediction of MS/MS spectra acquired by different labs on different instruments through minor fine-tuning on a small set of spectra. Finally, we demonstrate that the molecular representation learned by 3DMolMS from MS/MS spectra prediction can be adapted to enhance the prediction of chemical properties such as the elution time in the liquid chromatography and the collisional cross section measured by ion mobility spectrometry, both of which are often used to improve compound identification. Availability and implementationThe codes of 3DMolMS are available at https://github.com/JosieHong/3DMolMS and the web service is at https://spectrumprediction.gnps2.org. 
    more » « less