- Award ID(s):
- 1656481
- PAR ID:
- 10199204
- Date Published:
- Journal Name:
- Metabolites
- Volume:
- 9
- Issue:
- 7
- ISSN:
- 2218-1989
- Page Range / eLocation ID:
- 144
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Recent developments in molecular networking have expanded our ability to characterize the metabolome of diverse samples that contain a significant proportion of ion features with no mass spectral match to known compounds. Manual and tool-assisted natural annotation propagation is readily used to classify molecular networks; however, currently no annotation propagation tools leverage consensus confidence strategies enabled by hierarchical chemical ontologies or enable the use of new in silico tools without significant modification. Herein we present ConCISE (Consensus Classifications of In Silico Elucidations) which is the first tool to fuse molecular networking, spectral library matching and in silico class predictions to establish accurate putative classifications for entire subnetworks. By limiting annotation propagation to only structural classes which are identical for the majority of ion features within a subnetwork, ConCISE maintains a true positive rate greater than 95% across all levels of the ChemOnt hierarchical ontology used by the ClassyFire annotation software (superclass, class, subclass). The ConCISE framework expanded the proportion of reliable and consistent ion feature annotation up to 76%, allowing for improved assessment of the chemo-diversity of dissolved organic matter pools from three complex marine metabolomics datasets comprising dominant reef primary producers, five species of the diatom genus Pseudo-nitzchia, and stromatolite sediment samples.more » « less
-
The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp.more » « less
-
Abstract The wide applications of liquid chromatography - mass spectrometry (LC-MS) in untargeted metabolomics demand an easy-to-use, comprehensive computational workflow to support efficient and reproducible data analysis. However, current tools were primarily developed to perform specific tasks in LC-MS based metabolomics data analysis. Here we introduce MetaboAnalystR 4.0 as a streamlined pipeline covering raw spectra processing, compound identification, statistical analysis, and functional interpretation. The key features of MetaboAnalystR 4.0 includes an auto-optimized feature detection and quantification algorithm for LC-MS1 spectra processing, efficient MS2 spectra deconvolution and compound identification for data-dependent or data-independent acquisition, and more accurate functional interpretation through integrated spectral annotation. Comprehensive validation studies using LC-MS1 and MS2 spectra obtained from standards mixtures, dilution series and clinical metabolomics samples have shown its excellent performance across a wide range of common tasks such as peak picking, spectral deconvolution, and compound identification with good computing efficiency. Together with its existing statistical analysis utilities, MetaboAnalystR 4.0 represents a significant step toward a unified, end-to-end workflow for LC-MS based global metabolomics in the open-source R environment.
-
Background: Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. Aim of Review: We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. Key Scientific Concepts of Review: This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.more » « less
-
The management and analysis of large in silico molecular libraries is pivotal in many areas of modern chemistry. The adoption and success of data-oriented approaches to chemical research is dependent on the ease of handling large collections of in silico molecular structures in a programmatic way. Herein, we introduce the MOLecular LIibrary toolkit, “molli”, which is a Python 3 chemoinformatics module that provides a streamlined interface for manipulating large in silico libraries. Three-dimensional, combinatorial molecule libraries can be expanded directly from two-dimensional chemical structure fragments stored in CDXML files with high stereochemical fidelity. Geometry optimization, property calculation, and conformer generation are executed by interfacing with widely used computational chemistry programs such as OpenBabel, RDKit, ORCA, and xTB/CREST. Conformer-dependent grid-based feature calculators provide numerical representation suitable for diversity analysis, and interface to robust three-dimensional visualization tools provide comprehensive images to enhance human understanding of libraries with thousands of members. The package includes command-line interface in addition to Python classes to streamline frequently used workflows. This work describes the development and implementation of molli 1.0 and highlights the available functionality. Parallel performance is benchmarked on various hardware platforms and common workflows are demonstrated for different tasks ranging from optimized grid-based descriptor calculation on catalyst libraries to NMR prediction workflow from CDXML files.