The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp. 
                        more » 
                        « less   
                    
                            
                            The critical role that spectral libraries play in capturing the metabolomics community knowledge
                        
                    
    
            Background: Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. Aim of Review: We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. Key Scientific Concepts of Review: This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2152526
- PAR ID:
- 10470641
- Publisher / Repository:
- Metabolomics
- Date Published:
- Journal Name:
- Metabolomics
- Volume:
- 18
- Issue:
- 12
- ISSN:
- 1573-3890
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Plant metabolomes are structurally diverse. One of the most popular techniques for sampling this diversity is liquid chromatography–mass spectrometry (LC‐MS), which typically detects thousands of peaks from single organ extracts, many representing true metabolites. These peaks are usually annotated using in‐house retention time or spectral libraries, in silico fragmentation libraries, and increasingly through computational techniques such as machine learning. Despite these advances, over 85% of LC‐MS peaks remain unidentified, posing a major challenge for data analysis and biological interpretation. This bottleneck limits our ability to fully understand the diversity, functions, and evolution of plant metabolites. In this review, we first summarize current approaches for metabolite identification, highlighting their challenges and limitations. We further focus on alternative strategies that bypass the need for metabolite identification, allowing researchers to interpret global metabolic patterns and pinpoint key metabolite signals. These methods include molecular networking, distance‐based approaches, information theory–based metrics, and discriminant analysis. Additionally, we explore their practical applications in plant science and highlight a set of useful tools to support researchers in analyzing complex plant metabolomics data. By adopting these approaches, researchers can enhance their ability to uncover new insights into plant metabolism.more » « less
- 
            Abstract MotivationDriven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets. ResultsWe propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA’s tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools. Availability and implementationHOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc.more » « less
- 
            Abstract Data-Independent Acquisition (DIA) is a method to improve consistent identification and precise quantitation of peptides and proteins by mass spectrometry (MS). The targeted data analysis strategy in DIA relies on spectral assay libraries that are generally derived from a priori measurements of peptides for each species. Although Escherichia coli ( E. coli ) is among the best studied model organisms, so far there is no spectral assay library for the bacterium publicly available. Here, we generated a spectral assay library for 4,014 of the 4,389 annotated E. coli proteins using one- and two-dimensional fractionated samples, and ion mobility separation enabling deep proteome coverage. We demonstrate the utility of this high-quality library with robustness in quantitation of the E. coli proteome and with rapid-chromatography to enhance throughput by targeted DIA-MS. The spectral assay library supports the detection and quantification of 91.5% of all E. coli proteins at high-confidence with 56,182 proteotypic peptides, making it a valuable resource for the scientific community. Data and spectral libraries are available via ProteomeXchange (PXD020761, PXD020785) and SWATHAtlas (SAL00222-28).more » « less
- 
            Tandem mass spectrometry (MS/MS) is crucial for small-molecule analysis; however, traditional computational methods are limited by incomplete reference libraries and complex data processing. Machine learning (ML) is transforming small-molecule mass spectrometry in three key directions: (a) predicting MS/MS spectra and related physicochemical properties to expand reference libraries, (b) improving spectral matching through automated pattern extraction, and (c) predicting molecular structures of compounds directly from their MS/MS spectra. We review ML approaches for molecular representations [descriptors, simplified molecular-input line-entry (SMILE) strings, and graphs] and MS/MS spectra representations (using binned vectors and peak lists) along with recent advances in spectra prediction, retention time, collision cross sections, and spectral matching. Finally, we discuss ML-integrated workflows for chemical formula identification. By addressing the limitations of current methods for compound identification, these ML approaches can greatly enhance the understanding of biological processes and the development of diagnostic and therapeutic tools.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    