skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Protocol for community‐created public MS/MS reference spectra within the Global Natural Products Social Molecular Networking infrastructure
RationaleA major hurdle in identifying chemicals in mass spectrometry experiments is the availability of tandem mass spectrometry (MS/MS) reference spectra in public databases. Currently, scientists purchase databases or use public databases such as Global Natural Products Social Molecular Networking (GNPS). The MSMS‐Chooser workflow is an open‐source protocol for the creation of MS/MS reference spectra directly in the GNPS infrastructure. MethodsAn MSMS‐Chooser Sample Template is provided and completed manually. The MSMS‐Chooser Submission File and Sequence Table for data acquisition were programmatically generated. Standards from the Mass Spectrometry Metabolite Library (MSMLS) suspended in a methanol–water (1:1) solution were analyzed. Flow injection on an LC/MS/MS system was used to generate negative and positive mode data using data‐dependent acquisition. The MS/MS spectra and Submission File were uploaded to MSMS‐Chooser workflow in GNPS for automatic selection of MS/MS spectra. ResultsData acquisition and processing required ~2 h and ~2 min, respectively, per 96‐well plate using MSMS‐Chooser. Analysis of the MSMLS, over 600 small molecules, using MSMS‐Chooser added 889 spectra (including multiple adducts) to the public library in GNPS. Manual validation of one plate indicated accurate selection of MS/MS scans (true positive rate of 0.96 and a true negative rate of 0.99). The MSMS‐Chooser output includes a table formatted for inclusion in the GNPS library as well as the ability to directly launch searches via MASST. ConclusionsMSMS‐Chooser enables rapid data acquisition, data analysis (selection of MS/MS spectra), and a formatted table for inspection and upload to GNPS. Open file‐format data (.mzML or.mzXML) from most mass spectrometry platforms containing MS/MS spectra can be processed using MSMS‐Chooser. MSMS‐Chooser democratizes the creation of MS/MS reference spectra in GNPS which will improve annotation and strengthen the tools which use the annotation information.  more » « less
Award ID(s):
1656481
PAR ID:
10457686
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Rapid Communications in Mass Spectrometry
Volume:
34
Issue:
10
ISSN:
0951-4198
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp. 
    more » « less
  2. Abstract MotivationAdvances in mass spectrometry have led to the development of mass spectrometers with ion mobility spectrometry capabilities and dual-source instrumentation; however, the current software ecosystem lacks interoperability with downstream data analysis using open-source software and pipelines. ResultsHere, we present TIMSCONVERT, a data conversion high-throughput workflow from timsTOF Pro/fleX mass spectrometer raw data files to mzML and imzML formats that incorporates ion mobility data while maintaining compatibility with data analysis tools. We showcase several examples using data acquired across different experiments and acquisition modalities on the timsTOF fleX MS. Availability and implementationTIMSCONVERT and its documentation can be found at https://github.com/gtluu/timsconvert and is available as a standalone command-line interface tool for Windows and Linux, NextFlow workflow and online in the Global Natural Products Social (GNPS) platform. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  3. Abstract MotivationTandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification. ResultsWe evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%–2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%–15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%–12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder’s potential to enhance peptide identification for proteomic data analyses. Availability and ImplementationThe source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu. 
    more » « less
  4. Abstract MotivationDriven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets. ResultsWe propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA’s tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools. Availability and implementationHOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc. 
    more » « less
  5. Fatty acid-based ignitable liquids (ILs), such as biodiesels and bio-based lighter fluids, represent a growing class of accelerants with limited forensic characterization. In this study, we applied gas chromatography–mass spectrometry (GC–MS) and direct analysis in real time mass spectrometry (DART–MS) to analyze plant oil-derived IL residues on wood and fabric substrates. ILs were prepared from ten different plant oils, subjected to burning, and extracted from fire debris using the ASTM E1412 activated charcoal method. GC–MS analysis resolved characteristic fatty acid methyl esters (FAMEs) and identified diagnostic fragment ions (m/z 55, 67, 74, 79). The fragmentation patterns of unsaturated and saturated FAMEs were systematically examined and compared against experimental data and reference spectra from online databases, demonstrating strong agreement and validating the reliability of these ion ratios as qualitative indicators of FAME saturation. DART–MS enabled rapid confirmation of major unsaturated FAMEs through the detection of protonated molecular ions, offering complementary identification without chromatographic separation. Chemometric analysis using principal component analysis (PCA) and analysis of variance-PCA revealed that FAME profiles were strongly dependent on the IL sources and remained reliable across replicate preparations and synthesis conditions, while substrate and combustion effects were mitigated using targeted ion extraction. These findings demonstrate the practical casework relevance of combining GC–MS and DART–MS for the detection and classification of fatty acid–based ILs in fire debris, providing robust chemical evidence to support arson investigations and to guide the inclusion of these emerging accelerants in forensic ignitable-liquid classification schemes. 
    more » « less