skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis
Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.  more » « less
Award ID(s):
1759934
PAR ID:
10163861
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Journal of Molecular Sciences
Volume:
21
Issue:
8
ISSN:
1422-0067
Page Range / eLocation ID:
2873
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. RationaleTandem‐ion mobility spectrometry/mass spectrometry methods have recently gained traction for the structural characterization of proteins and protein complexes. However, ion activation techniques currently coupled with tandem‐ion mobility spectrometry/mass spectrometry methods are limited in their ability to characterize structures of proteins and protein complexes. MethodsHere, we describe the coupling of the separation capabilities of tandem‐trapped ion mobility spectrometry/mass spectrometry (tTIMS/MS) with the dissociation capabilities of ultraviolet photodissociation (UVPD) for protein structure analysis. ResultsWe establish the feasibility of dissociating intact proteins by UV irradiation at 213 nm between the two TIMS devices in tTIMS/MS and at pressure conditions compatible with ion mobility spectrometry (2–3 mbar). We validate that the fragments produced by UVPD under these conditions result from a radical‐based mechanism in accordance with prior literature on UVPD. The data suggest stabilization of fragment ions produced from UVPD by collisional cooling due to the elevated pressures used here (“UVnoD2”), which otherwise do not survive to detection. The data account for a sequence coverage for the protein ubiquitin comparable to recent reports, demonstrating the analytical utility of our instrument in mobility‐separating fragment ions produced from UVPD. ConclusionsThe data demonstrate that UVPD carried out at elevated pressures of 2–3 mbar yields extensive fragment ions rich in information about the protein and that their exhaustive analysis requires IMS separation post‐UVPD. Therefore, because UVPD and tTIMS/MS each have been shown to be valuable techniques on their own merit in proteomics, our contribution here underscores the potential of combining tTIMS/MS with UVPD for structural proteomics. 
    more » « less
  2. Multilevel proteomics aims to delineate proteins at the peptide (bottom-up proteomics), proteoform (top-down proteomics), and protein complex (native proteomics) levels. Capillary electrophoresis-mass spectrometry (CE-MS) can achieve highly efficient separation and highly sensitive detection of complex mixtures of peptides, proteoforms, and even protein complexes because of its substantial technical progress. CE-MS has become a valuable alternative to the routinely used liquid chromatography-mass spectrometry for multilevel proteomics. This review summarizes the most recent (2019-2021) advances of CE-MS for multilevel proteomics regarding technological progress and biological applications. We also provide brief perspectives on CE-MS for multilevel proteomics at the end, highlighting some future directions and potential challenges. 
    more » « less
  3. null (Ed.)
    Protein O -GlcNAcylation refers to the covalent binding of a single N -acetylglucosamine (GlcNAc) to the serine or threonine residue. This modification primarily occurs on proteins in the nucleus and the cytosol, and plays critical roles in many cellular events, including regulation of gene expression and signal transduction. Aberrant protein O -GlcNAcylation is directly related to human diseases such as cancers, diabetes and neurodegenerative diseases. In the past decades, considerable progress has been made for global and site-specific analysis of O -GlcNAcylation in complex biological samples using mass spectrometry (MS)-based proteomics. In this review, we summarized previous efforts on comprehensive investigation of protein O -GlcNAcylation by MS. Specifically, the review is focused on methods for enriching and site-specifically mapping O -GlcNAcylated peptides, and applications for quantifying protein O -GlcNAcylation in different biological systems. As O -GlcNAcylation is an important protein modification for cell survival, effective methods are essential for advancing our understanding of glycoprotein functions and cellular events. 
    more » « less
  4. Mass spectrometry is the dominant technology in the field of proteomics, enabling high-throughput analysis of the protein content of complex biological samples. Due to the complexity of the instrumentation and resulting data, sophisticated computational methods are required for the processing and interpretation of acquired mass spectra. Machine learning has shown great promise to improve the analysis of mass spectrometry data, with numerous purpose-built methods for improving specific steps in the data acquisition and analysis pipeline reaching widespread adoption. Here, we propose unifying various spectrum prediction tasks under a single foundation model for mass spectra. To this end, we pre-train a spectrum encoder using de novo sequencing as a pre-training task. We then show that using these pre-trained spectrum representations improves our performance on the four downstream tasks of spectrum quality prediction, chimericity prediction, phosphorylation prediction, and glycosylation status prediction. Finally, we perform multi-task fine-tuning and find that this approach improves the performance on each task individually. Overall, our work demonstrates that a foundation model for tandem mass spectrometry proteomics trained on de novo sequencing learns generalizable representations of spectra, improves performance on downstream tasks where training data is limited, and can ultimately enhance data acquisition and analysis in proteomics experiments. 
    more » « less
  5. Abstract MotivationUbiquitination is widely involved in protein homeostasis and cell signaling. Ubiquitin E3 ligases are critical regulators of ubiquitination that recognize and recruit specific ubiquitination targets for the final rate-limiting step of ubiquitin transfer reactions. Understanding the ubiquitin E3 ligase activities will provide knowledge in the upstream regulator of the ubiquitination pathway and reveal potential mechanisms in biological processes and disease progression. Recent advances in mass spectrometry-based proteomics have enabled deep profiling of ubiquitylome in a quantitative manner. Yet, functional analysis of ubiquitylome dynamics and pathway activity remains challenging. ResultsHere, we developed a UbE3-APA, a computational algorithm and stand-alone python-based software for Ub E3 ligase Activity Profiling Analysis. Combining an integrated annotation database with statistical analysis, UbE3-APA identifies significantly activated or suppressed E3 ligases based on quantitative ubiquitylome proteomics datasets. Benchmarking the software with published quantitative ubiquitylome analysis confirms the genetic manipulation of SPOP enzyme activity through overexpression and mutation. Application of the algorithm in the re-analysis of a large cohort of ubiquitination proteomics study revealed the activation of PARKIN and the co-activation of other E3 ligases in mitochondria depolarization-induced mitophagy process. We further demonstrated the application of the algorithm in the DIA (data-independent acquisition)-based quantitative ubiquitylome analysis. Availability and implementationSource code and binaries are freely available for download at URL: https://github.com/Chenlab-UMN/Ub-E3-ligase-Activity-Profiling-Analysis, implemented in python and supported on Linux and MS Windows. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less