skip to main content

Title: Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis
Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Journal of Molecular Sciences
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Ubiquitination is widely involved in protein homeostasis and cell signaling. Ubiquitin E3 ligases are critical regulators of ubiquitination that recognize and recruit specific ubiquitination targets for the final rate-limiting step of ubiquitin transfer reactions. Understanding the ubiquitin E3 ligase activities will provide knowledge in the upstream regulator of the ubiquitination pathway and reveal potential mechanisms in biological processes and disease progression. Recent advances in mass spectrometry-based proteomics have enabled deep profiling of ubiquitylome in a quantitative manner. Yet, functional analysis of ubiquitylome dynamics and pathway activity remains challenging.


    Here, we developed a UbE3-APA, a computational algorithm and stand-alone python-based software for Ub E3 ligase Activity Profiling Analysis. Combining an integrated annotation database with statistical analysis, UbE3-APA identifies significantly activated or suppressed E3 ligases based on quantitative ubiquitylome proteomics datasets. Benchmarking the software with published quantitative ubiquitylome analysis confirms the genetic manipulation of SPOP enzyme activity through overexpression and mutation. Application of the algorithm in the re-analysis of a large cohort of ubiquitination proteomics study revealed the activation of PARKIN and the co-activation of other E3 ligases in mitochondria depolarization-induced mitophagy process. We further demonstrated the application of the algorithm in the DIA (data-independent acquisition)-based quantitative ubiquitylome analysis.

    Availability and implementation

    Source code and binaries are freely available for download at URL:, implemented in python and supported on Linux and MS Windows.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  2. Rationale

    Tandem‐ion mobility spectrometry/mass spectrometry methods have recently gained traction for the structural characterization of proteins and protein complexes. However, ion activation techniques currently coupled with tandem‐ion mobility spectrometry/mass spectrometry methods are limited in their ability to characterize structures of proteins and protein complexes.


    Here, we describe the coupling of the separation capabilities of tandem‐trapped ion mobility spectrometry/mass spectrometry (tTIMS/MS) with the dissociation capabilities of ultraviolet photodissociation (UVPD) for protein structure analysis.


    We establish the feasibility of dissociating intact proteins by UV irradiation at 213 nm between the two TIMS devices in tTIMS/MS and at pressure conditions compatible with ion mobility spectrometry (2–3 mbar). We validate that the fragments produced by UVPD under these conditions result from a radical‐based mechanism in accordance with prior literature on UVPD. The data suggest stabilization of fragment ions produced from UVPD by collisional cooling due to the elevated pressures used here (“UVnoD2”), which otherwise do not survive to detection. The data account for a sequence coverage for the protein ubiquitin comparable to recent reports, demonstrating the analytical utility of our instrument in mobility‐separating fragment ions produced from UVPD.


    The data demonstrate that UVPD carried out at elevated pressures of 2–3 mbar yields extensive fragment ions rich in information about the protein and that their exhaustive analysis requires IMS separation post‐UVPD. Therefore, because UVPD and tTIMS/MS each have been shown to be valuable techniques on their own merit in proteomics, our contribution here underscores the potential of combining tTIMS/MS with UVPD for structural proteomics.

    more » « less
  3. Rationale

    Protein studies in archaeology and paleontology have been dominated by stable isotope studies to understand diet and trophic levels, but recent applications of proteomic techniques have resulted in a more complete understanding of protein diagenesis than stable isotopes alone. In stable isotope analyses, samples are retained or discarded based on their properties. Proteomics can directly determine what proteins are present within the sample and may be able to allow previously discarded samples to be analyzed.


    Protein samples that had been previously analyzed for stable isotopes, including those with marginal and poor sample quality, were characterized by liquid chromatography/mass spectrometry using an LTQ Orbitrap Velos mass spectrometer after separation on a Dionex Ultimate 3000 LC system. Data were analyzed using MetaMorpheus and custom R scripts.


    We found a variety of proteins in addition to collagen, although collagen I was found in the majority of the samples (most samples >80%). We also found a positive correlation between total deamidation and wt% N, suggesting that deamidation may impact the overall nitrogen signal in bulk analyses. The amino acid profiles of samples, including those of marginal or poor stable isotope quality, reflect the expected collagen I percentages, allowing their use in single amino acid stable isotope analyses.


    All the samples regardless of quality were found to have high concentrations of collagen I, making interpretations of dietary routing based on collagen I reasonably valid. The amino acid profiles on the marginal and poor samples reflect an expected collagen I profile and allow these samples to be recovered for single amino acid analyses.

    more » « less
  4. null (Ed.)
    Protein O -GlcNAcylation refers to the covalent binding of a single N -acetylglucosamine (GlcNAc) to the serine or threonine residue. This modification primarily occurs on proteins in the nucleus and the cytosol, and plays critical roles in many cellular events, including regulation of gene expression and signal transduction. Aberrant protein O -GlcNAcylation is directly related to human diseases such as cancers, diabetes and neurodegenerative diseases. In the past decades, considerable progress has been made for global and site-specific analysis of O -GlcNAcylation in complex biological samples using mass spectrometry (MS)-based proteomics. In this review, we summarized previous efforts on comprehensive investigation of protein O -GlcNAcylation by MS. Specifically, the review is focused on methods for enriching and site-specifically mapping O -GlcNAcylated peptides, and applications for quantifying protein O -GlcNAcylation in different biological systems. As O -GlcNAcylation is an important protein modification for cell survival, effective methods are essential for advancing our understanding of glycoprotein functions and cellular events. 
    more » « less
  5. Abstract Over the past two decades, mass spectrometric (MS)-based proteomics technologies have facilitated the study of signaling pathways throughout biology. Nowhere is this needed more than in plants, where an evolutionary history of genome duplications has resulted in large gene families involved in posttranslational modifications and regulatory pathways. For example, at least 5% of the Arabidopsis thaliana genome (ca. 1,200 genes) encodes protein kinases and protein phosphatases that regulate nearly all aspects of plant growth and development. MS-based technologies that quantify covalent changes in the side-chain of amino acids are critically important, but they only address one piece of the puzzle. A more crucially important mechanistic question is how noncovalent interactions—which are more difficult to study—dynamically regulate the proteome’s 3D structure. The advent of improvements in protein 3D technologies such as cryo-electron microscopy, nuclear magnetic resonance, and X-ray crystallography has allowed considerable progress to be made at this level, but these methods are typically limited to analyzing proteins, which can be expressed and purified in milligram quantities. Newly emerging MS-based technologies have recently been developed for studying the 3D structure of proteins. Importantly, these methods do not require protein samples to be purified and require smaller amounts of sample, opening the wider proteome for structural analysis in complex mixtures, crude lysates, and even in intact cells. These MS-based methods include covalent labeling, crosslinking, thermal proteome profiling, and limited proteolysis, all of which can be leveraged by established MS workflows, as well as newly emerging methods capable of analyzing intact macromolecules and the complexes they form. In this review, we discuss these recent innovations in MS-based “structural” proteomics to provide readers with an understanding of the opportunities they offer and the remaining challenges for understanding the molecular underpinnings of plant structure and function. 
    more » « less