skip to main content


Title: The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource
Abstract We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of ∼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying ∼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.  more » « less
Award ID(s):
1922871
NSF-PAR ID:
10348226
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
The Plant Cell
Volume:
33
Issue:
11
ISSN:
1040-4651
Page Range / eLocation ID:
3421 to 3453
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Alternatively spliced genes produce multiple spliced isoforms, called transcript variants. In differential alternative splicing, transcript variant abundance differs across sample types. Differential alternative splicing is common in animal systems and influences cellular development in many processes, but its extent and significance is not as well known in plants. To investigate differential alternative splicing in plants, we examined RNA‐Seq data from rice seedlings. The data included three biological replicates per sample type, approximately 30 million sequence alignments per replicate, and four sample types: roots and shoots treated with exogenous cytokinin delivered hydroponically or a mock treatment. Cytokinin treatment triggered expression changes in thousands of genes but had negligible effect on splicing patterns. However, many genes were differentially spliced between mock‐treated roots and shoots, indicating that our methods were sufficiently sensitive to detect differential splicing between data sets. Quantitative fragment analysis of reverse transcriptase‐PCR products made from newly prepared rice samples confirmed 9 of 10 differential splicing events between rice roots and shoots. Differential alternative splicing typically changed the relative abundance of splice variants that co‐occurred in a data set. Analysis of a similar (but less deeply sequenced) RNA‐Seq data set fromArabidopsisshowed the same pattern. In both theArabidopsisand rice RNA‐Seq data sets, most genes annotated as alternatively spliced had small minor variant frequencies. Of splicing choices with abundant support for minor forms, most alternative splicing events were located within the protein‐coding sequence and maintained the annotated reading frame. A tool for visualizing protein annotations in the context of genomic sequence (ProtAnnot) together with a genome browser (Integrated Genome Browser) were used to visualize and assess effects of differential splicing on gene function. In general, differentially spliced regions coincided with conserved protein domains, indicating that differential alternative splicing is likely to affect protein function between root and shoot tissue in rice.

     
    more » « less
  2. Abstract

    Protein S‐acylation, predominately in the form of palmitoylation, is a reversible lipid post‐translational modification on cysteines that plays important roles in protein localization, trafficking, activity, and complex assembly. The functions and regulatory mechanisms of S‐acylation have been extensively studied in mammals owing to remarkable development of high‐resolution proteomics and the discovery of the S‐acylation‐related enzymes. However, the advancement of S‐acylation studies in plants lags behind that in mammals, mainly due to the lack of knowledge about proteins responsible for this process, such as protein acyltransferases and their substrates. In this article, a set of systematic protocols to study global S‐acylation inArabidopsisseedlings is described. The procedures are presented in detail, including preparation ofArabidopsisseedlings, enrichment of plasma membrane (PM) proteins, ensuing enrichment of S‐acylated proteins/peptides based on the acyl‐biotin exchange method, and large‐scale identification of S‐acylated proteins/peptides via mass spectrometry. This approach enables researchers to study S‐acylation of PM proteins in plants in a systematic and straightforward way. © 2020 Wiley Periodicals LLC.

    Basic Protocol 1: Preparation ofArabidopsisseedling materials

    Basic Protocol 2: Isolation and enrichment of plasma membrane proteins

    Support Protocol 1: Determination of protein concentration using BCA assay

    Basic Protocol 3: Enrichment of S‐acylated proteins by acyl‐biotin exchange method

    Support Protocol 2: Protein precipitation by methanol/chloroform method

    Basic Protocol 4: Trypsin digestion and proteomic analysis

    Alternate Protocol: Pre‐resin digestion and peptide‐level enrichment

     
    more » « less
  3. Abstract SPINDLY (SPY) is a novel nucleocytoplasmic protein O-fucosyltransferase that regulates target protein activity or stability via O-fucosylation of specific Ser/Thr residues. Previous genetic studies indicate that AtSPY regulates plant development during vegetative and reproductive growth by modulating gibberellin and cytokinin responses. AtSPY also regulates the circadian clock and plant responses to biotic and abiotic stresses. The pleiotropic phenotypes of spy mutants point to the likely role of AtSPY in regulating key proteins functioning in diverse cellular pathways. However, very few AtSPY targets are known. Here, we identified 88 SPY targets from Arabidopsis (Arabidopsis thaliana) and Nicotiana benthamiana via the purification of O-fucosylated peptides using Aleuria aurantia lectin followed by electron transfer dissociation-MS/MS analysis. Most AtSPY targets were nuclear proteins that function in DNA repair, transcription, RNA splicing, and nucleocytoplasmic transport. Cytoplasmic AtSPY targets were involved in microtubule-mediated cell division/growth and protein folding. A comparison with the published O-linked-N-acetylglucosamine (O-GlcNAc) proteome revealed that 30% of AtSPY targets were also O-GlcNAcylated, indicating that these distinct glycosylations could co-regulate many protein functions. This study unveiled the roles of O-fucosylation in modulating many key nuclear and cytoplasmic proteins and provided a valuable resource for elucidating the regulatory mechanisms involved. 
    more » « less
  4. Abstract

    Understanding the molecular profile of every human cell type is essential for understanding its role in normal physiology and disease. Technological advancements in DNA sequencing, mass spectrometry, and computational methods allow us to carry out multiomics analyses although such approaches are not routine yet. Human umbilical vein endothelial cells (HUVECs) are a widely used model system to study pathological and physiological processes associated with the cardiovascular system. In this study, next‐generation sequencing and high‐resolution mass spectrometry to profile the transcriptome and proteome of primary HUVECs is employed. Analysis of 145 million paired‐end reads from next‐generation sequencing confirmed expression of 12 186 protein‐coding genes (FPKM ≥0.1), 439 novel long non‐coding RNAs, and revealed 6089 novel isoforms that were not annotated in GENCODE. Proteomics analysis identifies 6477 proteins including confirmation ofN‐termini for 1091 proteins, isoforms for 149 proteins, and 1034 phosphosites. A database search to specifically identify other post‐translational modifications provide evidence for a number of modification sites on 117 proteins which include ubiquitylation, lysine acetylation, and mono‐, di‐ and tri‐methylation events. Evidence for 11 “missing proteins,” which are proteins for which there was insufficient or no protein level evidence, is provided. Peptides supporting missing protein and novel events are validated by comparison of MS/MS fragmentation patterns with synthetic peptides. Finally, 245 variant peptides derived from 207 expressed proteins in addition to alternate translational start sites for seven proteins and evidence for novel proteoforms for five proteins resulting from alternative splicing are identified. Overall, it is believed that the integrated approach employed in this study is widely applicable to study any primary cell type for deeper molecular characterization.

     
    more » « less
  5. null (Ed.)
    Abstract Background A few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which now contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs). These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. One application of these genomes is to provide a universal reference for database search in metaproteomic studies, when matched metagenomic/metatranscriptomic data are unavailable. However, a greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time. Methods Here, we present a new approach that uses two steps to optimize the use of the reference genomes and MAGs as the universal reference for human gut metaproteomic MS/MS data analysis. The first step is to use only the high-abundance proteins (HAPs) (i.e., ribosomal proteins and elongation factors) for metaproteomic MS/MS database search and, based on the identification results, to derive the taxonomic composition of the underlying microbial community. The second step is to expand the search database by including all proteins from identified abundant species. We call our approach HAPiID (HAPs guided metaproteomics IDentification). Results We tested our approach using human gut metaproteomic datasets from a previous study and compared it to the state-of-the-art reference database search method MetaPro-IQ for metaproteomic identification in studying human gut microbiota. Our results show that our two-steps method not only performed significantly faster but also was able to identify more peptides. We further demonstrated the application of HAPiID to revealing protein profiles of individual human-associated bacterial species, one or a few species at a time, using metaproteomic data. Conclusions The HAP guided profiling approach presents a novel effective way for constructing target database for metaproteomic data analysis. The HAPiID pipeline built upon this approach provides a universal tool for analyzing human gut-associated metaproteomic data. 
    more » « less