Abstract Arabidopsis (Arabidopsis thaliana) ecotype Col-0 has plastid and mitochondrial genomes encoding over 100 proteins. Public databases (e.g. Araport11) have redundancy and discrepancies in gene identifiers for these organelle-encoded proteins. RNA editing results in changes to specific amino acid residues or creation of start and stop codons for many of these proteins, but the impact of RNA editing at the protein level is largely unexplored due to the complexities of detection. Here, we assembled the nonredundant set of identifiers, their correct protein sequences, and 452 predicted nonsynonymous editing sites of which 56 are edited at lower frequency. We then determined accumulation of edited and/or unedited proteoforms by searching ∼259 million raw tandem MS spectra from ProteomeXchange, which is part of PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/). We identified all mitochondrial proteins and all except 3 plastid-encoded proteins (NdhG/Ndh6, PsbM, and Rps16), but no proteins predicted from the 4 ORFs were identified. We suggest that Rps16 and 3 of the ORFs are pseudogenes. Detection frequencies for each edit site and type of edit (e.g. S to L/F) were determined at the protein level, cross-referenced against the metadata (e.g. tissue), and evaluated for technical detection challenges. We detected 167 predicted edit sites at the proteome level. Minor frequency sites were edited at low frequency at the protein level except for cytochrome C biogenesis 382 at residue 124 (Ccb382-124). Major frequency sites (>50% editing of RNA) only accumulated in edited form (>98% to 100% edited) at the protein level, with the exception of Rpl5-22. We conclude that RNA editing for major editing sites is required for stable protein accumulation.
more »
« less
The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource
Abstract We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of ∼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying ∼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.
more »
« less
- Award ID(s):
- 1922871
- PAR ID:
- 10348226
- Date Published:
- Journal Name:
- The Plant Cell
- Volume:
- 33
- Issue:
- 11
- ISSN:
- 1040-4651
- Page Range / eLocation ID:
- 3421 to 3453
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract SPINDLY (SPY) is a novel nucleocytoplasmic protein O-fucosyltransferase that regulates target protein activity or stability via O-fucosylation of specific Ser/Thr residues. Previous genetic studies indicate that AtSPY regulates plant development during vegetative and reproductive growth by modulating gibberellin and cytokinin responses. AtSPY also regulates the circadian clock and plant responses to biotic and abiotic stresses. The pleiotropic phenotypes of spy mutants point to the likely role of AtSPY in regulating key proteins functioning in diverse cellular pathways. However, very few AtSPY targets are known. Here, we identified 88 SPY targets from Arabidopsis (Arabidopsis thaliana) and Nicotiana benthamiana via the purification of O-fucosylated peptides using Aleuria aurantia lectin followed by electron transfer dissociation-MS/MS analysis. Most AtSPY targets were nuclear proteins that function in DNA repair, transcription, RNA splicing, and nucleocytoplasmic transport. Cytoplasmic AtSPY targets were involved in microtubule-mediated cell division/growth and protein folding. A comparison with the published O-linked-N-acetylglucosamine (O-GlcNAc) proteome revealed that 30% of AtSPY targets were also O-GlcNAcylated, indicating that these distinct glycosylations could co-regulate many protein functions. This study unveiled the roles of O-fucosylation in modulating many key nuclear and cytoplasmic proteins and provided a valuable resource for elucidating the regulatory mechanisms involved.more » « less
-
Abstract MotivationTandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification. ResultsWe evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%–2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%–15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%–12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder’s potential to enhance peptide identification for proteomic data analyses. Availability and ImplementationThe source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu.more » « less
-
Abstract Protein activity, abundance, and stability can be regulated by post-translational modification including ubiquitination. Ubiquitination is conserved among eukaryotes and plays a central role in modulating cellular function; yet, we lack comprehensive catalogs of proteins that are modified by ubiquitin in plants. In this study, we describe an antibody-based approach to enrich ubiquitinated peptides coupled with isobaric labeling to enable quantification of up to 18-multiplexed samples. This approach identified 17,940 ubiquitinated lysine sites arising from 6,453 proteins from Arabidopsis (Arabidopsis thaliana) primary roots, seedlings, and rosette leaves. Gene ontology analysis indicated that ubiquitinated proteins are associated with numerous biological processes including hormone signaling, plant defense, protein homeostasis, and metabolism. We determined ubiquitinated lysine residues that directly regulate the stability of three transcription factors, CRYPTOCHROME-INTERACTING BASIC-HELIX-LOOP-HELIX 1 (CIB1), CIB1 LIKE PROTEIN 2 (CIL2), and SENSITIVE TO PROTON RHIZOTOXICITY1 (STOP1) using in vivo degradation assays. Furthermore, codon mutation of CIB1 to create a K166R conversion to prevent ubiquitination, via CRISPR/Cas9-derived adenosine base editing, led to an early flowering phenotype and increased expression of FLOWERING LOCUS T (FT). These comprehensive site-level ubiquitinome profiles provide a wealth of data for future functional studies related to modulation of biological processes mediated by this post-translational modification in plants.more » « less
-
Abstract Training machine learning models for tasks such asde novosequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is based on a previously described benchmark but has been re-processed to ensure consistent data quality and enforce separation of training and test peptides.more » « less
An official website of the United States government

