skip to main content


Title: An in silico proteomics screen to predict and prioritize protein–protein interactions dependent on post-translationally modified motifs
Abstract Motivation

The development of proteomic methods for the characterization of domain/motif interactions has greatly expanded our understanding of signal transduction. However, proteomics-based binding screens have limitations including that the queried tissue or cell type may not harbor all potential interacting partners or post-translational modifications (PTMs) required for the interaction. Therefore, we sought a generalizable, complementary in silico approach to identify potentially novel motif and PTM-dependent binding partners of high priority.

Results

We used as an initial example the interaction between the Src homology 2 (SH2) domains of the adaptor proteins CT10 regulator of kinase (CRK) and CRK-like (CRKL) and phosphorylated-YXXP motifs. Employing well-curated, publicly-available resources, we scored and prioritized potential CRK/CRKL–SH2 interactors possessing signature characteristics of known interacting partners. Our approach gave high priority scores to 102 of the >9000 YXXP motif-containing proteins. Within this 102 were 21 of the 25 curated CRK/CRKL–SH2-binding partners showing a more than 80-fold enrichment. Several predicted interactors were validated biochemically. To demonstrate generalized applicability, we used our workflow to predict protein–protein interactions dependent upon motif-specific arginine methylation. Our data demonstrate the applicability of our approach to, conceivably, any modular binding domain that recognizes a specific post-translationally modified motif.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
Award ID(s):
1656510
NSF-PAR ID:
10393404
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
34
Issue:
22
ISSN:
1367-4803
Page Range / eLocation ID:
p. 3898-3906
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Due to the nature of experimental annotation, most protein function prediction methods operate at the protein-level, where functions are assigned to full-length proteins based on overall similarities. However, most proteins function by interacting with other proteins or molecules, and many functional associations should be limited to specific regions rather than the entire protein length. Most domain-centric function prediction methods depend on accurate domain family assignments to infer relationships between domains and functions, with regions that are unassigned to a known domain-family left out of functional evaluation. Given the abundance of residue-level annotations currently available, we present a function prediction methodology that automatically infers function labels of specific protein regions using protein-level annotations and multiple types of region-specific features.

    Results

    We apply this method to local features obtained from InterPro, UniProtKB and amino acid sequences and show that this method improves both the accuracy and region-specificity of protein function transfer and prediction. We compare region-level predictive performance of our method against that of a whole-protein baseline method using proteins with structurally verified binding sites and also compare protein-level temporal holdout predictive performances to expand the variety and specificity of GO terms we could evaluate. Our results can also serve as a starting point to categorize GO terms into region-specific and whole-protein terms and select prediction methods for different classes of GO terms.

    Availability and implementation

    The code and features are freely available at: https://github.com/ek1203/rsfp.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Motivation

    metal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal-binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal-binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability.

    Results

    we developed a novel machine learning-based method, mebipred, for identifying metal-binding proteins from sequence-derived features. This method is over 80% accurate in recognizing proteins that bind metal ion-containing ligands; the specific identity of 11 ubiquitously present metal ions can also be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and is thus faster than alignment-based methods; it is also more accurate than other sequence-based prediction methods. Additionally, mebipred can identify protein metal-binding capabilities from short sequence stretches, e.g. translated sequencing reads, and, thus, may be useful for the annotation of metal requirements of metagenomic samples. We performed an analysis of available microbiome data and found that ocean, hot spring sediments and soil microbiomes use a more diverse set of metals than human host-related ones. For human microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal-binding proteins. These results highlight mebipred’s utility in analyzing microbiome metal requirements.

    Availability and implementation

    mebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Human RNA‐binding motif 3 protein (RBM3) is a cold‐shock protein which functions in various aspects of global protein synthesis, cell proliferation and apoptosis by interacting with the components of basal translational machinery. RBM3 plays important roles in tumour progression and cancer metastasis, and also has been shown to be involved in neuroprotection and endoplasmic reticulum stress response. Here, we have solved the solution NMR structure of the N‐terminal 84 residue RNA recognition motif (RRM) of RBM3. The remaining residues are rich in RGG and YGG motifs and are disordered. The RRM domain adopts a βαββαβ topology, which is found in many RNA‐binding proteins. NMR‐monitored titration experiments and molecular dynamic simulations show that the beta‐sheet and two loops form the RNA‐binding interface. Hydrogen bond, pi–pi and pi–cation are the key interactions between the RNA and the RRM domain. NMR, size exclusion chromatography and chemical cross‐linking experiments show that RBM3 forms oligomers in solution, which is favoured by decrease in temperature, thus, potentially linking it to its function as a cold‐shock protein. Temperature‐dependent NMR studies revealed that oligomerization of the RRM domain occurs via nonspecific interactions. Overall, this study provides the detailed structural analysis of RRM domain of RBM3, its interaction with RNA and the molecular basis of its temperature‐dependent oligomerization.

     
    more » « less
  4. CRK adaptor proteins are important for signal transduction mechanisms driving cell proliferation and positioning during vertebrate central nervous system development. Zebrafish lacking both CRK family members exhibit small, disorganized retinas with 50% penetrance. The goal of this study was to determine whether another adaptor protein might functionally compensate for the loss of CRK adaptors. Expression patterns in developing zebrafish, and bioinformatic analyses of the motifs recognized by their SH2 and SH3 domains, suggest NCK adaptors are well‐positioned to compensate for loss of CRK adaptors. In support of this hypothesis, proteomic analyses found CRK and NCK adaptors share overlapping interacting partners including known regulators of cell adhesion and migration, suggesting their functional intersection in neurodevelopment.

     
    more » « less
  5. Abstract Motivation

    Accurate predictions of protein-binding residues (PBRs) enhances understanding of molecular-level rules governing protein–protein interactions, helps protein–protein docking and facilitates annotation of protein functions. Recent studies show that current sequence-based predictors of PBRs severely cross-predict residues that interact with other types of protein partners (e.g. RNA and DNA) as PBRs. Moreover, these methods are relatively slow, prohibiting genome-scale use.

    Results

    We propose a novel, accurate and fast sequence-based predictor of PBRs that minimizes the cross-predictions. Our SCRIBER (SeleCtive pRoteIn-Binding rEsidue pRedictor) method takes advantage of three innovations: comprehensive dataset that covers multiple types of binding residues, novel types of inputs that are relevant to the prediction of PBRs, and an architecture that is tailored to reduce the cross-predictions. The dataset includes complete protein chains and offers improved coverage of binding annotations that are transferred from multiple protein–protein complexes. We utilize innovative two-layer architecture where the first layer generates a prediction of protein-binding, RNA-binding, DNA-binding and small ligand-binding residues. The second layer re-predicts PBRs by reducing overlap between PBRs and the other types of binding residues produced in the first layer. Empirical tests on an independent test dataset reveal that SCRIBER significantly outperforms current predictors and that all three innovations contribute to its high predictive performance. SCRIBER reduces cross-predictions by between 41% and 69% and our conservative estimates show that it is at least 3 times faster. We provide putative PBRs produced by SCRIBER for the entire human proteome and use these results to hypothesize that about 14% of currently known human protein domains bind proteins.

    Availability and implementation

    SCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less