skip to main content


Title: Computational identification of protein-protein interactions in model plant proteomes
Abstract

Protein-protein interactions (PPIs) play essential roles in many biological processes. A PPI network provides crucial information on how biological pathways are structured and coordinated from individual protein functions. In the past two decades, large-scale PPI networks of a handful of organisms were determined by experimental techniques. However, these experimental methods are time-consuming, expensive, and are not easy to perform on new target organisms. Large-scale PPI data is particularly sparse in plant organisms. Here, we developed a computational approach for detecting PPIs trained and tested on known PPIs ofArabidopsis thalianaand applied to three plants,Arabidopsis thaliana,Glycine max(soybean), andZea mays(maize) to discover new PPIs on a genome-scale. Our method considers a variety of features including protein sequences, gene co-expression, functional association, and phylogenetic profiles. This is the first work where a PPI prediction method was developed for is the first PPI prediction method applied on benchmark datasets ofArabidopsis. The method showed a high prediction accuracy of over 90% and very high precision of close to 1.0. We predicted 50,220 PPIs inArabidopsis thaliana, 13,175,414 PPIs in corn, and 13,527,834 PPIs in soybean. Newly predicted PPIs were classified into three confidence levels according to the availability of existing supporting evidence and discussed. Predicted PPIs in the three plant genomes are made available for future reference.

 
more » « less
NSF-PAR ID:
10154221
Author(s) / Creator(s):
;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
9
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses.

    Results

    We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities.

    Availability and implementation

    Web-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Summary

    Computational methods to predict protein–protein interaction (PPI) typically segregate into sequence-based ‘bottom-up’ methods that infer properties from the characteristics of the individual protein sequences, or global ‘top-down’ methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms.

    Availability and implementation

    https://topsyturvy.csail.mit.edu.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract

    During the past two decades, glucosinolate (GLS) metabolic pathways have been under extensive studies because of the importance of the specialized metabolites in plant defense against herbivores and pathogens. The studies have led to a nearly complete characterization of biosynthetic genes in the reference plantArabidopsis thaliana. Before methionine incorporation into the core structure of aliphatic GLS, it undergoes chain-elongation through an iterative three-step process recruited from leucine biosynthesis. Although enzymes catalyzing each step of the reaction have been characterized, the regulatory mode is largely unknown. In this study, using three independent approaches, yeast two-hybrid (Y2H), coimmunoprecipitation (Co-IP) and bimolecular fluorescence complementation (BiFC), we uncovered the presence of protein complexes consisting of isopropylmalate isomerase (IPMI) and isopropylmalate dehydrogenase (IPMDH). In addition, simultaneous decreases in both IPMI and IPMDH activities in aleuc:ipmdh1double mutants resulted in aggregated changes of GLS profiles compared to eitherleucoripmdh1single mutants. Although the biological importance of the formation of IPMI and IPMDH protein complexes has not been documented in any organisms, these complexes may represent a new regulatory mechanism of substrate channeling in GLS and/or leucine biosynthesis. Since genes encoding the two enzymes are widely distributed in eukaryotic and prokaryotic genomes, such complexes may have universal significance in the regulation of leucine biosynthesis.

     
    more » « less
  4. Driving mechanisms of many biological functions in a cell include physical interactions of proteins. As protein-protein interactions (PPIs) are also important in disease development, protein-protein interactions are highlighted in the pharmaceutical industry as possible therapeutic targets in recent years. To understand the variety of protein-protein interactions in a proteome, it is essential to establish a method that can identify similarity and dissimilarity between protein-protein interactions for inferring the binding of similar molecules, including drugs and other proteins. In this study, we developed a novel method, protein-protein interaction-Surfer, which compares and quantifies similarity of local surface regions of protein-protein interactions. protein-protein interaction-Surfer represents a protein-protein interaction surface with overlapping surface patches, each of which is described with a three-dimensional Zernike descriptor (3DZD), a compact mathematical representation of 3D function. 3DZD captures both the 3D shape and physicochemical properties of the protein surface. The performance of protein-protein interaction-Surfer was benchmarked on datasets of protein-protein interactions, where we were able to show that protein-protein interaction-Surfer finds similar potential drug binding regions that do not share sequence and structure similarity. protein-protein interaction-Surfer is available at https://kiharalab.org/ppi-surfer . 
    more » « less
  5. Abstract

    Protein‐only RNase P (PRORP) is an essential enzyme responsible for the 5′ maturation of precursor tRNAs (pre‐tRNAs). PRORPs are classified into three categories with unique molecular architectures, although all three classes of PRORPs share a mechanism and have similar active sites. Single subunit PRORPs, like those found in plants, have multiple isoforms with different localizations, substrate specificities, and temperature sensitivities. Most recently,Arabidopsis thalianaPRORP2 was shown to interact with TRM1A and B, highlighting a new potential role between these enzymes. Work withAtPRORPs led to the development of a ribonuclease that is being used to protect against plant viruses. The mitochondrial RNase P complex, found in metazoans, consists of PRORP, TRMT10C, and SDR5C1, and has also been shown to have substrate specificity, although the cause is unknown. Mutations in mitochondrial tRNA and mitochondrial RNase P have been linked to human disease, highlighting the need to continue understanding this complex. The last class of PRORPs, homologs ofAquifexRNase P (HARPs), is found in thermophilic archaea and bacteria. This most recently discovered type of PRORP forms a large homo‐oligomer complex. Although numerous structures of HARPs have been published, it is still unclear how HARPs bind pre‐tRNAs and in what ratio. There is also little investigation into the substrate specificity and ideal conditions for HARPs. Moving forward, further work is required to fully characterize each of the three classes of PRORP, the pre‐tRNA binding recognition mechanism, the rules of substrate specificity, and how these three distinct classes of PRORP evolved.

    This article is categorized under:

    RNA Structure and Dynamics > RNA Structure, Dynamics and Chemistry

    RNA Structure and Dynamics > Influence of RNA Structure in Biological Systems

     
    more » « less