skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: On the Planarity of Validated Complexes of Model Organisms in Protein-Protein Interaction Networks
Leveraging protein-protein interaction networks to identify groups of proteins and their common functionality is an important problem in bioinformatics. Systems-level analysis of protein-protein interactions is made possible through network science and modeling of high-throughput data. From these analyses, small protein complexes are traditionally represented graphically as complete graphs or dense clusters of nodes. However, there are certain graph theoretic properties that have not been extensively studied in PPI networks, especially as they pertain to cluster discovery, such as planarity. Planarity of graphs have been used to reflect the physical constraints of real-world systems outside of bioinformatics, in areas such as mapping and imaging. Here, we investigate the planarity property in network models of protein complexes. We hypothesize that complexes represented as PPI subgraphs will tend to be planar, reflecting the actual physical interface and limits of components in the complex. When testing the planarity of known complex subgraphs in S. cerevisiae and selected mammalian PPIs, we find that a majority of validated complexes possess this planar property. We discuss the biological motivation of planar versus nonplanar subgraphs, observing that planar subgraphs tend to have longer protein components. Functional classification of planar versus nonplanar complex subgraphs reveals differences in annotation of these groups relating to cellular component organization, structural molecule activity, catalytic activity, and nucleic acid binding. These results provide a new quantitative and biologically motivated measure of real protein complexes in the network model, important for the development of future complex-finding algorithms in PPIs. Accounting for this property paves the way to new means for discovering new protein complexes and uncovering the functionality of unknown or novel proteins. s  more » « less
Award ID(s):
1916084
NSF-PAR ID:
10187223
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Internal Conference on Computational Science ICCS 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background: In bioinformatics, network alignment algorithms have been applied to protein-protein interaction (PPI) networks to discover evolutionary conserved substructures at the system level. However, most previous methods aim to maximize the similarity of aligned proteins in pairwise networks, while concerning little about the feature of connectivity in these substructures, such as the protein complexes. Results: In this paper, we identify the problem of finding conserved protein complexes, which requires the aligned proteins in a PPI network to form a connected subnetwork. By taking the feature of connectivity into consideration, we propose ConnectedAlign, an efficient method to find conserved protein complexes from multiple PPI networks. The proposed method improves the coverage significantly without compromising of the consistency in the aligned results. In this way, the knowledge of protein complexes in well-studied species can be extended to that of poor-studied species. Conclusions: We conducted extensive experiments on real PPI networks of four species, including human, yeast, fruit fly and worm. The experimental results demonstrate dominant benefits of the proposed method in finding protein complexes across multiple species. 
    more » « less
  2. Abstract Protein–protein interaction (PPI) networks represent complex intra-cellular protein interactions, and the presence or absence of such interactions can lead to biological changes in an organism. Recent network-based approaches have shown that a phenotype’s PPI network’s resilience to environmental perturbations is related to its placement in the tree of life; though we still do not know how or why certain intra-cellular factors can bring about this resilience. Here, we explore the influence of gene expression and network properties on PPI networks’ resilience. We use publicly available data of PPIs for E. coli , S. cerevisiae , and H. sapiens , where we compute changes in network resilience as new nodes (proteins) are added to the networks under three node addition mechanisms—random, degree-based, and gene-expression-based attachments. By calculating the resilience of the resulting networks, we estimate the effectiveness of these node addition mechanisms. We demonstrate that adding nodes with gene-expression-based preferential attachment (as opposed to random or degree-based) preserves and can increase the original resilience of PPI network in all three species, regardless of gene expression distribution or network structure. These findings introduce a general notion of prospective resilience , which highlights the key role of network structures in understanding the evolvability of phenotypic traits. 
    more » « less
  3. Abstract

    Protein-protein interactions (PPIs) play essential roles in many biological processes. A PPI network provides crucial information on how biological pathways are structured and coordinated from individual protein functions. In the past two decades, large-scale PPI networks of a handful of organisms were determined by experimental techniques. However, these experimental methods are time-consuming, expensive, and are not easy to perform on new target organisms. Large-scale PPI data is particularly sparse in plant organisms. Here, we developed a computational approach for detecting PPIs trained and tested on known PPIs ofArabidopsis thalianaand applied to three plants,Arabidopsis thaliana,Glycine max(soybean), andZea mays(maize) to discover new PPIs on a genome-scale. Our method considers a variety of features including protein sequences, gene co-expression, functional association, and phylogenetic profiles. This is the first work where a PPI prediction method was developed for is the first PPI prediction method applied on benchmark datasets ofArabidopsis. The method showed a high prediction accuracy of over 90% and very high precision of close to 1.0. We predicted 50,220 PPIs inArabidopsis thaliana, 13,175,414 PPIs in corn, and 13,527,834 PPIs in soybean. Newly predicted PPIs were classified into three confidence levels according to the availability of existing supporting evidence and discussed. Predicted PPIs in the three plant genomes are made available for future reference.

     
    more » « less
  4. null (Ed.)
    Abstract Background The current computational methods on identifying conserved protein complexes across multiple Protein-Protein Interaction (PPI) networks suffer from the lack of explicit modeling of the desired topological properties within conserved protein complexes as well as their scalability. Results To overcome those issues, we propose a scalable algorithm—ClusterM—for identifying conserved protein complexes across multiple PPI networks through the integration of network topology and protein sequence similarity information. ClusterM overcomes the computational barrier that existed in previous methods, where the complexity escalates exponentially when handling an increasing number of PPI networks; and it is able to detect conserved protein complexes with both topological separability and cohesive protein sequence conservation. On two independent compendiums of PPI networks from Saccharomyces cerevisiae ( Sce , yeast), Drosophila melanogaster ( Dme , fruit fly), Caenorhabditis elegans ( Cel , worm), and Homo sapiens ( Hsa , human), we demonstrate that ClusterM outperforms other state-of-the-art algorithms by a significant margin and is able to identify de novo conserved protein complexes across four species that are missed by existing algorithms. Conclusions ClusterM can better capture the desired topological property of a typical conserved protein complex, which is densely connected within the complex while being well-separated from the rest of the networks. Furthermore, our experiments have shown that ClusterM is highly scalable and efficient when analyzing multiple PPI networks. 
    more » « less
  5. Abstract Motivation

    Sequence-based protein–protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information.

    Results

    We present an end-to-end framework, PIPR (Protein–Protein Interaction Prediction Based on Siamese Residual RCNN), for PPI predictions using only the protein sequences. PIPR incorporates a deep residual recurrent convolutional neural network in the Siamese architecture, which leverages both robust local features and contextualized information, which are significant for capturing the mutual influence of proteins sequences. PIPR relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that PIPR outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.

    Availability and implementation

    The implementation is available at https://github.com/muhaochen/seq_ppi.git.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less