skip to main content

Title: Predicting ncRNA–protein interactions based on dual graph convolutional network and pairwise learning

Noncoding RNAs (ncRNAs) have recently attracted considerable attention due to their key roles in biology. The ncRNA–proteins interaction (NPI) is often explored to reveal some biological activities that ncRNA may affect, such as biological traits, diseases, etc. Traditional experimental methods can accomplish this work but are often labor-intensive and expensive. Machine learning and deep learning methods have achieved great success by exploiting sufficient sequence or structure information. Graph Neural Network (GNN)-based methods consider the topology in ncRNA–protein graphs and perform well on tasks like NPI prediction. Based on GNN, some pairwise constraint methods have been developed to apply on homogeneous networks, but not used for NPI prediction on heterogeneous networks. In this paper, we construct a pairwise constrained NPI predictor based on dual Graph Convolutional Network (GCN) called NPI-DGCN. To our knowledge, our method is the first to train a heterogeneous graph-based model using a pairwise learning strategy. Instead of binary classification, we use a rank layer to calculate the score of an ncRNA–protein pair. Moreover, our model is the first to predict NPIs on the ncRNA–protein bipartite graph rather than the homogeneous graph. We transform the original ncRNA–protein bipartite graph into two homogenous graphs on which to explore second-order implicit relationships. At the same time, we model direct interactions between two homogenous graphs to explore explicit relationships. Experimental results on the four standard datasets indicate that our method achieves competitive performance with other state-of-the-art methods. And the model is available at

more » « less
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Briefings in Bioinformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Network embedding has been an effective tool to analyze heterogeneous networks (HNs) by representing nodes in a low-dimensional space. Although many recent methods have been proposed for representation learning of HNs, there is still much room for improvement. Random walks based methods are currently popular methods to learn network embedding; however, they are random and limited by the length of sampled walks, and have difculty capturing network structural information. Some recent researches proposed using meta paths to express the sample relationship in HNs. Another popular graph learning model, the graph convolutional network (GCN) is known to be capable of better exploitation of network topology, but the current design of GCN is intended for homogenous networks. This paper proposes a novel combination of meta-graph and graph convolution, the meta-graph based graph convolutional networks (MGCN). To fully capture the complex long semantic information, MGCN utilizes different meta-graphs in HNs. As different meta-graphs express different semantic relationships, MGCN learns the weights of different meta-graphs to make up for the loss of semantics when applying GCN. In addition, we improve the current convolution design by adding node self-signicance. To validate our model in learning feature representation, we present comprehensive experiments on four real-world datasets and two representation tasks: classication and link prediction. WMGCN's representations can improve accuracy scores by up to around 10% in comparison to other popular representation learning models. What's more, WMGCN'feature learning outperforms other popular baselines. The experimental results clearly show our model is superior over other state-of-the-art representation learning algorithms. 
    more » « less
  2. null (Ed.)
    Abstract Molecular interaction networks are powerful resources for molecular discovery. They are increasingly used with machine learning methods to predict biologically meaningful interactions. While deep learning on graphs has dramatically advanced the prediction prowess, current graph neural network (GNN) methods are mainly optimized for prediction on the basis of direct similarity between interacting nodes. In biological networks, however, similarity between nodes that do not directly interact has proved incredibly useful in the last decade across a variety of interaction networks. Here, we present SkipGNN, a graph neural network approach for the prediction of molecular interactions. SkipGNN predicts molecular interactions by not only aggregating information from direct interactions but also from second-order interactions, which we call skip similarity. In contrast to existing GNNs, SkipGNN receives neural messages from two-hop neighbors as well as immediate neighbors in the interaction network and non-linearly transforms the messages to obtain useful information for prediction. To inject skip similarity into a GNN, we construct a modified version of the original network, called the skip graph. We then develop an iterative fusion scheme that optimizes a GNN using both the skip graph and the original graph. Experiments on four interaction networks, including drug–drug, drug–target, protein–protein, and gene–disease interactions, show that SkipGNN achieves superior and robust performance. Furthermore, we show that unlike popular GNNs, SkipGNN learns biologically meaningful embeddings and performs especially well on noisy, incomplete interaction networks. 
    more » « less
  3. Graphs are powerful representations for relations among objects, which have attracted plenty of attention in both academia and industry. A fundamental challenge for graph learning is how to train an effective Graph Neural Network (GNN) encoder without labels, which are expensive and time consuming to obtain. Contrastive Learning (CL) is one of the most popular paradigms to address this challenge, which trains GNNs by discriminating positive and negative node pairs. Despite the success of recent CL methods, there are still two under-explored problems. Firstly, how to reduce the semantic error introduced by random topology based data augmentations. Traditional CL defines positive and negative node pairs via the node-level topological proximity, which is solely based on the graph topology regardless of the semantic information of node attributes, and thus some semantically similar nodes could be wrongly treated as negative pairs. Secondly, how to effectively model the multiplexity of the real-world graphs, where nodes are connected by various relations and each relation could form a homogeneous graph layer. To solve these problems, we propose a novel multiplex heterogeneous graph prototypical contrastive leaning (X-GOAL) framework to extract node embeddings. X-GOAL is comprised of two components: the GOAL framework, which learns node embeddings for each homogeneous graph layer, and an alignment regularization, which jointly models different layers by aligning layer-specific node embeddings. Specifically, the GOAL framework captures the node-level information by a succinct graph transformation technique, and captures the cluster-level information by pulling nodes within the same semantic cluster closer in the embedding space. The alignment regularization aligns embeddings across layers at both node level and cluster level. We evaluate the proposed X-GOAL on a variety of real-world datasets and downstream tasks to demonstrate the effectiveness of the X-GOAL framework. 
    more » « less
  4. Transparency and accountability have become major concerns for black-box machine learning (ML) models. Proper explanations for the model behavior increase model transparency and help researchers develop more accountable models. Graph neural networks (GNN) have recently shown superior performance in many graph ML problems than traditional methods, and explaining them has attracted increased interest. However, GNN explanation for link prediction (LP) is lacking in the literature. LP is an essential GNN task and corresponds to web applications like recommendation and sponsored search on web. Given existing GNN explanation methods only address node/graph-level tasks, we propose Path-based GNN Explanation for heterogeneous Link prediction (PaGE-Link) that generates explanations with connection interpretability, enjoys model scalability, and handles graph heterogeneity. Qualitatively, PaGE-Link can generate explanations as paths connecting a node pair, which naturally captures connections between the two nodes and easily transfer to human-interpretable explanations. Quantitatively, explanations generated by PaGE-Link improve AUC for recommendation on citation and user-item graphs by 9 - 35% and are chosen as better by 78.79% of responses in human evaluation. 
    more » « less
  5. Abstract

    Predicting residue‐residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance‐driven template‐free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template‐free and template‐based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue‐residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template‐based modeling targets. Deep learning also successfully integrated one‐dimensional structural features, two‐dimensional contact information, and three‐dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.

    more » « less