skip to main content


Search for: All records

Award ID contains: 1815256

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Network embedding has been an effective tool to analyze heterogeneous networks (HNs) by representing nodes in a low-dimensional space. Although many recent methods have been proposed for representation learning of HNs, there is still much room for improvement. Random walks based methods are currently popular methods to learn network embedding; however, they are random and limited by the length of sampled walks, and have difculty capturing network structural information. Some recent researches proposed using meta paths to express the sample relationship in HNs. Another popular graph learning model, the graph convolutional network (GCN) is known to be capable of better exploitation of network topology, but the current design of GCN is intended for homogenous networks. This paper proposes a novel combination of meta-graph and graph convolution, the meta-graph based graph convolutional networks (MGCN). To fully capture the complex long semantic information, MGCN utilizes different meta-graphs in HNs. As different meta-graphs express different semantic relationships, MGCN learns the weights of different meta-graphs to make up for the loss of semantics when applying GCN. In addition, we improve the current convolution design by adding node self-signicance. To validate our model in learning feature representation, we present comprehensive experiments on four real-world datasets and two representation tasks: classication and link prediction. WMGCN's representations can improve accuracy scores by up to around 10% in comparison to other popular representation learning models. What's more, WMGCN'feature learning outperforms other popular baselines. The experimental results clearly show our model is superior over other state-of-the-art representation learning algorithms. 
    more » « less
  2. Research of Protein-Protein Interaction (PPI) Network Alignment is playing an important role in understanding the crucial underlying biological knowledge such as functionally homologous proteins and conserved evolutionary pathways across different species. Existing methods of PPI network alignment often try to improve the coverage ratio of the alignment result by aligning all proteins from different species. However, there is a fundamental biological premise that needs to be considered carefully: not every protein in a species can, nor should, find its homologous proteins in other species. In this work, we propose a novel alignment method to map only those proteins with the most similarity throughout the PPI networks of multiple species. For the similarity features of the protein in the networks, we integrate both topological features with biological characteristics to provide enhanced supports for the alignment procedures. For topological features, we apply a representation learning method on the networks that can generate a low dimensional vector embedding with its surrounding structural features for each protein. The topological similarity of proteins from different PPI networks can thus be transferred as the similarity of their corresponding vector representations, which provides a new way to comprehensively quantify the topological similarities between proteins. We also propose a new measure for the topological evaluation of the alignment results which better uncover the structural quality of the alignment across multiple networks. Both biological and topological evaluations on the alignment results of real datasets demonstrate our approach is promising and preferable against previous multiple alignment methods 
    more » « less
  3. Quantifying the similarities between diseases is now playing an important role in biology and medicine, which provides reliable reference information in finding similar diseases. Most of the previous methods for similarity calculation between diseases either use a single-source data or do not fully utilize multi-sources data. In this study, we propose an approach to measure disease similarity by utilizing multiple heterogeneous disease information networks. Firstly, multiple disease-related data sources are formulated as heterogeneous disease information networks which include various types of objects such as disease, pathway, and chemicals. Then, the corresponding subgraphs of these heterogeneous disease information networks are obtained by filtering vertices. Topological scores and semantics scores are calculated in these heterogenous subgraphs using Dynamic Time Warping (DTW) algorithm and meta path method respectively. In this way, we transform multiple heterogeneous disease networks to a homogeneous disease network with different weights on the edges. Finally, the disease nodes can be embedded according to the weights and the similarity between diseases can then be calculated using these n-dimensional vectors. Experiments based on benchmark set fully demonstrate the effectiveness of our method in measuring the similarity of diseases through multisources data. Index Terms 
    more » « less
  4. Subgraph matching query is to find out the subgraphs of data graph G which match a given query graph Q. Traditional methods can not deal with big data graphs due to their high computational complex. In this paper, we propose a distributed top-k subgraph search method over big graphs. The proposed method is designed at the level of single vertex and all vertices obtain their matching state separately without requiring global graph information. Therefore, it can be easily deployed in distributed platform like Hadoop. The evaluations of running time, number of messages and supersteps show the efficiency and scalability of the proposed method. 
    more » « less
  5. Protein-protein interaction (PPI) network alignment has been motivating researches for the comprehension of the underlying crucial biological knowledge, such as conserved evolutionary pathways and functionally conserved proteins throughout different species. Existing PPI network alignment methods have tried to improve the coverage ratio by aligning all proteins from different species. However, there is a fundamental biological justification needed to be acknowledged, that not every protein in a species can, nor should, find homologous proteins in other species. In this paper, we propose a novel approach for multiple PPI network alignment that tries to align only those proteins with the most similarities. To provide more comprehensive supports in computing the similarity, we integrate structural features of the networks together with biological characteristics during the alignment. For the structural features, we apply on PPI networks a representation learning method, which creates a low-dimensional vector embedding with the surrounding topologies of each protein in the network. This approach quantifies the structural features, and provides a new way to determine the topological similarity of the networks by transferring which as calculations in vector similarities. We also propose a new metric for the topological evaluation which can better assess the topological quality of the alignment results across different networks. Both biological and topological evaluations demonstrate our approach is promising and preferable against previous multiple alignment methods. 
    more » « less