skip to main content


Title: HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning
In this paper, we propose a novel representation learning framework, namely HIN2Vec, for heterogeneous information networks (HINs). The core of the proposed framework is a neural network model, also called HIN2Vec, designed to capture the rich semantics embedded in HINs by exploiting different types of relationships among nodes. Given a set of relationships specified in forms of meta-paths in an HIN, HIN2Vec carries out multiple prediction training tasks jointly based on a target set of relationships to learn latent vectors of nodes and meta-paths in the HIN. In addition to model design, several issues unique to HIN2Vec, including regularization of meta-path vectors, node type selection in negative sampling, and cycles in random walks, are examined. To validate our ideas, we learn latent vectors of nodes using four large-scale real HIN datasets, including Blogcatalog, Yelp, DBLP and U.S. Patents, and use them as features for multi-label node classification and link prediction applications on those networks. Empirical results show that HIN2Vec soundly outperforms the state-of-the-art representation learning models for network data, including DeepWalk, LINE, node2vec, PTE, HINE and ESim, by 6.6% to 23.8% ofmicro-f1 in multi-label node classification and 5% to 70.8% of MAP in link prediction.  more » « less
Award ID(s):
1717084
NSF-PAR ID:
10065397
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
Page Range / eLocation ID:
1797 to 1806
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Network embedding has been an effective tool to analyze heterogeneous networks (HNs) by representing nodes in a low-dimensional space. Although many recent methods have been proposed for representation learning of HNs, there is still much room for improvement. Random walks based methods are currently popular methods to learn network embedding; however, they are random and limited by the length of sampled walks, and have difculty capturing network structural information. Some recent researches proposed using meta paths to express the sample relationship in HNs. Another popular graph learning model, the graph convolutional network (GCN) is known to be capable of better exploitation of network topology, but the current design of GCN is intended for homogenous networks. This paper proposes a novel combination of meta-graph and graph convolution, the meta-graph based graph convolutional networks (MGCN). To fully capture the complex long semantic information, MGCN utilizes different meta-graphs in HNs. As different meta-graphs express different semantic relationships, MGCN learns the weights of different meta-graphs to make up for the loss of semantics when applying GCN. In addition, we improve the current convolution design by adding node self-signicance. To validate our model in learning feature representation, we present comprehensive experiments on four real-world datasets and two representation tasks: classication and link prediction. WMGCN's representations can improve accuracy scores by up to around 10% in comparison to other popular representation learning models. What's more, WMGCN'feature learning outperforms other popular baselines. The experimental results clearly show our model is superior over other state-of-the-art representation learning algorithms. 
    more » « less
  2. In 2011, we proposed PathSim to systematically define and compute similarity between nodes in a heterogeneous information network (HIN), where nodes and links are from different types. In the PathSim paper, we for the first time introduced HIN with general network schema and proposed the concept of meta-paths to systematically define new relation types between nodes. In this paper, we summarize the impact of PathSim paper in both academia and industry. We start from the algorithms that are based on meta-path-based feature engineering, then move on to the recent development in heterogeneous network representation learning, including both shallow network embedding and heterogeneous graph neural networks. In the end, we make the connection between knowledge graphs and HINs and discuss the implication of meta-paths in the symbolic reasoning scenario. Finally, we point out several future directions. 
    more » « less
  3. Attributed network embedding aims to learn lowdimensional vector representations for nodes in a network, where each node contains rich attributes/features describing node content. Because network topology structure and node attributes often exhibit high correlation, incorporating node attribute proximity into network embedding is beneficial for learning good vector representations. In reality, large-scale networks often have incomplete/missing node content or linkages, yet existing attributed network embedding algorithms all operate under the assumption that networks are complete. Thus, their performance is vulnerable to missing data and suffers from poor scalability. In this paper, we propose a Scalable Incomplete Network Embedding (SINE) algorithm for learning node representations from incomplete graphs. SINE formulates a probabilistic learning framework that separately models pairs of node-context and node-attribute relationships. Different from existing attributed network embedding algorithms, SINE provides greater flexibility to make the best of useful information and mitigate negative effects of missing information on representation learning. A stochastic gradient descent based online algorithm is derived to learn node representations, allowing SINE to scale up to large-scale networks with high learning efficiency. We evaluate the effectiveness and efficiency of SINE through extensive experiments on real-world networks. Experimental results confirm that SINE outperforms state-of-the-art baselines in various tasks, including node classification, node clustering, and link prediction, under settings with missing links and node attributes. SINE is also shown to be scalable and efficient on large-scale networks with millions of nodes/edges and high-dimensional node features. 
    more » « less
  4. Heterogeneous Information Network (HIN), where nodes and their attributes denote real-world entities and links encode relationships between entities, are ubiquitous in many applications. The presence of multiple types of nodes and links pose significant challenges to the state-of-the-art methods for learning node embeddings from heterogeneous graphs. To address these challenges, we consider three variants of graph variational autoencoder models for heterogeneous networks that avoid the computationally expensive sampling of meta-paths. The proposed methods also maintain uncertainty estimates of node embeddings that help improve generalization performance. We report the results of experiments on link prediction using three different real-world heterogeneous network benchmark data sets that show that the proposed methods significantly outperform state-of-the-art baselines. 
    more » « less
  5. Network embedding has gained more attentions in recent years. It has been shown that the learned low-dimensional node vector representations could advance a myriad of graph mining tasks such as node classification, community detection, and link prediction. A vast majority of the existing efforts are overwhelmingly devoted to single-layered networks or homogeneous networks with a single type of nodes and node interactions. However, in many real-world applications, a variety of networks could be abstracted and presented in a multilayered fashion. Typical multi-layered networks include critical infrastructure systems, collaboration platforms, social recommender systems, to name a few. Despite the widespread use of multi-layered networks, it remains a daunting task to learn vector representations of different types of nodes due to the bewildering combination of both within-layer connections and cross-layer network dependencies. In this paper, we study a novel problem of multi-layered network embedding. In particular, we propose a principled framework – MANE to model both within-layer connections and cross-layer network dependencies simultaneously in a unified optimization framework for embedding representation learning. Experiments on real-world multi-layered networks corroborate the effectiveness of the proposed framework. 
    more » « less