skip to main content


Title: Network Schema Preserving Heterogeneous Information Network Embedding

As heterogeneous networks have become increasingly ubiquitous, Heterogeneous Information Network (HIN) embedding, aiming to project nodes into a low-dimensional space while preserving the heterogeneous structure, has drawn increasing attention in recent years. Many of the existing HIN embedding methods adopt meta-path guided random walk to retain both the semantics and structural correlations between different types of nodes. However, the selection of meta-paths is still an open problem, which either depends on domain knowledge or is learned from label information. As a uniform blueprint of HIN, the network schema comprehensively embraces the high-order structure and contains rich semantics. In this paper, we make the first attempt to study network schema preserving HIN embedding, and propose a novel model named NSHE. In NSHE, a network schema sampling method is first proposed to generate sub-graphs (i.e., schema instances), and then multi-task learning task is built to preserve the heterogeneous structure of each schema instance. Besides preserving pairwise structure information, NSHE is able to retain high-order structure (i.e., network schema). Extensive experiments on three real-world datasets demonstrate that our proposed model NSHE significantly outperforms the state-of-the-art methods.

 
more » « less
Award ID(s):
1940859 2027127 2040144 2034470
NSF-PAR ID:
10215834
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Joint Conference on Artificial Intelligence (IJCAI)
Page Range / eLocation ID:
1366 to 1372
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In 2011, we proposed PathSim to systematically define and compute similarity between nodes in a heterogeneous information network (HIN), where nodes and links are from different types. In the PathSim paper, we for the first time introduced HIN with general network schema and proposed the concept of meta-paths to systematically define new relation types between nodes. In this paper, we summarize the impact of PathSim paper in both academia and industry. We start from the algorithms that are based on meta-path-based feature engineering, then move on to the recent development in heterogeneous network representation learning, including both shallow network embedding and heterogeneous graph neural networks. In the end, we make the connection between knowledge graphs and HINs and discuss the implication of meta-paths in the symbolic reasoning scenario. Finally, we point out several future directions. 
    more » « less
  2. null (Ed.)
    Scientific literature, as one of the major knowledge resources, provides abundant textual evidence that has great potential to support high-quality scientific hypothesis validation. In this paper, we study the problem of textual evidence mining in scientific literature: given a scientific hypothesis as a query triplet, find the textual evidence sentences in scientific literature that support the input query. A critical challenge for textual evidence mining in scientific literature is to retrieve high-quality textual evidence without human supervision. Because it is non-trivial to obtain a large set of human-annotated articles con-taining evidence sentences in scientific literature. To tackle this challenge, we propose EVIDENCEMINER, a high-quality textual evidence retrieval method for scientific literature without human-annotated training examples. To achieve high-quality textual evidence retrieval, we leverage heterogeneous information from both existing knowledge bases and massive unstructured text. We propose to construct a large heterogeneous information network (HIN) to build connections between the user-input queries and the candidate evidence sentences. Based on the constructed HIN, we propose a novel HIN embedding method that directly embeds the nodes onto a spherical space to improve the retrieval performance. Quantitative experiments on a huge biomedical literature corpus (over 4 million sentences) demonstrate that EVIDENCEMINER significantly outperforms baseline methods for unsupervised textual evidence retrieval. Case studies also demonstrate that our HIN construction and embedding greatly benefit many downstream applications such as textual evidence interpretation and synonym meta-pattern discovery. 
    more » « less
  3. Heterogeneous information networks (HINs) are ubiquitous in real-world applications. In the meantime, network embedding has emerged as a convenient tool to mine and learn from networked data. As a result, it is of interest to develop HIN embedding methods. However, the heterogeneity in HINs introduces not only rich information but also potentially incompatible semantics, which poses special challenges to embedding learning in HINs. With the intention to preserve the rich yet potentially incompatible information in HIN embedding, we propose to study the problem of comprehensive transcription of heterogeneous information networks. The comprehensive transcription of HINs also provides an easy-to-use approach to unleash the power of HINs, since it requires no additional supervision, expertise, or feature engineering. To cope with the challenges in the comprehensive transcription of HINs, we propose the HEER algorithm, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics. To corroborate the efficacy of HEER, we conducted experiments on two large-scale real-words datasets with an edge reconstruction task and multiple case studies. Experiment results demonstrate the effectiveness of the proposed HEER model and the utility of edge representations and heterogeneous metrics. The code and data are available at https://github.com/GentleZhu/HEER. 
    more » « less
  4. Systematic reviews (SRs) are a crucial component of evidence-based clinical practice. Unfortunately, SRs are labor-intensive and unscalable with the exponential growth in literature. Automating evidence synthesis using machine learning models has been proposed but solely focuses on the text and ignores additional features like citation information. Recent work demonstrated that citation embeddings can outperform the text itself, suggesting that better network representation may expedite SRs. Yet, how to utilize the rich information in heterogeneous information networks (HIN) for network embeddings is understudied. Existing HIN models fail to produce a high-quality embedding compared to simply running state-of-the-art homogeneous network models. To address existing HIN model limitations, we propose SR-CoMbEr, a community-based multi-view graph convolutional network for learning better embeddings for evidence synthesis. Our model automatically discovers article communities to learn robust embeddings that simultaneously encapsulate the rich semantics in HINs. We demonstrate the effectiveness of our model to automate 15 SRs. 
    more » « less
  5. In this paper, we propose a novel representation learning framework, namely HIN2Vec, for heterogeneous information networks (HINs). The core of the proposed framework is a neural network model, also called HIN2Vec, designed to capture the rich semantics embedded in HINs by exploiting different types of relationships among nodes. Given a set of relationships specified in forms of meta-paths in an HIN, HIN2Vec carries out multiple prediction training tasks jointly based on a target set of relationships to learn latent vectors of nodes and meta-paths in the HIN. In addition to model design, several issues unique to HIN2Vec, including regularization of meta-path vectors, node type selection in negative sampling, and cycles in random walks, are examined. To validate our ideas, we learn latent vectors of nodes using four large-scale real HIN datasets, including Blogcatalog, Yelp, DBLP and U.S. Patents, and use them as features for multi-label node classification and link prediction applications on those networks. Empirical results show that HIN2Vec soundly outperforms the state-of-the-art representation learning models for network data, including DeepWalk, LINE, node2vec, PTE, HINE and ESim, by 6.6% to 23.8% ofmicro-f1 in multi-label node classification and 5% to 70.8% of MAP in link prediction. 
    more » « less