skip to main content


Title: SR-CoMbEr: Heterogeneous Network Embedding Using Community Multi-view Enhanced Graph Convolutional Network for Automating Systematic Reviews
Systematic reviews (SRs) are a crucial component of evidence-based clinical practice. Unfortunately, SRs are labor-intensive and unscalable with the exponential growth in literature. Automating evidence synthesis using machine learning models has been proposed but solely focuses on the text and ignores additional features like citation information. Recent work demonstrated that citation embeddings can outperform the text itself, suggesting that better network representation may expedite SRs. Yet, how to utilize the rich information in heterogeneous information networks (HIN) for network embeddings is understudied. Existing HIN models fail to produce a high-quality embedding compared to simply running state-of-the-art homogeneous network models. To address existing HIN model limitations, we propose SR-CoMbEr, a community-based multi-view graph convolutional network for learning better embeddings for evidence synthesis. Our model automatically discovers article communities to learn robust embeddings that simultaneously encapsulate the rich semantics in HINs. We demonstrate the effectiveness of our model to automate 15 SRs.  more » « less
Award ID(s):
2145411 1838200
NSF-PAR ID:
10419378
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Advances in Information Retrieval: 45th European Conference on Information Retrieval
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Scientific literature, as one of the major knowledge resources, provides abundant textual evidence that has great potential to support high-quality scientific hypothesis validation. In this paper, we study the problem of textual evidence mining in scientific literature: given a scientific hypothesis as a query triplet, find the textual evidence sentences in scientific literature that support the input query. A critical challenge for textual evidence mining in scientific literature is to retrieve high-quality textual evidence without human supervision. Because it is non-trivial to obtain a large set of human-annotated articles con-taining evidence sentences in scientific literature. To tackle this challenge, we propose EVIDENCEMINER, a high-quality textual evidence retrieval method for scientific literature without human-annotated training examples. To achieve high-quality textual evidence retrieval, we leverage heterogeneous information from both existing knowledge bases and massive unstructured text. We propose to construct a large heterogeneous information network (HIN) to build connections between the user-input queries and the candidate evidence sentences. Based on the constructed HIN, we propose a novel HIN embedding method that directly embeds the nodes onto a spherical space to improve the retrieval performance. Quantitative experiments on a huge biomedical literature corpus (over 4 million sentences) demonstrate that EVIDENCEMINER significantly outperforms baseline methods for unsupervised textual evidence retrieval. Case studies also demonstrate that our HIN construction and embedding greatly benefit many downstream applications such as textual evidence interpretation and synonym meta-pattern discovery. 
    more » « less
  2. null (Ed.)

    As heterogeneous networks have become increasingly ubiquitous, Heterogeneous Information Network (HIN) embedding, aiming to project nodes into a low-dimensional space while preserving the heterogeneous structure, has drawn increasing attention in recent years. Many of the existing HIN embedding methods adopt meta-path guided random walk to retain both the semantics and structural correlations between different types of nodes. However, the selection of meta-paths is still an open problem, which either depends on domain knowledge or is learned from label information. As a uniform blueprint of HIN, the network schema comprehensively embraces the high-order structure and contains rich semantics. In this paper, we make the first attempt to study network schema preserving HIN embedding, and propose a novel model named NSHE. In NSHE, a network schema sampling method is first proposed to generate sub-graphs (i.e., schema instances), and then multi-task learning task is built to preserve the heterogeneous structure of each schema instance. Besides preserving pairwise structure information, NSHE is able to retain high-order structure (i.e., network schema). Extensive experiments on three real-world datasets demonstrate that our proposed model NSHE significantly outperforms the state-of-the-art methods.

     
    more » « less
  3. Citations of scientific papers and patents reveal the knowledge flow and usually serve as the metric for evaluating their novelty and impacts in the field. Citation Forecasting thus has various applications in the real world. Existing works on citation forecasting typically exploit the sequential properties of citation events, without exploring the citation network. In this paper, we propose to explore both the citation network and the related citation event sequences which provide valuable information for future citation forecasting. We propose a novel Citation Network and Event Sequence (CINES) Model to encode signals in the citation network and related citation event sequences into various types of embeddings for decoding to the arrivals of future citations. Moreover, we propose a temporal network attention and three alternative designs of bidirectional feature propagation to aggregate the retrospective and prospective aspects of publications in the citation network, coupled with the citation event sequence embeddings learned by a two-level attention mechanism for the citation forecasting. We evaluate our models and baselines on both a U.S. patent dataset and a DBLP dataset. Experimental results show that our models outperform the state-of-the-art methods, i.e., RMTPP, CYAN-RNN, Intensity-RNN, and PC-RNN, reducing the forecasting error by 37.76% - 75.32%. 
    more » « less
  4. null (Ed.)
    Accurate prediction of scientific impact is important for scientists, academic recommender systems, and granting organizations alike. Existing approaches rely on many years of leading citation values to predict a scientific paper’s citations (a proxy for impact), even though most papers make their largest contributions in the first few years after they are published. In this paper, we tackle a new problem: predicting a new paper’s citation time series from the date of publication (i.e., without leading values). We propose HINTS, a novel end-to-end deep learning framework that converts citation signals from dynamic heterogeneous information networks (DHIN) into citation time series. HINTS imputes pseudo-leading values for a paper in the years before it is published from DHIN embeddings, and then transforms these embeddings into the parameters of a formal model that can predict citation counts immediately after publication. Empirical analysis on two real-world datasets from Computer Science and Physics show that HINTS is competitive with baseline citation prediction models. While we focus on citations, our approach generalizes to other “cold start” time series prediction tasks where relational data is available and accurate prediction in early timestamps is crucial. 
    more » « less
  5. The automated construction of topic taxonomies can benefit numerous applications, including web search, recommendation, and knowledge discovery. One of the major advantages of automatic taxonomy construction is the ability to capture corpus-specific information and adapt to different scenarios. To better reflect the characteristics of a corpus, we take the meta-data of documents into consideration and view the corpus as a text-rich network. In this paper, we propose NetTaxo, a novel automatic topic taxonomy construction framework, which goes beyond the existing paradigm and allows text data to collaborate with network structure. Specifically, we learn term embeddings from both text and network as contexts. Network motifs are adopted to capture appropriate network contexts. We conduct an instance-level selection for motifs, which further refines term embedding according to the granularity and semantics of each taxonomy node. Clustering is then applied to obtain sub-topics under a taxonomy node. Extensive experiments on two real-world datasets demonstrate the superiority of our method over the state-of-the-art, and further verify the effectiveness and importance of instance-level motif selection. 
    more » « less