Revisiting Citation Prediction with Cluster-Aware Text-Enhanced Heterogeneous Graph Neural Networks

Yang, Carl; Han, Jiawei

doi:10.1109/ICDE55515.2023.00058

Citation Details

Revisiting Citation Prediction with Cluster-Aware Text-Enhanced Heterogeneous Graph Neural Networks

Numerous papers get published all the time. However, some papers are born to be well-cited while others are not. In this work, we revisit the important problem of citation prediction, by focusing on the important yet realistic prediction on the average number of citations a paper will attract per year. The task is nonetheless challenging because many correlated factors underlie the potential impact of a paper, such as the prestige of its authors, the authority of its publishing venue, and the significance of the problems/techniques/applications it studies. To jointly model these factors, we propose to construct a heterogeneous publication network of nodes including papers, authors, venues, and terms. Moreover, we devise a novel heterogeneous graph neural network (HGN) to jointly embed all types of nodes and links, towards the modeling of research impact and its propagation. Beyond graph heterogeneity, we find it also important to consider the latent research domains, because the same nodes can have different impacts within different communities. Therefore, we further devise a novel cluster-aware (CA) module, which models all nodes and their interactions under the proper contexts of research domains. Finally, to exploit the information-rich texts associated with papers, we devise a novel text-enhancing (TE) module for automatic quality term mining. With the real-world publication data of DBLP, we construct three different networks and conduct comprehensive experiments to evaluate our proposed CATE-HGN framework, against various state-of-the-art models. Rich quantitative results and qualitative case studies demonstrate the superiority of CATEHGN in citation prediction on publication networks, and indicate its general advantages in various relevant downstream tasks on text-rich heterogeneous networks. more »

Award ID(s):: 1956151 1741317 1704532

PAR ID:: 10467076

Author(s) / Creator(s):: Yang, Carl; Han, Jiawei

Editor(s):: Proc. of 2023 IEEE 39th International Conference on Data Engineering

Publisher / Repository:: IEEE

Date Published:: 2023-04-01

Edition / Version:: 1

ISBN:: 979-8-3503-2227-9

Page Range / eLocation ID:: 682 to 695

Subject(s) / Keyword(s):: Citation Prediction, Cluster-Aware Text-Enhanced Heterogeneous Graph Neural Networks, Graph Neural Networks, Text mining

Format(s):: Medium: X

Location:: Anaheim, CA, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICDE55515.2023.00058

More Like this