skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scholar2vec: Vector Representation of Scholars for Lifetime Collaborator Prediction
While scientific collaboration is critical for a scholar, some collaborators can be more significant than others, e.g., lifetime collaborators. It has been shown that lifetime collaborators are more influential on a scholar’s academic performance. However, little research has been done on investigating predicting such special relationships in academic networks. To this end, we propose Scholar2vec, a novel neural network embedding for representing scholar profiles. First, our approach creates scholars’ research interest vector from textual information, such as demographics, research, and influence. After bridging research interests with a collaboration network, vector representations of scholars can be gained with graph learning. Meanwhile, since scholars are occupied with various attributes, we propose to incorporate four types of scholar attributes for learning scholar vectors. Finally, the early-stage similarity sequence based on Scholar2vec is used to predict lifetime collaborators with machine learning methods. Extensive experiments on two real-world datasets show that Scholar2vec outperforms state-of-the-art methods in lifetime collaborator prediction. Our work presents a new way to measure the similarity between two scholars by vector representation, which tackles the knowledge between network embedding and academic relationship mining.  more » « less
Award ID(s):
1651203 1947135
PAR ID:
10232374
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Knowledge Discovery from Data
Volume:
15
Issue:
3
ISSN:
1556-4681
Page Range / eLocation ID:
1 to 19
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Networks have been widely used to represent the relations between objects such as academic networks and social networks, and learning embedding for networks has thus garnered plenty of research attention. Self-supervised network representation learning aims at extracting node embedding without external supervision. Recently, maximizing the mutual information between the local node embedding and the global summary (e.g. Deep Graph Infomax, or DGI for short) has shown promising results on many downstream tasks such as node classification. However, there are two major limitations of DGI. Firstly, DGI merely considers the extrinsic supervision signal (i.e., the mutual information between node embedding and global summary) while ignores the intrinsic signal (i.e., the mutual dependence between node embedding and node attributes). Secondly, nodes in a real-world network are usually connected by multiple edges with different relations, while DGI does not fully explore the various relations among nodes. To address the above-mentioned problems, we propose a novel framework, called High-order Deep Multiplex Infomax (HDMI), for learning node embedding on multiplex networks in a self-supervised way. To be more specific, we first design a joint supervision signal containing both extrinsic and intrinsic mutual information by high-order mutual information, and we propose a High- order Deep Infomax (HDI) to optimize the proposed supervision signal. Then we propose an attention based fusion module to combine node embedding from different layers of the multiplex network. Finally, we evaluate the proposed HDMI on various downstream tasks such as unsupervised clustering and supervised classification. The experimental results show that HDMI achieves state-of-the-art performance on these tasks. 
    more » « less
  2. Abstract Collaboration is a key driver of science and innovation. Mainly motivated by the need to leverage different capacities and expertise to solve a scientific problem, collaboration is also an excellent source of information about the future behavior of scholars. In particular, it allows us to infer the likelihood that scientists choose future research directions via the intertwined mechanisms of selection and social influence. Here we thoroughly investigate the interplay between collaboration and topic switches. We find that the probability for a scholar to start working on a new topic increases with the number of previous collaborators, with a pattern showing that the effects of individual collaborators are not independent. The higher the productivity and the impact of authors, the more likely their coworkers will start working on new topics. The average number of coauthors per paper is also inversely related to the topic switch probability, suggesting a dilution of this effect as the number of collaborators increases. 
    more » « less
  3. null (Ed.)
    Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned continuous vector representations are inefficient for large-scale similarity search, which often involves finding nearest neighbors measured by distance or similarity in a continuous vector space. In this article, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations using a stochastic gradient descent-based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support faster node similarity search than using Euclidean or other distance measures. Extensive experiments and comparisons demonstrate that BinaryNE not only delivers more than 25 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods. The binary codes learned by BinaryNE also render competitive performance on node classification and node clustering tasks. The source code of the BinaryNE algorithm is available at https://github.com/daokunzhang/BinaryNE. 
    more » « less
  4. Protein-protein interaction (PPI) network alignment has been motivating researches for the comprehension of the underlying crucial biological knowledge, such as conserved evolutionary pathways and functionally conserved proteins throughout different species. Existing PPI network alignment methods have tried to improve the coverage ratio by aligning all proteins from different species. However, there is a fundamental biological justification needed to be acknowledged, that not every protein in a species can, nor should, find homologous proteins in other species. In this paper, we propose a novel approach for multiple PPI network alignment that tries to align only those proteins with the most similarities. To provide more comprehensive supports in computing the similarity, we integrate structural features of the networks together with biological characteristics during the alignment. For the structural features, we apply on PPI networks a representation learning method, which creates a low-dimensional vector embedding with the surrounding topologies of each protein in the network. This approach quantifies the structural features, and provides a new way to determine the topological similarity of the networks by transferring which as calculations in vector similarities. We also propose a new metric for the topological evaluation which can better assess the topological quality of the alignment results across different networks. Both biological and topological evaluations demonstrate our approach is promising and preferable against previous multiple alignment methods. 
    more » « less
  5. Abstract—Protein-protein interaction (PPI) network alignment has been motivating researches for the comprehension of the underlying crucial biological knowledge, such as conserved evolutionary pathways and functionally conserved proteins throughout different species. Existing PPI network alignment methods have tried to improve the coverage ratio by aligning all proteins from different species. However, there is a fundamental biological justification needed to be acknowledged, that not every protein in a species can, nor should, find homologous proteins in other species. In this paper, we propose a novel approach for multiple PPI network alignment that tries to align only those proteins with the most similarities. To provide more comprehensive supports in computing the similarity, we integrate structural features of the networks together with biological characteristics during the alignment. For the structural features, we apply on PPI networks a representation learning method, which creates a low-dimensional vector embedding with the surrounding topologies of each protein in the network. This approach quantifies the structural features, and provides a new way to determine the topological similarity of the networks by transferring which as calculations in vector similarities. We also propose a new metric for the topological evaluation which can better assess the topological quality of the alignment results across different networks. Both biological and topological evaluations demonstrate our approach is promising and preferable against previous multiple alignment methods. 
    more » « less