skip to main content


Title: Representation Learning in Heterogeneous Professional Social Networks with Ambiguous Social Connections.
Network representations have been shown to improve performance within a variety of tasks, including classification, clustering, and link prediction. However, most models either focus on moderate-sized, homogeneous networks or require a significant amount of auxiliary input to be provided by the user. Moreover, few works have studied network representations in real-world heterogeneous social networks with ambiguous social connections and are often incomplete. In the present work, we investigate the problem of learning low-dimensional node representations in heterogeneous professional social networks (HPSNs), which are incomplete and have ambiguous social connections. We present a general heterogeneous network representation learning model called Star2Vec that learns entity and person embeddings jointly using a social connection strength-aware biased random walk combined with a node-structure expansion function. Experiments on LinkedIn's Economic Graph and publicly available snapshots of Facebook's network show that Star2Vec outperforms existing methods on members' industry and social circle classification, skill and title clustering, and member-entity link predictions. We also conducted large-scale case studies to demonstrate practical applications of the Star2Vec embeddings trained on LinkedIn's Economic Graph such as next career move, alternative career suggestions, and general entity similarity searches.  more » « less
Award ID(s):
1652492
NSF-PAR ID:
10136391
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Conference on Big Data
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. Results We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. Availability and implementation As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  2. null (Ed.)
    In the past decade, the amount of attributed network data has skyrocketed, and the problem of identifying their underlying group structures has received significant attention. By leveraging both attribute and link information, recent state-of-the-art network clustering methods have achieved significant improvements on relatively clean datasets. However, the noisy nature of real-world attributed networks has long been overlooked, which leads to degraded performance facing missing or inaccurate attributes and links. In this work, we overcome such weaknesses by marrying the strengths of clustering and embedding on attributed networks. Specifically, we propose GRACE (GRAph Clustering with Embedding propagation), to simultaneously learn network representations and identify network clusters in an end-to-end manner. It employs deep denoise autoencoders to generate robust network embeddings from node attributes, propagates the embeddings in the network to capture node interactions, and detects clusters based on the stable state of embedding propagation. To provide more insight, we further analyze GRACE in a theoretical manner and find its underlying connections with two canonical approaches for network modeling. Extensive experiments on six real-world attributed networks demonstrate the superiority of GRACE over various baselines from the state-of-the-art. Remarkably, GRACE improves the averaged performance of the strongest baseline from 0.43 to 0.52, yielding a 21% relative improvement. Controlled experiments and case studies further verify our intuitions and demonstrate the ability of GRACE to handle noisy information in real-world attributed networks. 
    more » « less
  3. Attributed network embedding aims to learn lowdimensional vector representations for nodes in a network, where each node contains rich attributes/features describing node content. Because network topology structure and node attributes often exhibit high correlation, incorporating node attribute proximity into network embedding is beneficial for learning good vector representations. In reality, large-scale networks often have incomplete/missing node content or linkages, yet existing attributed network embedding algorithms all operate under the assumption that networks are complete. Thus, their performance is vulnerable to missing data and suffers from poor scalability. In this paper, we propose a Scalable Incomplete Network Embedding (SINE) algorithm for learning node representations from incomplete graphs. SINE formulates a probabilistic learning framework that separately models pairs of node-context and node-attribute relationships. Different from existing attributed network embedding algorithms, SINE provides greater flexibility to make the best of useful information and mitigate negative effects of missing information on representation learning. A stochastic gradient descent based online algorithm is derived to learn node representations, allowing SINE to scale up to large-scale networks with high learning efficiency. We evaluate the effectiveness and efficiency of SINE through extensive experiments on real-world networks. Experimental results confirm that SINE outperforms state-of-the-art baselines in various tasks, including node classification, node clustering, and link prediction, under settings with missing links and node attributes. SINE is also shown to be scalable and efficient on large-scale networks with millions of nodes/edges and high-dimensional node features. 
    more » « less
  4. Demeniconi, C. ; Davidson, I (Ed.)
    Many irregular domains such as social networks, financial transactions, neuron connections, and natural language constructs are represented using graph structures. In recent years, a variety of graph neural networks (GNNs) have been successfully applied for representation learning and prediction on such graphs. In many of the real-world applications, the underlying graph changes over time, however, most of the existing GNNs are inadequate for handling such dynamic graphs. In this paper we propose a novel technique for learning embeddings of dynamic graphs using a tensor algebra framework. Our method extends the popular graph convolutional network (GCN) for learning representations of dynamic graphs using the recently proposed tensor M-product technique. Theoretical results presented establish a connection between the proposed tensor approach and spectral convolution of tensors. The proposed method TM-GCN is consistent with the Message Passing Neural Network (MPNN) framework, accounting for both spatial and temporal message passing. Numerical experiments on real-world datasets demonstrate the performance of the proposed method for edge classification and link prediction tasks on dynamic graphs. We also consider an application related to the COVID-19 pandemic, and show how our method can be used for early detection of infected individuals from contact tracing data. 
    more » « less
  5. Proc. 2023 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (Ed.)
    Representation learning on networks aims to derive a meaningful vector representation for each node, thereby facilitating downstream tasks such as link prediction, node classification, and node clustering. In heterogeneous text-rich networks, this task is more challenging due to (1) presence or absence of text: Some nodes are associated with rich textual information, while others are not; (2) diversity of types: Nodes and edges of multiple types form a heterogeneous network structure. As pretrained language models (PLMs) have demonstrated their effectiveness in obtaining widely generalizable text representations, a substantial amount of effort has been made to incorporate PLMs into representation learning on text-rich networks. However, few of them can jointly consider heterogeneous structure (network) information as well as rich textual semantic information of each node effectively. In this paper, we propose Heterformer, a Heterogeneous Network-Empowered Transformer that performs contextualized text encoding and heterogeneous structure encoding in a unified model. Specifically, we inject heterogeneous structure information into each Transformer layer when encoding node texts. Meanwhile, Heterformer is capable of characterizing node/edge type heterogeneity and encoding nodes with or without texts. We conduct comprehensive experiments on three tasks (i.e., link prediction, node classification, and node clustering) on three large-scale datasets from different domains, where Heterformer outperforms competitive baselines significantly and consistently. 
    more » « less