skip to main content


Title: GAGE: Geometry Preserving Attributed Graph Embeddings
Node embedding is the task of extracting concise and informative representations of certain entities that are connected in a network. Various real-world networks include information about both node connectivity and certain node attributes, in the form of features or time-series data. Modern representation learning techniques employ both the connectivity and attribute information of the nodes to produce embeddings in an unsupervised manner. In this context, deriving embeddings that preserve the geometry of the network and the attribute vectors would be highly desirable, as they would reflect both the topological neighborhood structure and proximity in feature space. While this is fairly straightforward to maintain when only observing the connectivity or attribute information of the network, preserving the geometry of both types of information is challenging. A novel tensor factorization approach for node embedding in attributed networks is proposed in this paper, that preserves the distances of both the connections and the attributes. Furthermore, an effective and lightweight algorithm is developed to tackle the learning task and judicious experiments with multiple state-of-the-art baselines suggest that the proposed algorithm offers significant performance improvements in downstream tasks.  more » « less
Award ID(s):
1908070
NSF-PAR ID:
10357458
Author(s) / Creator(s):
;
Date Published:
Journal Name:
15th ACM Conference on Web Search and Data Mining (WSDM)
Page Range / eLocation ID:
439 to 448
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Attributed network embedding aims to learn lowdimensional vector representations for nodes in a network, where each node contains rich attributes/features describing node content. Because network topology structure and node attributes often exhibit high correlation, incorporating node attribute proximity into network embedding is beneficial for learning good vector representations. In reality, large-scale networks often have incomplete/missing node content or linkages, yet existing attributed network embedding algorithms all operate under the assumption that networks are complete. Thus, their performance is vulnerable to missing data and suffers from poor scalability. In this paper, we propose a Scalable Incomplete Network Embedding (SINE) algorithm for learning node representations from incomplete graphs. SINE formulates a probabilistic learning framework that separately models pairs of node-context and node-attribute relationships. Different from existing attributed network embedding algorithms, SINE provides greater flexibility to make the best of useful information and mitigate negative effects of missing information on representation learning. A stochastic gradient descent based online algorithm is derived to learn node representations, allowing SINE to scale up to large-scale networks with high learning efficiency. We evaluate the effectiveness and efficiency of SINE through extensive experiments on real-world networks. Experimental results confirm that SINE outperforms state-of-the-art baselines in various tasks, including node classification, node clustering, and link prediction, under settings with missing links and node attributes. SINE is also shown to be scalable and efficient on large-scale networks with millions of nodes/edges and high-dimensional node features. 
    more » « less
  2. null (Ed.)
    In the past decade, the amount of attributed network data has skyrocketed, and the problem of identifying their underlying group structures has received significant attention. By leveraging both attribute and link information, recent state-of-the-art network clustering methods have achieved significant improvements on relatively clean datasets. However, the noisy nature of real-world attributed networks has long been overlooked, which leads to degraded performance facing missing or inaccurate attributes and links. In this work, we overcome such weaknesses by marrying the strengths of clustering and embedding on attributed networks. Specifically, we propose GRACE (GRAph Clustering with Embedding propagation), to simultaneously learn network representations and identify network clusters in an end-to-end manner. It employs deep denoise autoencoders to generate robust network embeddings from node attributes, propagates the embeddings in the network to capture node interactions, and detects clusters based on the stable state of embedding propagation. To provide more insight, we further analyze GRACE in a theoretical manner and find its underlying connections with two canonical approaches for network modeling. Extensive experiments on six real-world attributed networks demonstrate the superiority of GRACE over various baselines from the state-of-the-art. Remarkably, GRACE improves the averaged performance of the strongest baseline from 0.43 to 0.52, yielding a 21% relative improvement. Controlled experiments and case studies further verify our intuitions and demonstrate the ability of GRACE to handle noisy information in real-world attributed networks. 
    more » « less
  3. Graph embedding techniques are pivotal in real-world machine learning tasks that operate on graph-structured data, such as social recommendation and protein structure modeling. Embeddings are mostly performed on the node level for learning representations of each node. Since the formation of a graph is inevitably affected by certain sensitive node attributes, the node embeddings can inherit such sensitive information and introduce undesirable biases in downstream tasks. Most existing works impose ad-hoc constraints on the node embeddings to restrict their distributions for unbiasedness/fairness, which however compromise the utility of the resulting embeddings. In this paper, we propose a principled new way for unbiased graph embedding by learning node embeddings from an underlying bias-free graph, which is not influenced by sensitive node attributes. Motivated by this new perspective, we propose two complementary methods for uncovering such an underlying graph, with the goal of introducing minimum impact on the utility of the embeddings. Both our theoretical justification and extensive experimental comparisons against state-of-the-art solutions demonstrate the effectiveness of our proposed methods. 
    more » « less
  4. null (Ed.)
    In this paper, we propose a supervised graph representation learning method to model the relationship between brain functional connectivity (FC) and structural connectivity (SC) through a graph encoder-decoder system. The graph convolutional network (GCN) model is leveraged in the encoder to learn lower-dimensional node representations (i.e. node embeddings) integrating information from both node attributes and network topology. In doing so, the encoder manages to capture both direct and indirect interactions between brain regions in the node embeddings which later help reconstruct empirical FC networks. From node embeddings, graph representations are learnt to embed the entire graphs into a vector space. Our end-to-end model utilizes a multi-objective loss function to simultaneously learn node representations for FC network reconstruction and graph representations for subject classification. The experiment on a large population of non-drinkers and heavy drinkers shows that our model can provide a characterization of the population pattern in the SC-FC relationship, while also learning features that capture individual uniqueness for subject classification. The identified key brain subnetworks show significant between-group difference and support the promising prospect of GCN-based graph representation learning on brain networks to model human brain activity and function. 
    more » « less
  5. null (Ed.)
    Learning the low-dimensional representations of graphs (i.e., network embedding) plays a critical role in network analysis and facilitates many downstream tasks. Recently graph convolutional networks (GCNs) have revolutionized the field of network embedding, and led to state-of-the-art performance in network analysis tasks such as link prediction and node classification. Nevertheless, most of the existing GCN-based network embedding methods are proposed for unsigned networks. However, in the real world, some of the networks are signed, where the links are annotated with different polarities, e.g., positive vs. negative. Since negative links may have different properties from the positive ones and can also significantly affect the quality of network embedding. Thus in this paper, we propose a novel network embedding framework SNEA to learn Signed Network Embedding via graph Attention. In particular, we propose a masked self-attentional layer, which leverages self-attention mechanism to estimate the importance coefficient for pair of nodes connected by different type of links during the embedding aggregation process. Then SNEA utilizes the masked self-attentional layers to aggregate more important information from neighboring nodes to generate the node embeddings based on balance theory. Experimental results demonstrate the effectiveness of the proposed framework through signed link prediction task on several real-world signed network datasets. 
    more » « less