skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on September 25, 2025

Title: Learning Representations for Hierarchies with Minimal Support
When training node embedding models to represent large directed graphs (digraphs), it is impossible to observe all entries of the adjacency matrix during training. As a consequence most methods employ sampling. For very large digraphs, however, this means many (most) entries may be unobserved during training. In general, observing every entry would be necessary to uniquely identify a graph, however if we know the graph has a certain property some entries can be omitted - for example, only half the entries would be required for a symmetric graph. In this work, we develop a novel framework to identify a subset of entries required to uniquely distinguish a graph among all transitively-closed DAGs. We give an explicit algorithm to compute the provably minimal set of entries, and demonstrate empirically that one can train node embedding models with greater efficiency and performance, provided the energy function has an appropriate inductive bias. We achieve robust performance on synthetic hierarchies and a larger real-world taxonomy, observing improved convergence rates in a resource-constrained setting while reducing the set of training examples by as much as 99%.  more » « less
Award ID(s):
2106391
PAR ID:
10559353
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
NeurIPS
Date Published:
Format(s):
Medium: X
Location:
https://openreview.net/forum?id=HFS800reZK
Sponsoring Org:
National Science Foundation
More Like this
  1. Matrix completion is a well-known approach for recommender systems. It predicts the values of the missing entries in a sparse user-item interaction matrix, based on the low-rank structure of the rating matrix. However, existing matrix completion methods do not take node polysemy and side information of social relationships into consideration, which can otherwise further improve the performance. In this paper, we propose a novel matrix completion method that employs both users’ friendships and rating entries to predict the missing values in a user-item matrix. Our approach adopts a graph-based modeling where nodes are users and items, and two types of edges are considered: user friendships and user-item interactions. Polysemy-aware node features are extracted from this heterogeneous graph through a graph convolution network by considering the multifaceted factors for edge formation, which are then connected to a hybrid loss function with two heads: (1) a social-homophily head to address node polysemy, and (2) an error head for user-item rating regression. The latter is formulated on all matrix entries to combat the sensitivity of negative sampling of the vast majority of missing entries during training, with a smart technique to reduce the time complexity. Extensive experiments over real datasets verify that our model outperforms the state-of-the-art matrix completion methods by a significant margin. 
    more » « less
  2. Graph neural networks (GNNs) have demonstrated a significant success in various graph learning tasks, from graph classification to anomaly detection. There recently has emerged a number of approaches adopting a graph pooling operation within GNNs, with a goal to preserve graph attributive and structural features during the graph representation learning. However, most existing graph pooling operations suffer from the limitations of relying on node-wise neighbor weighting and embedding, which leads to insufficient encoding of rich topological structures and node attributes exhibited by real-world networks. By invoking the machinery of persistent homology and the concept of landmarks, we propose a novel topological pooling layer and witness complex-based topological embedding mechanism that allow us to systematically integrate hidden topological information at both local and global levels. Specifically, we design new learnable local and global topological representations Wit-TopoPool which allow us to simultaneously extract rich discriminative topological information from graphs. Experiments on 11 diverse benchmark datasets against 18 baseline models in conjunction with graph classification tasks indicate that Wit-TopoPool significantly outperforms all competitors across all datasets. 
    more » « less
  3. null (Ed.)
    Low-dimensional node embeddings play a key role in analyzing graph datasets. However, little work studies exactly what information is encoded by popular embedding methods, and how this information correlates with performance in downstream machine learning tasks. We tackle this question by studying whether embeddings can be inverted to (approximately) recover the graph used to generate them. Focusing on a variant of the popular DeepWalk method (Perozzi et al., 2014; Qiu et al., 2018), we present algorithms for accurate embedding inversion - i.e., from the low-dimensional embedding of a graph G, we can find a graph H with a very similar embedding. We perform numerous experiments on real-world networks, observing that significant information about G, such as specific edges and bulk properties like triangle density, is often lost in H. However, community structure is often preserved or even enhanced. Our findings are a step towards a more rigorous understanding of exactly what information embeddings encode about the input graph, and why this information is useful for learning tasks. 
    more » « less
  4. In this paper, we propose a comprehensive unsupervised framework that leverages existing and novel multiview learning models, towards obtaining a single node embedding from a collection of node embeddings, combining the best of all worlds. Through extensive experiments, we demonstrate that the proposed multiview node embedding is able to perform on par or better than the best of its constituents and provide reliable performance across downstream tasks including node classification and graph reconstruction. Index Terms—multiview learning, node embedding, hybrid tensor decomposition, unsupervised learning 
    more » « less
  5. null (Ed.)
    We present FastRP, a scalable and performant algorithm for learning distributed node representations in a graph. FastRP is over 4,000 times faster than state-of-the-art methods such as DeepWalk and node2vec, while achieving comparable or even better performance as evaluated on several real-world networks on various downstream tasks. We observe that most network embedding methods consist of two components: construct a node similarity matrix and then apply dimension reduction techniques to this matrix. We show that the success of these methods should be attributed to the proper construction of this similarity matrix, rather than the dimension reduction method employed. FastRP is proposed as a scalable algorithm for network embeddings. Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction. An extra benefit from combining these two design choices is that it allows the iterative computation of node embeddings so that the similarity matrix need not be explicitly constructed, which further speeds up FastRP. FastRP is also advantageous for its ease of implementation, parallelization and hyperparameter tuning. The source code is available at https://github.com/GTmac/FastRP. 
    more » « less