The Random Dot Product Graph (RDPG) is a popular generative graph model for relational data. RDPGs postulate there exist latent positions for each node, and specifies the edge formation probabilities via the inner product of the corresponding latent vectors. The embedding task of estimating these latent positions from observed graphs is usually posed as a non-convex matrix factorization problem. The workhorse Adjacency Spectral Embedding offers an approximate solution obtained via the eigendecomposition of the adjacency matrix, which enjoys solid statistical guarantees but can be computationally intensive and is formally solving a surrogate problem. In this paper, we bring to bear recent non-convex optimization advances and demonstrate their impact to RDPG inference. We develop first-order gradient descent methods to better solve the original optimization problem, and to accommodate broader network embedding applications in an organic way. The effectiveness of the resulting graph representation learning framework is demonstrated on both synthetic and real data. We show the algorithms are scalable, robust to missing network data, and can track the latent positions over time when the graphs are acquired in a streaming fashion.
more »
« less
Tracking the Adjacency Spectral Embedding for Streaming Graphs
The popular Random Dot Product Graph (RDPG) generative model postulates that each node has an associated (latent) vector, and the probability of existence of an edge between two nodes is their inner-product (with variants to consider directed and weighted graphs). In any case, the latent vectors may be estimated through a spectral decomposition of the adjacency matrix, the so-called Adjacency Spectral Embedding (ASE). Assume we are monitoring a stream of graphs and the objective is to track the latent vectors. Examples include recommender systems or monitoring of a wireless network. It is clear that performing the ASE of each graph separately may result in a prohibitive computation load. Furthermore, the invariance to rotations of the inner product complicates comparing the latent vectors at different time-steps. By considering the minimization problem underlying ASE, we develop an iterative algorithm that updates the latent vectors' estimation as new graphs from the stream arrive. Differently to other proposals, our method does not accumulate errors and thus does not requires periodically re-computing the spectral decomposition. Furthermore, the pragmatic setting where nodes leave or join the graph (e.g. a new product in the recommender system) can be accommodated as well. Our code is available at https://github.com/marfiori/efficient-ASE
more »
« less
- PAR ID:
- 10443086
- Date Published:
- Journal Name:
- 2022 56th Asilomar Conference on Signals, Systems, and Computers
- Page Range / eLocation ID:
- 847 to 851
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Given a sequence of random graphs, we address the problem of online monitoring and detection of changes in the underlying data distribution. To this end, we adopt the Random Dot Product Graph (RDPG) model which postulates each node has an associated latent vector, and inner products between these vectors dictate the edge formation probabilities. Existing approaches for graph change-point detection (CPD) rely either on extensive computation, or they store and process the entire observed time series. In this paper we consider the cumulative sum of a judicious monitoring function, which quantifies the discrepancy between the streaming graph observations and the nominal model. This reference distribution is inferred via spectral embeddings of the first few graphs in the sequence, and the monitoring function can be updated in an efficient, online fashion. We characterize the distribution of this running statistic, allowing us to select appropriate thresholding parameters that guarantee error-rate control. The end result is a lightweight online CPD algorithm, with a proven capability to flag distribution shifts in the arriving graphs. The novel method is tested on both synthetic and real network data, corroborating its effectiveness in quickly detecting changes in the input graph sequence.more » « less
-
Given a sequence of possibly correlated randomly generated graphs, we address the problem of detecting changes on their underlying distribution. To this end, we will consider Random Dot Product Graphs (RDPGs), a simple yet rich family of random graphs that subsume Erdös-Rényi and Stochastic Block Model ensembles as particular cases. In RDPGs each node has an associated latent vector and inner products between these vectors dictate the edge existence probabilities. Previous works have mostly focused on the undirected and unweighted graph case, a gap we aim to close here. We first extend the RDPG model to accommodate directed and weighted graphs, a contribution whose interest transcends change-point detection (CPD). A statistic derived from the nodes' estimated latent vectors (i.e., embeddings) facilitates adoption of scalable geometric CPD techniques. The resulting algorithm yields interpretable results and facilitates pinpointing which (and when) nodes are acting differently. Numerical tests on simulated data as well as on a real dataset of graphs stemming from a Wi-Fi network corroborate the effectiveness of the proposed CPD method.more » « less
-
Abstract We pursue the problem of modelling and analysing latent space dynamics in collections of networks. Towards this end, we pose and study latent space generative models for signed networks that are amenable to inference via spectral methods. Permitting signs, rather than restricting to unsigned networks, enables richer latent space structure and permissible dynamic mechanisms that can be provably inferred via low rank truncations of observed adjacency matrices. Our treatment of and ability to recover latent space dynamics holds across different levels of granularity, namely, at the overall graph level, for communities of nodes, and even at the individual node level. We provide synthetic and real data examples to illustrate the effectiveness of methodologies and to corroborate accompanying theory. The contributions set forth in this paper complement an emerging statistical paradigm for random graph inference encompassing random dot product graphs and generalizations thereof.more » « less
-
Summary We propose and prove the optimality of a Bayesian approach for estimating the latent positions in random dot product graphs, which we call posterior spectral embedding. Unlike classical spectral-based adjacency, or Laplacian spectral embedding, posterior spectral embedding is a fully likelihood-based graph estimation method that takes advantage of the Bernoulli likelihood information of the observed adjacency matrix. We develop a minimax lower bound for estimating the latent positions, and show that posterior spectral embedding achieves this lower bound in the following two senses: it both results in a minimax-optimal posterior contraction rate and yields a point estimator achieving the minimax risk asymptotically. The convergence results are subsequently applied to clustering in stochastic block models with positive semidefinite block probability matrices, strengthening an existing result concerning the number of misclustered vertices. We also study a spectral-based Gaussian spectral embedding as a natural Bayesian analogue of adjacency spectral embedding, but the resulting posterior contraction rate is suboptimal by an extra logarithmic factor. The practical performance of the proposed methodology is illustrated through extensive synthetic examples and the analysis of Wikipedia graph data.more » « less
An official website of the United States government

