skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Concentration and Consistency Results for Canonical and Curved Exponential-Family Models of Random Graphs
Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likelihood and M-estimators of a wide range of canonical and curved exponentialfamily models of random graphs with local dependence. All results are nonasymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an application, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes.  more » « less
Award ID(s):
1812119 1513644
PAR ID:
10094996
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Annals of statistics
ISSN:
0090-5364
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. De Vico Fallani, Fabrizio (Ed.)
    The exponential family random graph modeling (ERGM) framework provides a highly flexible approach for the statistical analysis of networks (i.e., graphs). As ERGMs with dyadic dependence involve normalizing factors that are extremely costly to compute, practical strategies for ERGMs inference generally employ a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the maximum likelihood estimator (MLE) of ERGM parameters, and is generally feasible for typical models on single networks with as many as a few thousand nodes. MCMC-based algorithms for Bayesian analysis are more expensive, and high-quality answers are challenging to obtain on large graphs. For both strategies, extension to the pooled case—in which we observe multiple networks from a common generative process—adds further computational cost, with both time and memory scaling linearly in the number of graphs. This becomes prohibitive for large networks, or cases in which large numbers of graph observations are available. Here, we exploit some basic properties of the discrete exponential families to develop an approach for ERGM inference in the pooled case that (where applicable) allows an arbitrarily large number of graph observations to be fit at no additional computational cost beyond preprocessing the data itself. Moreover, a variant of our approach can also be used to perform Bayesian inference under conjugate priors, again with no additional computational cost in the estimation phase. The latter can be employed either for single graph observations, or for observations from graph sets. As we show, the conjugate prior is easily specified, and is well-suited to applications such as regularization. Simulation studies show that the pooled method leads to estimates with good frequentist properties, and posterior estimates under the conjugate prior are well-behaved. We demonstrate the usefulness of our approach with applications to pooled analysis of brain functional connectivity networks and to replicated x-ray crystal structures of hen egg-white lysozyme. 
    more » « less
  2. Constructing a spanning tree of a graph is one of the most basic tasks in graph theory. Motivated by several recent studies of local graph algorithms, we consider the following variant of this problem. Let G be a connected bounded-degree graph. Given an edge e in G we would like to decide whether e belongs to a connected subgraph math formula consisting of math formula edges (for a prespecified constant math formula), where the decision for different edges should be consistent with the same subgraph math formula. Can this task be performed by inspecting only a constant number of edges in G? Our main results are: We show that if every t-vertex subgraph of G has expansion math formula then one can (deterministically) construct a sparse spanning subgraph math formula of G using few inspections. To this end we analyze a “local” version of a famous minimum-weight spanning tree algorithm. We show that the above expansion requirement is sharp even when allowing randomization. To this end we construct a family of 3-regular graphs of high girth, in which every t-vertex subgraph has expansion math formula. We prove that for this family of graphs, any local algorithm for the sparse spanning graph problem requires inspecting a number of edges which is proportional to the girth. 
    more » « less
  3. ABSTRACT A random algebraic graph is defined by a group with a uniform distribution over it and a connection with expectation satisfying . The random graph with vertex set is formed as follows. First, independent variables are sampled uniformly from . Then, vertices are connected with probability . This model captures random geometric graphs over the sphere, torus, and hypercube; certain instances of the stochastic block model; and random subgraphs of Cayley graphs. The main question of interest to the current paper is: when is a random algebraic graph statistically and/or computationally distinguishable from ? Our results fall into two categories. (1) Geometric. We focus on the case and use Fourier‐analytic tools. We match and extend the following results from the prior literature: For hard threshold connections, we match for , and for ‐Lipschitz connections we extend the results of when to the non‐monotone setting. (2) Algebraic. We provide evidence for an exponential statistical‐computational gap. Consider any finite group and let be a set of elements formed by including each set of the form independently with probability Let be the distribution of random graphs formed by taking a uniformly random induced subgraph of size of the Cayley graph . Then, and are statistically indistinguishable with high probability over if and only if . However, low‐degree polynomial tests fail to distinguish and with high probability over when 
    more » « less
  4. Abstract We pursue the problem of modelling and analysing latent space dynamics in collections of networks. Towards this end, we pose and study latent space generative models for signed networks that are amenable to inference via spectral methods. Permitting signs, rather than restricting to unsigned networks, enables richer latent space structure and permissible dynamic mechanisms that can be provably inferred via low rank truncations of observed adjacency matrices. Our treatment of and ability to recover latent space dynamics holds across different levels of granularity, namely, at the overall graph level, for communities of nodes, and even at the individual node level. We provide synthetic and real data examples to illustrate the effectiveness of methodologies and to corroborate accompanying theory. The contributions set forth in this paper complement an emerging statistical paradigm for random graph inference encompassing random dot product graphs and generalizations thereof. 
    more » « less
  5. Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traffic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifiable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for efficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOSPLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and efficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and efficiency of our proposed HOSPLOC algorithm. 
    more » « less