skip to main content


Title: Structure recovery and trend estimation for dynamic network analysis

Low‐dimensional parametric models for network dynamics have been successful as inferentially efficient and interpretable tools for modelling network evolution but have difficulty in settings with strong time inhomogeneity (particularly when sharp variation in parameters is possible and covariates are limited). Here, we propose to address this problem via a novel family of block‐structured dynamic exponential‐family random graph models (ERGMs), where the time domain is divided into consecutive blocks and the network parameters are assumed to evolve smoothly within each block. In particular, we let the latent ERGM parameters follow a piecewise polynomial model with an unknown block structure (e.g., change points). We propose an iterative estimation procedure that involves estimating the block structure using trend filtering and fitting ERGMs for networks belonging to the same time block. We demonstrate the utility of the proposed approach using simulation studies and applications to interbank transaction networks and citations among political blogs over the course of an electoral cycle.

 
more » « less
NSF-PAR ID:
10431363
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Stat
Volume:
12
Issue:
1
ISSN:
2049-1573
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Exponential random graph models, or ERGMs, are a flexible and general class of models for modeling dependent data. While the early literature has shown them to be powerful in capturing many network features of interest, recent work highlights difficulties related to the models’ ill behavior, such as most of the probability mass being concentrated on a very small subset of the parameter space. This behavior limits both the applicability of an ERGM as a model for real data and inference and parameter estimation via the usual Markov chain Monte Carlo algorithms. To address this problem, we propose a new exponential family of models for random graphs that build on the standard ERGM framework. Specifically, we solve the problem of computational intractability and “degenerate” model behavior by an interpretable support restriction. We introduce a new parameter based on the graph-theoretic notion of degeneracy, a measure of sparsity whose value is commonly low in real-world networks. The new model family is supported on the sample space of graphs with bounded degeneracy and is called degeneracy-restricted ERGMs, or DERGMs for short. Since DERGMs generalize ERGMs—the latter is obtained from the former by setting the degeneracy parameter to be maximal—they inherit good theoretical properties, while at the same time place their mass more uniformly over realistic graphs. The support restriction allows the use of new (and fast) Monte Carlo methods for inference, thus making the models scalable and computationally tractable. We study various theoretical properties of DERGMs and illustrate how the support restriction improves the model behavior. We also present a fast Monte Carlo algorithm for parameter estimation that avoids many issues faced by Markov Chain Monte Carlo algorithms used for inference in ERGMs. 
    more » « less
  2. Social network data are complex and dependent data. At the macro-level, social networks often exhibit clustering in the sense that social networks consist of communities; and at the micro-level, social networks often exhibit complex network features such as transitivity within communities. Modeling real-world social networks requires modeling both the macro- and micro-level, but many existing models focus on one of them while neglecting the other. In recent work, [28] introduced a class of Exponential Random Graph Models (ERGMs) capturing community structure as well as microlevel features within communities. While attractive, existing approaches to estimating ERGMs with community structure are not scalable. We propose here a scalable two-stage strategy to estimate an important class of ERGMs with community structure, which induces transitivity within communities. At the first stage, we use an approximate model, called working model, to estimate the community structure. At the second stage, we use ERGMs with geometrically weighted dyadwise and edgewise shared partner terms to capture refined forms of transitivity within communities. We use simulations to demonstrate the performance of the two-stage strategy in terms of the estimated community structure. In addition, we show that the estimated ERGMs with geometrically weighted dyadwise and edgewise shared partner terms within communities outperform the working model in terms of goodness-of-fit. Last, but not least, we present an application to high-resolution human contact network data. 
    more » « less
  3. We consider the problem of estimating the location of a single change point in a network generated by a dynamic stochastic block model mechanism. This model produces community structure in the network that exhibits change at a single time epoch. We propose two methods of estimating the change point, together with the model parameters, before and after its occurrence. The first employs a least-squares criterion function and takes into consideration the full structure of the stochastic block model and is evaluated at each point in time. Hence, as an intermediate step, it requires estimating the community structure based on a clustering algorithm at every time point. The second method comprises the following two steps: in the first one, a least-squares function is used and evaluated at each time point, but ignoring the community structure and only considering a random graph generating mechanism exhibiting a change point. Once the change point is identified, in the second step, all network data before and after it are used together with a clustering algorithm to obtain the corresponding community structures and subsequently estimate the generating stochastic block model parameters. The first method, since it requires knowledge of the community structure and hence clustering at every point in time, is significantly more computationally expensive than the second one. On the other hand, it requires a significantly less stringent identifiability condition for consistent estimation of the change point and the model parameters than the second method; however, it also requires a condition on the misclassification rate of misallocating network nodes to their respective communities that may fail to hold in many realistic settings. Despite the apparent stringency of the identifiability condition for the second method, we show that networks generated by a stochastic block mechanism exhibiting a change in their structure can easily satisfy this condition under a multitude of scenarios, including merging/splitting communities, nodes joining another community, etc. Further, for both methods under their respective identifiability and certain additional regularity conditions, we establish rates of convergence and derive the asymptotic distributions of the change point estimators. The results are illustrated on synthetic data. In summary, this work provides an in-depth investigation of the novel problem of change point analysis for networks generated by stochastic block models, identifies key conditions for the consistent estimation of the change point, and proposes a computationally fast algorithm that solves the problem in many settings that occur in applications. Finally, it discusses challenges posed by employing clustering algorithms in this problem, that require additional investigation for their full resolution. 
    more » « less
  4. De Vico Fallani, Fabrizio (Ed.)
    The exponential family random graph modeling (ERGM) framework provides a highly flexible approach for the statistical analysis of networks (i.e., graphs). As ERGMs with dyadic dependence involve normalizing factors that are extremely costly to compute, practical strategies for ERGMs inference generally employ a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the maximum likelihood estimator (MLE) of ERGM parameters, and is generally feasible for typical models on single networks with as many as a few thousand nodes. MCMC-based algorithms for Bayesian analysis are more expensive, and high-quality answers are challenging to obtain on large graphs. For both strategies, extension to the pooled case—in which we observe multiple networks from a common generative process—adds further computational cost, with both time and memory scaling linearly in the number of graphs. This becomes prohibitive for large networks, or cases in which large numbers of graph observations are available. Here, we exploit some basic properties of the discrete exponential families to develop an approach for ERGM inference in the pooled case that (where applicable) allows an arbitrarily large number of graph observations to be fit at no additional computational cost beyond preprocessing the data itself. Moreover, a variant of our approach can also be used to perform Bayesian inference under conjugate priors, again with no additional computational cost in the estimation phase. The latter can be employed either for single graph observations, or for observations from graph sets. As we show, the conjugate prior is easily specified, and is well-suited to applications such as regularization. Simulation studies show that the pooled method leads to estimates with good frequentist properties, and posterior estimates under the conjugate prior are well-behaved. We demonstrate the usefulness of our approach with applications to pooled analysis of brain functional connectivity networks and to replicated x-ray crystal structures of hen egg-white lysozyme. 
    more » « less
  5. Model‐based clustering of time‐evolving networks has emerged as one of the important research topics in statistical network analysis. It is a fundamental research question to model time‐varying network parameters. However, due to difficulties in modelling functional network parameters, there is little progress in the current literature to model time‐varying network parameters effectively. In this work, we model network parameters as univariate nonparametric functions instead of constants. We effectively estimate those functional network parameters in temporal exponential‐family random graph models using a kernel regression technique and a local likelihood approach. Furthermore, we propose a semiparametric finite mixture of temporal exponential‐family random graph models by adopting finite mixture models, which simultaneously allows both modelling and detecting groups in time‐evolving networks. Also, we use a conditional likelihood to construct an effective model selection criterion and network cross‐validation to choose an optimal bandwidth. The power of our method is demonstrated in simulation studies and real‐world applications to dynamic international trade networks and dynamic arm trade networks.

     
    more » « less