skip to main content

Title: Causal inference over stochastic networks

Claiming causal inferences in network settings necessitates careful consideration of the often complex dependency between outcomes for actors. Of particular importance are treatment spillover or outcome interference effects. We consider causal inference when the actors are connected via an underlying network structure. Our key contribution is a model for causality when the underlying network is endogenous; where the ties between actors and the actor covariates are statistically dependent. We develop a joint model for the relational and covariate generating process that avoids restrictive separability and fixed network assumptions, as these rarely hold in realistic social settings. While our framework can be used with general models, we develop the highly expressive class of Exponential-family Random Network models (ERNM) of which Markov random fields and Exponential-family Random Graph models are special cases. We present potential outcome-based inference within a Bayesian framework and propose a modification to the exchange algorithm to allow for sampling from ERNM posteriors. We present results of a simulation study demonstrating the validity of the approach. Finally, we demonstrate the value of the framework in a case study of smoking in the context of adolescent friendship networks.

more » « less
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series A: Statistics in Society
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Semi-supervised (SS) inference has received much attention in recent years. Apart from a moderate-sized labeled data, $\mathcal L$, the SS setting is characterized by an additional, much larger sized, unlabeled data, $\mathcal U$. The setting of $|\mathcal U\ |\gg |\mathcal L\ |$, makes SS inference unique and different from the standard missing data problems, owing to natural violation of the so-called ‘positivity’ or ‘overlap’ assumption. However, most of the SS literature implicitly assumes $\mathcal L$ and $\mathcal U$ to be equally distributed, i.e., no selection bias in the labeling. Inferential challenges in missing at random type labeling allowing for selection bias, are inevitably exacerbated by the decaying nature of the propensity score (PS). We address this gap for a prototype problem, the estimation of the response’s mean. We propose a double robust SS mean estimator and give a complete characterization of its asymptotic properties. The proposed estimator is consistent as long as either the outcome or the PS model is correctly specified. When both models are correctly specified, we provide inference results with a non-standard consistency rate that depends on the smaller size $|\mathcal L\ |$. The results are also extended to causal inference with imbalanced treatment groups. Further, we provide several novel choices of models and estimators of the decaying PS, including a novel offset logistic model and a stratified labeling model. We present their properties under both high- and low-dimensional settings. These may be of independent interest. Lastly, we present extensive simulations and also a real data application.

    more » « less
  2. Abstract Exponential random graph models, or ERGMs, are a flexible and general class of models for modeling dependent data. While the early literature has shown them to be powerful in capturing many network features of interest, recent work highlights difficulties related to the models’ ill behavior, such as most of the probability mass being concentrated on a very small subset of the parameter space. This behavior limits both the applicability of an ERGM as a model for real data and inference and parameter estimation via the usual Markov chain Monte Carlo algorithms. To address this problem, we propose a new exponential family of models for random graphs that build on the standard ERGM framework. Specifically, we solve the problem of computational intractability and “degenerate” model behavior by an interpretable support restriction. We introduce a new parameter based on the graph-theoretic notion of degeneracy, a measure of sparsity whose value is commonly low in real-world networks. The new model family is supported on the sample space of graphs with bounded degeneracy and is called degeneracy-restricted ERGMs, or DERGMs for short. Since DERGMs generalize ERGMs—the latter is obtained from the former by setting the degeneracy parameter to be maximal—they inherit good theoretical properties, while at the same time place their mass more uniformly over realistic graphs. The support restriction allows the use of new (and fast) Monte Carlo methods for inference, thus making the models scalable and computationally tractable. We study various theoretical properties of DERGMs and illustrate how the support restriction improves the model behavior. We also present a fast Monte Carlo algorithm for parameter estimation that avoids many issues faced by Markov Chain Monte Carlo algorithms used for inference in ERGMs. 
    more » « less
  3. Summary

    Models of dynamic networks—networks that evolve over time—have manifold applications. We develop a discrete time generative model for social network evolution that inherits the richness and flexibility of the class of exponential family random-graph models. The model—a separable temporal exponential family random-graph model—facilitates separable modelling of the tie duration distributions and the structural dynamics of tie formation. We develop likelihood-based inference for the model and provide computational algorithms for maximum likelihood estimation. We illustrate the interpretability of the model in analysing a longitudinal network of friendship ties within a school.

    more » « less
  4. Context

    US states are largely responsible for the regulation of firearms within their borders. Each state has developed a different legal environment with regard to firearms based on different values and beliefs of citizens, legislators, governors, and other stakeholders. Predicting the types of firearm laws that states may adopt is therefore challenging.


    We propose a parsimonious model for this complex process and provide credible predictions of state firearm laws by estimating the likelihood they will be passed in the future. We employ a temporal exponential‐family random graph model to capture the bipartite state law–state network data over time, allowing for complex interdependencies and their temporal evolution. Using data on all state firearm laws over the period 1979–2020, we estimate these models’ parameters while controlling for factors associated with firearm law adoption, including internal and external state characteristics. Predictions of future firearm law passage are then calculated based on a number of scenarios to assess the effects of a given type of firearm law being passed in the future by a given state.


    Results show that a set of internal state factors are important predictors of firearm law adoption, but the actions of neighboring states may be just as important. Analysis of scenarios provide insights into the mechanics of how adoption of laws by specific states (or groups of states) may perturb the rest of the network structure and alter the likelihood that new laws would become more (or less) likely to continue to diffuse to other states.


    The methods used here outperform standard approaches for policy diffusion studies and afford predictions that are superior to those of an ensemble of machine learning tools. The proposed framework could have applications for the study of policy diffusion in other domains.

    more » « less
  5. Abstract

    Structural nested mean models (SNMMs) are useful for causal inference of treatment effects in longitudinal observational studies. Most existing works assume that the data are collected at prefixed time points for all subjects, which, however, may be restrictive in practice. To deal with irregularly spaced observations, we assume a class of continuous‐time SNMMs and a martingale condition of no unmeasured confounding (NUC) to identify the causal parameters. We develop the semiparametric efficiency theory and locally efficient estimators for continuous‐time SNMMs. This task is nontrivial due to the restrictions from the NUC assumption imposed on the SNMM parameter. In the presence of ignorable censoring, we show that the complete‐case estimator is optimal among a class of weighting estimators including the inverse probability of censoring weighting estimator, and it achieves a double robustness feature in that it is consistent if at least one of the models for the potential outcome mean function and the treatment process is correctly specified. The new framework allows us to conduct causal analysis respecting the underlying continuous‐time nature of data processes. The simulation study shows that the proposed estimator outperforms existing approaches. We estimate the effect of time to initiate highly active antiretroviral therapy on the CD4 count at year 2 from the observational Acute Infection and Early Disease Research Program database.

    more » « less