skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Causal Inference Methods and their Challenges: The Case of 311 Data
The main purpose of this paper is to illustrate the application of causal inference method to administrative data and the challenges of such application. We illustrate by applying Bayesian networks method to 311 data from Miami-Dade County, Florida (USA). The 311 centers provide non-emergency services to residents. The 311 data are large and granular. We aim to explore the equity issues and biases that might exist in this particular type of service requests. As a case study, the relationship between population characteristics (independent variables) and request volume and completion time (dependent variables) is examined to identify the disparities, if any, from the observational data. The empirical analysis shows that there are no biases in services provided to any specific demographic, socioeconomic, or geographical groups. However, the administrative data do have various challenges for inferring causality due to missing or impure data, inadequacy, and latent confounders. The precautions of applying causal techniques to analyzing administrative data like 311 are discussed.  more » « less
Award ID(s):
1924154
PAR ID:
10280683
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
DG.O2021: The 22nd Annual International Conference on Digital Government Research
Page Range / eLocation ID:
49 to 59
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Carruthers, John; Duncan, Natasha; He, Canfei; Zhu, Shengjun (Ed.)
    This paper illustrates the application of machine learning algorithms in predictive analytics for local governments using administrative data. The developed and tested machine learning predictive algorithm overcomes known limitations of the conventional ordinary least squares method. Such limitations include but not limited to imposed linearity, presumed causality with independent variables as presumed causes and dependent variables as presume result, likely high multicollinearity among features, and spatial autocorrelation. The study applies the algorithms to 311 non-emergency service requests in the context of Miami-Dade County. The algorithms are applied to predict the volume of 311 service requests and the community characteristics affecting the volume across Census tract neighborhoods. Four common families of algorithms and an ensemble of them are applied. They are random forest, support vector machines, lasso and elastic-net regularized generalized linear models, and extreme gradient boosting. Two feature selection methods, namely Boruta and fscaret, are applied to identify the significant community characteristics. The results show that the machine learning algorithms capture spatial autocorrelation and clustering. The features generated by fscaret algorithms are parsimonious in predicting the 311 service request volume. 
    more » « less
  2. ABSTRACT Ecology often seeks to answer causal questions, and while ecologists have a rich history of experimental approaches, novel observational data streams and the need to apply insights across naturally occurring conditions pose opportunities and challenges. Other fields have developed causal inference approaches that can enhance and expand our ability to answer ecological causal questions using observational or experimental data. However, the lack of comprehensive resources applying causal inference to ecological settings and jargon from multiple disciplines creates barriers. We introduce approaches for causal inference, discussing the main frameworks for counterfactual causal inference, how causal inference differs from other research aims and key challenges; the application of causal inference in experimental and quasi‐experimental study designs; appropriate interpretation of the results of causal inference approaches given their assumptions and biases; foundational papers; and the data requirements and trade‐offs between internal and external validity posed by different designs. We highlight that these designs generally prioritise internal validity over generalisability. Finally, we identify opportunities and considerations for ecologists to further integrate causal inference with synthesis science and meta‐analysis and expand the spatiotemporal scales at which causal inference is possible. We advocate for ecology as a field to collectively define best practices for causal inference. 
    more » « less
  3. Ecologists seek to understand the intermediary ecological processes through which changes in one attribute in a system affect other attributes. Yet, quantifying the causal effects of these mediating processes in ecological systems is challenging. Researchers must define what they mean by a “mediated effect”, determine what assumptions are required to estimate mediation effects without bias, and assess whether these assumptions are credible for a study. To address these challenges, scholars in fields outside of ecology have made significant advances in mediation analysis over the past three decades. Here, we bring these advances to the attention of ecologists, for whom understanding mediating processes and deriving causal inferences are important for testing theory and developing resource management and conservation strategies. To illustrate both the challenges and the advances in quantifying mediation effects, we use a hypothetical ecological study. With this study, we show how common research designs used in ecology to detect and quantify mediation effects may have biases and how these biases can be addressed through alternative designs. Throughout the review, we highlight how causal claims rely on causal assumptions, and we illustrate how different designs or definitions of mediation effects can relax some of these assumptions. In contrast to statistical assumptions, causal assumptions are not verifiable from data, so we also describe procedures that researchers can use to assess the sensitivity of a study’s results to potential violations of its causal assumptions. The advances in causal mediation analyses reviewed herein will provide ecological researchers with approaches to clearly communicate the causal assumptions necessary for valid inferences and examine potential violations to these assumptions, which will enable rigorous and reproducible explanations of intermediary processes in ecology. 
    more » « less
  4. Identification of causal direction between a causal-effect pair from observed data has recently attracted much attention. Various methods based on functional causal models have been proposed to solve this problem, by assuming the causal process satisfies some (structural) constraints and showing that the reverse direction violates such constraints. The nonlinear additive noise model has been demonstrated to be effective for this purpose, but the model class is not transitive--even if each direct causal relation follows this model, indirect causal influences, which result from omitted intermediate causal variables and are frequently encountered in practice, do not necessarily follow the model constraints; as a consequence, the nonlinear additive noise model may fail to correctly discover causal direction. In this work, we propose a cascade nonlinear additive noise model to represent such causal influences--each direct causal relation follows the nonlinear additive noise model but we observe only the initial cause and final effect. We further propose a method to estimate the model, including the unmeasured intermediate variables, from data, under the variational auto-encoder framework. Our theoretical results show that with our model, causal direction is identifiable under suitable technical conditions on the data generation process. Simulation results illustrate the power of the proposed method in identifying indirect causal relations across various settings, and experimental results on real data suggest that the proposed model and method greatly extend the applicability of causal discovery based on functional causal models in nonlinear cases. 
    more » « less
  5. Missing data are ubiquitous in many domain such as healthcare. When these data entries are not missing completely at random, the (conditional) independence relations in the observed data may be different from those in the complete data generated by the underlying causal process.Consequently, simply applying existing causal discovery methods to the observed data may lead to wrong conclusions. In this paper, we aim at developing a causal discovery method to recover the underlying causal structure from observed data that are missing under different mechanisms, including missing completely at random (MCAR),missing at random (MAR), and missing not at random (MNAR). With missingness mechanisms represented by missingness graphs (m-graphs),we analyze conditions under which additional correction is needed to derive conditional independence/dependence relations in the complete data. Based on our analysis, we propose Miss-ing Value PC (MVPC), which extends the PC algorithm to incorporate additional corrections.Our proposed MVPC is shown in theory to give asymptotically correct results even on data that are MAR or MNAR. Experimental results on both synthetic data and real healthcare applications illustrate that the proposed algorithm is able to find correct causal relations even in the general case of MNAR. 
    more » « less