A powerful tool for the analysis of nonrandomized observational studies has been the potential outcomes model. Utilization of this framework allows analysts to estimate average treatment effects. This article considers the situation in which high-dimensional covariates are present and revisits the standard assumptions made in causal inference. We show that by employing a flexible Gaussian process framework, the assumption of strict overlap leads to very restrictive assumptions about the distribution of covariates, results for which can be characterized using classical results from Gaussian random measures as well as reproducing kernel Hilbert space theory. In addition, we propose a strategy for data-adaptive causal effect estimation that does not rely on the strict overlap assumption. These findings reveal under a focused framework the stringency that accompanies the use of the treatment positivity assumption in high-dimensional settings.
more »
« less
Choosing exogeneity assumptions in potential outcome models
There are many kinds of exogeneity assumptions. How should researchers choose among them? When exogeneity is imposed on an unobservable like a potential outcome, we argue that the form of exogeneity should be chosen based on the kind of selection on unobservables it allows. Consequently, researchers can assess the plausibility of any exogeneity assumption by studying the distributions of treatment given the unobservables that are consistent with that assumption. We use this approach to study two common exogeneity assumptions: quantile and mean independence. We show that both assumptions require a kind of nonmonotonic relationship between treatment and the potential outcomes. We discuss how to assess the plausibility of this kind of treatment selection. We also show how to define a new and weaker version of quantile independence that allows for monotonic selection on unobservables. We then show the implications of the choice of exogeneity assumption for identification. We apply these results in an empirical illustration of the effect of child soldiering on wages.
more »
« less
- Award ID(s):
- 1943138
- PAR ID:
- 10527491
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- The Econometrics Journal
- Volume:
- 26
- Issue:
- 3
- ISSN:
- 1368-4221
- Page Range / eLocation ID:
- 327 to 349
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Economic models often depend on quantities that are unobservable, either for privacy reasons or because they are difficult to measure. Examples of such variables include human capital (or ability), personal income, unobserved heterogeneity (such as consumer “types”), et cetera. This situation has historically been handled either by simply using observable imperfect proxies for each of the unobservables, or by assuming that such unobservables satisfy convenient conditional mean or independence assumptions that enable their elimination from the estimation problem. However, thanks to tremendous increases in both the amount of data available and computing power, it has become possible to take full advantage of recent formal methods to infer the statistical properties of unobservable variables from multiple imperfect measurements of them. The general framework used is the concept of measurement systems in which a vector of observed variables is expressed as a (possibly nonlinear or nonparametric) function of a vector of all unobserved variables (including unobserved error terms or “disturbances” that may have nonadditively separable affects). The framework emphasizes important connections with related fields, such as nonlinear panel data, limited dependent variables, game theoretic models, dynamic models, and set identification. This review reports the progress made toward the central question of whether there exist plausible assumptions under which one can identify the joint distribution of the unobservables from the knowledge of the joint distribution of the observables. It also overviews empirical efforts aimed at exploiting such identification results to deliver novel findings that formally account for the unavoidable presence of unobservables. (JEL C30, C55, C57, D12, E21, E23, J24)more » « less
-
Not AvailableStandard Discrete Choice Models (DCMs) assume that unobserved effects that influence decision-making are independently and identically distributed among individuals. When unobserved effects are spatially correlated, the independence assumption does not hold, leading to biased standard errors and potentially biased parameter estimates. This paper proposes an interpretable Hierarchical Nearest Neighbor Gaussian Process (HNNGP) model to account for spatially correlated unobservables in discrete choice analysis. Gaussian Processes (GPs) are often regarded as lacking interpretability due to their non-parametric nature. However, we demonstrate how to incorporate GPs directly into the latent utility specification to flexibly model spatially correlated unobserved effects without sacrificing structural economic interpretation. To empirically test our proposed HNNGP models, we analyze binary and multinomial mode choices for commuting to work in New York City. For the multinomial case, we formulate and estimate HNNGPs with and without independence from irrelevant alternatives (IIA). Building on the interpretability of our modeling strategy, we provide both point estimates and credible intervals for the value of travel time savings in NYC. Finally, we compare the results from all proposed specifications with those derived from a standard logit model and a probit model with spatially autocorrelated errors (SAE) to showcase how accounting for different sources of spatial correlation in discrete choice can significantly impact inference. We also show that the HNNGP models attain better out-of-sample prediction performance when compared to the logit and probit SAE models, especially in the multinomial case.more » « less
-
Conditional independence (CI) tests play a central role in statistical inference, machine learning, and causal discovery. Most existing CI tests assume that the samples are indepen- dently and identically distributed (i.i.d.). How- ever, this assumption often does not hold in the case of relational data. We define Relational Conditional Independence (RCI), a generaliza- tion of CI to the relational setting. We show how, under a set of structural assumptions, we can test for RCI by reducing the task of test- ing for RCI on non-i.i.d. data to the problem of testing for CI on several data sets each of which consists of i.i.d. samples. We develop Kernel Relational CI test (KRCIT), a nonpara- metric test as a practical approach to testing for RCI by relaxing the structural assumptions used in our analysis of RCI. We describe re- sults of experiments with synthetic relational data that show the benefits of KRCIT relative to traditional CI tests that don’t account for the non-i.i.d. nature of relational data.more » « less
-
Ecologists seek to understand the intermediary ecological processes through which changes in one attribute in a system affect other attributes. Yet, quantifying the causal effects of these mediating processes in ecological systems is challenging. Researchers must define what they mean by a “mediated effect”, determine what assumptions are required to estimate mediation effects without bias, and assess whether these assumptions are credible for a study. To address these challenges, scholars in fields outside of ecology have made significant advances in mediation analysis over the past three decades. Here, we bring these advances to the attention of ecologists, for whom understanding mediating processes and deriving causal inferences are important for testing theory and developing resource management and conservation strategies. To illustrate both the challenges and the advances in quantifying mediation effects, we use a hypothetical ecological study. With this study, we show how common research designs used in ecology to detect and quantify mediation effects may have biases and how these biases can be addressed through alternative designs. Throughout the review, we highlight how causal claims rely on causal assumptions, and we illustrate how different designs or definitions of mediation effects can relax some of these assumptions. In contrast to statistical assumptions, causal assumptions are not verifiable from data, so we also describe procedures that researchers can use to assess the sensitivity of a study’s results to potential violations of its causal assumptions. The advances in causal mediation analyses reviewed herein will provide ecological researchers with approaches to clearly communicate the causal assumptions necessary for valid inferences and examine potential violations to these assumptions, which will enable rigorous and reproducible explanations of intermediary processes in ecology.more » « less
An official website of the United States government

