A powerful tool for the analysis of nonrandomized observational studies has been the potential outcomes model. Utilization of this framework allows analysts to estimate average treatment effects. This article considers the situation in which high-dimensional covariates are present and revisits the standard assumptions made in causal inference. We show that by employing a flexible Gaussian process framework, the assumption of strict overlap leads to very restrictive assumptions about the distribution of covariates, results for which can be characterized using classical results from Gaussian random measures as well as reproducing kernel Hilbert space theory. In addition, we propose a strategy for data-adaptive causal effect estimation that does not rely on the strict overlap assumption. These findings reveal under a focused framework the stringency that accompanies the use of the treatment positivity assumption in high-dimensional settings.
more »
« less
Choosing exogeneity assumptions in potential outcome models
There are many kinds of exogeneity assumptions. How should researchers choose among them? When exogeneity is imposed on an unobservable like a potential outcome, we argue that the form of exogeneity should be chosen based on the kind of selection on unobservables it allows. Consequently, researchers can assess the plausibility of any exogeneity assumption by studying the distributions of treatment given the unobservables that are consistent with that assumption. We use this approach to study two common exogeneity assumptions: quantile and mean independence. We show that both assumptions require a kind of nonmonotonic relationship between treatment and the potential outcomes. We discuss how to assess the plausibility of this kind of treatment selection. We also show how to define a new and weaker version of quantile independence that allows for monotonic selection on unobservables. We then show the implications of the choice of exogeneity assumption for identification. We apply these results in an empirical illustration of the effect of child soldiering on wages.
more »
« less
- Award ID(s):
- 1943138
- PAR ID:
- 10527491
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- The Econometrics Journal
- Volume:
- 26
- Issue:
- 3
- ISSN:
- 1368-4221
- Page Range / eLocation ID:
- 327 to 349
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Economic models often depend on quantities that are unobservable, either for privacy reasons or because they are difficult to measure. Examples of such variables include human capital (or ability), personal income, unobserved heterogeneity (such as consumer “types”), et cetera. This situation has historically been handled either by simply using observable imperfect proxies for each of the unobservables, or by assuming that such unobservables satisfy convenient conditional mean or independence assumptions that enable their elimination from the estimation problem. However, thanks to tremendous increases in both the amount of data available and computing power, it has become possible to take full advantage of recent formal methods to infer the statistical properties of unobservable variables from multiple imperfect measurements of them. The general framework used is the concept of measurement systems in which a vector of observed variables is expressed as a (possibly nonlinear or nonparametric) function of a vector of all unobserved variables (including unobserved error terms or “disturbances” that may have nonadditively separable affects). The framework emphasizes important connections with related fields, such as nonlinear panel data, limited dependent variables, game theoretic models, dynamic models, and set identification. This review reports the progress made toward the central question of whether there exist plausible assumptions under which one can identify the joint distribution of the unobservables from the knowledge of the joint distribution of the observables. It also overviews empirical efforts aimed at exploiting such identification results to deliver novel findings that formally account for the unavoidable presence of unobservables. (JEL C30, C55, C57, D12, E21, E23, J24)more » « less
-
Conditional independence (CI) tests play a central role in statistical inference, machine learning, and causal discovery. Most existing CI tests assume that the samples are indepen- dently and identically distributed (i.i.d.). How- ever, this assumption often does not hold in the case of relational data. We define Relational Conditional Independence (RCI), a generaliza- tion of CI to the relational setting. We show how, under a set of structural assumptions, we can test for RCI by reducing the task of test- ing for RCI on non-i.i.d. data to the problem of testing for CI on several data sets each of which consists of i.i.d. samples. We develop Kernel Relational CI test (KRCIT), a nonpara- metric test as a practical approach to testing for RCI by relaxing the structural assumptions used in our analysis of RCI. We describe re- sults of experiments with synthetic relational data that show the benefits of KRCIT relative to traditional CI tests that don’t account for the non-i.i.d. nature of relational data.more » « less
-
Ecologists seek to understand the intermediary ecological processes through which changes in one attribute in a system affect other attributes. Yet, quantifying the causal effects of these mediating processes in ecological systems is challenging. Researchers must define what they mean by a “mediated effect”, determine what assumptions are required to estimate mediation effects without bias, and assess whether these assumptions are credible for a study. To address these challenges, scholars in fields outside of ecology have made significant advances in mediation analysis over the past three decades. Here, we bring these advances to the attention of ecologists, for whom understanding mediating processes and deriving causal inferences are important for testing theory and developing resource management and conservation strategies. To illustrate both the challenges and the advances in quantifying mediation effects, we use a hypothetical ecological study. With this study, we show how common research designs used in ecology to detect and quantify mediation effects may have biases and how these biases can be addressed through alternative designs. Throughout the review, we highlight how causal claims rely on causal assumptions, and we illustrate how different designs or definitions of mediation effects can relax some of these assumptions. In contrast to statistical assumptions, causal assumptions are not verifiable from data, so we also describe procedures that researchers can use to assess the sensitivity of a study’s results to potential violations of its causal assumptions. The advances in causal mediation analyses reviewed herein will provide ecological researchers with approaches to clearly communicate the causal assumptions necessary for valid inferences and examine potential violations to these assumptions, which will enable rigorous and reproducible explanations of intermediary processes in ecology.more » « less
-
Can information theory be used to understand neural processing? Yes, but assumptions have to be made about the nature of neural signaling. The traditional view is that the individual neural spike is an all-or-none phenomenon, which allows neural spikes to be viewed as discrete, binary pulses, similar in kind to the way digital computers store and transmit digital representations. Under this assumption, the tools of information theory can be used to derive results about the properties of neural signals. However, new results from neuroscience demonstrate that the precise shape of the individual spike is functionally significant, thus violating the assumption that spikes can always be treated as a binary pulse. Instead, spikes must sometimes be viewed as a continuous signal. Fortunately, information-theoretic tools exist for the study of continuous signals; unfortunately, their use in the continuous domain is very different from their use in the discrete domain, and not always well understood. Researchers interested in making precise claims about the nature of the information used, stored, and processed in neural systems must pay careful attention to these differences.more » « less
An official website of the United States government

