Selection and confounding biases are the two most common impediments to the applicability of causal inference methods in largescale settings. We generalize the notion of backdoor adjustment to account for both biases and leverage external data that may be available without selection bias (e.g., data from census). We introduce the notion of adjustment pair and present complete graphical conditions for identifying causal effects by adjustment. We further design an algorithm for listing all admissible adjustment pairs in polynomial delay, which is useful for researchers interested in evaluating certain properties of some admissible pairs but not all (common properties include cost, variance, and feasibility to measure). Finally, we describe a statistical estimation procedure that can be performed once a set is known to be admissible, which entails different challenges in terms of finite samples.
more »
« less
Generalized Adjustment Under Confounding and Selection Biases
Selection and confounding biases are the two most common
impediments to the applicability of causal inference methods
in largescale settings. We generalize the notion of backdoor
adjustment to account for both biases and leverage external
data that may be available without selection bias (e.g., data
from census). We introduce the notion of adjustment pair and
present complete graphical conditions for identifying causal
effects by adjustment. We further design an algorithm for
listing all admissible adjustment pairs in polynomial delay,
which is useful for researchers interested in evaluating certain
properties of some admissible pairs but not all (common
properties include cost, variance, and feasibility to measure).
Finally, we describe a statistical estimation procedure that can
be performed once a set is known to be admissible, which entails
different challenges in terms of finite samples.
more »
« less
 Award ID(s):
 1704352
 NSFPAR ID:
 10060356
 Date Published:
 Journal Name:
 Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI)
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Generalizing causal effects from a controlled experiment to settings beyond the particular study population is arguably one of the central tasks found in empirical circles. While a proper design and careful execution of the experiment would support, under mild conditions, the validity of inferences about the population in which the experiment was conducted, two challenges make the extrapolation step to different populations somewhat involved, namely, transportability and sampling selection bias. The former is concerned with disparities in the distributions and causal mechanisms between the domain (i.e., settings, population, environment) where the experiment is conducted and where the inferences are intended; the latter with distortions in the sample’s proportions due to preferential selection of units into the study. In this paper, we investigate the assumptions and machinery necessary for using covariate adjustment to correct for the biases generated by both of these problems, and generalize experimental data to infer causal effects in a new domain. We derive complete graphical conditions to determine if a set of covariates is admissible for adjustment in this new setting. Building on the graphical characterization, we develop an efficient algorithm that enumerates all possible admissible sets with polytime delay guarantee; this can be useful for when some variables are preferred over the others due to different costs or amenability to measurement.more » « less

Generalizing causal effects from a controlled experiment to settings beyond the particular study population is arguably one of the central tasks found in empirical circles. While a proper design and careful execution of the experiment would support, under mild conditions, the validity of inferences about the population in which the experiment was conducted, two challenges make the extrapolation step to different populations somewhat involved, namely, transportability and sampling selection bias. The former is concerned with disparities in the distributions and causal mechanisms between the domain (i.e., settings, population, environment) where the experiment is conducted and where the inferences are intended; the latter with distortions in the sample’s proportions due to preferential selection of units into the study. In this paper, we investigate the assumptions and machinery necessary for using \emph{covariate adjustment} to correct for the biases generated by both of these problems, and generalize experimental data to infer causal effects in a new domain. We derive complete graphical conditions to determine if a set of covariates is admissible for adjustment in this new setting. Building on the graphical characterization, we develop an efficient algorithm that enumerates all possible admissible sets with polytime delay guarantee; this can be useful for when some variables are preferred over the others due to different costs or amenability to measurement.more » « less

Causeandeffect relations are one of the most valuable types of knowledge sought after throughout the datadriven sciences since they translate into stable and generalizable explanations as well as efficient and robust decisionmaking capabilities. Inferring these relations from data, however, is a challenging task. Two of the most common barriers to this goal are known as confounding and selection biases. The former stems from the systematic bias introduced during the treat ment assignment, while the latter comes from the systematic bias during the collection of units into the sample. In this paper, we consider the problem of identifiability of causal effects when both confounding and selection biases are simultaneously present. We first investigate the problem of identifiability when all the available data is biased. We prove that the algorithm proposed by [Bareinboim and Tian, 2015] is, in fact, complete, namely, whenever the algorithm returns a failure condition, no identifiability claim about the causal relation can be made by any other method. We then generalize this setting to when, in addition to the biased data, another piece of external data is available, without bias. It may be the case that a subset of the covariates could be measured without bias (e.g., from census). We examine the problem of identifiability when a combination of biased and unbiased data is available. We propose a new algorithm that subsumes the current stateoftheart method based on the backdoor criterion.more » « less

Causeandeffect relations are one of the most valuable types of knowledge sought after throughout the datadriven sciences since they translate into stable and generalizable explanations as well as efficient and robust decisionmaking capabilities. Inferring these relations from data, however, is a challenging task. Two of the most common barriers to this goal are known as confounding and selection biases. The former stems from the systematic bias introduced during the treatment assignment, while the latter comes from the systematic bias during the collection of units into the sample. In this paper, we consider the problem of identifiability of causal effects when both confounding and selection biases are simultaneously present. We first investigate the problem of identifiability when all the available data is biased. We prove that the algorithm proposed by [Bareinboim and Tian, 2015] is, in fact, complete, namely, whenever the algorithm returns a failure condition, no identifiability claim about the causal relation can be made by any other method. We then generalize this setting to when, in addition to the biased data, another piece of external data is available, without bias. It may be the case that a subset of the covariates could be measured without bias (e.g., from census). We examine the problem of identifiability when a combination of biased and unbiased data is available. We propose a new algorithm that subsumes the current stateoftheart method based on the backdoor criterion.more » « less