skip to main content

Title: The Future Strikes Back: Using Future Treatments to Detect and Reduce Hidden Bias
Conventional advice discourages controlling for postoutcome variables in regression analysis. By contrast, we show that controlling for commonly available postoutcome (i.e., future) values of the treatment variable can help detect, reduce, and even remove omitted variable bias (unobserved confounding). The premise is that the same unobserved confounder that affects treatment also affects the future value of the treatment. Future treatments thus proxy for the unmeasured confounder, and researchers can exploit these proxy measures productively. We establish several new results: Regarding a commonly assumed data-generating process involving future treatments, we (1) introduce a simple new approach and show that it strictly reduces bias, (2) elaborate on existing approaches and show that they can increase bias, (3) assess the relative merits of alternative approaches, and (4) analyze true state dependence and selection as key challenges. (5) Importantly, we also introduce a new nonparametric test that uses future treatments to detect hidden bias even when future-treatment estimation fails to reduce bias. We illustrate these results empirically with an analysis of the effect of parental income on children’s educational attainment.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Sociological Methods & Research
Page Range / eLocation ID:
1014 to 1051
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Unobserved confounding presents a major threat to causal inference in observational studies. Recently, several authors have suggested that this problem could be overcome in a shared confounding setting where multiple treatments are independent given a common latent confounder. It has been shown that under a linear Gaussian model for the treatments, the causal effect is not identifiable without parametric assumptions on the outcome model. In this note, we show that the causal effect is indeed identifiable if we assume a general binary choice model for the outcome with a non-probit link. Our identification approach is based on the incongruence between Gaussianity of the treatments and latent confounder and non-Gaussianity of a latent outcome variable. We further develop a two-step likelihood-based estimation procedure. 
    more » « less
  2. Abstract

    We present a simulation study and application that shows inclusion of binary proxy variables related to binary unmeasured confounders improves the estimate of a related treatment effect in binary logistic regression. The simulation study included 60,000 randomly generated parameter scenarios of sample size 10,000 across six different simulation structures. We assessed bias by comparing the probability of finding the expected treatment effect relative to the modeled treatment effect with and without the proxy variable. Inclusion of a proxy variable in the logistic regression model significantly reduced the bias of the treatment or exposure effect when compared to logistic regression without the proxy variable. Including proxy variables in the logistic regression model improves the estimation of the treatment effect at weak, moderate, and strong association with unmeasured confounders and the outcome, treatment, or proxy variables. Comparative advantages held for weakly and strongly collapsible situations, as the number of unmeasured confounders increased, and as the number of proxy variables adjusted for increased.

    more » « less
  3. Networked observational data presents new opportunities for learning individual causal effects, which plays an indispensable role in decision making. Such data poses the challenge of confounding bias. Previous work presents two desiderata to handle confounding bias. On the treatment group level, we aim to balance the distributions of confounder representations. On the individual level, it is desirable to capture patterns of hidden confounders that predict treatment assignments. Existing methods show the potential of utilizing network information to handle confounding bias, but they only try to satisfy one of the two desiderata. This is because the two desiderata seem to contradict each other. When the two distributions of confounder representations are highly overlapped, then we confront the undiscriminating problem between the treated and the controlled. In this work, we formulate the two desiderata as a minimax game. We propose IGNITE that learns representations of confounders from networked observational data, which is trained by a minimax game to achieve the two desiderata. Experiments verify the efficacy of IGNITE on two datasets under various settings.

    more » « less
  4. Recommender systems may be confounded by various types of confounding factors (also called confounders) that may lead to inaccurate recommendations and sacrificed recommendation performance. Current approaches to solving the problem usually design each specific model for each specific confounder. However, real-world systems may include a huge number of confounders and thus designing each specific model for each specific confounder could be unrealistic. More importantly, except for those “explicit confounders” that experts can manually identify and process such as item’s position in the ranking list, there are also many “latent confounders” that are beyond the imagination of experts. For example, users’ rating on a song may depend on their current mood or the current weather, and users’ preference on ice creams may depend on the air temperature. Such latent confounders may be unobservable in the recorded training data. To solve the problem, we propose Deconfounded Causal Collaborative Filtering (DCCF). We first frame user behaviors with unobserved confounders into a causal graph, and then we design a front-door adjustment model carefully fused with machine learning to deconfound the influence of unobserved confounders. Experiments on real-world datasets show that our method is able to deconfound unobserved confounders to achieve better recommendation performance. 
    more » « less
  5. null (Ed.)

    One fundamental problem in causality learning is to estimate the causal effects of one or multiple treatments (e.g., medicines in the prescription) on an important outcome (e.g., cure of a disease). One major challenge of causal effect estimation is the existence of unobserved confounders -- the unobserved variables that affect both the treatments and the outcome. Recent studies have shown that by modeling how instances are assigned with different treatments together, the patterns of unobserved confounders can be captured through their learned latent representations. However, the interpretability of the representations in these works is limited. In this paper, we focus on the multi-cause effect estimation problem from a new perspective by learning disentangled representations of confounders. The disentangled representations not only facilitate the treatment effect estimation but also strengthen the understanding of causality learning process. Experimental results on both synthetic and real-world datasets show the superiority of our proposed framework from different aspects.

    more » « less