skip to main content


Title: The Future Strikes Back: Using Future Treatments to Detect and Reduce Hidden Bias
Conventional advice discourages controlling for postoutcome variables in regression analysis. By contrast, we show that controlling for commonly available postoutcome (i.e., future) values of the treatment variable can help detect, reduce, and even remove omitted variable bias (unobserved confounding). The premise is that the same unobserved confounder that affects treatment also affects the future value of the treatment. Future treatments thus proxy for the unmeasured confounder, and researchers can exploit these proxy measures productively. We establish several new results: Regarding a commonly assumed data-generating process involving future treatments, we (1) introduce a simple new approach and show that it strictly reduces bias, (2) elaborate on existing approaches and show that they can increase bias, (3) assess the relative merits of alternative approaches, and (4) analyze true state dependence and selection as key challenges. (5) Importantly, we also introduce a new nonparametric test that uses future treatments to detect hidden bias even when future-treatment estimation fails to reduce bias. We illustrate these results empirically with an analysis of the effect of parental income on children’s educational attainment.  more » « less
Award ID(s):
2042875
NSF-PAR ID:
10348443
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Sociological Methods & Research
Volume:
51
Issue:
3
ISSN:
0049-1241
Page Range / eLocation ID:
1014 to 1051
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, we study the optimal design of two-armed clinical trials to maximize the accuracy of parameter estimation in a statistical model, where the interaction between patient covariates and treatment are explicitly incorporated to enable precision medication decisions. Such a modeling extension leads to significant complexities for the produced optimization problems because they include optimization over design and covariates concurrently. We take a min-max optimization model and minimize (over design) the maximum (over population) variance of the estimated interaction effect between treatment and patient covariates. This results in a min-max bilevel mixed integer nonlinear programming problem, which is notably challenging to solve. To address this challenge, we introduce a surrogate optimization model by approximating the objective function, for which we propose two solution approaches. The first approach provides an exact solution based on reformulation and decomposition techniques. In the second approach, we provide a lower bound for the inner optimization problem and solve the outer optimization problem over the lower bound. We test our proposed algorithms with synthetic and real-world data sets and compare them with standard (re)randomization methods. Our numerical analysis suggests that the proposed approaches provide higher-quality solutions in terms of the variance of estimators and probability of correct selection. We also show the value of covariate information in precision medicine clinical trials by comparing our proposed approaches to an alternative optimal design approach that does not consider the interaction terms between covariates and treatment. Summary of Contribution: Precision medicine is the future of healthcare where treatment is prescribed based on each patient information. Designing precision medicine clinical trials, which are the cornerstone of precision medicine, is extremely challenging because sample size is limited and patient information may be multidimensional. This work proposes a novel approach to optimally estimate the treatment effect for each patient type in a two-armed clinical trial by reducing the largest variance of personalized treatment effect. We use several statistical and optimization techniques to produce efficient solution methodologies. Results have the potential to save countless lives by transforming the design and implementation of future clinical trials to ensure the right treatments for the right patients. Doing so will reduce patient risks and reduce costs in the healthcare system. 
    more » « less
  2. Phytophthora cinnamomi, also known as root rot, is an oomycete that is particularly damaging to the plant world. Infecting the root of plants, Phytophthora cinnamomi inhibits water uptake in plants, leading to increased rates of plant mortality. Rhododendron species are not impervious to the infestation of root rot, so, as a popular plant among gardeners, decreasing susceptibility to and identification of Phytophthora cinnamomi is beneficial to plant longevity. In this study, phosphite treatment and soil microbial communities are used to potentially prevent root rot from infecting the eight tested Rhododendron species. It is hypothesized that the phosphite treatment will directly attack the oomycete, as well as improve the defense system of the plants themselves. Rhododendrons treated with the live soil microbiota are predicted to be less susceptible to root rot due to increased resilience to disease from the presence of soil biota, potentially including mutualists such as mycorrhizal fungi. Since Phytophthora cinnamomi primarily affects the roots of plants, it is difficult to detect without uprooting those suspected of being diseased, which causes unnecessary and potentially fatal stress on the plant. This is why we used color analysis software to find a link between root rot infection and leaf color. Since Phytophthora cinnamomi decreases water uptake, plants that are infected will begin to wilt, and their leaves will begin to change color. Discovering a significant link between leaf color in Rhododendron species and Phytophthora cinnamomi infection has given a new diagnostic measure that will cause significantly less stress to the plant and will lead to better plant longevity outcomes. Our data also suggests both preventative measures and treatment options for certain Rhododendron species infected with P. cinnamomi, through the use of a combination of phosphite treatments and live soil biota presence. Our results differ by species, which we further analyzed through the utilization of specific leaf area measurements. Using this data, we were able to link our results to current theory, such as growth-defense tradeoffs and implications of tolerance versus resistance. 
    more » « less
  3. Summary Unobserved confounding presents a major threat to causal inference in observational studies. Recently, several authors have suggested that this problem could be overcome in a shared confounding setting where multiple treatments are independent given a common latent confounder. It has been shown that under a linear Gaussian model for the treatments, the causal effect is not identifiable without parametric assumptions on the outcome model. In this note, we show that the causal effect is indeed identifiable if we assume a general binary choice model for the outcome with a non-probit link. Our identification approach is based on the incongruence between Gaussianity of the treatments and latent confounder and non-Gaussianity of a latent outcome variable. We further develop a two-step likelihood-based estimation procedure. 
    more » « less
  4. SUMMARY

    Ambient noise tomography is a well-established tomographic imaging technique but the effect that spatially variable noise sources have on the measurements remains challenging to account for. Full waveform ambient noise inversion has emerged recently as a promising solution but is computationally challenging since even distant noise sources can have an influence on the interstation correlation functions and therefore requires a prohibitively large numerical domain, beyond that of the tomographic region of interest. We investigate a new strategy that allows us to reduce the simulation domain while still being able to account for distant contributions. To allow nearby numerical sources to account for distant true sources, we introduce correlated sources and generate a time-dependent effective source distribution at the boundary of a small region of interest that excites the correlation wavefield of a larger domain. In a series of 2-D numerical simulations, we demonstrate that the proposed methodology with correlated sources is able to successfully represent a far-field source that is simultaneously present with nearby sources and the methodology also successfully results in a robustly estimated noise source distribution. Furthermore, we show how beamforming results can be used as prior information regarding the azimuthal variation of the ambient noise sources in helping determine the far-field noise distribution. These experiments provide insight into how to reduce the computational cost needed to perform full waveform ambient noise inversion, which is key to turning it into a viable tomographic technique. In addition, the presented experiments may help reduce source-induced bias in time-dependent monitoring applications.

     
    more » « less
  5. Recommender systems may be confounded by various types of confounding factors (also called confounders) that may lead to inaccurate recommendations and sacrificed recommendation performance. Current approaches to solving the problem usually design each specific model for each specific confounder. However, real-world systems may include a huge number of confounders and thus designing each specific model for each specific confounder could be unrealistic. More importantly, except for those “explicit confounders” that experts can manually identify and process such as item’s position in the ranking list, there are also many “latent confounders” that are beyond the imagination of experts. For example, users’ rating on a song may depend on their current mood or the current weather, and users’ preference on ice creams may depend on the air temperature. Such latent confounders may be unobservable in the recorded training data. To solve the problem, we propose Deconfounded Causal Collaborative Filtering (DCCF). We first frame user behaviors with unobserved confounders into a causal graph, and then we design a front-door adjustment model carefully fused with machine learning to deconfound the influence of unobserved confounders. Experiments on real-world datasets show that our method is able to deconfound unobserved confounders to achieve better recommendation performance. 
    more » « less