skip to main content

This content will become publicly available on August 1, 2023

Title: The Future Strikes Back: Using Future Treatments to Detect and Reduce Hidden Bias
Conventional advice discourages controlling for postoutcome variables in regression analysis. By contrast, we show that controlling for commonly available postoutcome (i.e., future) values of the treatment variable can help detect, reduce, and even remove omitted variable bias (unobserved confounding). The premise is that the same unobserved confounder that affects treatment also affects the future value of the treatment. Future treatments thus proxy for the unmeasured confounder, and researchers can exploit these proxy measures productively. We establish several new results: Regarding a commonly assumed data-generating process involving future treatments, we (1) introduce a simple new approach and show that it strictly reduces bias, (2) elaborate on existing approaches and show that they can increase bias, (3) assess the relative merits of alternative approaches, and (4) analyze true state dependence and selection as key challenges. (5) Importantly, we also introduce a new nonparametric test that uses future treatments to detect hidden bias even when future-treatment estimation fails to reduce bias. We illustrate these results empirically with an analysis of the effect of parental income on children’s educational attainment.
Award ID(s):
Publication Date:
Journal Name:
Sociological Methods & Research
Page Range or eLocation-ID:
1014 to 1051
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Unobserved confounding presents a major threat to causal inference in observational studies. Recently, several authors have suggested that this problem could be overcome in a shared confounding setting where multiple treatments are independent given a common latent confounder. It has been shown that under a linear Gaussian model for the treatments, the causal effect is not identifiable without parametric assumptions on the outcome model. In this note, we show that the causal effect is indeed identifiable if we assume a general binary choice model for the outcome with a non-probit link. Our identification approach is based on the incongruence between Gaussianity of the treatments and latent confounder and non-Gaussianity of a latent outcome variable. We further develop a two-step likelihood-based estimation procedure.
  2. Abstract Motivation

    Sketching is now widely used in bioinformatics to reduce data size and increase data processing speed. Sketching approaches entice with improved scalability but also carry the danger of decreased accuracy and added bias. In this article, we investigate the minimizer sketch and its use to estimate the Jaccard similarity between two sequences.


    We show that the minimizer Jaccard estimator is biased and inconsistent, which means that the expected difference (i.e. the bias) between the estimator and the true value is not zero, even in the limit as the lengths of the sequences grow. We derive an analytical formula for the bias as a function of how the shared k-mers are laid out along the sequences. We show both theoretically and empirically that there are families of sequences where the bias can be substantial (e.g. the true Jaccard can be more than double the estimate). Finally, we demonstrate that this bias affects the accuracy of the widely used mashmap read mapping tool.

    Availability and implementation

    Scripts to reproduce our experiments are available at

    Supplementary information

    Supplementary data are available at Bioinformatics online.

  3. One fundamental problem in causality learning is to estimate the causal effects of one or multiple treatments (e.g., medicines in the prescription) on an important outcome (e.g., cure of a disease). One major challenge of causal effect estimation is the existence of unobserved confounders -- the unobserved variables that affect both the treatments and the outcome. Recent studies have shown that by modeling how instances are assigned with different treatments together, the patterns of unobserved confounders can be captured through their learned latent representations. However, the interpretability of the representations in these works is limited. In this paper, we focus on the multi-cause effect estimation problem from a new perspective by learning disentangled representations of confounders. The disentangled representations not only facilitate the treatment effect estimation but also strengthen the understanding of causality learning process. Experimental results on both synthetic and real-world datasets show the superiority of our proposed framework from different aspects.

  4. Abstract

    The spine flexibility creates one of the most significant challenges to proper positioning in radiation therapy of head and neck cancers. Even though existing immobilization techniques can reduce the positioning uncertainty, residual errors (2–3 mm along the cervical spine) cannot be mitigated by single translation-based approaches. Here, we introduce a fully radiotherapy-compatible electro-mechanical robotic system, capable of positioning a patient’s head with submillimeter accuracy in clinically acceptable spatial constraints. Key mechanical components, designed by finite element analysis, are fabricated with 3D printing and a cyclic loading test of the printed materials captures a great mechanical robustness. Measured attenuation of most printed components is lower than analytic estimations and radiographic imaging shows no visible artifacts, implying full radio-compatibility. The new system evaluates the positioning accuracy with an anthropomorphic skeletal phantom and optical tracking system, which shows a minimal residual error (0.7 ± 0.3 mm). This device also offers an accurate assessment of the post correction error of aligning individual regions when the head and body are individually positioned. Collectively, the radiotherapy-compatible robotic system enables multi-landmark setup to align the head and body independently and accurately for radiation treatment, which will significantly reduce the need for large margins in the lower neck.

  5. ABSTRACT The fecal indicator bacterial species Escherichia coli is an important measure of water quality and a leading cause of impaired surface waters. We investigated the impact of the filter-feeding metazooplankton Daphnia magna on the inactivation of E. coli . The E. coli clearance rates of these daphnids were calculated from a series of batch experiments conducted under variable environmental conditions. Batch system experiments of 24 to 48 h in duration were completed to test the impacts of bacterial concentration, organism density, temperature, and water type. The maximum clearance rate for adult D. magna organisms was 2 ml h −1 organism −1 . Less than 5% of E. coli removed from water by daphnids was recoverable from excretions. Sorption of E. coli on daphnid carapaces was not observed. As a comparison, the clearance rates of the freshwater rotifer Branchionus calyciflorus were also calculated for select conditions. The maximum clearance rate for B. calyciflorus was 6 × 10 −4  ml h −1 organism −1 . This research furthers our understanding of the impacts of metazooplankton predation on E. coli inactivation and the effects of environmental variables on filter feeding. Based on our results, metazooplankton can play an important role in the reduction of E. colimore »in natural treatment systems under environmentally relevant conditions. IMPORTANCE Escherichia coli is a fecal indicator bacterial species monitored by the U.S. Environmental Protection Agency to assess microbial water quality. Due to the potential human health implications linked to high levels of E. coli , it is important to understand the inactivation or reduction mechanisms in surface waters. Our research examines the capacities of two types of widespread filter-feeding freshwater metazooplankton, Daphnia magna and Brachionus calyciflorus , to reduce E. coli concentrations. We examine the impacts of different environmentally relevant conditions on the clearance rates. Our results contribute to a better understanding of the importance of metazooplankton in controlling E. coli concentrations and what conditions will reduce or increase grazing. These results provide baseline data to support future efforts to develop a quantitative model relating zooplankton uptake rates to relevant environmental variables.« less