skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Semiparametric Counterfactual Density Estimation
Causal effects are often characterized with averages, which can give an incomplete picture of the underlying counterfactual distributions. Here we consider estimating the entire counterfactual density and generic functionals thereof. We focus on two kinds of target parameters. The first is a density approximation, defined by a projection onto a finite-dimensional model using a generalized distance metric, which includes f-divergences as well as Lp norms. The second is the distance between counterfactual densities, which can be used as a more nuanced effect measure than the mean difference, and as a tool for model selection. We study nonparametric efficiency bounds for these targets, giving results for smooth but otherwise generic models and distances. Importantly, we show how these bounds connect to means of particular non-trivial functions of counterfactuals, linking the problems of density and mean estimation. We go on to propose doubly robust-style estimators for the density approximations and distances, and study their rates of convergence, showing they can be optimally efficient in large nonparametric models. We also give analogous methods for model selection and aggregation, when many models may be available and of interest. Our results all hold for generic models and distances, but throughout we highlight what happens for particular choices, such as L2 projections on linear models, and KL projections on exponential families. Finally we illustrate by estimating the density of CD4 count among patients with HIV, had all been treated with combination therapy versus zidovudine alone, as well as a density effect. Our results suggest combination therapy may have increased CD4 count most for high-risk patients. Our methods are implemented in the freely available R package npcausal on GitHub.  more » « less
Award ID(s):
2113684 1763734
PAR ID:
10430444
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Biometrika
ISSN:
0006-3444
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary This paper studies nonparametric identification in market-level demand models for differentiated products with heterogeneous consumers. We consider a general class of models that allows for the individual-specific coefficients to vary continuously across the population and give conditions under which the density of these coefficients, and hence also functionals such as the fractions of individuals who benefit from a counterfactual intervention, is identified. 
    more » « less
  2. We propose a framework for analyzing the sensitivity of counterfactuals to parametric assumptions about the distribution of latent variables in structural models. In particular, we derive bounds on counterfactuals as the distribution of latent variables spans nonparametric neighborhoods of a given parametric specification while other “structural” features of the model are maintained. Our approach recasts the infinite‐dimensional problem of optimizing the counterfactual with respect to the distribution of latent variables (subject to model constraints) as a finite‐dimensional convex program. We also develop an MPEC version of our method to further simplify computation in models with endogenous parameters (e.g., value functions) defined by equilibrium constraints. We propose plug‐in estimators of the bounds and two methods for inference. We also show that our bounds converge to the sharp nonparametric bounds on counterfactuals as the neighborhood size becomes large. To illustrate the broad applicability of our procedure, we present empirical applications to matching models with transferable utility and dynamic discrete choice models. 
    more » « less
  3. In this paper, we derive parameterized Chernoff bounds and show their applications for simplifying the analysis of some well-known probabilistic algorithms and data structures. The parameterized Chernoff bounds we provide give probability bounds that are powers of two, with a clean formulation of the relation between the constant in the exponent and the relative distance from the mean. In addition, we provide new simplified analyses with these bounds for hash tables, randomized routing, and a simplified, non-recursive adaptation of the Floyd-Rivest selection algorithm. 
    more » « less
  4. Coarse Structural Nested Mean Models (SNMMs, Robins (2000)) and G-estimation can be used to estimate the causal effect of a time-varying treatment from longitudinal observational studies. However, they rely on an untestable assumption of no unmeasured confounding. In the presence of unmeasured confounders, the unobserved potential outcomes are not missing at random, and standard G-estimation leads to biased effect estimates. To remedy this, we investigate the sensitivity of G-estimators of coarse SNMMs to unmeasured confounding, assuming a nonidentifiable bias function which quantifies the impact of unmeasured confounding on the average potential outcome. We present adjusted G-estimators of coarse SNMM parameters and prove their consistency, under the bias modeling for unmeasured confounding. We apply this to a sensitivity analysis for the effect of the ART initiation time on the mean CD4 count at year 2 after infection in HIV-positive patients, based on the prospective Acute and Early Disease Research Program. 
    more » « less
  5. The development of point-of-care, cost-effective, and easy-to-use assays for the accurate counting of CD4+ T cells remains an important focus for HIV-1 disease management. The CD4+ T cell count provides an indication regarding the overall success of HIV-1 treatments. The CD4+ T count information is equally important for both resource-constrained regions and areas with extensive resources. Hospitals and other allied facilities may be overwhelmed by epidemics or other disasters. An assay for a physician’s office or other home-based setting is becoming increasingly popular. We have developed a technology for the rapid quantification of CD4+ T cells. A double antibody selection process, utilizing anti-CD4 and anti-CD3 antibodies, is tested and provides a high specificity. The assay utilizes a microfluidic chip coated with the anti-CD3 antibody, having an improved antibody avidity. As a result of enhanced binding, a higher flow rate can be applied that enables an improved channel washing to reduce non-specific bindings. A wide-field optical imaging system is also developed that provides the rapid quantification of cells. The designed optical setup is portable and low-cost. An ImageJ-based program is developed for the automatic counting of CD4+ T cells. We have successfully isolated and counted CD4+ T cells with high specificity and efficiency greater than 90%. 
    more » « less