skip to main content


Title: Near-Optimal A-B Testing
We consider the problem of A-B testing when the impact of the treatment is marred by a large number of covariates. Randomization can be highly inefficient in such settings, and thus we consider the problem of optimally allocating test subjects to either treatment with a view to maximizing the precision of our estimate of the treatment effect. Our main contribution is a tractable algorithm for this problem in the online setting, where subjects arrive, and must be assigned, sequentially, with covariates drawn from an elliptical distribution with finite second moment. We further characterize the gain in precision afforded by optimized allocations relative to randomized allocations, and show that this gain grows large as the number of covariates grows. Our dynamic optimization framework admits several generalizations that incorporate important operational constraints such as the consideration of selection bias, budgets on allocations, and endogenous stopping times. In a set of numerical experiments, we demonstrate that our method simultaneously offers better statistical efficiency and less selection bias than state-of-the-art competing biased coin designs.  more » « less
Award ID(s):
1727239 1054034
NSF-PAR ID:
10152064
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Management Science
ISSN:
0025-1909
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Methodological advancements, including propensity score methods, have resulted in improved unbiased estimation of treatment effects from observational data. Traditionally, a “throw in the kitchen sink” approach has been used to select covariates for inclusion into the propensity score, but recent work shows including unnecessary covariates can impact both the bias and statistical efficiency of propensity score estimators. In particular, the inclusion of covariates that impact exposure but not the outcome, can inflate standard errors without improving bias, while the inclusion of covariates associated with the outcome but unrelated to exposure can improve precision. We propose the outcome-adaptive lasso for selecting appropriate covariates for inclusion in propensity score models to account for confounding bias and maintaining statistical efficiency. This proposed approach can perform variable selection in the presence of a large number of spurious covariates, that is, covariates unrelated to outcome or exposure. We present theoretical and simulation results indicating that the outcome-adaptive lasso selects the propensity score model that includes all true confounders and predictors of outcome, while excluding other covariates. We illustrate covariate selection using the outcome-adaptive lasso, including comparison to alternative approaches, using simulated data and in a survey of patients using opioid therapy to manage chronic pain.

     
    more » « less
  2. In this work, we study the optimal design of two-armed clinical trials to maximize the accuracy of parameter estimation in a statistical model, where the interaction between patient covariates and treatment are explicitly incorporated to enable precision medication decisions. Such a modeling extension leads to significant complexities for the produced optimization problems because they include optimization over design and covariates concurrently. We take a min-max optimization model and minimize (over design) the maximum (over population) variance of the estimated interaction effect between treatment and patient covariates. This results in a min-max bilevel mixed integer nonlinear programming problem, which is notably challenging to solve. To address this challenge, we introduce a surrogate optimization model by approximating the objective function, for which we propose two solution approaches. The first approach provides an exact solution based on reformulation and decomposition techniques. In the second approach, we provide a lower bound for the inner optimization problem and solve the outer optimization problem over the lower bound. We test our proposed algorithms with synthetic and real-world data sets and compare them with standard (re)randomization methods. Our numerical analysis suggests that the proposed approaches provide higher-quality solutions in terms of the variance of estimators and probability of correct selection. We also show the value of covariate information in precision medicine clinical trials by comparing our proposed approaches to an alternative optimal design approach that does not consider the interaction terms between covariates and treatment. Summary of Contribution: Precision medicine is the future of healthcare where treatment is prescribed based on each patient information. Designing precision medicine clinical trials, which are the cornerstone of precision medicine, is extremely challenging because sample size is limited and patient information may be multidimensional. This work proposes a novel approach to optimally estimate the treatment effect for each patient type in a two-armed clinical trial by reducing the largest variance of personalized treatment effect. We use several statistical and optimization techniques to produce efficient solution methodologies. Results have the potential to save countless lives by transforming the design and implementation of future clinical trials to ensure the right treatments for the right patients. Doing so will reduce patient risks and reduce costs in the healthcare system. 
    more » « less
  3. Abstract

    We consider estimating average treatment effects (ATE) of a binary treatment in observational data when data‐driven variable selection is needed to select relevant covariates from a moderately large number of available covariates . To leverage covariates among predictive of the outcome for efficiency gain while using regularization to fit a parametric propensity score (PS) model, we consider a dimension reduction of based on fitting both working PS and outcome models using adaptive LASSO. A novel PS estimator, the Double‐index Propensity Score (DiPS), is proposed, in which the treatment status is smoothed over the linear predictors for from both the initial working models. The ATE is estimated by using the DiPS in a normalized inverse probability weighting estimator, which is found to maintain double robustness and also local semiparametric efficiency with a fixed number of covariatesp. Under misspecification of working models, the smoothing step leads to gains in efficiency and robustness over traditional doubly robust estimators. These results are extended to the case wherepdiverges with sample size and working models are sparse. Simulations show the benefits of the approach in finite samples. We illustrate the method by estimating the ATE of statins on colorectal cancer risk in an electronic medical record study and the effect of smoking on C‐reactive protein in the Framingham Offspring Study.

     
    more » « less
  4. Cause-and-effect relations are one of the most valuable types of knowledge sought after throughout the data-driven sciences since they translate into stable and generalizable explanations as well as efficient and robust decision-making capabilities. Inferring these relations from data, however, is a challenging task. Two of the most common barriers to this goal are known as confounding and selection biases. The former stems from the systematic bias introduced during the treatment assignment, while the latter comes from the systematic bias during the collection of units into the sample. In this paper, we consider the problem of identifiability of causal effects when both confounding and selection biases are simultaneously present. We first investigate the problem of identifiability when all the available data is biased. We prove that the algorithm proposed by [Bareinboim and Tian, 2015] is, in fact, complete, namely, whenever the algorithm returns a failure condition, no identifiability claim about the causal relation can be made by any other method. We then generalize this setting to when, in addition to the biased data, another piece of external data is available, without bias. It may be the case that a subset of the covariates could be measured without bias (e.g., from census). We examine the problem of identifiability when a combination of biased and unbiased data is available. We propose a new algorithm that subsumes the current state-of-the-art method based on the back-door criterion. 
    more » « less
  5. Abstract STUDY QUESTION Do daughters of older mothers have lower fecundability? SUMMARY ANSWER In this cohort study of North American pregnancy planners, there was virtually no association between maternal age ≥35 years and daughters’ fecundability. WHAT IS KNOWN ALREADY Despite suggestive evidence that daughters of older mothers may have lower fertility, only three retrospective studies have examined the association between maternal age and daughter’s fecundability. STUDY DESIGN, SIZE, DURATION Prospective cohort study of 6689 pregnancy planners enrolled between March 2016 and January 2020. PARTICIPANTS/MATERIALS, SETTING, METHODS Pregnancy Study Online (PRESTO) is an ongoing pre-conception cohort study of pregnancy planners (age, 21-45 years) from the USA and Canada. We estimated fecundability ratios (FR) for maternal age at the participant’s birth using multivariable proportional probabilities regression models. MAIN RESULTS AND THE ROLE OF CHANCE Daughters of mothers ≥30 years were less likely to have previous pregnancies (or pregnancy attempts) or risk factors for infertility, although they were more likely to report that their mother had experienced problems conceiving. The proportion of participants with prior unplanned pregnancies, a birth before age 21, ≥3 cycles of attempt at study entry or no follow-up was greater among daughters of mothers <25 years. Compared with maternal age 25–29 years, FRs (95% CI) for maternal age <20, 20–24, 30–34, and ≥35 were 0.72 (0.61, 0.84), 0.92 (0.85, 1.00), 1.08 (1.00, 1.17), and 1.00 (0.89, 1.12), respectively. LIMITATIONS, REASONS FOR CAUTION Although the examined covariates did not meaningfully affect the associations, we had limited information on the participants’ mother. Differences by maternal age in reproductive history, infertility risk factors and loss to follow-up suggest that selection bias may partly explain our results. WIDER IMPLICATIONS OF THE FINDINGS Our finding that maternal age 35 years or older was not associated with daughter’s fecundability is reassuring, considering the trend towards delayed childbirth. However, having been born to a young mother may be a marker of low fecundability among pregnancy planners. STUDY FUNDING/COMPETING INTEREST(S) PRESTO was funded by NICHD Grants (R21-HD072326 and R01-HD086742) and has received in-kind donations from Swiss Precision Diagnostics, FertilityFriend.com, Kindara.com, and Sandstone Diagnostics. Dr Wise is a fibroid consultant for AbbVie, Inc. TRIAL REGISTRATION NUMBER n/a 
    more » « less