skip to main content


This content will become publicly available on April 27, 2024

Title: Covariate Balancing Methods for Randomized Controlled Trials Are Not Adversarially Robust
The first step toward investigating the effectiveness of a treatment via a randomized trial is to split the population into control and treatment groups then compare the average response of the treatment group receiving the treatment to the control group receiving the placebo. To ensure that the difference between the two groups is caused only by the treatment, it is crucial that the control and the treatment groups have similar statistics. Indeed, the validity and reliability of a trial are determined by the similarity of two groups’ statistics. Covariate balancing methods increase the similarity between the distributions of the two groups’ covariates. However, often in practice, there are not enough samples to accurately estimate the groups’ covariate distributions. In this article, we empirically show that covariate balancing with the standardized means difference (SMD) covariate balancing measure, as well as Pocock and Simon’s sequential treatment assignment method, are susceptible to worst case treatment assignments. Worst case treatment assignments are those admitted by the covariate balance measure, but result in highest possible ATE estimation errors. We developed an adversarial attack to find adversarial treatment assignment for any given trial. Then, we provide an index to measure how close the given trial is to the worst case. To this end, we provide an optimization-based algorithm, namely adversarial treatment assignment in treatment effect trials (ATASTREET), to find the adversarial treatment assignments.  more » « less
Award ID(s):
1937134 1842378 1911094 1838177 1730574
NSF-PAR ID:
10466282
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Neural Networks and Learning Systems
ISSN:
2162-237X
Page Range / eLocation ID:
1 to 13
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Interest in incorporating historical data in the clinical trial has increased with the rising cost of conducting clinical trials. The intervention arm for the current trial often requires prospective data to assess a novel treatment, and thus borrowing historical control data commensurate in distribution to current control data is motivated in order to increase the allocation ratio to the current intervention arm. Existing historical control borrowing adaptive designs adjust allocation ratios based on the commensurability assessed through study‐level summary statistics of the response agnostic of the distributions of the trial subject characteristics in the current and historical trials. This can lead to distributional imbalance of the current trial subject characteristics across the treatment arms as well as between current control data and borrowed historical control data. Such covariate imbalance may threaten the internal validity of the current trial by introducing confounding factors that affect study endpoints. In this article, we propose a Bayesian design which borrows and updates the treatment allocation ratios both covariate‐adaptively and commensurate to covariate dependently assessed similarity between the current and historical control data. We employ covariate‐dependent discrepancy parameters which are allowed to grow with the sample size and propose a regularized local regression procedure for the estimation of the parameters. The proposed design also permits the current and the historical controls to be similar to varying degree, depending on the subject level characteristics. We evaluate the proposed design extensively under the settings derived from two placebo‐controlled randomized trials on vertebral fracture risk in post‐menopausal women.

     
    more » « less
  2. Abstract Expert testimony varies in scientific quality and jurors have a difficult time evaluating evidence quality (McAuliff et al., 2009). In the current study, we apply Fuzzy Trace Theory principles, examining whether visual and gist aids help jurors calibrate to the strength of scientific evidence. Additionally we were interested in the role of jurors’ individual differences in scientific reasoning skills in their understanding of case evidence. Contrary to our preregistered hypotheses, there was no effect of evidence condition or gist aid on evidence understanding. However, individual differences between jurors’ numeracy skills predicted evidence understanding. Summary Poor-quality expert evidence is sometimes admitted into court (Smithburn, 2004). Jurors’ calibration to evidence strength varies widely and is not robustly understood. For instance, previous research has established jurors lack understanding of the role of control groups, confounds, and sample sizes in scientific research (McAuliff, Kovera, & Nunez, 2009; Mill, Gray, & Mandel, 1994). Still others have found that jurors can distinguish weak from strong evidence when the evidence is presented alone, yet not when simultaneously presented with case details (Smith, Bull, & Holliday, 2011). This research highlights the need to present evidence to jurors in a way they can understand. Fuzzy Trace Theory purports that people encode information in exact, verbatim representations and through “gist” representations, which represent summary of meaning (Reyna & Brainerd, 1995). It is possible that the presenting complex scientific evidence to people with verbatim content or appealing to the gist, or bottom-line meaning of the information may influence juror understanding of that evidence. Application of Fuzzy Trace Theory in the medical field has shown that gist representations are beneficial for helping laypeople better understand risk and benefits of medical treatment (Brust-Renck, Reyna, Wilhelms, & Lazar, 2016). Yet, little research has applied Fuzzy Trace Theory to information comprehension and application within the context of a jury (c.f. Reyna et. al., 2015). Additionally, it is likely that jurors’ individual characteristics, such as scientific reasoning abilities and cognitive tendencies, influence their ability to understand and apply complex scientific information (Coutinho, 2006). Methods The purpose of this study was to examine how jurors calibrate to the strength of scientific information, and whether individual difference variables and gist aids inspired by Fuzzy Trace Theory help jurors better understand complicated science of differing quality. We used a 2 (quality of scientific evidence: high vs. low) x 2 (decision aid to improve calibration - gist information vs. no gist information), between-subjects design. All hypotheses were preregistered on the Open Science Framework. Jury-eligible community participants (430 jurors across 90 juries; Mage = 37.58, SD = 16.17, 58% female, 56.93% White). Each jury was randomly assigned to one of the four possible conditions. Participants were asked to individually fill out measures related to their scientific reasoning skills prior to watching a mock jury trial. The trial was about an armed bank robbery and consisted of various pieces of testimony and evidence (e.g. an eyewitness testimony, police lineup identification, and a sweatshirt found with the stolen bank money). The key piece of evidence was mitochondrial DNA (mtDNA) evidence collected from hair on a sweatshirt (materials from Hans et al., 2011). Two experts presented opposing opinions about the scientific evidence related to the mtDNA match estimate for the defendant’s identification. The quality and content of this mtDNA evidence differed based on the two conditions. The high quality evidence condition used a larger database than the low quality evidence to compare to the mtDNA sample and could exclude a larger percentage of people. In the decision aid condition, experts in the gist information group presented gist aid inspired visuals and examples to help explain the proportion of people that could not be excluded as a match. Those in the no gist information group were not given any aid to help them understand the mtDNA evidence presented. After viewing the trial, participants filled out a questionnaire on how well they understood the mtDNA evidence and their overall judgments of the case (e.g. verdict, witness credibility, scientific evidence strength). They filled this questionnaire out again after a 45-minute deliberation. Measures We measured Attitudes Toward Science (ATS) with indices of scientific promise and scientific reservations (Hans et al., 2011; originally developed by National Science Board, 2004; 2006). We used Drummond and Fischhoff’s (2015) Scientific Reasoning Scale (SRS) to measure scientific reasoning skills. Weller et al.’s (2012) Numeracy Scale (WNS) measured proficiency in reasoning with quantitative information. The NFC-Short Form (Cacioppo et al., 1984) measured need for cognition. We developed a 20-item multiple-choice comprehension test for the mtDNA scientific information in the cases (modeled on Hans et al., 2011, and McAuliff et al., 2009). Participants were shown 20 statements related to DNA evidence and asked whether these statements were True or False. The test was then scored out of 20 points. Results For this project, we measured calibration to the scientific evidence in a few different ways. We are building a full model with these various operationalizations to be presented at APLS, but focus only on one of the calibration DVs (i.e., objective understanding of the mtDNA evidence) in the current proposal. We conducted a general linear model with total score on the mtDNA understanding measure as the DV and quality of scientific evidence condition, decision aid condition, and the four individual difference measures (i.e., NFC, ATS, WNS, and SRS) as predictors. Contrary to our main hypotheses, neither evidence quality nor decision aid condition affected juror understanding. However, the individual difference variables did: we found significant main effects for Scientific Reasoning Skills, F(1, 427) = 16.03, p <.001, np2 = .04, Weller Numeracy Scale, F(1, 427) = 15.19, p <.001, np2 = .03, and Need for Cognition, F(1, 427) = 16.80, p <.001, np2 = .04, such that those who scored higher on these measures displayed better understanding of the scientific evidence. In addition there was a significant interaction of evidence quality condition and scores on the Weller’s Numeracy Scale, F(1, 427) = 4.10, p = .04, np2 = .01. Further results will be discussed. Discussion These data suggest jurors are not sensitive to differences in the quality of scientific mtDNA evidence, and also that our attempt at helping sensitize them with Fuzzy Trace Theory-inspired aids did not improve calibration. Individual scientific reasoning abilities and general cognition styles were better predictors of understanding this scientific information. These results suggest a need for further exploration of approaches to help jurors differentiate between high and low quality evidence. Note: The 3rd author was supported by an AP-LS AP Award for her role in this research. Learning Objective: Participants will be able to describe how individual differences in scientific reasoning skills help jurors understand complex scientific evidence. 
    more » « less
  3. Abstract Background Substance use disorders (SUDs) represent major public health concerns and are linked to enhanced risk of legal consequences. Unresolved legal issues may prevent individuals with SUD from completing treatment. Interventions aimed at improving SUD treatment outcomes are limited. Filling that gap, this randomized controlled trial (RCT) tests the ability of a technology-assisted intervention to increase SUD treatment completion rates and improve post-treatment health, economic, justice-system, and housing outcomes. Methods A randomized controlled trial with a two-year administrative follow-up period will be conducted. Eight hundred Medicaid eligible and uninsured adults receiving SUD treatment will be recruited at community-based non-profit health care clinics in Southeast, Michigan, USA. Using an algorithm embedded in a community-based case management system, we randomly assign all eligible adults to one of two groups. The treatment/intervention group will receive hands-on assistance with a technology aimed at resolving unaddressed legal issues and the control group receives no treatment. Upon enrollment into the intervention, both treatment ( n  = 400) and control groups ( n  = 400) retain traditional options to resolve unaddressed legal issues, such as hiring an attorney, but only the treatment group is targeted the technology and offered personalized assistance in navigating the online legal platform. To develop baseline and historical contexts for participants, we collect life course history reports from all participants and intend to link those in each group to administrative data sources. In addition to the randomized controlled trial (RCT), we used an exploratory sequential mixed methods and participatory-based design to develop, test, and administer our life course history instruments to all participants. The primary objective is to test whether targeting no-cost online legal resources to those experiencing SUD improves their long-term recovery and decreases negative health, economic, justice-system, and housing outcomes. Discussion Findings from this RCT will improve our understanding of the acute socio-legal needs faced by those experiencing SUD and provide recommendations to help target resources toward the areas that best support long-term recovery. The public health impact includes making publicly available a deidentified, longitudinal dataset of uninsured and Medicaid eligible clients in treatment for SUD. Data include an overrepresentation of understudied groups including African American and American Indian Alaska Native persons documented to experience heightened risk for SUD-related premature mortality and justice-system involvement. Within these data, several intended outcome measures can inform the health policy landscape: (1) health, including substance use, disability, mental health diagnosis, and mortality; (2) financial health, including employment, earnings, public assistance receipt, and financial obligations to the state; (3) justice-system involvement, including civil and criminal legal system encounters; (4) housing, including homelessness, household composition, and homeownership. Trial registration Retrospectively registered # NCT05665179 on December 27, 2022. 
    more » « less
  4. Abstract

    “Covariate adjustment” in the randomized trial context refers to an estimator of the average treatment effect that adjusts for chance imbalances between study arms in baseline variables (called “covariates”). The baseline variables could include, for example, age, sex, disease severity, and biomarkers. According to two surveys of clinical trial reports, there is confusion about the statistical properties of covariate adjustment. We focus on the analysis of covariance (ANCOVA) estimator, which involves fitting a linear model for the outcome given the treatment arm and baseline variables, and trials that use simple randomization with equal probability of assignment to treatment and control. We prove the following new (to the best of our knowledge) robustness property of ANCOVA to arbitrary model misspecification: Not only is the ANCOVA point estimate consistent (as proved by Yang and Tsiatis, 2001) but so is its standard error. This implies that confidence intervals and hypothesis tests conducted as if the linear model were correct are still asymptotically valid even when the linear model is arbitrarily misspecified, for example, when the baseline variables are nonlinearly related to the outcome or there is treatment effect heterogeneity. We also give a simple, robust formula for the variance reduction (equivalently, sample size reduction) from using ANCOVA. By reanalyzing completed randomized trials for mild cognitive impairment, schizophrenia, and depression, we demonstrate how ANCOVA can achieve variance reductions of 4 to 32%.

     
    more » « less
  5. For large observational studies lacking a control group (unlike randomized controlled trials, RCT), propensity scores (PS) are often the method of choice to account for pre-treatment confounding in baseline characteristics, and thereby avoid substantial bias in treatment estimation. A vast majority of PS techniques focus on average treatment effect estimation, without any clear consensus on how to account for confounders, especially in a multiple treatment setting. Furthermore, for time-to event outcomes, the analytical framework is further complicated in presence of high censoring rates (sometimes, due to non-susceptibility of study units to a disease), imbalance between treatment groups, and clustered nature of the data (where, survival outcomes appear in groups). Motivated by a right-censored kidney transplantation dataset derived from the United Network of Organ Sharing (UNOS), we investigate and compare two recent promising PS procedures, (a) the generalized boosted model (GBM), and (b) the covariate-balancing propensity score (CBPS), in an attempt to decouple the causal effects of treatments (here, study subgroups, such as hepatitis C virus (HCV) positive/negative donors, and positive/negative recipients) on time to death of kidney recipients due to kidney failure, post transplantation. For estimation, we employ a 2-step procedure which addresses various complexities observed in the UNOS database within a unified paradigm. First, to adjust for the large number of confounders on the multiple sub-groups, we fit multinomial PS models via procedures (a) and (b). In the next stage, the estimated PS is incorporated into the likelihood of a semi-parametric cure rate Cox proportional hazard frailty model via inverse probability of treatment weighting, adjusted for multi-center clustering and excess censoring, Our data analysis reveals a more informative and superior performance of the full model in terms of treatment effect estimation, over sub-models that relaxes the various features of the event time dataset. 
    more » « less