skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Stable Discovery of Interpretable Subgroups via Calibration in Causal Studies
Summary Building on Yu and Kumbier's predictability, computability and stability (PCS) framework and for randomised experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re‐analysis of the 1999–2000 VIGOR study, an 8076‐patient randomised controlled trial that compared the risk of adverse events from a then newly approved drug, rofecoxib (Vioxx), with that from an older drug naproxen. Vioxx was found to, on average and in comparison with naproxen, reduce the risk of gastrointestinal events but increase the risk of thrombotic cardiovascular events. Applying StaDISC, we fit 18 popular conditional average treatment effect (CATE) estimators for both outcomes and use calibration to demonstrate their poor global performance. However, they are locally well‐calibrated and stable, enabling the identification of patient groups with larger than (estimated) average treatment effects. In fact, StaDISC discovers three clinically interpretable subgroups each for the gastrointestinal outcome (totalling 29.4% of the study size) and the thrombotic cardiovascular outcome (totalling 11.0%). Complementary analyses of the found subgroups using the 2001–2004 APPROVe study, a separate independently conducted randomised controlled trial with 2587 patients, provide further supporting evidence for the promise of StaDISC.  more » « less
Award ID(s):
1953191 1741340
PAR ID:
10453883
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
International Statistical Review
Volume:
88
Issue:
S1
ISSN:
0306-7734
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background Evidence to guide type 2 diabetes treatment individualization is limited. We evaluated heterogeneous treatment effects (HTE) of intensive glycemic control in type 2 diabetes patients on major adverse cardiovascular events (MACE) in the Action to Control Cardiovascular Risk in Diabetes Study (ACCORD) and the Veterans Affairs Diabetes Trial (VADT). Methods Causal forests machine learning analysis was performed using pooled individual data from two randomized trials (n = 12,042) to identify HTE of intensive versus standard glycemic control on MACE in patients with type 2 diabetes. We used variable prioritization from causal forests to build a summary decision tree and examined the risk difference of MACE between treatment arms in the resulting subgroups. Results A summary decision tree used five variables (hemoglobin glycation index, estimated glomerular filtration rate, fasting glucose, age, and body mass index) to define eight subgroups in which risk differences of MACE ranged from − 5.1% (95% CI − 8.7, − 1.5) to 3.1% (95% CI 0.2, 6.0) (negative values represent lower MACE associated with intensive glycemic control). Intensive glycemic control was associated with lower MACE in pooled study data in subgroups with low (− 4.2% [95% CI − 8.1, − 1.0]), intermediate (− 5.1% [95% CI − 8.7, − 1.5]), and high (− 4.3% [95% CI − 7.7, − 1.0]) MACE rates with consistent directions of effect in ACCORD and VADT alone. Conclusions This data-driven analysis provides evidence supporting the diabetes treatment guideline recommendation of intensive glucose lowering in diabetes patients with low cardiovascular risk and additionally suggests potential benefits of intensive glycemic control in some individuals at higher cardiovascular risk. 
    more » « less
  2. Abstract Overly restrictive eligibility criteria for clinical trials may limit the generalizability of the trial results to their target real-world patient populations. We developed a novel machine learning approach using large collections of real-world data (RWD) to better inform clinical trial eligibility criteria design. We extracted patients’ clinical events from electronic health records (EHRs), which include demographics, diagnoses, and drugs, and assumed certain compositions of these clinical events within an individual’s EHRs can determine the subphenotypes—homogeneous clusters of patients, where patients within each subgroup share similar clinical characteristics. We introduced an outcome-guided probabilistic model to identify those subphenotypes, such that the patients within the same subgroup not only share similar clinical characteristics but also at similar risk levels of encountering severe adverse events (SAEs). We evaluated our algorithm on two previously conducted clinical trials with EHRs from the OneFlorida+ Clinical Research Consortium. Our model can clearly identify the patient subgroups who are more likely to suffer or not suffer from SAEs as subphenotypes in a transparent and interpretable way. Our approach identified a set of clinical topics and derived novel patient representations based on them. Each clinical topic represents a certain clinical event composition pattern learned from the patient EHRs. Tested on both trials, patient subgroup (#SAE=0) and patient subgroup (#SAE>0) can be well-separated by k-means clustering using the inferred topics. The inferred topics characterized as likely to align with the patient subgroup (#SAE>0) revealed meaningful combinations of clinical features and can provide data-driven recommendations for refining the exclusion criteria of clinical trials. The proposed supervised topic modeling approach can infer the clinical topics from the subphenotypes with or without SAEs. The potential rules for describing the patient subgroups with SAEs can be further derived to inform the design of clinical trial eligibility criteria. 
    more » « less
  3. We show how entropy balancing can be used for transporting experimental treatment effects from a trial population onto a target population. This method is doubly robust in the sense that if either the outcome model or the probability of trial participation is correctly specified, then the estimate of the target population average treatment effect is consistent. Furthermore, we only require the sample moments of the effect modifiers drawn from the target population to consistently estimate the target population average treatment effect. We compared the finite‐sample performance of entropy balancing with several alternative methods for transporting treatment effects between populations. Entropy balancing techniques are efficient and robust to violations of model misspecification. We also examine the results of our proposed method in an applied analysis of the Action to Control Cardiovascular Risk in Diabetes Blood Pressure trial transported to a sample of US adults with diabetes taken from the National Health and Nutrition Examination Survey cohort. 
    more » « less
  4. null (Ed.)
    Abstract A central issue in drug risk-benefit assessment is identifying frequencies of side effects in humans. Currently, frequencies are experimentally determined in randomised controlled clinical trials. We present a machine learning framework for computationally predicting frequencies of drug side effects. Our matrix decomposition algorithm learns latent signatures of drugs and side effects that are both reproducible and biologically interpretable. We show the usefulness of our approach on 759 structurally and therapeutically diverse drugs and 994 side effects from all human physiological systems. Our approach can be applied to any drug for which a small number of side effect frequencies have been identified, in order to predict the frequencies of further, yet unidentified, side effects. We show that our model is informative of the biology underlying drug activity: individual components of the drug signatures are related to the distinct anatomical categories of the drugs and to the specific drug routes of administration. 
    more » « less
  5. Abstract Aims The association of glycemic variability with microvascular disease complications in type 2 diabetes (T2D) has been under-studied and remains unclear. We investigated this relationship using both Action to Control Cardiovascular Risk in Diabetes (ACCORD) and the Veteran Affairs Diabetes Trial (VADT). Methods In ACCORD, fasting plasma glucose (FPG) was measured 1 to 3 times/year for up to 84 months in 10 251 individuals. In the VADT, FPG was measured every 3 months for up to 87 months in 1791 individuals. Variability measures included coefficient of variation (CV) and average real variability (ARV) for fasting glucose. The primary composite outcome was time to either severe nephropathy or retinopathy event and secondary outcomes included each outcome individually. To assess the association, we considered variability measures as time-dependent covariates in Cox proportional hazard models. We conducted a meta-analysis across the 2 trials to estimate the risk of fasting glucose variability as well as to assess the heterogenous effects of FPG variability across treatment arms. Results In both ACCORD and the VADT, the CV and ARV of FPG were associated with development of future microvascular outcomes even after adjusting for other risk factors, including measures of average glycemic control (ie, cumulative average of HbA1c). Meta-analyses of these 2 trials confirmed these findings and indicated FPG variation may be more harmful in those with less intensive glucose control. Conclusions This post hoc analysis indicates that variability of FPG plays a role in, and/or is an independent and readily available marker of, development of microvascular complications in T2D. 
    more » « less