skip to main content

Title: Bayesian approaches to include real-world data in clinical studies
Randomized clinical trials have been the mainstay of clinical research, but are prohibitively expensive and subject to increasingly difficult patient recruitment. Recently, there is a movement to use real-world data (RWD) from electronic health records, patient registries, claims data and other sources in lieu of or supplementing controlled clinical trials. This process of combining information from diverse sources calls for inference under a Bayesian paradigm. We review some of the currently used methods and a novel non-parametric Bayesian (BNP) method. Carrying out the desired adjustment for differences in patient populations is naturally done with BNP priors that facilitate understanding of and adjustment for population heterogeneities across different data sources. We discuss the particular problem of using RWD to create a synthetic control arm to supplement single-arm treatment only studies. At the core of the proposed approach is the model-based adjustment to achieve equivalent patient populations in the current study and the (adjusted) RWD. This is implemented using common atoms mixture models. The structure of such models greatly simplifies inference. The adjustment for differences in the populations can be reduced to ratios of weights in such mixtures. This article is part of the theme issue ‘Bayesian inference: challenges, perspectives, and prospects’.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Overly restrictive eligibility criteria for clinical trials may limit the generalizability of the trial results to their target real-world patient populations. We developed a novel machine learning approach using large collections of real-world data (RWD) to better inform clinical trial eligibility criteria design. We extracted patients’ clinical events from electronic health records (EHRs), which include demographics, diagnoses, and drugs, and assumed certain compositions of these clinical events within an individual’s EHRs can determine the subphenotypes—homogeneous clusters of patients, where patients within each subgroup share similar clinical characteristics. We introduced an outcome-guided probabilistic model to identify those subphenotypes, such that the patients within the same subgroup not only share similar clinical characteristics but also at similar risk levels of encountering severe adverse events (SAEs). We evaluated our algorithm on two previously conducted clinical trials with EHRs from the OneFlorida+ Clinical Research Consortium. Our model can clearly identify the patient subgroups who are more likely to suffer or not suffer from SAEs as subphenotypes in a transparent and interpretable way. Our approach identified a set of clinical topics and derived novel patient representations based on them. Each clinical topic represents a certain clinical event composition pattern learned from the patient EHRs. Tested on both trials, patient subgroup (#SAE=0) and patient subgroup (#SAE>0) can be well-separated by k-means clustering using the inferred topics. The inferred topics characterized as likely to align with the patient subgroup (#SAE>0) revealed meaningful combinations of clinical features and can provide data-driven recommendations for refining the exclusion criteria of clinical trials. The proposed supervised topic modeling approach can infer the clinical topics from the subphenotypes with or without SAEs. The potential rules for describing the patient subgroups with SAEs can be further derived to inform the design of clinical trial eligibility criteria. 
    more » « less
  2. A key and challenging step toward personalized/precision medicine is the ability to redesign dose-finding clinical trials. This work studies a problem of fully response-adaptive Bayesian design of phase II dose-finding clinical trials with patient information, where the decision maker seeks to identify the right dose for each patient type (often defined as an effective target dose for each group of patients) by minimizing the expected (over patient types) variance of the right dose. We formulate this problem by a stochastic dynamic program and exploit a few properties of this class of learning problems. Because the optimal solution is intractable, we propose an approximate policy by an adaptation of a one-step look-ahead framework. We show the optimality of the proposed policy for a setting with homogeneous patients and two doses and find its asymptotic rate of sampling. We adapt a number of commonly applied allocation policies in dose-finding clinical trials, such as posterior adaptive sampling, and test their performance against our proposed policy via extensive simulations with synthetic and real data. Our numerical analyses provide insights regarding the connection between the structure of the dose-response curve for each patient type and the performance of allocation policies. This paper provides a practical framework for the Food and Drug Administration and pharmaceutical companies to transition from the current phase II procedures to the era of personalized dose-finding clinical trials. Funding: This research is supported by the National Science Foundation [Grant 1651912]. Supplemental Material: The online appendices are available at . 
    more » « less
  3. Heterogeneity among Alzheimer’s disease (AD) patients confounds clinical trial patient selection and therapeutic efficacy evaluation. This work defines separable AD clinical sub-populations using unsupervised machine learning. Clustering (t-SNE followed by k-means) of patient features and association rule mining (ARM) was performed on the ADNIMERGE dataset from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Patient sociodemographics, brain imaging, biomarkers, cognitive tests, and medication usage were included for analysis. Four AD clinical sub-populations were identified using between-cluster mean fold changes [cognitive performance, brain volume]: cluster-1 represented least severe disease [+17.3, +13.3]; cluster-0 [−4.6, +3.8] and cluster-3 [+10.8, −4.9] represented mid-severity sub-populations; cluster-2 represented most severe disease [−18.4, −8.4]. ARM assessed frequently occurring pharmacologic substances within the 4 sub-populations. No drug class was associated with the least severe AD (cluster-1), likely due to lesser antecedent disease. Anti-hyperlipidemia drugs associated with cluster-0 (mid-severity, higher volume). Interestingly, antioxidants vitamin C and E associated with cluster-3 (mid-severity, higher cognition). Anti-depressants like Zoloft associated with most severe disease (cluster-2). Vitamin D is protective for AD, but ARM identified significant underutilization across all AD sub-populations. Identification and feature characterization of four distinct AD sub-population “clusters” using standard clinical features enhances future clinical trial selection criteria and cross-study comparative analysis. 
    more » « less
  4. Abstract Motivation

    Analysis of time series transcriptomics data from clinical trials is challenging. Such studies usually profile very few time points from several individuals with varying response patterns and dynamics. Current methods for these datasets are mainly based on linear, global orderings using visit times which do not account for the varying response rates and subgroups within a patient cohort.


    We developed a new method that utilizes multi-commodity flow algorithms for trajectory inference in large scale clinical studies. Recovered trajectories satisfy individual-based timing restrictions while integrating data from multiple patients. Testing the method on multiple drug datasets demonstrated an improved performance compared to prior approaches suggested for this task, while identifying novel disease subtypes that correspond to heterogeneous patient response patterns.

    Availability and implementation

    The source code and instructions to download the data have been deposited on GitHub at

    more » « less
  5. Summary We develop a Bayesian nonparametric (BNP) approach to evaluate the causal effect of treatment in a randomized trial where a nonterminal event may be censored by a terminal event, but not vice versa (i.e., semi-competing risks). Based on the idea of principal stratification, we define a novel estimand for the causal effect of treatment on the nonterminal event. We introduce identification assumptions, indexed by a sensitivity parameter, and show how to draw inference using our BNP approach. We conduct simulation studies and illustrate our methodology using data from a brain cancer trial. The R code implementing our model and algorithm is available for download at 
    more » « less