skip to main content


Title: Predictive modeling of clinical trial terminations using feature engineering and embedding learning
Abstract In this study, we propose to use machine learning to understand terminated clinical trials. Our goal is to answer two fundamental questions: (1) what are common factors/markers associated to terminated clinical trials? and (2) how to accurately predict whether a clinical trial may be terminated or not? The answer to the first question provides effective ways to understand characteristics of terminated trials for stakeholders to better plan their trials; and the answer to the second question can direct estimate the chance of success of a clinical trial in order to minimize costs. By using 311,260 trials to build a testbed with 68,999 samples, we use feature engineering to create 640 features, reflecting clinical trial administration, eligibility, study information, criteria etc. Using feature ranking, a handful of features, such as trial eligibility, trial inclusion/exclusion criteria, sponsor types etc. , are found to be related to the clinical trial termination. By using sampling and ensemble learning, we achieve over 67% Balanced Accuracy and over 0.73 AUC (Area Under the Curve) scores to correctly predict clinical trial termination, indicating that machine learning can help achieve satisfactory prediction results for clinical trial study.  more » « less
Award ID(s):
2027339 1763452 1828181
NSF-PAR ID:
10230296
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Scientific Reports
Volume:
11
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gadekallu, Thippa Reddy (Ed.)
    As of March 30 2021, over 5,193 COVID-19 clinical trials have been registered through Clinicaltrial.gov. Among them, 191 trials were terminated, suspended, or withdrawn (indicating the cessation of the study). On the other hand, 909 trials have been completed (indicating the completion of the study). In this study, we propose to study underlying factors of COVID-19 trial completion vs . cessation, and design predictive models to accurately predict whether a COVID-19 trial may complete or cease in the future. We collect 4,441 COVID-19 trials from ClinicalTrial.gov to build a testbed, and design four types of features to characterize clinical trial administration, eligibility, study information, criteria, drug types, study keywords, as well as embedding features commonly used in the state-of-the-art machine learning. Our study shows that drug features and study keywords are most informative features, but all four types of features are essential for accurate trial prediction. By using predictive models, our approach achieves more than 0.87 AUC (Area Under the Curve) score and 0.81 balanced accuracy to correctly predict COVID-19 clinical trial completion vs . cessation. Our research shows that computational methods can deliver effective features to understand difference between completed vs . ceased COVID-19 trials. In addition, such models can also predict COVID-19 trial status with satisfactory accuracy, and help stakeholders better plan trials and minimize costs. 
    more » « less
  2. Abstract Overly restrictive eligibility criteria for clinical trials may limit the generalizability of the trial results to their target real-world patient populations. We developed a novel machine learning approach using large collections of real-world data (RWD) to better inform clinical trial eligibility criteria design. We extracted patients’ clinical events from electronic health records (EHRs), which include demographics, diagnoses, and drugs, and assumed certain compositions of these clinical events within an individual’s EHRs can determine the subphenotypes—homogeneous clusters of patients, where patients within each subgroup share similar clinical characteristics. We introduced an outcome-guided probabilistic model to identify those subphenotypes, such that the patients within the same subgroup not only share similar clinical characteristics but also at similar risk levels of encountering severe adverse events (SAEs). We evaluated our algorithm on two previously conducted clinical trials with EHRs from the OneFlorida+ Clinical Research Consortium. Our model can clearly identify the patient subgroups who are more likely to suffer or not suffer from SAEs as subphenotypes in a transparent and interpretable way. Our approach identified a set of clinical topics and derived novel patient representations based on them. Each clinical topic represents a certain clinical event composition pattern learned from the patient EHRs. Tested on both trials, patient subgroup (#SAE=0) and patient subgroup (#SAE>0) can be well-separated by k-means clustering using the inferred topics. The inferred topics characterized as likely to align with the patient subgroup (#SAE>0) revealed meaningful combinations of clinical features and can provide data-driven recommendations for refining the exclusion criteria of clinical trials. The proposed supervised topic modeling approach can infer the clinical topics from the subphenotypes with or without SAEs. The potential rules for describing the patient subgroups with SAEs can be further derived to inform the design of clinical trial eligibility criteria. 
    more » « less
  3. Clinical trials are crucial for the advancement of treatment and knowledge within the medical community. Since 2007, US federal government took the initiative and requires organizations sponsoring clinical trials with at least one site in the United States to submit information on these clinical trials to the ClinicalTrials.gov database, resulting in a rich source of information for clinical trial research. Nevertheless, only a handful of analytic studies have been carried out to understand this valuable data source. In this study, we propose to use network analysis to understand infectious disease clinical trial research. Our goal is to answer two important questions: (1) what are the concentrations and characteristics of infectious disease clinical trail research? and (2) how to accurately predict what type of clinical trials a sponsor (or an investigator) is interested in? The answers to the first question provide effective ways to summarize clinical trial research related to particular disease(s), and the answers to the second question help match clinical trial sponsors and investigators for information recommendation. By using 4,228 clinical trails as the test bed, our study involves 4,864 sponsors and 1,879 research areas characterized by Medical Subject Heading (MeSH) keywords. We extract a set of network measures to show patterns of infectious disease clinical trials, and design a new community based link prediction approach to predict sponsors' interests, with significant improvement compared to baselines. This trans-formative study concludes that using network analysis can tremendously help the understanding of clinical trial research for effective summarization, characterization, and prediction. 
    more » « less
  4. Clinical trials are indispensable in developing new treatments, but they face obstacles in patient recruitment and retention, hindering the enrollment of necessary participants. To tackle these challenges, deep learning frameworks have been created to match patients to trials. These frameworks calculate the similarity between patients and clinical trial eligibility criteria, considering the discrepancy between inclusion and exclusion criteria. Recent studies have shown that these frameworks outperform earlier approaches. However, deep learning models may raise fairness issues in patient-trial matching when certain sensitive groups of individuals are underrepresented in clinical trials, leading to incomplete or inaccurate data and potential harm. To tackle the issue of fairness, this work proposes a fair patient-trial matching framework by generating a patient-criterion level fairness constraint. The proposed framework considers the inconsistency between the embedding of inclusion and exclusion criteria among patients of different sensitive groups. The experimental results on real-world patient-trial and patient-criterion matching tasks demonstrate that the proposed framework can successfully alleviate the predictions that tend to be biased. 
    more » « less
  5. Background/Aims:

    Design of clinical trials requires careful decision-making across several dimensions, including endpoints, eligibility criteria, and subgroup enrichment. Clinical trial simulation can be an informative tool in trial design, providing empirical evidence by which to evaluate and compare the results of hypothetical trials with varying designs. We introduce a novel simulation-based approach using observational data to inform the design of a future pragmatic trial.

    Methods:

    We utilize propensity score-adjusted models to simulate hypothetical trials under alternative endpoints and enrollment criteria. We apply our approach to the design of pragmatic trials in psoriatic arthritis, using observational data embedded within the Tight Control of Inflammation in Early Psoriatic Arthritis study to simulate hypothetical open-label trials comparing treatment with tumor necrosis factor-α inhibitors to methotrexate. We first validate our simulations of a trial with traditional enrollment criteria and endpoints against a recently published trial. Next, we compare simulated treatment effects in patient populations defined by traditional and broadened enrollment criteria, where the latter is consistent with a future pragmatic trial. In each trial, we also consider five candidate primary endpoints.

    Results:

    Our results highlight how changes in the enrolled population and primary endpoints may qualitatively alter study findings and the ability to detect heterogeneous treatment effects between clinical subgroups. For treatments of interest in the study of psoriatic arthritis, broadened enrollment criteria led to diluted estimated treatment effects. Endpoints with greater responsiveness to treatment compared with a traditionally used endpoint were identified. These considerations, among others, are important for designing a future pragmatic trial aimed at having high external validity with relevance for real-world clinical practice.

    Conclusion:

    Observational data may be leveraged to inform design decisions in pragmatic trials. Our approach may be generalized to the study of other conditions where existing trial data are limited or do not generalize well to real-world clinical practice, but where observational data are available.

     
    more » « less