skip to main content


Title: Addressing patient heterogeneity in disease predictive model development
Abstract

This paper addresses patient heterogeneity associated with prediction problems in biomedical applications. We propose a systematic hypothesis testing approach to determine the existence of patient subgroup structure and the number of subgroups in patient population if subgroups exist. A mixture of generalized linear models is considered to model the relationship between the disease outcome and patient characteristics and clinical factors, including targeted biomarker profiles. We construct a test statistic based on expectation maximization (EM) algorithm and derive its asymptotic distribution under the null hypothesis. An important computational advantage of the test is that the involved parameter estimates under the complex alternative hypothesis can be obtained through a small number of EM iterations, rather than optimizing the objective function. We demonstrate the finite sample performance of the proposed test in terms of type‐I error rate and power, using extensive simulation studies. The applicability of the proposed method is illustrated through an application to a multicenter prostate cancer study.

 
more » « less
NSF-PAR ID:
10397029
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
78
Issue:
3
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 1045-1055
Size(s):
["p. 1045-1055"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Cluster analysis is an unsupervised learning strategy that is exceptionally useful for identifying homogeneous subgroups of observations in data sets of unknown structure. However, it is challenging to determine if the identified clusters represent truly distinct subgroups rather than noise. Existing approaches for addressing this problem tend to define clusters based on distributional assumptions, ignore the inherent correlation structure in the data, or are not suited for high‐dimension low‐sample size (HDLSS) settings. In this paper, we propose a novel method to evaluate the significance of identified clusters by comparing the explained variation due to the clustering from the original data to that produced by clustering a unimodal reference distribution that preserves the covariance structure in the data. The reference distribution is generated using kernel density estimation, and thus, does not require that the data follow a particular distribution. By utilizing sparse covariance estimation, the method is adapted for the HDLSS setting. The approach can be used to test the null hypothesis that the data cannot be partitioned into clusters and to determine the optimal number of clusters. Simulation examples, theoretical evaluations, and applications to temporomandibular disorder research and cancer microarray data illustrate the utility of the proposed method.

     
    more » « less
  2. Summary

    In many biomedical studies, we are interested in comparing treatment effects with an inherent ordering. We propose a quadratic score test (QST) based on a quadratic inference function for detecting an order in treatment effects for correlated data. The quadratic inference function is similar to the negative of a log-likelihood, and it provides test statistics in the spirit of a χ2-test for testing nested hypotheses as well as for assessing the goodness of fit of model assumptions. Under the null hypothesis of no order restriction, it is shown that the QST statistic has a Wald-type asymptotic representation and that the asymptotic distribution of the QST statistic is a weighted χ2-distribution. Furthermore, an asymptotic distribution of the QST statistic under an arbitrary convex cone alternative is provided. The performance of the QST is investigated through Monte Carlo simulation experiments. Analysis of the polyposis data demonstrates that the QST outperforms the Wald test when data are highly correlated with a small sample size and there is a significant amount of missing data with a small number of clusters. The proposed test statistic accommodates both time-dependent and time-independent covariates in a model.

     
    more » « less
  3. Abstract

    In biomedical science, analyzing treatment effect heterogeneity plays an essential role in assisting personalized medicine. The main goals of analyzing treatment effect heterogeneity include estimating treatment effects in clinically relevant subgroups and predicting whether a patient subpopulation might benefit from a particular treatment. Conventional approaches often evaluate the subgroup treatment effects via parametric modeling and can thus be susceptible to model mis-specifications. In this paper, we take a model-free semiparametric perspective and aim to efficiently evaluate the heterogeneous treatment effects of multiple subgroups simultaneously under the one-step targeted maximum-likelihood estimation (TMLE) framework. When the number of subgroups is large, we further expand this path of research by looking at a variation of the one-step TMLE that is robust to the presence of small estimated propensity scores in finite samples. From our simulations, our method demonstrates substantial finite sample improvements compared to conventional methods. In a case study, our method unveils the potential treatment effect heterogeneity of rs12916-T allele (a proxy for statin usage) in decreasing Alzheimer's disease risk.

     
    more » « less
  4. Abstract

    The recurrence of cancer following chemotherapy treatment is a major cause of death across solid and hematologic cancers. In B-cell acute lymphoblastic leukemia (B-ALL), relapse after initial chemotherapy treatment leads to poor patient outcomes. Here we test the hypothesis that chemotherapy-treated versus control B-ALL cells can be characterized based on cellular physical phenotypes. To quantify physical phenotypes of chemotherapy-treated leukemia cells, we use cells derived from B-ALL patients that are treated for 7 days with a standard multidrug chemotherapy regimen of vincristine, dexamethasone, and L-asparaginase (VDL). We conduct physical phenotyping of VDL-treated versus control cells by tracking the sequential deformations of single cells as they flow through a series of micron-scale constrictions in a microfluidic device; we call this method Quantitative Cyclical Deformability Cytometry. Using automated image analysis, we extract time-dependent features of deforming cells including cell size and transit time (TT) with single-cell resolution. Our findings show that VDL-treated B-ALL cells have faster TTs and transit velocity than control cells, indicating that VDL-treated cells are more deformable. We then test how effectively physical phenotypes can predict the presence of VDL-treated cells in mixed populations of VDL-treated and control cells using machine learning approaches. We find that TT measurements across a series of sequential constrictions can enhance the classification accuracy of VDL-treated cells in mixed populations using a variety of classifiers. Our findings suggest the predictive power of cell physical phenotyping as a complementary prognostic tool to detect the presence of cells that survive chemotherapy treatment. Ultimately such complementary physical phenotyping approaches could guide treatment strategies and therapeutic interventions.

    Insight box Cancer cells that survive chemotherapy treatment are major contributors to patient relapse, but the ability to predict recurrence remains a challenge. Here we investigate the physical properties of leukemia cells that survive treatment with chemotherapy drugs by deforming individual cells through a series of micron-scale constrictions in a microfluidic channel. Our findings reveal that leukemia cells that survive chemotherapy treatment are more deformable than control cells. We further show that machine learning algorithms applied to physical phenotyping data can predict the presence of cells that survive chemotherapy treatment in a mixed population. Such an integrated approach using physical phenotyping and machine learning could be valuable to guide patient treatments.

     
    more » « less
  5. Abstract Objective

    To characterize a cohort of patients withSCN8A‐related epilepsy and to perform analyses to identify correlations involving the acquisition of neurodevelopmental skills.

    Methods

    We analyzed patient data (n = 91) submitted to an online registry tailored to characteristics of children withSCN8Avariants. Participants provided information on the history of their child's seizures, medications, comorbidities, and developmental skills based on the DenverIIitems. Spearman rank tests were utilized to test for correlations among a variety of aspects of seizures, medications, and neurodevelopmental progression.

    Results

    The 91 participants carried 71 missense variants (41 newly reported) and three truncating variants. Ages at seizure onset ranged from birth to >12 months of age (mean ± SD = 5 months 21 days ± 7 months 14 days). Multiple seizure types with multimodal onset times and developmental delay were observed as general features of this cohort. We found a positive correlation between a developmental score based upon percentage of acquired skills and the age at seizure onset, current seizure freedom, and initial febrile seizures. Analyses of cohort subgroups revealed clear distinctions between patients who had a single reported variant inSCN8Aand those with an additional variant reported in a gene other thanSCN8A, as well as between patients with different patterns of regression before and at seizure onset.

    Significance

    This is the first study of anSCN8A patient cohort of this size and for which correlations between age at seizure onset and neurodevelopment were investigated. Our correlation studies suggest that variants of uncertain significance should be considered in assessing children withSCN8A‐related disorders. This study substantially improves the characterization of this patient population and our understanding of the neurodevelopmental effects associated with seizures forSCN8A patients, and provides a clinical context at initial presentation that may be prognostic for developmental outcome.

     
    more » « less