skip to main content


Title: Novel classification of axial spondyloarthritis to predict radiographic progression using machine learning
OBJECTIVES: Prediction and determination of drug efficacy for radiographic progression is limited by the heterogeneity inherent in axial spondyloarthritis (axSpA). We investigated whether unbiased clustering analysis of phenotypic data can lead to coherent subgroups of axSpA patients with a distinct risk of radiographic progression. METHODS: A group of 412 patients with axSpA was clustered in an unbiased way using a agglomerative hierarchical clustering method, based on their phenotype mapping. We used a generalised linear model, naïve Bayes, Decision Trees, K-Nearest-Neighbors, and Support Vector Machines to construct a consensus classification method. Radiographic progression over 2 years was assessed using the modified Stoke Ankylosing Spondylitis Spine Score (mSASSS). RESULTS: axSpA patients were classified into three distinct subgroups with distinct clinical characteristics. Sex, smoking, HLA-B27, baseline mSASSS, uveitis, and peripheral arthritis were the key features that were found to stratifying the phenogroups. The three phenogroups showed distinct differences in radiographic progression rate (p<0.05) and the proportion of progressors (p<0.001). Phenogroup 2, consisting of male smokers, had the worst radiographic progression, while phenogroup 3, exclusively suffering from uveitis, showed the least radiographic progression. The axSpA phenogroup classification, including its ability to stratify risk, was successfully replicated in an independent validation group. CONCLUSIONS: Phenotype mapping results in a clinically relevant classification of axSpA that is applicable for risk stratification. Novel coupling between phenotypic features and radiographic progression can provide a glimpse into the mechanisms underlying divergent and shared features of axSpA.  more » « less
Award ID(s):
1934568
NSF-PAR ID:
10281958
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Clinical and Experimental Rheumatology
Volume:
39
Issue:
3
ISSN:
1593-098X
Page Range / eLocation ID:
508-518
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. BACKGROUND:

    Classification of perioperative risk is important for patient care, resource allocation, and guiding shared decision-making. Using discriminative features from the electronic health record (EHR), machine-learning algorithms can create digital phenotypes among heterogenous populations, representing distinct patient subpopulations grouped by shared characteristics, from which we can personalize care, anticipate clinical care trajectories, and explore therapies. We hypothesized that digital phenotypes in preoperative settings are associated with postoperative adverse events including in-hospital and 30-day mortality, 30-day surgical redo, intensive care unit (ICU) admission, and hospital length of stay (LOS).

    METHODS:

    We identified all laminectomies, colectomies, and thoracic surgeries performed over a 9-year period from a large hospital system. Seventy-seven readily extractable preoperative features were first selected from clinical consensus, including demographics, medical history, and lab results. Three surgery-specific datasets were built and split into derivation and validation cohorts using chronological occurrence. Consensusk-means clustering was performed independently on each derivation cohort, from which phenotypes’ characteristics were explored. Cluster assignments were used to train a random forest model to assign patient phenotypes in validation cohorts. We reconducted descriptive analyses on validation cohorts to confirm the similarity of patient characteristics with derivation cohorts, and quantified the association of each phenotype with postoperative adverse events by using the area under receiver operating characteristic curve (AUROC). We compared our approach to American Society of Anesthesiologists (ASA) alone and investigated a combination of our phenotypes with the ASA score.

    RESULTS:

    A total of 7251 patients met inclusion criteria, of which 2770 were held out in a validation dataset based on chronological occurrence. Using segmentation metrics and clinical consensus, 3 distinct phenotypes were created for each surgery. The main features used for segmentation included urgency of the procedure, preoperative LOS, age, and comorbidities. The most relevant characteristics varied for each of the 3 surgeries. Low-risk phenotype alpha was the most common (2039 of 2770, 74%), while high-risk phenotype gamma was the rarest (302 of 2770, 11%). Adverse outcomes progressively increased from phenotypes alpha to gamma, including 30-day mortality (0.3%, 2.1%, and 6.0%, respectively), in-hospital mortality (0.2%, 2.3%, and 7.3%), and prolonged hospital LOS (3.4%, 22.1%, and 25.8%). When combined with the ASA score, digital phenotypes achieved higher AUROC than the ASA score alone (hospital mortality: 0.91 vs 0.84; prolonged hospitalization: 0.80 vs 0.71).

    CONCLUSIONS:

    For 3 frequently performed surgeries, we identified 3 digital phenotypes. The typical profiles of each phenotype were described and could be used to anticipate adverse postoperative events.

     
    more » « less
  2. Summary Objective

    Copy number variations (CNVs) represent a significant genetic risk for several neurodevelopmental disorders including epilepsy. As knowledge increases, reanalysis of existing data is essential. Reliable estimates of the contribution ofCNVs to epilepsies from sizeable populations are not available.

    Methods

    We assembled a cohort of 1255 patients with preexisting array comparative genomic hybridization or single nucleotide polymorphism array basedCNVdata. All patients had “epilepsy plus,” defined as epilepsy with comorbid features, including intellectual disability, psychiatric symptoms, and other neurological and nonneurological features.CNVclassification was conducted using a systematic filtering workflow adapted to epilepsy.

    Results

    Of 1097 patients remaining after genetic data quality control, 120 individuals (10.9%) carried at least one autosomalCNVclassified as pathogenic; 19 individuals (1.7%) carried at least one autosomalCNVclassified as possibly pathogenic. Eleven patients (1%) carried more than one (possibly) pathogenicCNV. We identifiedCNVs covering recently reported (HNRNPU)or emerging (RORB) epilepsy genes, and further delineated the phenotype associated with mutations of these genes. Additional novel epilepsy candidate genes emerge from our study. Comparing phenotypic features of pathogenicCNVcarriers to those of noncarriers of pathogenicCNVs, we show that patients with nonneurological comorbidities, especially dysmorphism, were more likely to carry pathogenicCNVs (odds ratio = 4.09, confidence interval = 2.51‐6.68;P = 2.34 × 10−9). Meta‐analysis including data from published control groups showed that the presence or absence of epilepsy did not affect the detected frequency ofCNVs.

    Significance

    The use of a specifically adapted workflow enabled identification of pathogenic autosomalCNVs in 10.9% of patients with epilepsy plus, which rose to 12.7% when we also considered possibly pathogenicCNVs. Our data indicate that epilepsy with comorbid features should be considered an indication for patients to be selected for a diagnostic algorithm includingCNVdetection. Collaborative large‐scaleCNVreanalysis leads to novel declaration of pathogenicity in unexplained cases and can promote discovery of promising candidate epilepsy genes.

     
    more » « less
  3. Abstract Objective

    The early stages of chronic disease typically progress slowly, so symptoms are usually only noticed until the disease is advanced. Slow progression and heterogeneous manifestations make it challenging to model the transition from normal to disease status. As patient conditions are only observed at discrete timestamps with varying intervals, an incomplete understanding of disease progression and heterogeneity affects clinical practice and drug development.

    Materials and Methods

    We developed the Gaussian Process for Stage Inference (GPSI) approach to uncover chronic disease progression patterns and assess the dynamic contribution of clinical features. We tested the ability of the GPSI to reliably stratify synthetic and real-world data for osteoarthritis (OA) in the Osteoarthritis Initiative (OAI), bipolar disorder (BP) in the Adolescent Brain Cognitive Development Study (ABCD), and hepatocellular carcinoma (HCC) in the UTHealth and The Cancer Genome Atlas (TCGA).

    Results

    First, GPSI identified two subgroups of OA based on image features, where these subgroups corresponded to different genotypes, indicating the bone-remodeling and overweight-related pathways. Second, GPSI differentiated BP into two distinct developmental patterns and defined the contribution of specific brain region atrophy from early to advanced disease stages, demonstrating the ability of the GPSI to identify diagnostic subgroups. Third, HCC progression patterns were well reproduced in the two independent UTHealth and TCGA datasets.

    Conclusion

    Our study demonstrated that an unsupervised approach can disentangle temporal and phenotypic heterogeneity and identify population subgroups with common patterns of disease progression. Based on the differences in these features across stages, physicians can better tailor treatment plans and medications to individual patients.

     
    more » « less
  4. Shaharudin, Shazlin (Ed.)
    Objective To apply biclustering, a methodology originally developed for analysis of gene expression data, to simultaneously cluster observations and clinical features to explore candidate phenotypes of knee osteoarthritis (KOA) for the first time. Methods Data from the baseline Osteoarthritis Initiative (OAI) visit were cleaned, transformed, and standardized as indicated (leaving 6461 knees with 86 features). Biclustering produced submatrices of the overall data matrix, representing similar observations across a subset of variables. Statistical validation was determined using the novel SigClust procedure. After identifying biclusters, relationships with key outcome measures were assessed, including progression of radiographic KOA, total knee arthroplasty, loss of joint space width, and worsening Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) scores, over 96 months of follow-up. Results The final analytic set included 6461 knees from 3330 individuals (mean age 61 years, mean body mass index 28 kg/m 2 , 57% women and 86% White). We identified 6 mutually exclusive biclusters characterized by different feature profiles at baseline, particularly related to symptoms and function. Biclusters represented overall better (#1), similar (#2, 3, 6), and poorer (#4, 5) prognosis compared to the overall cohort of knees, respectively. In general, knees in biclusters #4 and 5 had more structural progression (based on Kellgren-Lawrence grade, total knee arthroplasty, and loss of joint space width) but tended to have an improvement in WOMAC pain scores over time. In contrast, knees in bicluster #1 had less incident and progressive KOA, fewer total knee arthroplasties, less loss of joint space width, and stable pain scores compared with the overall cohort. Significance We identified six biclusters within the baseline OAI dataset which have varying relationships with key outcomes in KOA. Such biclusters represent potential phenotypes within the larger cohort and may suggest subgroups at greater or lesser risk of progression over time. 
    more » « less
  5. null (Ed.)
    Single cell RNA-sequencing (scRNA-seq) technology enables comprehensive transcriptomic profiling of thousands of cells with distinct phenotypic and physiological states in a complex tissue. Substantial efforts have been made to characterize single cells of distinct identities from scRNA-seq data, including various cell clustering techniques. While existing approaches can handle single cells in terms of different cell (sub)types at a high resolution, identification of the functional variability within the same cell type remains unsolved. In addition, there is a lack of robust method to handle the inter-subject variation that often brings severe confounding effects for the functional clustering of single cells. In this study, we developed a novel data denoising and cell clustering approach, namely CIBS, to provide biologically explainable functional classification for scRNA-seq data. CIBS is based on a systems biology model of transcriptional regulation that assumes a multi-modality distribution of the cells’ activation status, and it utilizes a Boolean matrix factorization approach on the discretized expression status to robustly derive functional modules. CIBS is empowered by a novel fast Boolean Matrix Factorization method, namely PFAST, to increase the computational feasibility on large scale scRNA-seq data. Application of CIBS on two scRNA-seq datasets collected from cancer tumor micro-environment successfully identified subgroups of cancer cells with distinct expression patterns of epithelial-mesenchymal transition and extracellular matrix marker genes, which was not revealed by the existing cell clustering analysis tools. The identified cell groups were significantly associated with the clinically confirmed lymph-node invasion and metastasis events across different patients. Index Terms—Cell clustering analysis, Data denoising, Boolean matrix factorization, Cancer microenvirionment, Metastasis. 
    more » « less