Abstract BackgroundAlzheimer’s Disease (AD) is a widespread neurodegenerative disease with Mild Cognitive Impairment (MCI) acting as an interim phase between normal cognitive state and AD. The irreversible nature of AD and the difficulty in early prediction present significant challenges for patients, caregivers, and the healthcare sector. Deep learning (DL) methods such as Recurrent Neural Networks (RNN) have been utilized to analyze Electronic Health Records (EHR) to model disease progression and predict diagnosis. However, these models do not address some inherent irregularities in EHR data such as irregular time intervals between clinical visits. Furthermore, most DL models are not interpretable. To address these issues, we developed a novel DL architecture called Time‐Aware RNN (TA‐RNN) to predict MCI to AD conversion at the next clinical visit. MethodTA‐RNN comprises of a time embedding layer, attention‐based RNN, and prediction layer based on multi‐layer perceptron (MLP) (Figure 1). For interpretability, a dual‐level attention mechanism within the RNN identifies significant visits and features impacting predictions. TA‐RNN addresses irregular time intervals by incorporating time embedding into longitudinal cognitive and neuroimaging data based on attention weights to create a patient embedding. The MLP, trained on demographic data and the patient embedding predicts AD conversion. TA‐RNN was evaluated on Alzheimer’s Disease Neuroimaging Initiative (ADNI) and National Alzheimer’s Coordinating Center (NACC) datasets based on F2 score and sensitivity. ResultMultiple TA‐RNN models were trained with two, three, five, or six visits to predict the diagnosis at the next visit. In one setup, the models were trained and tested on ADNI. In another setup, the models were trained on the entire ADNI dataset and evaluated on the entire NACC dataset. The results indicated superior performance of TA‐RNN compared to state‐of‐the‐art (SOTA) and baseline approaches for both setups (Figure 2A and 2B). Based on attention weights, we also highlighted significant visits (Figure 3A) and features (Figure 3B) and observed that CDRSB and FAQ features and the most recent visit had highest influence in predictions. ConclusionWe propose TA‐RNN, an interpretable model to predict MCI to AD conversion while handling irregular time intervals. TA‐RNN outperformed SOTA and baseline methods in multiple experiments.
more »
« less
Integrating patients in time series clinical transcriptomics data
Abstract MotivationAnalysis of time series transcriptomics data from clinical trials is challenging. Such studies usually profile very few time points from several individuals with varying response patterns and dynamics. Current methods for these datasets are mainly based on linear, global orderings using visit times which do not account for the varying response rates and subgroups within a patient cohort. ResultsWe developed a new method that utilizes multi-commodity flow algorithms for trajectory inference in large scale clinical studies. Recovered trajectories satisfy individual-based timing restrictions while integrating data from multiple patients. Testing the method on multiple drug datasets demonstrated an improved performance compared to prior approaches suggested for this task, while identifying novel disease subtypes that correspond to heterogeneous patient response patterns. Availability and implementationThe source code and instructions to download the data have been deposited on GitHub at https://github.com/euxhenh/Truffle.
more »
« less
- Award ID(s):
- 2134998
- PAR ID:
- 10518296
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 40
- Issue:
- Supplement_1
- ISSN:
- 1367-4803
- Format(s):
- Medium: X Size: p. i151-i159
- Size(s):
- p. i151-i159
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract BackgroundThe research gap addressed in this study is the applicability of deep neural network (NN) models on wearable sensor data to recognize different activities performed by patients with Parkinson’s Disease (PwPD) and the generalizability of these models to PwPD using labeled healthy data. MethodsThe experiments were carried out utilizing three datasets containing wearable motion sensor readings on common activities of daily living. The collected readings were from two accelerometer sensors. PAMAP2 and MHEALTH are publicly available datasets collected from 10 and 9 healthy, young subjects, respectively. A private dataset of a similar nature collected from 14 PwPD patients was utilized as well. Deep NN models were implemented with varying levels of complexity to investigate the impact of data augmentation, manual axis reorientation, model complexity, and domain adaptation on activity recognition performance. ResultsA moderately complex model trained on the augmented PAMAP2 dataset and adapted to the Parkinson domain using domain adaptation achieved the best activity recognition performance with an accuracy of 73.02%, which was significantly higher than the accuracy of 63% reported in previous studies. The model’s F1 score of 49.79% significantly improved compared to the best cross-testing of 33.66% F1 score with only data augmentation and 2.88% F1 score without data augmentation or domain adaptation. ConclusionThese findings suggest that deep NN models originating on healthy data have the potential to recognize activities performed by PwPD accurately and that data augmentation and domain adaptation can improve the generalizability of models in the healthy-to-PwPD transfer scenario. The simple/moderately complex architectures tested in this study could generalize better to the PwPD domain when trained on a healthy dataset compared to the most complex architectures used. The findings of this study could contribute to the development of accurate wearable-based activity monitoring solutions for PwPD, improving clinical decision-making and patient outcomes based on patient activity levels.more » « less
-
Abstract BackgroundSecuring adequate data privacy is critical for the productive utilization of data. De-identification, involving masking or replacing specific values in a dataset, could damage the dataset’s utility. However, finding a reasonable balance between data privacy and utility is not straightforward. Nonetheless, few studies investigated how data de-identification efforts affect data analysis results. This study aimed to demonstrate the effect of different de-identification methods on a dataset’s utility with a clinical analytic use case and assess the feasibility of finding a workable tradeoff between data privacy and utility. MethodsPredictive modeling of emergency department length of stay was used as a data analysis use case. A logistic regression model was developed with 1155 patient cases extracted from a clinical data warehouse of an academic medical center located in Seoul, South Korea. Nineteen de-identified datasets were generated based on various de-identification configurations using ARX, an open-source software for anonymizing sensitive personal data. The variable distributions and prediction results were compared between the de-identified datasets and the original dataset. We examined the association between data privacy and utility to determine whether it is feasible to identify a viable tradeoff between the two. ResultsAll 19 de-identification scenarios significantly decreased re-identification risk. Nevertheless, the de-identification processes resulted in record suppression and complete masking of variables used as predictors, thereby compromising dataset utility. A significant correlation was observed only between the re-identification reduction rates and the ARX utility scores. ConclusionsAs the importance of health data analysis increases, so does the need for effective privacy protection methods. While existing guidelines provide a basis for de-identifying datasets, achieving a balance between high privacy and utility is a complex task that requires understanding the data’s intended use and involving input from data users. This approach could help find a suitable compromise between data privacy and utility.more » « less
-
Abstract IntroductionNon‐routine events (NREs) are atypical or unusual occurrences in a pre‐defined process. Although some NREs in high‐risk clinical settings have no adverse effects on patient care, others can potentially cause serious patient harm. A unified strategy for identifying and describing NREs in these domains will facilitate the comparison of results between studies. MethodsWe conducted a literature search in PubMed, CINAHL, and EMBASE to identify studies related to NREs in high‐risk domains and evaluated the methods used for event observation and description. We applied The Joint Commission on Accreditation of Healthcare Organization (JCAHO) taxonomy (cause, impact, domain, type, prevention, and mitigation) to the descriptions of NREs from the literature. ResultsWe selected 25 articles that met inclusion criteria for review. Real‐time documentation of NREs was more common than a retrospective video review. Thirteen studies used domain experts as observers and seven studies validated observations with interrater reliability. Using the JCAHO taxonomy, “cause” was the most frequently applied classification method, followed by “impact,” “type,” “domain,” and “prevention and mitigation.” ConclusionsNREs are frequent in high‐risk medical settings. Strengths identified in several studies included the use of multiple observers with domain expertise and validation of the event ascertainment approach using interrater reliability. By applying the JCAHO taxonomy to the current literature, we provide an example of a structured approach that can be used for future analyses of NREs.more » « less
-
Background:Transcriptomics can reveal much about cellular activity, and cancer transcriptomics have been useful in investigating tumor cell behaviors. Patterns in transcriptome-wide gene expression can be used to investigate biological mechanisms and pathways that can explain the variability in patient response to cancer therapies. Methods:We identified gene expression patterns related to patient drug response by clustering tumor gene expression data and selecting from the resulting gene clusters those where expression of cluster genes was related to patient survival on specific drugs. We then investigated these gene clusters for biological meaning using several approaches, including identifying common genomic locations and transcription factors whose targets were enriched in these clusters and performing survival analyses to support these candidate transcription factor-drug relationships. Results:We identified gene clusters related to drug-specific survival, and through these, we were able to associate observed variations in patient drug response to specific known biological phenomena. Specifically, our analysis implicated 2 stem cell-related transcription factors, HOXB4 and SALL4, in poor response to temozolomide in brain cancers. In addition, expression of SNRNP70 and its targets were implicated in cetuximab response by 3 different analyses, although the mechanism remains unclear. We also found evidence that 2 cancer-related chromosomal structural changes may impact drug efficacy. Conclusion:In this study, we present the gene clusters identified and the results of our systematic analysis linking drug efficacy to specific transcription factors, which are rich sources of potential mechanistic relationships impacting patient outcomes. We also highlight the most promising of these results, which were supported by multiple analyses and by previous research. We report these findings as promising avenues for independent validation and further research into cancer treatments and patient response.more » « less
An official website of the United States government
