skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements
Abstract BackgroundAlzheimer’s Disease (AD) is a devastating disease that destroys memory and other cognitive functions. There has been an increasing research effort to prevent and treat AD. In the US, two major data sharing resources for AD research are the National Alzheimer’s Coordinating Center (NACC) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI); Additionally, the National Institutes of Health (NIH) Common Data Elements (CDE) Repository has been developed to facilitate data sharing and improve the interoperability among data sets in various disease research areas. MethodTo better understand how AD-related data elements in these resources are interoperable with each other, we leverage different representation models to map data elements from different resources: NACC to ADNI, NACC to NIH CDE, and ADNI to NIH CDE. We explore bag-of-words based and word embeddings based models (Word2Vec and BioWordVec) to perform the data element mappings in these resources. ResultsThe data dictionaries downloaded on November 23, 2021 contain 1,195 data elements in NACC, 13,918 in ADNI, and 27,213 in NIH CDE Repository. Data element preprocessing reduced the numbers of NACC and ADNI data elements for mapping to 1,099 and 7,584 respectively. Manual evaluation of the mapping results showed that the bag-of-words based approach achieved the best precision, while the BioWordVec based approach attained the best recall. In total, the three approaches mapped 175 out of 1,099 (15.92%) NACC data elements to ADNI; 107 out of 1,099 (9.74%) NACC data elements to NIH CDE; and 171 out of 7,584 (2.25%) ADNI data elements to NIH CDE. ConclusionsThe bag-of-words based and word embeddings based approaches showed promise in mapping AD-related data elements between different resources. Although the mapping approaches need further improvement, our result indicates that there is a critical need to standardize CDEs across these valuable AD research resources in order to maximize the discoveries regarding AD pathophysiology, diagnosis, and treatment that can be gleaned from them.  more » « less
Award ID(s):
1931134
PAR ID:
10524222
Author(s) / Creator(s):
; ; ; ;
Corporate Creator(s):
Publisher / Repository:
Springer Nature
Date Published:
Journal Name:
BMC Medical Informatics and Decision Making
Volume:
24
Issue:
S3
ISSN:
1472-6947
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract BackgroundAlzheimer’s Disease (AD) is a widespread neurodegenerative disease with Mild Cognitive Impairment (MCI) acting as an interim phase between normal cognitive state and AD. The irreversible nature of AD and the difficulty in early prediction present significant challenges for patients, caregivers, and the healthcare sector. Deep learning (DL) methods such as Recurrent Neural Networks (RNN) have been utilized to analyze Electronic Health Records (EHR) to model disease progression and predict diagnosis. However, these models do not address some inherent irregularities in EHR data such as irregular time intervals between clinical visits. Furthermore, most DL models are not interpretable. To address these issues, we developed a novel DL architecture called Time‐Aware RNN (TA‐RNN) to predict MCI to AD conversion at the next clinical visit. MethodTA‐RNN comprises of a time embedding layer, attention‐based RNN, and prediction layer based on multi‐layer perceptron (MLP) (Figure 1). For interpretability, a dual‐level attention mechanism within the RNN identifies significant visits and features impacting predictions. TA‐RNN addresses irregular time intervals by incorporating time embedding into longitudinal cognitive and neuroimaging data based on attention weights to create a patient embedding. The MLP, trained on demographic data and the patient embedding predicts AD conversion. TA‐RNN was evaluated on Alzheimer’s Disease Neuroimaging Initiative (ADNI) and National Alzheimer’s Coordinating Center (NACC) datasets based on F2 score and sensitivity. ResultMultiple TA‐RNN models were trained with two, three, five, or six visits to predict the diagnosis at the next visit. In one setup, the models were trained and tested on ADNI. In another setup, the models were trained on the entire ADNI dataset and evaluated on the entire NACC dataset. The results indicated superior performance of TA‐RNN compared to state‐of‐the‐art (SOTA) and baseline approaches for both setups (Figure 2A and 2B). Based on attention weights, we also highlighted significant visits (Figure 3A) and features (Figure 3B) and observed that CDRSB and FAQ features and the most recent visit had highest influence in predictions. ConclusionWe propose TA‐RNN, an interpretable model to predict MCI to AD conversion while handling irregular time intervals. TA‐RNN outperformed SOTA and baseline methods in multiple experiments. 
    more » « less
  2. Abstract BackgroundIn Alzheimer’s Diseases (AD) research, multimodal imaging analysis can unveil complementary information from multiple imaging modalities and further our understanding of the disease. One application is to discover disease subtypes using unsupervised clustering. However, existing clustering methods are often applied to input features directly, and could suffer from the curse of dimensionality with high-dimensional multimodal data. The purpose of our study is to identify multimodal imaging-driven subtypes in Mild Cognitive Impairment (MCI) participants using a multiview learning framework based on Deep Generalized Canonical Correlation Analysis (DGCCA), to learn shared latent representation with low dimensions from 3 neuroimaging modalities. ResultsDGCCA applies non-linear transformation to input views using neural networks and is able to learn correlated embeddings with low dimensions that capture more variance than its linear counterpart, generalized CCA (GCCA). We designed experiments to compare DGCCA embeddings with single modality features and GCCA embeddings by generating 2 subtypes from each feature set using unsupervised clustering. In our validation studies, we found that amyloid PET imaging has the most discriminative features compared with structural MRI and FDG PET which DGCCA learns from but not GCCA. DGCCA subtypes show differential measures in 5 cognitive assessments, 6 brain volume measures, and conversion to AD patterns. In addition, DGCCA MCI subtypes confirmed AD genetic markers with strong signals that existing late MCI group did not identify. ConclusionOverall, DGCCA is able to learn effective low dimensional embeddings from multimodal data by learning non-linear projections. MCI subtypes generated from DGCCA embeddings are different from existing early and late MCI groups and show most similarity with those identified by amyloid PET features. In our validation studies, DGCCA subtypes show distinct patterns in cognitive measures, brain volumes, and are able to identify AD genetic markers. These findings indicate the promise of the imaging-driven subtypes and their power in revealing disease structures beyond early and late stage MCI. 
    more » « less
  3. Alzheimer’s disease (AD) is a degenerative brain disease that affects millions of people around the world. As populations in the United States and worldwide age, the prevalence of Alzheimer’s disease will only increase. In turn, the social and financial costs of AD will create a difficult environment for many families and caregivers across the globe.By combining genetic information, brain scans, and clinical data, gathered over time through the Alzheimer’s Disease Neuroimaging Initiative(ADNI), we propose a newJoint High-Order Multi-Modal Multi-Task Feature Learning method to predict the cognitive performance and diagnosis of patients with and without AD. 
    more » « less
  4. Abstract PurposeThe objective of this study was to develop a novel AI-ensembled network based on the most important features and affected brain regions to accurately classify and exhibit the pattern of progression of the stages of Cognitive Impairment (CI). MethodsWe proposed a novel ensembled architecture, 3D ResNet-18 - RF (Random Forest), and used this network to categorize the stages of Alzheimer’s disease (AD). The residual unit (blocks of ResNet) was introduced to the 3D Convolutional Neural network (CNN) to solve the degradation problem. It was considered an innovative strategy since the combination with fine-tuning resulted in higher accuracy. This network was trained on selected features and affected brain regions. The structured magnetic resonance images (MRI) were collected from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and the random forest was used for determining the importance of the features and affected regions from the parcellated 170 regions of interest (ROIs) using Atlas, automated anatomical labeling 3(AAL-3). This framework classified five categories of AD and detected the progression pattern. ResultsThe proposed network showed promising results with a 66% F-1 score, 76% sensitivity, and 93.5% specificity, which outperformed the performance of conventional methods for categorizing five categories. Ventral Posterolateral and Pulvinar lateral regions were the regions most affected, indicating the progression from early MCI to AD. The five-fold validation accuracy for the developed model was 60.02%. ConclusionThe results showed that the gray matter to white matter ratio was the most significant feature, which also accurately predicted the progression pattern. The performance metrics fluctuated with different hyperparameters, but they never exceeded 0.05% of the estimated results, indicating the validity and originality of the suggested methodology. 
    more » « less
  5. Abstract BackgroundImaging, cognitive and fluid data have been widely studied to identify quantitative biomarkers that can help predict the status and progression of Alzheimer’s disease (AD). However, it is still an underexplored topic whether there exist subpopulations with different genetic profiles across which the biomarker‐based prediction models may vary. We propose to use the Chow test (Chow 1960 Econometrica 28(3)) to perform genetically stratified analyses for identifying SNP‐based subpopulations coupled with precision AD biomarkers with varying effects on future diagnosis in these subpopulations. The investigation of such SNPs and precision biomarkers may eventually pave the way for increased customization of AD care. MethodParticipants included 1,324 subjects from the ADNI cohort with both AD biomarker and genotyping data available (http://www.pi4cs.org/qt‐pad‐challenge). 30 significant (P < 1.5E‐278) AD SNPs were sourced from (Jansen 2019 NatGen). Chow tests were performed to determine whether each of baseline visit measures of 16 AD biomarkers predicted AD diagnosis at the three‐year visit with varying slopes when stratifying upon the allelic dosage of each of 30 chosen SNPs. Bonferroni correction (P < 1.04E‐4) was employed to correct for multiple comparisons. ResultMultiple SNP‐biomarker pairs showed significant genetically driven deviations in the regression coefficients when predicting diagnosis in three years using baseline biomarkers (Figure 1). Top SNP hits involved rs769449 (Chr 19,APOE) and rs7561528 (Chr 2,LOC105373605), and almost all 16 studied biomarkers demonstrated differential slopes in different genotype groups to predict diagnosis in three years. To examine the details of these top findings, the regression coefficients calculated for each of the five most significant biomarkers of both SNPs were bootstrapped and plotted in Figure 2. ConclusionGenetic analysis of AD candidate SNPs in conjunction with AD biomarker data via the Chow test identified several SNPs coupled with precision AD biomarkers with varying prognosis effects in the corresponding genotype groups. These findings provide valuable information to reveal disease heterogeneity and help facilitate precision medicine. 
    more » « less