skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Accurate diagnosis of atopic dermatitis by combining transcriptome and microbiota data with supervised machine learning
Abstract

Atopic dermatitis (AD) is a common skin disease in childhood whose diagnosis requires expertise in dermatology. Recent studies have indicated that host genes–microbial interactions in the gut contribute to human diseases including AD. We sought to develop an accurate and automated pipeline for AD diagnosis based on transcriptome and microbiota data. Using these data of 161 subjects including AD patients and healthy controls, we trained a machine learning classifier to predict the risk of AD. We found that the classifier could accurately differentiate subjects with AD and healthy individuals based on the omics data with an average F1-score of 0.84. With this classifier, we also identified a set of 35 genes and 50 microbiota features that are predictive for AD. Among the selected features, we discovered at least three genes and three microorganisms directly or indirectly associated with AD. Although further replications in other cohorts are needed, our findings suggest that these genes and microbiota features may provide novel biological insights and may be developed into useful biomarkers of AD prediction.

 
more » « less
PAR ID:
10361352
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
12
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Imaging, cognitive and fluid data have been widely studied to identify quantitative biomarkers that can help predict the status and progression of Alzheimer’s disease (AD). However, it is still an underexplored topic whether there exist subpopulations with different genetic profiles across which the biomarker‐based prediction models may vary. We propose to use the Chow test (Chow 1960 Econometrica 28(3)) to perform genetically stratified analyses for identifying SNP‐based subpopulations coupled with precision AD biomarkers with varying effects on future diagnosis in these subpopulations. The investigation of such SNPs and precision biomarkers may eventually pave the way for increased customization of AD care.

    Method

    Participants included 1,324 subjects from the ADNI cohort with both AD biomarker and genotyping data available (http://www.pi4cs.org/qt‐pad‐challenge). 30 significant (P < 1.5E‐278) AD SNPs were sourced from (Jansen 2019 NatGen). Chow tests were performed to determine whether each of baseline visit measures of 16 AD biomarkers predicted AD diagnosis at the three‐year visit with varying slopes when stratifying upon the allelic dosage of each of 30 chosen SNPs. Bonferroni correction (P < 1.04E‐4) was employed to correct for multiple comparisons.

    Result

    Multiple SNP‐biomarker pairs showed significant genetically driven deviations in the regression coefficients when predicting diagnosis in three years using baseline biomarkers (Figure 1). Top SNP hits involved rs769449 (Chr 19,APOE) and rs7561528 (Chr 2,LOC105373605), and almost all 16 studied biomarkers demonstrated differential slopes in different genotype groups to predict diagnosis in three years. To examine the details of these top findings, the regression coefficients calculated for each of the five most significant biomarkers of both SNPs were bootstrapped and plotted in Figure 2.

    Conclusion

    Genetic analysis of AD candidate SNPs in conjunction with AD biomarker data via the Chow test identified several SNPs coupled with precision AD biomarkers with varying prognosis effects in the corresponding genotype groups. These findings provide valuable information to reveal disease heterogeneity and help facilitate precision medicine.

     
    more » « less
  2. Gibbons, Sean M. (Ed.)
    ABSTRACT Microbiota studies have reported changes in the microbial composition of the breast upon cancer development. However, results are inconsistent and limited to the later phases of cancer development (after diagnosis). We analyzed and compared the resident bacterial taxa of histologically normal breast tissue (healthy, H, n  = 49) with those of tissues donated prior to (prediagnostic, PD, n  = 15) and after (adjacent normal, AN, n  = 49, and tumor, T, n  = 46) breast cancer diagnosis ( n total = 159). DNA was isolated from tissue samples and submitted for Illumina MiSeq paired-end sequencing of the V3-V4 region of the 16S gene. To infer bacterial function in breast cancer, we predicted the functional bacteriome from the 16S sequencing data using PICRUSt2. Bacterial compositional analysis revealed an intermediary taxonomic signature in the PD tissue relative to that of the H tissue, represented by shifts in Bacillaceae , Burkholderiaceae , Corynebacteriaceae , Streptococcaceae , and Staphylococcaceae . This compositional signature was enhanced in the AN and T tissues. We also identified significant metabolic reprogramming of the microbiota of the PD, AN, and T tissue compared with the H tissue. Further, preliminary correlation analysis between host transcriptome profiling and microbial taxa and genes in H and PD tissues identified altered associations between the human host and mammary microbiota in PD tissue compared with H tissue. These findings suggest that compositional shifts in bacterial abundance and metabolic reprogramming of the breast tissue microbiota are early events in breast cancer development that are potentially linked with cancer susceptibility. IMPORTANCE The goal of this study was to determine the role of resident breast tissue bacteria in breast cancer development. We analyzed breast tissue bacteria in healthy breast tissue and breast tissue donated prior to (precancerous) and after (postcancerous) breast cancer diagnosis. Compared to healthy tissue, the precancerous and postcancerous breast tissues demonstrated differences in the amounts of breast tissue bacteria. In addition, breast tissue bacteria exhibit different functions in pre-cancerous and post-cancerous breast tissues relative to healthy tissue. These differences in function are further emphasized by altered associations of the breast tissue bacteria with gene expression in the human host prior to cancer development. Collectively, these analyses identified shifts in bacterial abundance and metabolic function (dysbiosis) prior to breast tumor diagnosis. This dysbiosis may serve as a therapeutic target in breast cancer prevention. 
    more » « less
  3. Metabolites are critical products and mediators of cellular and tissue function, and key signals in cell-to-cell, organ-to-organ and cross-organism communication. Many of these interactions are spatially segregated. Thus, spatial metabolomics can provide valuable insight into healthy tissue function and disease pathogenesis. Here, we review major mass spectrometry-based spatial metabolomics techniques and the biological insights they have enabled, with a focus on brain and microbiota function and on cancer, neurological diseases and infectious diseases. These techniques also present significant translational utility, for example in cancer diagnosis, and for drug development. However, spatial mass spectrometry techniques still encounter significant challenges, including artifactual features, metabolite annotation, open data, and ethical considerations. Addressing these issues represent the future challenges in this field. 
    more » « less
  4. null (Ed.)
    Introduction: Alzheimer’s disease (AD) causes progressive irreversible cognitive decline and is the leading cause of dementia. Therefore, a timely diagnosis is imperative to maximize neurological preservation. However, current treatments are either too costly or limited in availability. In this project, we explored using retinal vasculature as a potential biomarker for early AD diagnosis. This project focuses on stage 3 of a three-stage modular machine learning pipeline which consisted of image quality selection, vessel map generation, and classification [1]. The previous model only used support vector machine (SVM) to classify AD labels which limited its accuracy to 82%. In this project, random forest and gradient boosting were added and, along with SVM, combined into an ensemble classifier, raising the classification accuracy to 89%. Materials and Methods: Subjects classified as AD were those who were diagnosed with dementia in “Dementia Outcome: Alzheimer’s disease” from the UK Biobank Electronic Health Records. Five control groups were chosen with a 5:1 ratio of control to AD patients where the control patients had the same age, gender, and eye side image as the AD patient. In total, 122 vessel images from each group (AD and control) were used. The vessel maps were then segmented from fundus images through U-net. A t-test feature selection was first done on the training folds and the selected features was fed into the classifiers with a p-value threshold of 0.01. Next, 20 repetitions of 5-fold cross validation were performed where the hyperparameters were solely tuned on the training data. An ensemble classifier consisting of SVM, gradient boosting tree, and random forests was built and the final prediction was made through majority voting and evaluated on the test set. Results and Discussion: Through ensemble classification, accuracy increased by 4-12% relative to the individual classifiers, precision by 9-15%, sensitivity by 2-9%, specificity by at least 9-16%, and F1 score by 712%. Conclusions: Overall, a relatively high classification accuracy was achieved using machine learning ensemble classification with SVM, random forest, and gradient boosting. Although the results are very promising, a limitation of this study is that the requirement of needing images of sufficient quality decreased the amount of control parameters that can be implemented. However, through retinal vasculature analysis, this project shows machine learning’s high potential to be an efficient, more cost-effective alternative to diagnosing Alzheimer’s disease. Clinical Application: Using machine learning for AD diagnosis through retinal images will make screening available for a broader population by being more accessible and cost-efficient. Mobile device based screening can also be enabled at primary screening in resource-deprived regions. It can provide a pathway for future understanding of the association between biomarkers in the eye and brain. 
    more » « less
  5. Abstract Background

    Joint acoustic emissions from knees have been evaluated as a convenient, non-invasive digital biomarker of inflammatory knee involvement in a small cohort of children with Juvenile Idiopathic Arthritis (JIA). The objective of the present study was to validate this in a larger cohort.

    Findings

    A total of 116 subjects (86 JIA and 30 healthy controls) participated in this study. Of the 86 subjects with JIA, 43 subjects had active knee involvement at the time of study. Joint acoustic emissions were bilaterally recorded, and corresponding signal features were used to train a machine learning algorithm (XGBoost) to classify JIA and healthy knees. All active JIA knees and 80% of the controls were used as training data set, while the remaining knees were used as testing data set. Leave-one-leg-out cross-validation was used for validation on the training data set. Validation on the training and testing set of the classifier resulted in an accuracy of 81.1% and 87.7% respectively. Sensitivity / specificity for the training and testing validation was 88.6% / 72.3% and 88.1% / 83.3%, respectively. The area under the curve of the receiver operating characteristic curve was 0.81 for the developed classifier. The distributions of the joint scores of the active and inactive knees were significantly different.

    Conclusion

    Joint acoustic emissions can serve as an inexpensive and easy-to-use digital biomarker to distinguish JIA from healthy controls. Utilizing serial joint acoustic emission recordings can potentially help monitor disease activity in JIA affected joints to enable timely changes in therapy.

     
    more » « less