skip to main content


Title: Development and Validation of a Parsimonious Tuberculosis Gene Signature Using the digital NanoString nCounter Platform
Abstract Background

Blood-based biomarkers for diagnosing active tuberculosis (TB), monitoring treatment response, and predicting risk of progression to TB disease have been reported. However, validation of the biomarkers across multiple independent cohorts is scarce. A robust platform to validate TB biomarkers in different populations with clinical end points is essential to the development of a point-of-care clinical test. NanoString nCounter technology is an amplification-free digital detection platform that directly measures mRNA transcripts with high specificity. Here, we determined whether NanoString could serve as a platform for extensive validation of candidate TB biomarkers.

Methods

The NanoString platform was used for performance evaluation of existing TB gene signatures in a cohort in which signatures were previously evaluated on an RNA-seq dataset. A NanoString codeset that probes 107 genes comprising 12 TB signatures and 6 housekeeping genes (NS-TB107) was developed and applied to total RNA derived from whole blood samples of TB patients and individuals with latent TB infection (LTBI) from South India. The TBSignatureProfiler tool was used to score samples for each signature. An ensemble of machine learning algorithms was used to derive a parsimonious biomarker.

Results

Gene signatures present in NS-TB107 had statistically significant discriminative power for segregating TB from LTBI. Further analysis of the data yielded a NanoString 6-gene set (NANO6) that when tested on 10 published datasets was highly diagnostic for active TB.

Conclusions

The NanoString nCounter system provides a robust platform for validating existing TB biomarkers and deriving a parsimonious gene signature with enhanced diagnostic performance.

 
more » « less
NSF-PAR ID:
10372395
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Clinical Infectious Diseases
Volume:
75
Issue:
6
ISSN:
1058-4838
Page Range / eLocation ID:
p. 1022-1030
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods.

    Methods

    Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction.

    Results

    Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance.

    Conclusions

    We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

     
    more » « less
  2. Abstract Background

    Circulating miRNAs (c-miRNAs) are found in most, if not all, biological fluids and are becoming well-established non-invasive biomarkers of many human pathologies. However, their features in non-pathological contexts and whether their expression profiles reflect normal life history events have received little attention, especially in non-mammalian species. The aim of the present study was to investigate the potential of c-miRNAs to serve as biomarkers of reproductive and metabolic states in fish.

    Results

    The blood plasma was sampled throughout the reproductive cycle of female rainbow trout subjected to two different feeding regimes that triggered contrasting metabolic states. In addition, ovarian fluid was sampled at ovulation, and all samples were subjected to small RNA-seq analysis, leading to the establishment of a comprehensive miRNA repertoire (i.e., miRNAome) and enabling subsequent comparative analyses to a panel of RNA-seq libraries from a wide variety of tissues and organs. We showed that biological fluid miRNAomes are complex and encompass a high proportion of the overall rainbow trout miRNAome. While sharing a high proportion of common miRNAs, the blood plasma and ovarian fluid miRNAomes exhibited strong fluid-specific signatures. We further revealed that the blood plasma miRNAome significantly changed depending on metabolic and reproductive states. We subsequently identified three evolutionarily conserved muscle-specific miRNAs or myomiRs (miR-1-1/2-3p, miR-133a-1/2-3p, and miR-206-3p) that accumulated in the blood plasma in response to high feeding rates, making these myomiRs strong candidate biomarkers of active myogenesis. We also identified miR-202-5p as a candidate biomarker for reproductive success that could be used to predict ovulation and/or egg quality.

    Conclusions

    Together, these promising results reveal the high potential of c-miRNAs, including evolutionarily conserved myomiRs, as physiologically relevant biomarker candidates and pave the way for the use of c-miRNAs for non-invasive phenotyping in various fish species.

     
    more » « less
  3. Abstract Motivation

    Recent advances in biomedical research have made massive amount of transcriptomic data available in public repositories from different sources. Due to the heterogeneity present in the individual experiments, identifying reproducible biomarkers for a given disease from multiple independent studies has become a major challenge. The widely used meta-analysis approaches, such as Fisher’s method, Stouffer’s method, minP and maxP, have at least two major limitations: (i) they are sensitive to outliers, and (ii) they perform only one statistical test for each individual study, and hence do not fully utilize the potential sample size to gain statistical power.

    Results

    Here, we propose a gene-level meta-analysis framework that overcomes these limitations and identifies a gene signature that is reliable and reproducible across multiple independent studies of a given disease. The approach provides a comprehensive global signature that can be used to understand the underlying biological phenomena, and a smaller test signature that can be used to classify future samples of a given disease. We demonstrate the utility of the framework by constructing disease signatures for influenza and Alzheimer’s disease using nine datasets including 1108 individuals. These signatures are then validated on 12 independent datasets including 912 individuals. The results indicate that the proposed approach performs better than the majority of the existing meta-analysis approaches in terms of both sensitivity as well as specificity. The proposed signatures could be further used in diagnosis, prognosis and identification of therapeutic targets.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. Abstract Background

    Sepsis is a highly heterogeneous syndrome, which has hindered the development of effective therapies. This has prompted investigators to develop a precision medicine approach aimed at identifying biologically homogenous subgroups of patients with septic shock and critical illnesses. Transcriptomic analysis can identify subclasses derived from differences in underlying pathophysiological processes that may provide the basis for new targeted therapies. The goal of this study was to elucidate pathophysiological pathways and identify pediatric septic shock subclasses based on whole blood RNA expression profiles.

    Methods

    The subjects were critically ill children with cardiopulmonary failure who were a part of a prospective randomized insulin titration trial to treat hyperglycemia. Genome-wide expression profiling was conducted using RNA sequencing from whole blood samples obtained from 46 children with septic shock and 52 mechanically ventilated noninfected controls without shock. Patients with septic shock were allocated to subclasses based on hierarchical clustering of gene expression profiles, and we then compared clinical characteristics, plasma inflammatory markers, cell compositions using GEDIT, and immune repertoires using Imrep between the two subclasses.

    Results

    Patients with septic shock depicted alterations in innate and adaptive immune pathways. Among patients with septic shock, we identified two subtypes based on gene expression patterns. Compared with Subclass 2, Subclass 1 was characterized by upregulation of innate immunity pathways and downregulation of adaptive immunity pathways. Subclass 1 had significantly worse clinical outcomes despite the two classes having similar illness severity on initial clinical presentation. Subclass 1 had elevated levels of plasma inflammatory cytokines and endothelial injury biomarkers and demonstrated decreased percentages of CD4 T cells and B cells and less diverse T cell receptor repertoires.

    Conclusions

    Two subclasses of pediatric septic shock patients were discovered through genome-wide expression profiling based on whole blood RNA sequencing with major biological and clinical differences.

    Trial RegistrationThis is a secondary analysis of data generated as part of the observational CAF-PINT ancillary of the HALF-PINT study (NCT01565941). Registered March 29, 2012.

     
    more » « less
  5. Abstract

    Tuberculosis (TB) remains a significant cause of mortality worldwide. Metagenomic next-generation sequencing has the potential to reveal biomarkers of active disease, identify coinfection, and improve detection for sputum-scarce or culture-negative cases. We conducted a large-scale comparative study of 428 plasma, urine, and oral swab samples from 334 individuals from TB endemic and non-endemic regions to evaluate the utility of a shotgun metagenomic DNA sequencing assay for tuberculosis diagnosis. We found that the composition of the control population had a strong impact on the measured performance of the diagnostic test: the use of a control population composed of individuals from a TB non-endemic region led to a test with nearly 100% specificity and sensitivity, whereas a control group composed of individuals from TB endemic regions exhibited a high background of nontuberculous mycobacterial DNA, limiting the diagnostic performance of the test.Using mathematical modeling and quantitative comparisons to matched qPCR data, we found that the burden ofMycobacterium tuberculosisDNA constitutes a very small fraction (0.04 or less) of the total abundance of DNA originating from mycobacteria in samples from TB endemic regions. Our findings suggest that the utility of a minimally invasive metagenomic sequencing assay for pulmonary tuberculosis diagnostics is limited by the low burden ofM. tuberculosisand an overwhelming biological background of nontuberculous mycobacterial DNA.

     
    more » « less