skip to main content


Title: Minos: variant adjudication and joint genotyping of cohorts of bacterial genomes
Abstract

There are many short-read variant-calling tools, with different strengths and weaknesses. We present a tool, Minos, which combines outputs from arbitrary variant callers, increasing recall without loss of precision. We benchmark on 62 samples from three bacterial species and an outbreak of 385Mycobacterium tuberculosissamples. Minos also enables joint genotyping; we demonstrate on a large (N=13k)M. tuberculosiscohort, building a map of non-synonymous SNPs and indels in a region where all such variants are assumed to cause rifampicin resistance. We quantify the correlation with phenotypic resistance and then replicate in a second cohort (N=10k).

 
more » « less
NSF-PAR ID:
10368530
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Genome Biology
Volume:
23
Issue:
1
ISSN:
1474-760X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Objective

    To characterize a cohort of patients withSCN8A‐related epilepsy and to perform analyses to identify correlations involving the acquisition of neurodevelopmental skills.

    Methods

    We analyzed patient data (n = 91) submitted to an online registry tailored to characteristics of children withSCN8Avariants. Participants provided information on the history of their child's seizures, medications, comorbidities, and developmental skills based on the DenverIIitems. Spearman rank tests were utilized to test for correlations among a variety of aspects of seizures, medications, and neurodevelopmental progression.

    Results

    The 91 participants carried 71 missense variants (41 newly reported) and three truncating variants. Ages at seizure onset ranged from birth to >12 months of age (mean ± SD = 5 months 21 days ± 7 months 14 days). Multiple seizure types with multimodal onset times and developmental delay were observed as general features of this cohort. We found a positive correlation between a developmental score based upon percentage of acquired skills and the age at seizure onset, current seizure freedom, and initial febrile seizures. Analyses of cohort subgroups revealed clear distinctions between patients who had a single reported variant inSCN8Aand those with an additional variant reported in a gene other thanSCN8A, as well as between patients with different patterns of regression before and at seizure onset.

    Significance

    This is the first study of anSCN8A patient cohort of this size and for which correlations between age at seizure onset and neurodevelopment were investigated. Our correlation studies suggest that variants of uncertain significance should be considered in assessing children withSCN8A‐related disorders. This study substantially improves the characterization of this patient population and our understanding of the neurodevelopmental effects associated with seizures forSCN8A patients, and provides a clinical context at initial presentation that may be prognostic for developmental outcome.

     
    more » « less
  2. Abstract Background

    Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods.

    Methods

    Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction.

    Results

    Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance.

    Conclusions

    We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

     
    more » « less
  3. Abstract Background

    Blood-based biomarkers for diagnosing active tuberculosis (TB), monitoring treatment response, and predicting risk of progression to TB disease have been reported. However, validation of the biomarkers across multiple independent cohorts is scarce. A robust platform to validate TB biomarkers in different populations with clinical end points is essential to the development of a point-of-care clinical test. NanoString nCounter technology is an amplification-free digital detection platform that directly measures mRNA transcripts with high specificity. Here, we determined whether NanoString could serve as a platform for extensive validation of candidate TB biomarkers.

    Methods

    The NanoString platform was used for performance evaluation of existing TB gene signatures in a cohort in which signatures were previously evaluated on an RNA-seq dataset. A NanoString codeset that probes 107 genes comprising 12 TB signatures and 6 housekeeping genes (NS-TB107) was developed and applied to total RNA derived from whole blood samples of TB patients and individuals with latent TB infection (LTBI) from South India. The TBSignatureProfiler tool was used to score samples for each signature. An ensemble of machine learning algorithms was used to derive a parsimonious biomarker.

    Results

    Gene signatures present in NS-TB107 had statistically significant discriminative power for segregating TB from LTBI. Further analysis of the data yielded a NanoString 6-gene set (NANO6) that when tested on 10 published datasets was highly diagnostic for active TB.

    Conclusions

    The NanoString nCounter system provides a robust platform for validating existing TB biomarkers and deriving a parsimonious gene signature with enhanced diagnostic performance.

     
    more » « less
  4. null (Ed.)
    Abstract Background Acinetobacter baumannii is a gram-negative bacterium which causes opportunistic infections in immunocompromised hosts. Genome plasticity has given rise to a wide range of strain variation with respect to antimicrobial resistance profiles and expression of virulence factors which lead to altered phenotypes associated with pathogenesis. The purpose of this study was to analyze clinical strains of A. baumannii for phenotypic variation that might correlate with virulence phenotypes, antimicrobial resistance patterns, or strain isolation source. We hypothesized that individual strain virulence phenotypes might be associated with anatomical site of isolation or alterations in susceptibility to antimicrobial interventions. Methodology A cohort of 17 clinical isolates of A. baumannii isolated from diverse anatomical sites were evaluated to ascertain phenotypic patterns including biofilm formation, hemolysis, motility, and antimicrobial resistance. Antibiotic susceptibility/resistance to ampicillin-sulbactam, amikacin, ceftriaxone, ceftazidime, cefotaxime, ciprofloxacin, cefepime, gentamicin, levofloxacin, meropenem, piperacillin, trimethoprim-sulfamethoxazole, ticarcillin- K clavulanate, tetracyclin, and tobramycin was determined. Results Antibiotic resistance was prevalent in many strains including resistance to ampicillin-sulbactam, amikacin, ceftriaxone, ceftazidime, cefotaxime, ciprofloxacin, cefepime, gentamicin, levofloxacin, meropenem, piperacillin, trimethoprim-sulfamethoxazole, ticarcillin- K clavulanate, tetracyclin, and tobramycin. All strains tested induced hemolysis on agar plate detection assays. Wound-isolated strains of A. baumannii exhibited higher motility than strains isolated from blood, urine or Foley catheter, or sputum/bronchial wash. A. baumannii strains isolated from patient blood samples formed significantly more biofilm than isolates from wounds, sputum or bronchial wash samples. An inverse relationship between motility and biofilm formation was observed in the cohort of 17 clinical isolates of A. baumannii tested in this study. Motility was also inversely correlated with induction of hemolysis. An inverse correlation was observed between hemolysis and resistance to ticarcillin-k clavulanate, meropenem, and piperacillin. An inverse correlation was also observed between motility and resistance to ampicillin-sulbactam, ceftriaxone, ceftoxamine, ceftazidime, ciprofloxacin, or levofloxacin. Conclusions Strain dependent variations in biofilm and motility are associated with anatomical site of isolation. Biofilm and hemolysis production both have an inverse association with motility in the cohort of strains utilized in this study, and motility and hemolysis were inversely correlated with resistance to numerous antibiotics. 
    more » « less
  5. Abstract Introduction

    Exposure to community violence (ECV) continues to be a major public health problem among urban adolescents in the United States. We sought to identify subgroups of adolescents' ECV and examine how after‐school activities are related to exposure subgroups across two samples.

    Methods

    In Study 1 there were 1432 adolescents (Cohort 9n = 717,Mage = 11, and Cohort 12n = 715,Mage = 14; 52% boys) from the Project on Human Development in Chicago Neighborhoods (1994–2002). Study 2 had a more recent sample of 537 adolescents (Mage = 16 years; 54% girls) from the After‐School Activity Study (ASAS; 2015–2017) in Chicago and Detroit.

    Results

    Exploratory latent class analyses yielded a three‐class solution for Study 1: a “No ECV” class (44%); a “Low ECV” class (36%); and a “High Exposure” class (14%). In Study 2, a four‐class solution was the best fit with a “No ECV” class (33%), a “Moderate Witness/Low Victim” class (36%), a “High Witness/Moderate Victim” class (19%), and a “High ECV” class (11%). Home‐based activities appeared to be protective against high ECV for adolescents in Study 2. School‐based activities were associated with higher ECV across both samples, but community‐based activities were only associated with greater violence exposure in Study 1. Adolescents' unstructured socializing in both studies was associated with higher odds of ECV.

    Conclusions

    Results indicate that subgroups of adolescents can be identified based on ECV and highlight the complexity of after‐school activities as risk and protective factors in both past and more recent contexts.

     
    more » « less