skip to main content

This content will become publicly available on December 1, 2024

Title: Evaluation of the application of sequence data to the identification of outbreaks of disease using anomaly detection methods

Anomaly detection methods have a great potential to assist the detection of diseases in animal production systems. We used sequence data of Porcine Reproductive and Respiratory Syndrome (PRRS) to define the emergence of new strains at the farm level. We evaluated the performance of 24 anomaly detection methods based on machine learning, regression, time series techniques and control charts to identify outbreaks in time series of new strains and compared the best methods using different time series: PCR positives, PCR requests and laboratory requests. We introduced synthetic outbreaks of different size and calculated the probability of detection of outbreaks (POD), sensitivity (Se), probability of detection of outbreaks in the first week of appearance (POD1w) and background alarm rate (BAR). The use of time series of new strains from sequence data outperformed the other types of data but POD, Se, POD1w were only high when outbreaks were large. The methods based on Long Short-Term Memory (LSTM) and Bayesian approaches presented the best performance. Using anomaly detection methods with sequence data may help to identify the emergency of cases in multiple farms, but more work is required to improve the detection with time series of high variability. Our results suggest a promising application of sequence data for early detection of diseases at a production system level. This may provide a simple way to extract additional value from routine laboratory analysis. Next steps should include validation of this approach in different settings and with different diseases.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
BioMed Central
Date Published:
Journal Name:
Veterinary Research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Björkroth, Johanna (Ed.)
    ABSTRACT Giardia duodenalis (syn. Giardia lamblia , Giardia intestinalis ) is the causative agent of giardiasis, one of the most common diarrheal infections in humans. Evolutionary relationships among G. duodenalis genotypes (or subtypes) of assemblage B, one of two genetic assemblages causing the majority of human infections, remain unclear due to poor phylogenetic resolution of current typing methods. In this study, we devised a methodology to identify new markers for a streamlined multilocus sequence typing (MLST) scheme based on comparisons of all core genes against the phylogeny of whole-genome sequences (WGS). Our analysis identified three markers with resolution comparable to that of WGS data. Using newly designed PCR primers for our novel MLST loci, we typed an additional 68 strains of assemblage B. Analyses of these strains and previously determined genome sequences showed that genomes of this assemblage can be assigned to 16 clonal complexes, each with unique gene content that is apparently tuned to differential virulence and ecology. Obtaining new genomes of Giardia spp. and other eukaryotic microbial pathogens remains challenging due to difficulties in culturing the parasites in the laboratory. Hence, the methods described here are expected to be widely applicable to other pathogens of interest and advance our understanding of their ecology and evolution. IMPORTANCE Giardia duodenalis assemblage B is a major waterborne pathogen and the most commonly identified genotype causing human giardiasis worldwide. The lack of morphological characters for classification requires the use of molecular techniques for strain differentiation; however, the absence of scalable and affordable next-generation sequencing (NGS)-based typing methods has prevented meaningful advancements in high-resolution molecular typing for further understanding of the evolution and epidemiology of assemblage B. Prior studies have reported high sequence diversity but low phylogenetic resolution at standard loci in assemblage B, highlighting the necessity of identifying new markers for accurate and robust molecular typing. Data from comparative analyses of available genomes in this study identified three loci that together form a novel high-resolution typing scheme with high concordance to whole-genome-based phylogenomics and which should aid in future public health endeavors related to this parasite. In addition, data from newly characterized strains suggest evidence of biogeographic and ecologic endemism. 
    more » « less
  2. Rapid, simple, inexpensive, accurate, and sensitive point-of-care (POC) detection of viral pathogens in bodily fluids is a vital component of controlling the spread of infectious diseases. The predominant laboratory-based methods for sample processing and nucleic acid detection face limitations that prevent them from gaining wide adoption for POC applications in low-resource settings and self-testing scenarios. Here, we report the design and characterization of an integrated system for rapid sample-to-answer detection of a viral pathogen in a droplet of whole blood comprised of a 2-stage microfluidic cartridge for sample processing and nucleic acid amplification, and a clip-on detection instrument that interfaces with the image sensor of a smartphone. The cartridge is designed to release viral RNA from Zika virus in whole blood using chemical lysis, followed by mixing with the assay buffer for performing reverse-transcriptase loop-mediated isothermal amplification (RT-LAMP) reactions in six parallel microfluidic compartments. The battery-powered handheld detection instrument uniformly heats the compartments from below, and an array of LEDs illuminates from above, while the generation of fluorescent reporters in the compartments is kinetically monitored by collecting a series of smartphone images. We characterize the assay time and detection limits for detecting Zika RNA and gamma ray-deactivated Zika virus spiked into buffer and whole blood and compare the performance of the same assay when conducted in conventional PCR tubes. Our approach for kinetic monitoring of the fluorescence-generating process in the microfluidic compartments enables spatial analysis of early fluorescent “bloom” events for positive samples, in an approach called “Spatial LAMP” (S-LAMP). We show that S-LAMP image analysis reduces the time required to designate an assay as a positive test, compared to conventional analysis of the average fluorescent intensity of the entire compartment. S-LAMP enables the RT-LAMP process to be as short as 22 minutes, resulting in a total sample-to-answer time in the range of 17–32 minutes to distinguish positive from negative samples, while demonstrating a viral RNA detection as low as 2.70 × 10 2 copies per μl, and a gamma-irradiated virus of 10 3 virus particles in a single 12.5 μl droplet blood sample. 
    more » « less
  3. Early detection of the COVID-19 virus, SARS-CoV-2, is key to mitigating the spread of new outbreaks. Data from individual testing is increasingly difficult to obtain as people conduct non-reported home tests, defer tests due to logistics or attitudes, or ignore testing altogether. Wastewater based epidemiology is an alternative method for surveilling a community while maintaining individual anonymity; however, a problem is that SARS-CoV-2 markers in wastewater vary throughout the day. Collecting grab samples at a single time may miss marker presence, while autosampling throughout a day is technically challenging and expensive. This study investigates a passive sampling method that would be expected to accumulate greater amounts of viral material from sewers over a period of time. Tampons were tested as passive swab sampling devices from which viral markers could be eluted with a Tween-20 surfactant wash. Six sewersheds in Detroit were sampled 16–22 times by paired swab (4 h immersion before retrieval) and grab methods over a five-month period and enumerated for N1 and N2 SARS-CoV-2 markers using ddPCR. Swabs detected SARS-CoV-2 markers significantly more frequently (P < 0.001) than grab samples, averaging two to three-fold more copies of SARS-CoV-2 markers than their paired grab samples (p < 0.0001) in the assayed volume (10 mL) of wastewater or swab eluate. No significant difference was observed in the recovery of a spiked-in control (Phi6), indicating that the improved sensitivity is not due to improvements in nucleic acid recovery or reduction of PCR inhibition. The outcomes of swab-based sampling varied significantly between sites, with swab samples providing the greatest improvements in counts for smaller sewersheds that otherwise tend to have greater variation in grab sample counts. Swab-sampling with tampons provides significant advantages in detection of SARS-CoV-2 wastewater markers and are expected to provide earlier detection of new outbreaks than grab samples, with consequent public health benefits. 
    more » « less
  4. Abstract

    Deep generative learning cannot only be used for generating new data with statistical characteristics derived from input data but also for anomaly detection, by separating nominal and anomalous instances based on their reconstruction quality. In this paper, we explore the performance of three unsupervised deep generative models—variational autoencoders (VAEs) with Gaussian, Bernoulli, and Boltzmann priors—in detecting anomalies in multivariate time series of commercial-flight operations. We created two VAE models with discrete latent variables (DVAEs), one with a factorized Bernoulli prior and one with a restricted Boltzmann machine (RBM) with novel positive-phase architecture as prior, because of the demand for discrete-variable models in machine-learning applications and because the integration of quantum devices based on two-level quantum systems requires such models. To the best of our knowledge, our work is the first that applies DVAE models to anomaly-detection tasks in the aerospace field. The DVAE with RBM prior, using a relatively simple—and classically or quantum-mechanically enhanceable—sampling technique for the evolution of the RBM’s negative phase, performed better in detecting anomalies than the Bernoulli DVAE and on par with the Gaussian model, which has a continuous latent space. The transfer of a model to an unseen dataset with the same anomaly but without re-tuning of hyperparameters or re-training noticeably impaired anomaly-detection performance, but performance could be improved by post-training on the new dataset. The RBM model was robust to change of anomaly type and phase of flight during which the anomaly occurred. Our studies demonstrate the competitiveness of a discrete deep generative model with its Gaussian counterpart on anomaly-detection problems. Moreover, the DVAE model with RBM prior can be easily integrated with quantum sampling by outsourcing its generative process to measurements of quantum states obtained from a quantum annealer or gate-model device.

    more » « less
  5. Hug, Laura A. (Ed.)
    ABSTRACT Natural microbial communities consist of closely related taxa that may exhibit phenotypic differences and inhabit distinct niches. However, connecting genetic diversity to ecological properties remains a challenge in microbial ecology due to the lack of pure cultures across the microbial tree of life. “ Candidatus Accumulibacter phosphatis” (Accumulibacter) is a polyphosphate-accumulating organism that contributes to the enhanced biological phosphorus removal (EBPR) biotechnological process for removing excess phosphorus from wastewater and preventing eutrophication from downstream receiving waters. Distinct Accumulibacter clades often coexist in full-scale wastewater treatment plants and laboratory-scale enrichment bioreactors and have been hypothesized to inhabit distinct ecological niches. However, since individual strains of the Accumulibacter lineage have not been isolated in pure culture to date, these predictions have been made solely on genome-based comparisons and enrichments with varying strain compositions. Here, we used genome-resolved metagenomics and metatranscriptomics to explore the activity of coexisting Accumulibacter strains in an engineered bioreactor environment. We obtained four high-quality genomes of Accumulibacter strains that were present in the bioreactor ecosystem, one of which is a completely contiguous draft genome scaffolded with long Nanopore reads. We identified core and accessory genes to investigate how gene expression patterns differed among the dominating strains. Using this approach, we were able to identify putative pathways and functions that may confer distinct functions to Accumulibacter strains and provide key functional insights into this biotechnologically significant microbial lineage. IMPORTANCE “ Candidatus Accumulibacter phosphatis” is a model polyphosphate-accumulating organism that has been studied using genome-resolved metagenomics, metatranscriptomics, and metaproteomics to understand the EBPR process. Within the Accumulibacter lineage, several similar but diverging clades are defined by the shared sequence identity of the polyphosphate kinase ( ppk1 ) locus. These clades are predicted to have key functional differences in acetate uptake rates, phage defense mechanisms, and nitrogen-cycling capabilities. However, such hypotheses have largely been made based on gene content comparisons of sequenced Accumulibacter genomes, some of which were obtained from different systems. Here, we performed time series genome-resolved metatranscriptomics to explore gene expression patterns of coexisting Accumulibacter clades in the same bioreactor ecosystem. Our work provides an approach for elucidating ecologically relevant functions based on gene expression patterns between closely related microbial populations. 
    more » « less