skip to main content


This content will become publicly available on December 1, 2024

Title: Evaluation of the application of sequence data to the identification of outbreaks of disease using anomaly detection methods
Abstract

Anomaly detection methods have a great potential to assist the detection of diseases in animal production systems. We used sequence data of Porcine Reproductive and Respiratory Syndrome (PRRS) to define the emergence of new strains at the farm level. We evaluated the performance of 24 anomaly detection methods based on machine learning, regression, time series techniques and control charts to identify outbreaks in time series of new strains and compared the best methods using different time series: PCR positives, PCR requests and laboratory requests. We introduced synthetic outbreaks of different size and calculated the probability of detection of outbreaks (POD), sensitivity (Se), probability of detection of outbreaks in the first week of appearance (POD1w) and background alarm rate (BAR). The use of time series of new strains from sequence data outperformed the other types of data but POD, Se, POD1w were only high when outbreaks were large. The methods based on Long Short-Term Memory (LSTM) and Bayesian approaches presented the best performance. Using anomaly detection methods with sequence data may help to identify the emergency of cases in multiple farms, but more work is required to improve the detection with time series of high variability. Our results suggest a promising application of sequence data for early detection of diseases at a production system level. This may provide a simple way to extract additional value from routine laboratory analysis. Next steps should include validation of this approach in different settings and with different diseases.

 
more » « less
Award ID(s):
1838207
NSF-PAR ID:
10494903
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
BioMed Central
Date Published:
Journal Name:
Veterinary Research
Volume:
54
Issue:
1
ISSN:
1297-9716
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Anomaly detection methods abound and are used extensively in streaming settings in a wide variety of domains. But a strength can also be a weakness; given the vast number of methods, how can one select the best method for their application? Unfortunately, there is no one best way for all domains. Existing literature is focused on creating new anomaly detection methods or creating large frameworks for experimenting with multiple methods at the same time. As the literature continues to grow, extensive evaluation of every available anomaly detection method is not feasible. To reduce this evaluation burden, in this paper we present a framework to intelligently choose the optimal anomaly detection methods based on the characteristics the time series displays. We provide a comprehensive experimental validation of multiple anomaly detection methods over different time series characteristics to form guidelines. Applying our framework can save time and effort by surfacing the most promising anomaly detection methods instead of experimenting extensively with a rapidly expanding library of anomaly detection methods. 
    more » « less
  2. Sayan Mukherjee (Ed.)
    distributed random variables with which the original time series has a causal representation. The innovation at a time is statistically independent of the history of the time series. As such, it represents the new information contained at present but not in the past. Because of its simple probability structure, the innovations sequence is the most efficient signature of the original. Unlike the principle or independent component representations, an innovations sequence preserves not only the complete statistical properties but also the temporal order of the original time series. A long-standing open problem is to find a computationally tractable way to extract an innovations sequence of non-Gaussian processes. This paper presents a deep learning approach, referred to as Innovations Autoencoder (IAE), that extracts innovations sequences using a causal convolutional neural network. An application of IAE to the one-class anomalous sequence detection problem with unknown anomaly and anomaly-free models is also presented. 
    more » « less
  3. With the booming of online service systems, anomaly detection on multivariate time series, such as a combination of CPU utilization, average response time, and requests per second, is important for system reliability. Although a collection of learning-based approaches have been designed for this purpose, our empirical study shows that these approaches suffer from long initialization time for sufficient training data. In this paper, we introduce the Compressed Sensing technique to multivariate time series anomaly detection for rapid initialization. To build a jump-starting anomaly detector, we propose an approach named JumpStarter. Based on domainspecific insights, we design a shape-based clustering algorithm as well as an outlier-resistant sampling algorithm for JumpStarter.With real-world multivariate time series datasets collected from two Internet companies, our results show that JumpStarter achieves an average F1 score of 94.12%, significantly outperforming the state-of-the-art anomaly detection algorithms, with a much shorter initialization time of twenty minutes. We have applied JumpStarter in online service systems and gained useful lessons in real-world scenarios. 
    more » « less
  4. The proliferation of web platforms has created incentives for online abuse. Many graph-based anomaly detection techniques are proposed to identify the suspicious accounts and behaviors. However, most of them detect the anomalies once the users have performed many such behaviors. Their performance is substantially hindered when the users' observed data is limited at an early stage, which needs to be improved to minimize financial loss. In this work, we propose Eland, a novel framework that uses action sequence augmentation for early anomaly detection. Eland utilizes a sequence predictor to predict next actions of every user and exploits the mutual enhancement between action sequence augmentation and user-action graph anomaly detection. Experiments on three real-world datasets show that Eland improves the performance of a variety of graph-based anomaly detection methods. With Eland, anomaly detection performance at an earlier stage is better than non-augmented methods that need significantly more observed data by up to 15% on the Area under the ROC curve. 
    more » « less
  5. Access to safe and nutritious food is critical for maintaining life and supporting good health. Eating food that is contaminated with pathogens leads to serious diseases ranging from diarrhea to cancer. Many foodborne infections can cause long-term impairment or even death. Hence, early detection of foodborne pathogens such as pathogenicEscherichia colistrains is essential for public safety. Conventional methods for detecting these bacteria are based on culturing on selective media and following standard biochemical identification. Despite their accuracy, these methods are time-consuming. PCR-based detection of pathogens relies on sophisticated equipment and specialized technicians which are difficult to find in areas with limited resources. Whereas CRISPR technology is more specific and sensitive for identifying pathogenic bacteria because it employs programmable CRISPR-Cas systems that target particular DNA sequences, minimizing non-specific binding and cross-reactivity. In this project, a robust detection method based on CRISPR-Cas12a sensing was developed, which is rapid, sensitive and specific for detection of pathogenicE. coliisolates that were collected from the fecal samples from adult goats from 17 farms in Tennessee. Detection reaction contained amplified PCR products for the pathogenic regions, reporter probe, Cas12a enzyme, and crRNA specific to three pathogenic genes—stx1, stx2, and hlyA. The CRISPR reaction with the pathogenic bacteria emitted fluorescence when excited under UV light. To evaluate the detection sensitivity and specificity of this assay, its results were compared with PCR based detection assay. Both methods resulted in similar results for the same samples. This technique is very precise, highly sensitive, quick, cost effective, and easy to use, and can easily overcome the limitations of the present detection methods. This project can result in a versatile detection method that is easily adaptable for rapid response in the detection and surveillance of diseases that pose large-scale biosecurity threats to human health, and plant and animal production.

     
    more » « less