skip to main content


Title: Experimental Comparison of Online Anomaly Detection Algorithms
Anomaly detection methods abound and are used extensively in streaming settings in a wide variety of domains. But a strength can also be a weakness; given the vast number of methods, how can one select the best method for their application? Unfortunately, there is no one best way for all domains. Existing literature is focused on creating new anomaly detection methods or creating large frameworks for experimenting with multiple methods at the same time. As the literature continues to grow, extensive evaluation of every available anomaly detection method is not feasible. To reduce this evaluation burden, in this paper we present a framework to intelligently choose the optimal anomaly detection methods based on the characteristics the time series displays. We provide a comprehensive experimental validation of multiple anomaly detection methods over different time series characteristics to form guidelines. Applying our framework can save time and effort by surfacing the most promising anomaly detection methods instead of experimenting extensively with a rapidly expanding library of anomaly detection methods.  more » « less
Award ID(s):
1757207
PAR ID:
10155950
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
The Thirty-Second International Flairs Conference
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Time series anomaly detection remains a perennially important research topic. If anything, it is a task that has become increasingly important in the burgeoning age of IoT. While there are hundreds of anomaly detection methods in the literature, one definition, time series discords, has emerged as a competitive and popular choice for practitioners. Time series discords are subsequences of a time series that are maximally far away from their nearest neighbors. Perhaps the most attractive feature of discords is their simplicity. Unlike many parameter laden methods, discords require only a single parameter to be set by the user: the subsequence length. In this work we argue that the utility of discords is reduced by sensitivity to this single user choice. The obvious solution to this problem, computing discords of all lengths then selecting the best anomalies (under some measure), seems to be computationally untenable. However, in this work we introduce MERLIN, an algorithm that can efficiently and exactly find discords of all lengths in massive time series archives. 
    more » « less
  2. Abstract

    Anomaly detection methods have a great potential to assist the detection of diseases in animal production systems. We used sequence data of Porcine Reproductive and Respiratory Syndrome (PRRS) to define the emergence of new strains at the farm level. We evaluated the performance of 24 anomaly detection methods based on machine learning, regression, time series techniques and control charts to identify outbreaks in time series of new strains and compared the best methods using different time series: PCR positives, PCR requests and laboratory requests. We introduced synthetic outbreaks of different size and calculated the probability of detection of outbreaks (POD), sensitivity (Se), probability of detection of outbreaks in the first week of appearance (POD1w) and background alarm rate (BAR). The use of time series of new strains from sequence data outperformed the other types of data but POD, Se, POD1w were only high when outbreaks were large. The methods based on Long Short-Term Memory (LSTM) and Bayesian approaches presented the best performance. Using anomaly detection methods with sequence data may help to identify the emergency of cases in multiple farms, but more work is required to improve the detection with time series of high variability. Our results suggest a promising application of sequence data for early detection of diseases at a production system level. This may provide a simple way to extract additional value from routine laboratory analysis. Next steps should include validation of this approach in different settings and with different diseases.

     
    more » « less
  3. The burgeoning age of IoT has reinforced the need for robust time series anomaly detection. While there are hundreds of anomaly detection methods in the literature, one definition, time series discords, has emerged as a competitive and popular choice for practitioners. Time series discords are subsequences of a time series that are maximally far away from their nearest neighbors. Perhaps the most attractive feature of discords is their simplicity. Unlike many of the parameter-laden methods proposed, discords require only a single parameter to be set by the user: the subsequence length. We believe that the utility of discords is reduced by sensitivity to even this single user choice. The obvious solution to this problem, computing discords of all lengths then selecting the best anomalies (under some measure), appears at first glance to be computationally untenable. However, in this work we discuss MERLIN, a recently introduced algorithm that can efficiently and exactly find discords of all lengths in massive time series archives. By exploiting computational redundancies, MERLIN is two orders of magnitude faster than comparable algorithms. Moreover, we show that by exploiting a little-known indexing technique called Orchard’s algorithm, we can create a new algorithm called MERLIN++, which is an order of magnitude faster than MERLIN, yet produces identical results. We demonstrate the utility of our ideas on a large and diverse set of experiments and show that MERLIN++ can discover subtle anomalies that defy existing algorithms or even careful human inspection. We further compare to five state-of-the-art rival methods, on the largest benchmark dataset for this task, and show that MERLIN++ is superior in terms of accuracy and speed. 
    more » « less
  4. Modern smart vehicles have a Controller Area Network (CAN) that supports intra-vehicle communication between intelligent Electronic Control Units (ECUs). The CAN is known to be vulnerable to various cyber attacks. In this paper, we propose a unified framework that can detect multiple types of cyber attacks (viz., Denial of Service, Fuzzy, Impersonation) affecting the CAN. Specifically, we construct a feature by observing the timing information of CAN packets exchanged over the CAN bus network over partitioned time windows to construct a low dimensional representation of the entire CAN network as a time series latent space. Then, we apply a two tier anomaly based intrusion detection model that keeps track of short term and long term memory of deviations in the initial time series latent space, to create a 'stateful latent space'. Then, we learn the boundaries of the benign stateful latent space that specify the attack detection criterion. To find hyper-parameters of our proposed model, we formulate a preference based multi-objective optimization problem that optimizes security objectives tailored for a network-wide time series anomaly based intrusion detector by balancing trade-offs between false alarm count, time to detection, and missed detection rate. We use real benign and attack datasets collected from a Kia Soul vehicle to validate our framework and show how our performance outperforms existing works. 
    more » « less
  5. Time series anomaly detection has been a perennially important topic in data science, with papers dating back to the 1950s. However, in recent years there has been an explosion of interest in this topic, much of it driven by the success of deep learning in other domains and for other time series tasks. Most of these papers test on one or more of a handful of popular benchmark datasets, created by Yahoo, Numenta, NASA, etc. In this work we make a surprising claim. The majority of the individual exemplars in these datasets suffer from one or more of four flaws. Because of these four flaws, we believe that many published comparisons of anomaly detection algorithms may be unreliable, and more importantly, much of the apparent progress in recent years may be illusionary. In addition to demonstrating these claims, with this paper we introduce the UCR Time Series Anomaly Archive. We believe that this resource will perform a similar role as the UCR Time Series Classification Archive, by providing the community with a benchmark that allows meaningful comparisons between approaches and a meaningful gauge of overall progress 
    more » « less