skip to main content


Title: Real-time distance-based outlier detection in data streams
Real-time outlier detection in data streams has drawn much attention recently as many applications need to be able to detect abnormal behaviors as soon as they occur. The arrival and departure of streaming data on edge devices impose new challenges to process the data quickly in real-time due to memory and CPU limitations of these devices. Existing methods are slow and not memory efficient as they mostly focus on quick detection of inliers and pay less attention to expediting neighbor searches for outlier candidates. In this study, we propose a new algorithm, CPOD, to improve the efficiency of outlier detections while reducing its memory requirements. CPOD uses a unique data structure called "core point" with multi-distance indexing to both quickly identify inliers and reduce neighbor search spaces for outlier candidates. We show that with six real-world and one synthetic dataset, CPOD is, on average, 10, 19, and 73 times faster than M_MCOD, NETS, and MCOD, respectively, while consuming low memory.  more » « less
Award ID(s):
2027794 1910950
NSF-PAR ID:
10225518
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
14
Issue:
2
ISSN:
2150-8097
Page Range / eLocation ID:
141 to 153
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The detection of abnormal moving objects over high-volume trajectory streams is critical for real-time applications ranging from military surveillance to transportation management. Yet this outlier detection problem, especially along both the spatial and temporal dimensions, remains largely unexplored. In this work, we propose a rich taxonomy of novel classes of neighbor-based trajectory outlier definitions that model the anomalous behavior of moving objects for a large range of real-time applications. Our theoretical analysis and empirical study on two real-world datasets—the Beijing Taxi trajectory data and the Ground Moving Target Indicator data stream—and one generated Moving Objects dataset demonstrate the effectiveness of our taxonomy in effectively capturing different types of abnormal moving objects. Furthermore, we propose a general strategy for efficiently detecting these new outlier classes called the minimal examination (MEX) framework. The MEX framework features three core optimization principles, which leverage spatiotemporal as well as the predictability properties of the neighbor evidence to minimize the detection costs. Based on this foundation, we design algorithms that detect the outliers based on these classes of new outlier semantics that successfully leverage our optimization principles. Our comprehensive experimental study demonstrates that our proposed MEX strategy drives the detection costs 100-fold down into the practical realm for applications that analyze high-volume trajectory streams in near real time. 
    more » « less
  2. Outlier detection is critical in real world. Due to the existence of many outlier detection techniques which often return different results for the same data set, the users have to address the problem of determining which among these techniques is the best suited for their task and tune its parameters. This is particularly challenging in the unsupervised setting, where no labels are available for cross-validation needed for such method and parameter optimization. In this work, we propose AutoOD which uses the existing unsupervised detection techniques to automatically produce high quality outliers without any human tuning. AutoOD's fundamentally new strategy unifies the merits of unsupervised outlier detection and supervised classification within one integrated solution. It automatically tests a diverse set of unsupervised outlier detectors on a target data set, extracts useful signals from their combined detection results to reliably capture key differences between outliers and inliers. It then uses these signals to produce a "custom outlier classifier" to classify outliers, with its accuracy comparable to supervised outlier classification models trained with ground truth labels - without having access to the much needed labels. On a diverse set of benchmark outlier detection datasets, AutoOD consistently outperforms the best unsupervised outlier detector selected from hundreds of detectors. It also outperforms other tuning-free approaches from 12 to 97 points (out of 100) in the F-1 score. 
    more » « less
  3. Similarity search is the basis for many data analytics techniques, including k-nearest neighbor classification and outlier detection. Similarity search over large data sets relies on i) a distance metric learned from input examples and ii) an index to speed up search based on the learned distance metric. In interactive systems, input to guide the learning of the distance metric may be provided over time. As this new input changes the learned distance metric, a naive approach would adopt the costly process of re-indexing all items after each metric change. In this paper, we propose the first solution, called OASIS, to instantaneously adapt the index to conform to a changing distance metric without this prohibitive re-indexing process. To achieve this, we prove that locality-sensitive hashing (LSH) provides an invariance property, meaning that an LSH index built on the original distance metric is equally effective at supporting similarity search using an updated distance metric as long as the transform matrix learned for the new distance metric satisfies certain properties. This observation allows OASIS to avoid recomputing the index from scratch in most cases. Further, for the rare cases when an adaption of the LSH index is shown to be necessary, we design an efficient incremental LSH update strategy that re-hashes only a small subset of the items in the index. In addition, we develop an efficient distance metric learning strategy that incrementally learns the new metric as inputs are received. Our experimental study using real world public datasets confirms the effectiveness of OASIS at improving the accuracy of various similarity search-based data analytics tasks by instantaneously adapting the distance metric and its associated index in tandem, while achieving an up to 3 orders of magnitude speedup over the state-of-art techniques. 
    more » « less
  4. Outlier detection (OD) is a key machine learning task for finding rare and deviant data samples, with many time-critical applications such as fraud detection and intrusion detection. In this work, we propose TOD, the first tensor-based system for efficient and scalable outlier detection on distributed multi-GPU machines. A key idea behind TOD is decomposing complex OD applications into a small collection of basic tensor algebra operators. This decomposition enables TOD to accelerate OD computations by leveraging recent advances in deep learning infrastructure in both hardware and software. Moreover, to deploy memory-intensive OD applications on modern GPUs with limited on-device memory, we introduce two key techniques. First, provable quantization speeds up OD computations and reduces its memory footprint by automatically performing specific floating-point operations in lower precision while provably guaranteeing no accuracy loss. Second, to exploit the aggregated compute resources and memory capacity of multiple GPUs, we introduce automatic batching , which decomposes OD computations into small batches for both sequential execution on a single GPU and parallel execution across multiple GPUs. TOD supports a diverse set of OD algorithms. Evaluation on 11 real-world and 3 synthetic OD datasets shows that TOD is on average 10.9X faster than the leading CPU-based OD system PyOD (with a maximum speedup of 38.9X), and can handle much larger datasets than existing GPU-based OD systems. In addition, TOD allows easy integration of new OD operators, enabling fast prototyping of emerging and yet-to-discovered OD algorithms. 
    more » « less
  5. Sensors are used to monitor various parameters in many real-world applications. Sudden changes in the underlying patterns of the sensors readings may represent events of interest. Therefore, event detection, an important temporal version of outlier detection, is one of the primary motivating applications in sensor networks. This work describes the implementation of a real-time outlier detection that uses an Autoencoder-LSTM neural-network accelerator implemented on the Xilinx PYNQ-Z1 development board. The implemented accelerator consists of a fine-tuned Autoencoder to extract the latent features in sensor data followed by a Long short-term memory (LSTM) network to predict the next step and detect outliers in real-time. The implemented design achieves 2.06 ms minimum latency and 85.9 GOp/s maximum throughput. The low latency and 0.25 W power consumption of the Autoencoder-LSTM outlier detector makes it suitable for resource-constrained computing platforms. 
    more » « less