skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Anomaly Detection in Catalog Streams
etecting valuable anomalies with high accuracy and low latency from large amounts of streaming data is a challenge. This article focuses on a special kind of stream, the catalog stream, which has a high-level structure to analyze the stream effectively. We first formulate the anomaly detection in catalog streams as a constrained optimization problem based on a catalog stream matrix. Then, a novel filtering-identifying based anomaly detection algorithm (FIAD) is proposed, which includes two complementary strategies, true event identifying and false alarm filtering. Different kinds of attention windows are developed to provide corresponding data for various algorithm components. The identifying strategy includes true events in a much smaller candidate set. Meanwhile, the filtering strategy significantly removes false positives. A scalable catalog stream processing framework CSPF is designed to support the proposed method efficiently. Extensive experiments are conducted on the catalog stream data sets from an astronomy observation. The experimental results show that the proposed method can achieve a false-positive rate as low as 0.04%, reduces the false alarms by 98.6% compared with the existing methods, and the latency to handle each catalog is 2.1 seconds. Furthermore, a total of 36 transient candidates are detected from one observation season.  more » « less
Award ID(s):
2109988
PAR ID:
10385351
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
IEEE Transactions on Big Data
ISSN:
2372-2096
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This article presents anomaly detection algorithms for marine robots based on their trajectories under the influence of unknown ocean flow. A learning algorithm identifies the flow field and estimates the through-water speed of a marine robot. By comparing the through-water speed with a nominal speed range, the algorithm is able to detect anomalies causing unusual speed changes. The identified ocean flow field is used to eliminate false alarms, where an abnormal trajectory may be caused by unexpected flow. The convergence of the algorithms is justified through the theory of adaptive control. The proposed strategy is robust to speed constraints and inaccurate flow modeling. Experimental results are collected on an indoor testbed formed by the Georgia Tech Miniature Autonomous Blimp and Georgia Tech Wind Measuring Robot, while simulation study is performed for ocean flow field. Data collected in both studies confirm the effectiveness of the algorithms in identifying the through-water speed and the detection of speed anomalies while avoiding false alarms. 
    more » « less
  2. null (Ed.)
    Spurious power consumption data reported from compromised meters controlled by organized adversaries in the Advanced Metering Infrastructure (AMI) may have drastic consequences on a smart grid’s operations. While existing research on data falsification in smart grids mostly defends against isolated electricity theft, we introduce a taxonomy of various data falsification attack types, when smart meters are compromised by organized or strategic rivals. To counter these attacks, we first propose a coarse-grained and a fine-grained anomaly-based security event detection technique that uses indicators such as deviation and directional change in the time series of the proposed anomaly detection metrics to indicate: (i) occurrence, (ii) type of attack, and (iii) attack strategy used, collectively known as attack context . Leveraging the attack context information, we propose three attack response metrics to the inferred attack context: (a) an unbiased mean indicating a robust location parameter; (b) a median absolute deviation indicating a robust scale parameter; and (c) an attack probability time ratio metric indicating the active time horizon of attacks. Subsequently, we propose a trust scoring model based on Kullback-Leibler (KL) divergence, that embeds the appropriate unbiased mean, the median absolute deviation, and the attack probability ratio metric at runtime to produce trust scores for each smart meter. These trust scores help classify compromised smart meters from the non-compromised ones. The embedding of the attack context, into the trust scoring model, facilitates accurate and rapid classification of compromised meters, even under large fractions of compromised meters, generalize across various attack strategies and margins of false data. Using real datasets collected from two different AMIs, experimental results show that our proposed framework has a high true positive detection rate, while the average false alarm and missed detection rates are much lesser than 10% for most attack combinations for two different real AMI micro-grid datasets. Finally, we also establish fundamental theoretical limits of the proposed method, which will help assess the applicability of our method to other domains. 
    more » « less
  3. In this paper, we address the problem of detecting and learning anomalies in high-dimensional data-streams in real-time. Following a data-driven approach, we propose an online and multivariate anomaly detection method that is suitable for the timely and accurate detection of anomalies. We propose our method for both semi-supervised and supervised settings. By combining the semi-supervised and supervised algorithms, we present a self-supervised online learning algorithm in which the semi-supervised algorithm trains the supervised algorithm to improve its detection performance over time. The methods are comprehensively analyzed in terms of computational complexity, asymptotic optimality, and false alarm rate. The performances of the proposed algorithms are also evaluated using real-world cybersecurity datasets, that show a significant improvement over the state-of-the-art results. 
    more » « less
  4. Abstract We present an interpretable implementation of the autoencoding algorithm, used as an anomaly detector, built with a forest of deep decision trees on FPGA, field programmable gate arrays. Scenarios at the Large Hadron Collider at CERN are considered, for which the autoencoder is trained using known physical processes of the Standard Model. The design is then deployed in real-time trigger systems for anomaly detection of unknown physical processes, such as the detection of rare exotic decays of the Higgs boson. The inference is made with a latency value of 30 ns at percent-level resource usage using the Xilinx Virtex UltraScale+ VU9P FPGA. Our method offers anomaly detection at low latency values for edge AI users with resource constraints. 
    more » « less
  5. Identifying anomalies, especially weak anomalies in constantly changing targets, is more difficult than in stable targets. In this article, we borrow the dynamics metrics and propose the concept of dynamics signature (DS) in multi-dimensional feature space to efficiently distinguish the abnormal event from the normal behaviors of a variable star. The corresponding dynamics criterion is proposed to check whether a star's current state is an anomaly. Based on the proposed concept of DS, we develop a highly optimized DS algorithm that can automatically detect anomalies from millions of stars' high cadence sky survey data in real-time. Microlensing, which is a typical anomaly in astronomical observation, is used to evaluate the proposed DS algorithm. Two datasets, parameterized sinusoidal dataset containing 262,440 light curves and real variable stars based dataset containing 462,996 light curves are used to evaluate the practical performance of the proposed DS algorithm. Experimental results show that our DS algorithm is highly accurate, sensitive to detecting weak microlensing events at very early stages, and fast enough to process 176,000 stars in less than 1 s on a commodity computer. 
    more » « less