NSF PAR Search | NSF Public Access Repository

Anomaly Detection in Scientific Datasets using Sparse Representation

https://doi.org/10.1145/3588982.3603610

Moon, Aekyeung; Kim, Minjun; Chen, Jiaxi; Son, Seung Woo (August 2023, Proceedings of the First Workshop on AI for Systems)

As the size and complexity of high-performance computing (HPC) systems keep growing, scientists' ability to trust the data produced is paramount due to potential data corruption for various reasons, which may stay undetected. While employing machine learning-based anomaly detection techniques could relieve scientists of such concern, it is practically infeasible due to the need for labels for volumes of scientific datasets and the unwanted extra overhead associated. In this paper, we exploit spatial sparsity profiles exhibited in scientific datasets and propose an approach to detect anomalies effectively. Our method first extracts block-level sparse representations of original datasets in the transformed domain. Then it learns from the extracted sparse representations and builds the boundary threshold between normal and abnormal without relying on labeled data. Experiments using real-world scientific datasets show that the proposed approach requires 13% on average (less than 10% in most cases and as low as 0.3%) of the entire dataset to achieve competitive detection accuracy (70.74%-100.0%) as compared to two state-of-the-art unsupervised techniques.

Full Text Available

Search for: All records