Anomaly-based attack detection methods that rely on learning the benign profile of operation are commonly used for identifying data falsification attacks and faults in cyber-physical systems. However, most works do not assume the presence of attacks while training the anomaly detectors- and their impact on eventual anomaly detection performance during the test set. Some robust learning methods overcompensate mitigation which leads to increased false positives in the absence of attacks/threats during training. To achieve this balance, this paper proposes a framework to enhance the robustness of previous anomaly detection frameworks in smart living applications, by introducing three profound design changes for threshold learning of time series anomaly detectors:(1) Tukey bi-weight loss function instead of square loss function (2) adding quantile weights to regression errors of Tukey (3) modifying the definition of empirical cost function from MSE to the harmonic mean of quantile weighted Tukey losses. We show that these changes mitigate performance degradation in anomaly detectors caused by untargeted poisoning attacks during training- while is simultaneously able to prevent false alarms in the absence of such training set attacks. We evaluate our work using a proof of concept that uses state-of-the-art anomaly detection in smart living CPS that detects false data injection in smart metering.
more »
« less
This content will become publicly available on June 1, 2026
Mitigating Impact of Data Poisoning Attacks on CPS Anomaly Detection with Provable Guarantees
Anomaly-based attack detection methods depend on some form of machine learning to detect data falsification attacks in smart living cyber–physical systems. However, there is a lack of studies that consider the presence of attacks during the training phase and their effect on detection and false alarm performance. To improve the robustness of time series learning for anomaly detection, we propose a framework by modifying design choices such as regression error type and loss function type while learning the thresholds for an anomaly detection framework during the training phase. Specifically, we offer theoretical proofs on the relationship between poisoning attack strengths and how that informs the choice of loss functions used to learn the detection thresholds. This, in turn, leads to explainability of why and when our framework mitigates data poisoning and the trade-offs associated with such design changes. The theoretical results are backed by experimental results that prove attack mitigation performance with NIST-specified metrics for CPS, using real data collected from a smart metering infrastructure as a proof of concept. Thus, the contribution is a framework that guarantees security of ML and ML for security simultaneously.
more »
« less
- Award ID(s):
- 2320951
- PAR ID:
- 10630122
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- Information
- Volume:
- 16
- Issue:
- 6
- ISSN:
- 2078-2489
- Page Range / eLocation ID:
- 428
- Subject(s) / Keyword(s):
- anomaly detection data poisoning attacks cyber–physical systems (CPS) machine learning robustness ML for security resilient learning-based CPS
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Smart grids integrate advanced information and communication technologies (ICTs) into traditional power grids for more efficient and resilient power delivery and management, but also introduce new security vulnerabilities that can be exploited by adversaries to launch cyber attacks, causing severe consequences such as massive blackout and infrastructure damages. Existing machine learning-based methods for detecting cyber attacks in smart grids are mostly based on supervised learning, which need the instances of both normal and attack events for training. In addition, supervised learning requires that the training dataset includes representative instances of various types of attack events to train a good model, which is sometimes hard if not impossible. This paper presents a new method for detecting cyber attacks in smart grids using PMU data, which is based on semi-supervised anomaly detection and deep representation learning. Semi-supervised anomaly detection only employs the instances of normal events to train detection models, making it suitable for finding unknown attack events. A number of popular semi-supervised anomaly detection algorithms were investigated in our study using publicly available power system cyber attack datasets to identify the best-performing ones. The performance comparison with popular supervised algorithms demonstrates that semi-supervised algorithms are more capable of finding attack events than supervised algorithms. Our results also show that the performance of semi-supervised anomaly detection algorithms can be further improved by augmenting with deep representation learning.more » « less
-
Falsified data from compromised Phasor Measurement Units (PMUs) in a smart grid induce Energy Management Systems (EMS) to have an inaccurate estimation of the state of the grid, disrupting various operations of the power grid. Moreover, the PMUs deployed at the distribution layer of a smart grid show dynamic fluctuations in their data streams, which make it extremely challenging to design effective learning frameworks for anomaly based attack detection. In this paper, we propose a noise resilient learning framework for anomaly based attack detection specifically for distribution layer PMU infrastructure, that show real time indicators of data falsifications attacks while offsetting the effect of false alarms caused by the noise. Specifically, we propose a feature extraction framework that uses some Pythagorean Means of the active power from a cluster of PMUs, reducing multi-dimensional nature of the PMU data streams via quick big data summarization. We also propose a robust and noise resilient methodology for learning thresholds based on generalized robust estimation theory of our invariant feature. We experimentally validate our approach and demonstrate improved reliability performance using two completely different datasets collected from real distribution level PMU infrastructures.more » « less
-
A backdoor data poisoning attack is an adversarial attack wherein the attacker injects several watermarked, mislabeled training examples into a training set. The watermark does not impact the test-time performance of the model on typical data; however, the model reliably errs on watermarked examples. To gain a better foundational understanding of backdoor data poisoning attacks, we present a formal theoretical framework within which one can discuss backdoor data poisoning attacks for classification problems. We then use this to analyze important statistical and computational issues surrounding these attacks. On the statistical front, we identify a parameter we call the memorization capacity that captures the intrinsic vulnerability of a learning problem to a backdoor attack. This allows us to argue about the robustness of several natural learning problems to backdoor attacks. Our results favoring the attacker involve presenting explicit constructions of backdoor attacks, and our robustness results show that some natural problem settings cannot yield successful backdoor attacks. From a computational standpoint, we show that under certain assumptions, adversarial training can detect the presence of backdoors in a training set. We then show that under similar assumptions, two closely related problems we call backdoor filtering and robust generalization are nearly equivalent. This implies that it is both asymptotically necessary and sufficient to design algorithms that can identify watermarked examples in the training set in order to obtain a learning algorithm that both generalizes well to unseen data and is robust to backdoors.more » « less
-
VehiGAN : Generative Adversarial Networks for Adversarially Robust V2X Misbehavior Detection SystemsVehicle-to-Everything (V2X) communication enables vehicles to communicate with other vehicles and roadside infrastructure, enhancing traffic management and improving road safety. However, the open and decentralized nature of V2X networks exposes them to various security threats, especially misbehaviors, necessitating a robust Misbehavior Detection System (MBDS). While Machine Learning (ML) has proved effective in different anomaly detection applications, the existing ML-based MBDSs have shown limitations in generalizing due to the dynamic nature of V2X and insufficient and imbalanced training data. Moreover, they are known to be vulnerable to adversarial ML attacks. On the other hand, Generative Adversarial Networks (GAN) possess the potential to mitigate the aforementioned issues and improve detection performance by synthesizing unseen samples of minority classes and utilizing them during their model training. Therefore, we propose the first application of GAN to design an MBDS that detects any misbehavior and ensures robustness against adversarial perturbation. In this article, we present several key contributions. First, we propose an advanced threat model for stealthy V2X misbehavior where the attacker can transmit malicious data and mask it using adversarial attacks to avoid detection by ML-based MBDS. We formulate two categories of adversarial attacks against the anomaly-based MBDS. Later, in the pursuit of a generalized and robust GAN-based MBDS, we train and evaluate a diverse set of Wasserstein GAN (WGAN) models and presentVehicularGAN(VehiGAN), an ensemble of multiple top-performing WGANs, which transcends the limitations of individual models and improves detection performance. We present a physics-guided data preprocessing technique that generates effective features for ML-based MBDS. In the evaluation, we leverage the state-of-the-art V2X attack simulation tool VASP to create a comprehensive dataset of V2X messages with diverse misbehaviors. Evaluation results show that in 20 out of 35 misbehaviors,VehiGANoutperforms the baseline and exhibits comparable detection performance in other scenarios. Particularly,VehiGANexcels in detecting advanced misbehaviors that manipulate multiple fields in V2X messages simultaneously, replicating unique maneuvers. Moreover,VehiGANprovides approximately 92% improvement in false positive rate under powerful adaptive adversarial attacks, and possesses intrinsic robustness against other adversarial attacks that target the false negative rate. Finally, we make the data and code available for reproducibility and future benchmarking, available athttps://github.com/shahriar0651/VehiGAN.more » « less
An official website of the United States government
