skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Anomaly detection in the presence of irrelevant features
A<sc>bstract</sc> Experiments at particle colliders are the primary source of insight into physics at microscopic scales. Searches at these facilities often rely on optimization of analyses targeting specific models of new physics. Increasingly, however, data-driven model-agnostic approaches based on machine learning are also being explored. A major challenge is that such methods can be highly sensitive to the presence of many irrelevant features in the data. This paper presents Boosted Decision Tree (BDT)-based techniques to improve anomaly detection in the presence of many irrelevant features. First, a BDT classifier is shown to be more robust than neural networks for the Classification Without Labels approach to finding resonant excesses assuming independence of resonant and non-resonant observables. Next, a tree-based probability density estimator using copula transformations demonstrates significant stability and improved performance over normalizing flows as irrelevant features are added. The results make a compelling case for further development of tree-based algorithms for more robust resonant anomaly detection in high energy physics.  more » « less
Award ID(s):
2309456
PAR ID:
10521439
Author(s) / Creator(s):
; ;
Publisher / Repository:
10.1007/JHEP02(2024)220
Date Published:
Journal Name:
Journal of High Energy Physics
Volume:
2024
Issue:
2
ISSN:
1029-8479
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In this study, we investigate the application of the New Physics Learning Machine (NPLM) algorithm as an alternative to the standard CWoLa method with Boosted Decision Trees (BDTs), particularly for scenarios with rare signal events. NPLM offers an end-to-end approach to anomaly detection and hypothesis testing by utilizing an in-sample evaluation of a binary classifier to estimate a log-density ratio, which can improve detection performance without prior assumptions on the signal model. We examine two approaches: (1) a end-to-end NPLM application in cases with reliable background modelling and (2) an NPLM-based classifier used for signal selection when accurate background modelling is unavailable, with subsequent performance enhancement through a hyper-test on multiple values of the selection threshold. Our findings show that NPLM-based methods outperform BDT-based approaches in detection performance, particularly in low signal injection scenarios, while significantly reducing epistemic variance due to hyperparameter choices. This work highlights the potential of NPLM for robust resonant anomaly detection in particle physics, setting a foundation for future methods that enhance sensitivity and consistency under signal variability. 
    more » « less
  2. A<sc>bstract</sc> In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets. 
    more » « less
  3. Vehicle-to-Everything (V2X) communication enables vehicles to communicate with other vehicles and roadside infrastructure, enhancing traffic management and improving road safety. However, the open and decentralized nature of V2X networks exposes them to various security threats, especially misbehaviors, necessitating a robust Misbehavior Detection System (MBDS). While Machine Learning (ML) has proved effective in different anomaly detection applications, the existing ML-based MBDSs have shown limitations in generalizing due to the dynamic nature of V2X and insufficient and imbalanced training data. Moreover, they are known to be vulnerable to adversarial ML attacks. On the other hand, Generative Adversarial Networks (GAN) possess the potential to mitigate the aforementioned issues and improve detection performance by synthesizing unseen samples of minority classes and utilizing them during their model training. Therefore, we propose the first application of GAN to design an MBDS that detects any misbehavior and ensures robustness against adversarial perturbation. In this article, we present several key contributions. First, we propose an advanced threat model for stealthy V2X misbehavior where the attacker can transmit malicious data and mask it using adversarial attacks to avoid detection by ML-based MBDS. We formulate two categories of adversarial attacks against the anomaly-based MBDS. Later, in the pursuit of a generalized and robust GAN-based MBDS, we train and evaluate a diverse set of Wasserstein GAN (WGAN) models and presentVehicularGAN(VehiGAN), an ensemble of multiple top-performing WGANs, which transcends the limitations of individual models and improves detection performance. We present a physics-guided data preprocessing technique that generates effective features for ML-based MBDS. In the evaluation, we leverage the state-of-the-art V2X attack simulation tool VASP to create a comprehensive dataset of V2X messages with diverse misbehaviors. Evaluation results show that in 20 out of 35 misbehaviors,VehiGANoutperforms the baseline and exhibits comparable detection performance in other scenarios. Particularly,VehiGANexcels in detecting advanced misbehaviors that manipulate multiple fields in V2X messages simultaneously, replicating unique maneuvers. Moreover,VehiGANprovides approximately 92% improvement in false positive rate under powerful adaptive adversarial attacks, and possesses intrinsic robustness against other adversarial attacks that target the false negative rate. Finally, we make the data and code available for reproducibility and future benchmarking, available athttps://github.com/shahriar0651/VehiGAN. 
    more » « less
  4. Abstract Anomaly, or out-of-distribution, detection is a promising tool for aiding discoveries of new particles or processes in particle physics. In this work, we identify and address two overlooked opportunities to improve anomaly detection (AD) for high-energy physics. First, rather than train a generative model on the single most dominant background process, we build detection algorithms using representation learning from multiple background types, thus taking advantage of more information to improve estimation of what is relevant for detection. Second, we generalize decorrelation to the multi-background setting, thus directly enforcing a more complete definition of robustness for AD. We demonstrate the benefit of the proposed robust multi-background AD algorithms on a high-dimensional dataset of particle decays at the Large Hadron Collider. 
    more » « less
  5. To maximize the discovery potential of high-energy colliders, experimental searches should be sensitive to unforeseen new physics scenarios. This goal has motivated the use of machine learning for unsupervised anomaly detection. In this paper, we introduce a new anomaly detection strategy called : factorized observables for regressing conditional expectations. Our approach is based on the inductive bias of factorization, which is the idea that the physics governing different energy scales can be treated as approximately independent. Assuming factorization holds separately for signal and background processes, the appearance of nontrivial correlations between low- and high-energy observables is a robust indicator of new physics. Under the most restrictive form of factorization, a machine-learned model trained to identify such correlations will in fact converge to the optimal new physics classifier. We test on a benchmark anomaly detection task for the Large Hadron Collider involving collimated sprays of particles called jets. By teasing out correlations between the kinematics and substructure of jets, our method can reliably extract percent-level signal fractions. This strategy for uncovering new physics adds to the growing toolbox of anomaly detection methods for collider physics with a complementary set of assumptions. Published by the American Physical Society2024 
    more » « less