We describe the outcome of a data challenge conducted as part of the Dark Machines (https://www.darkmachines.org) initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims to detect signals of new physics at the Large Hadron Collider (LHC) using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 billion simulated LHC events corresponding to 10\, fb^{-1} 10 f b − 1 of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge.
more »
« less
Network Intrusion Detection and Machine Learning
Overall, this document will serve as an analysis of the combination between machine learning principles and computer network analysis in their ability to detect a network anomaly, such as a network attack. The research provided in this document will highlight the key elements of network analysis and provide an overview of common network analysis techniques. Specifically, this document will highlight a study conducted by the University of Luxembourg and an attempt to recreate the study with a slightly different list of parameters against a different dataset for network anomaly detection using NetFlow data. Alongside network analysis, is the emerging field of machine learning. This document will be investigating common machine learning techniques and implement a support vector machine algorithm to detect anomaly and intrusion within the network. MatLab was an utilized machine learning tool for identifying how to coordinate network analysis data with Support Vector Machines. The resulting graphs represent tests conducted using Support vector machines in a method similar to that of the University of Luxembourg. The difference between the tests is within the metrics used for anomaly detection. The University of Luxembourg utilized the IP addresses and the volume of traffic of a specific NetFlow dataset. The resulting graphs utilize a metric based on the duration of transmitted bytes, and the ratio of the incoming and outgoing bytes during the transmission. The algorithm created and defined metrics proved to not be as efficient as planned against the NetFlow dataset. The use of the conducted tests did not provide a clear classification of an anomaly. However, many other factors contributing to network anomalies were highlighted.
more »
« less
- Award ID(s):
- 1754054
- PAR ID:
- 10284704
- Date Published:
- Journal Name:
- ADMI 2021: The Symposium of Computing at Minority Institutions
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Although the connectivity offered by industrial internet of things (IIoT) enables enhanced operational capabilities, the exposure of systems to significant cybersecurity risks poses critical challenges. Recently, machine learning (ML) algorithms such as feature-based support vector machines and logistic regression, together with end-to-end deep neural networks, have been implemented to detect intrusions, including command injection, denial of service, reconnaissance, and backdoor attacks, by capturing anomalous patterns. However, ML algorithms not only fall short in agile identification of intrusion with few samples, but also fail in adapting to new data or environments. This paper introduces hyperdimensional computing (HDC) as a new cognitive computing paradigm that mimics brain functionality to detect intrusions in IIoT systems. HDC encodes real-time data into a high-dimensional representation, allowing for ultra-efficient learning and analysis with limited samples and a few passes. Additionally, we incorporate the concept of regenerating brain cells into hyperdimensional computing to further improve learning capability and reduce the required memory. Experimental results on the WUSTL-IIOT-2021 dataset show that HDC detects intrusion with the accuracy of 92.6%, which is superior to multi-layer perceptron (40.2%), support vector machine (72.9%), logistic regression (84.2%), and Gaussian process classification (89.1%) while requires only 300 data and 5 iterations for training.more » « less
-
null (Ed.)Recent self-propagating malware (SPM) campaigns compromised hundred of thousands of victim machines on the Internet. It is challenging to detect these attacks in their early stages, as adversaries utilize common network services, use novel techniques, and can evade existing detection mechanisms. We propose PORTFILER (PORT-Level Network Traffic ProFILER), a new machine learning system applied to network traffic for detecting SPM attacks. PORTFILER extracts port-level features from the Zeek connection logs collected at a border of a monitored network, applies anomaly detection techniques to identify suspicious events, and ranks the alerts across ports for investigation by the Security Operations Center (SOC). We propose a novel ensemble methodology for aggregating individual models in PORTFILER that increases resilience against several evasion strategies compared to standard ML baselines. We extensively evaluate PORTFILER on traffic collected from two university networks, and show that it can detect SPM attacks with different patterns, such as WannaCry and Mirai, and performs well under evasion. Ranking across ports achieves precision over 0.94 and false positive rates below 8 × 10−4 in the top 100 highly ranked alerts. When deployed on the university networks, PORTFILER detected anomalous SPM-like activity on one of the campus networks, confirmed by the university SOC as malicious. PORTFILER also detected a Mirai attack recreated on the two university networks with higher precision and recall than deep learning based autoencoder methods.more » « less
-
Rudzicz, Frank (Ed.)Manual surgical resection of soft tissue sarcoma tissue can involve many challenges, including the critical need for precise determination of tumor boundary with normal tissue and limitations of current surgical instrumentation, in addition to standard risks of infection or tissue healing difficulty. Substantial research has been conducted in the biomedical sensing landscape for development of non-human contact sensing devices. One such point-of-care platform, previously devised by our group, utilizes autofluorescence-based spectroscopic signatures to highlight important physiological differences in tumorous and healthy tissue. The following study builds on this work, implementing classification algorithms, including Artificial Neural Network, Support Vector Machine, Logistic Regression, and K-Nearest Neighbors, to diagnose freshly resected murine tissue as sarcoma or healthy. Classification accuracies of over 93% are achieved with Logistic Regression, and Area Under the Curve scores over 94% are achieved with Support Vector Machines, delineating a clear way to automate photonic diagnosis of ambiguous tissue in assistance of surgeons. These interpretable algorithms can also be linked to important physiological diagnostic indicators, unlike the black-box ANN architecture. This is the first known study to use machine learning to interpret data from a non-contact autofluorescence sensing device on sarcoma tissue, and has direct applications in rapid intraoperative sensing.more » « less
-
GPS spoofing attacks are a severe threat to unmanned aerial vehicles. These attacks manipulate the true state of the unmanned aerial vehicles, potentially misleading the system without raising alarms. Several techniques, including machine learning, have been proposed to detect these attacks. Most of the studies applied machine learning models without identifying the best hyperparameters, using feature selection and importance techniques, and ensuring that the used dataset is unbiased and balanced. However, no current studies have discussed the impact of model parameters and dataset characteristics on the performance of machine learning models; therefore, this paper fills this gap by evaluating the impact of hyperparameters, regularization parameters, dataset size, correlated features, and imbalanced datasets on the performance of six most commonly known machine learning techniques. These models are Classification and Regression Decision Tree, Artificial Neural Network, Random Forest, Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine. Thirteen features extracted from legitimate and simulated GPS attack signals are used to perform this investigation. The evaluation was performed in terms of four metrics: accuracy, probability of misdetection, probability of false alarm, and probability of detection. The results indicate that hyperparameters, regularization parameters, correlated features, dataset size, and imbalanced datasets adversely affect a machine learning model’s performance. The results also show that the Classification and Regression Decision Tree classifier has an accuracy of 99.99%, a probability of detection of 99.98%, a probability of misdetection of 0.2%, and a probability of false alarm of 1.005%, after removing correlated features and using tuned parameters in a balanced dataset. Random Forest can achieve an accuracy of 99.94%, a probability of detection of 99.6%, a probability of misdetection of 0.4%, and a probability of false alarm of 1.01% in similar conditions.more » « less
An official website of the United States government

