skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Network Intrusion Detection and Machine Learning
Overall, this document will serve as an analysis of the combination between machine learning principles and computer network analysis in their ability to detect a network anomaly, such as a network attack. The research provided in this document will highlight the key elements of network analysis and provide an overview of common network analysis techniques. Specifically, this document will highlight a study conducted by the University of Luxembourg and an attempt to recreate the study with a slightly different list of parameters against a different dataset for network anomaly detection using NetFlow data. Alongside network analysis, is the emerging field of machine learning. This document will be investigating common machine learning techniques and implement a support vector machine algorithm to detect anomaly and intrusion within the network. MatLab was an utilized machine learning tool for identifying how to coordinate network analysis data with Support Vector Machines. The resulting graphs represent tests conducted using Support vector machines in a method similar to that of the University of Luxembourg. The difference between the tests is within the metrics used for anomaly detection. The University of Luxembourg utilized the IP addresses and the volume of traffic of a specific NetFlow dataset. The resulting graphs utilize a metric based on the duration of transmitted bytes, and the ratio of the incoming and outgoing bytes during the transmission. The algorithm created and defined metrics proved to not be as efficient as planned against the NetFlow dataset. The use of the conducted tests did not provide a clear classification of an anomaly. However, many other factors contributing to network anomalies were highlighted.  more » « less
Award ID(s):
1754054
PAR ID:
10284704
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ADMI 2021: The Symposium of Computing at Minority Institutions
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Recent self-propagating malware (SPM) campaigns compromised hundred of thousands of victim machines on the Internet. It is challenging to detect these attacks in their early stages, as adversaries utilize common network services, use novel techniques, and can evade existing detection mechanisms. We propose PORTFILER (PORT-Level Network Traffic ProFILER), a new machine learning system applied to network traffic for detecting SPM attacks. PORTFILER extracts port-level features from the Zeek connection logs collected at a border of a monitored network, applies anomaly detection techniques to identify suspicious events, and ranks the alerts across ports for investigation by the Security Operations Center (SOC). We propose a novel ensemble methodology for aggregating individual models in PORTFILER that increases resilience against several evasion strategies compared to standard ML baselines. We extensively evaluate PORTFILER on traffic collected from two university networks, and show that it can detect SPM attacks with different patterns, such as WannaCry and Mirai, and performs well under evasion. Ranking across ports achieves precision over 0.94 and false positive rates below 8 × 10−4 in the top 100 highly ranked alerts. When deployed on the university networks, PORTFILER detected anomalous SPM-like activity on one of the campus networks, confirmed by the university SOC as malicious. PORTFILER also detected a Mirai attack recreated on the two university networks with higher precision and recall than deep learning based autoencoder methods. 
    more » « less
  2. We describe the outcome of a data challenge conducted as part of the Dark Machines (https://www.darkmachines.org) initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims to detect signals of new physics at the Large Hadron Collider (LHC) using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 billion simulated LHC events corresponding to 10\, fb^{-1} 10 f b − 1 of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge. 
    more » « less
  3. Abstract Although the connectivity offered by industrial internet of things (IIoT) enables enhanced operational capabilities, the exposure of systems to significant cybersecurity risks poses critical challenges. Recently, machine learning (ML) algorithms such as feature-based support vector machines and logistic regression, together with end-to-end deep neural networks, have been implemented to detect intrusions, including command injection, denial of service, reconnaissance, and backdoor attacks, by capturing anomalous patterns. However, ML algorithms not only fall short in agile identification of intrusion with few samples, but also fail in adapting to new data or environments. This paper introduces hyperdimensional computing (HDC) as a new cognitive computing paradigm that mimics brain functionality to detect intrusions in IIoT systems. HDC encodes real-time data into a high-dimensional representation, allowing for ultra-efficient learning and analysis with limited samples and a few passes. Additionally, we incorporate the concept of regenerating brain cells into hyperdimensional computing to further improve learning capability and reduce the required memory. Experimental results on the WUSTL-IIOT-2021 dataset show that HDC detects intrusion with the accuracy of 92.6%, which is superior to multi-layer perceptron (40.2%), support vector machine (72.9%), logistic regression (84.2%), and Gaussian process classification (89.1%) while requires only 300 data and 5 iterations for training. 
    more » « less
  4. Meyendorf, Norbert G.; Farhangdoust, Saman (Ed.)
    Network intrusion detection systems (NIDS) for Internet-of-Things (IoT) infrastructure are among the most critical tools to ensure the protection and security of networks against malicious cyberattacks. This paper employs four machine learning algorithms and evaluates their performance in NIDS considering the accuracy, precision, recall, and F-score. The comparative analysis conducted using the CICIDS2017 dataset reveals that the Boosted machine learning techniques perform better than the other algorithms reaching the predicted accuracy of above 99% in detecting cyberattacks. Such ML-based attack detectors also have the largest weighted metrics of F1-score, precision, and recall. The results assist the network engineers in choosing the most effective machine learning-based NIDS to ensure network security for today’s growing IoT network traffic. 
    more » « less
  5. GPS spoofing attacks are a severe threat to unmanned aerial vehicles. These attacks manipulate the true state of the unmanned aerial vehicles, potentially misleading the system without raising alarms. Several techniques, including machine learning, have been proposed to detect these attacks. Most of the studies applied machine learning models without identifying the best hyperparameters, using feature selection and importance techniques, and ensuring that the used dataset is unbiased and balanced. However, no current studies have discussed the impact of model parameters and dataset characteristics on the performance of machine learning models; therefore, this paper fills this gap by evaluating the impact of hyperparameters, regularization parameters, dataset size, correlated features, and imbalanced datasets on the performance of six most commonly known machine learning techniques. These models are Classification and Regression Decision Tree, Artificial Neural Network, Random Forest, Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine. Thirteen features extracted from legitimate and simulated GPS attack signals are used to perform this investigation. The evaluation was performed in terms of four metrics: accuracy, probability of misdetection, probability of false alarm, and probability of detection. The results indicate that hyperparameters, regularization parameters, correlated features, dataset size, and imbalanced datasets adversely affect a machine learning model’s performance. The results also show that the Classification and Regression Decision Tree classifier has an accuracy of 99.99%, a probability of detection of 99.98%, a probability of misdetection of 0.2%, and a probability of false alarm of 1.005%, after removing correlated features and using tuned parameters in a balanced dataset. Random Forest can achieve an accuracy of 99.94%, a probability of detection of 99.6%, a probability of misdetection of 0.4%, and a probability of false alarm of 1.01% in similar conditions. 
    more » « less