skip to main content


Title: Denial of Service (DoS) Attack Detection: Performance Comparison of Supervised Machine Learning Algorithms
Denial of Service (DoS) is one of the common attempts in security hacking for making computation resources unavailable or to impair geographical networks. In this paper, we detect Denial of Service (DoS) attack from publicly available datasets using Logistic regression, Naive Bayes algorithm and artificial neural networks. The results from our experiments indicate that the accuracy, ROC curve and balanced accuracy of artificial neural network were higher than Naive Bayes algorithm and logistic regression for slightly imbalanced distribution dataset.  more » « less
Award ID(s):
1723578
NSF-PAR ID:
10273177
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proc. of 18th IEEE International Conference on Dependable, Autonomic and Secure Computing
Page Range / eLocation ID:
469 to 474
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Denial of Service (DoS) is one of the common attempts in security hacking for making computation resources unavailable or to impair geographical networks. In this paper, we detect Denial of Service (DoS) attack from publicly available datasets using Logistic regression, Naive Bayes algorithm and artificial neural networks. The results from our experiments indicate that the accuracy, ROC curve and balanced accuracy of artificial neural network were higher than Naive Bayes algorithm and logistic regression for slightly imbalanced distribution dataset. 
    more » « less
  2. Although much of the work in behaviorally detecting malware lies in collecting the best explanatory data and using the most efficacious machine learning models, the processing of the data can sometimes prove to be the most important step in the data pipeline. In this work, we collect kernel-level system calls on a resource-constrained Internet of Things (IoT) device, apply lightweight Natural Language Processing (NLP) techniques to the data, and feed this processed data to two simple machine learning classification models: Logistic Regression (LR) and a Neural Network (NN). For the data processing, we group the system calls into n-grams that are sorted by the timestamp in which they are recorded. To demonstrate the effectiveness, or lack thereof, of using n-grams, we deploy two types of malware onto the IoT device: a Denial-of-Service (DoS) attack, and an Advanced Persistent Threat (APT) malware. We examine the effects of using lightweight NLP on malware like the DoS and the stealthy APT malware. For stealthier malware, such as the APT, using more advanced, but far more resource-intensive, NLP techniques will likely increase detection capability, which is saved for future work. 
    more » « less
  3. Abstract

    Previous work using logistic regression suggests that cognitive control‐related frontoparietal activation in early psychosis can predict symptomatic improvement after 1 year of coordinated specialty care with 66% accuracy. Here, we evaluated the ability of six machine learning (ML) algorithms and deep learning (DL) to predict “Improver” status (>20% improvement on Brief Psychiatric Rating Scale [BPRS] total score at 1‐year follow‐up vs. baseline) and continuous change in BPRS score using the same functional magnetic resonance imaging‐based features (frontoparietal activations during the AX‐continuous performance task) in the same sample (individuals with either schizophrenia (n =65, 49M/16F, mean age 20.8 years) or Type I bipolar disorder (n= 17, 9M/8F, mean age 21.6 years)). 138 healthy controls were included as a reference group. “Shallow” ML methods included Naive Bayes, support vector machine, K Star, AdaBoost, J48 decision tree, and random forest. DL included an explainable artificial intelligence (XAI) procedure for understanding results. The best overall performances (70% accuracy for the binary outcome and root mean square error = 9.47 for the continuous outcome) were achieved using DL. XAI revealed left DLPFC activation was the strongest feature used to make binary classification decisions, with a classification activation threshold (adjusted beta = .017) intermediate to the healthy control mean (adjusted beta = .15, 95% CI = −0.02 to 0.31) and patient mean (adjusted beta = −.13, 95% CI = −0.37 to 0.11). Our results suggest DL is more powerful than shallow ML methods for predicting symptomatic improvement. The left DLPFC may be a functional target for future biomarker development as its activation was particularly important for predicting improvement.

     
    more » « less
  4. GPS spoofing attacks are a severe threat to unmanned aerial vehicles. These attacks manipulate the true state of the unmanned aerial vehicles, potentially misleading the system without raising alarms. Several techniques, including machine learning, have been proposed to detect these attacks. Most of the studies applied machine learning models without identifying the best hyperparameters, using feature selection and importance techniques, and ensuring that the used dataset is unbiased and balanced. However, no current studies have discussed the impact of model parameters and dataset characteristics on the performance of machine learning models; therefore, this paper fills this gap by evaluating the impact of hyperparameters, regularization parameters, dataset size, correlated features, and imbalanced datasets on the performance of six most commonly known machine learning techniques. These models are Classification and Regression Decision Tree, Artificial Neural Network, Random Forest, Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine. Thirteen features extracted from legitimate and simulated GPS attack signals are used to perform this investigation. The evaluation was performed in terms of four metrics: accuracy, probability of misdetection, probability of false alarm, and probability of detection. The results indicate that hyperparameters, regularization parameters, correlated features, dataset size, and imbalanced datasets adversely affect a machine learning model’s performance. The results also show that the Classification and Regression Decision Tree classifier has an accuracy of 99.99%, a probability of detection of 99.98%, a probability of misdetection of 0.2%, and a probability of false alarm of 1.005%, after removing correlated features and using tuned parameters in a balanced dataset. Random Forest can achieve an accuracy of 99.94%, a probability of detection of 99.6%, a probability of misdetection of 0.4%, and a probability of false alarm of 1.01% in similar conditions. 
    more » « less
  5. GPS spoofing attacks are a severe threat to unmanned aerial vehicles. These attacks manipulate the true state of the unmanned aerial vehicles, potentially misleading the system without raising alarms. Several techniques, including machine learning, have been proposed to detect these attacks. Most of the studies applied machine learning models without identifying the best hyperparameters, using feature selection and importance techniques, and ensuring that the used dataset is unbiased and balanced. However, no current studies have discussed the impact of model parameters and dataset characteristics on the performance of machine learning models; therefore, this paper fills this gap by evaluating the impact of hyperparameters, regularization parameters, dataset size, correlated features, and imbalanced datasets on the performance of six most commonly known machine learning techniques. These models are Classification and Regression Decision Tree, Artificial Neural Network, Random Forest, Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine. Thirteen features extracted from legitimate and simulated GPS attack signals are used to perform this investigation. The evaluation was performed in terms of four metrics: accuracy, probability of misdetection, probability of false alarm, and probability of detection. The results indicate that hyperparameters, regularization parameters, correlated features, dataset size, and imbalanced datasets adversely affect a machine learning model’s performance. The results also show that the Classification and Regression Decision Tree classifier has an accuracy of 99.99%, a probability of detection of 99.98%, a probability of misdetection of 0.2%, and a probability of false alarm of 1.005%, after removing correlated features and using tuned parameters in a balanced dataset. Random Forest can achieve an accuracy of 99.94%, a probability of detection of 99.6%, a probability of misdetection of 0.4%, and a probability of false alarm of 1.01% in similar conditions. 
    more » « less