skip to main content


Title: Feature Reduction Method Comparison Towards Explainability and Efficiency in Cybersecurity Intrusion Detection Systems
In the realm of cybersecurity, intrusion detection systems (IDS) detect and prevent attacks based on collected computer and network data. In recent research, IDS models have been constructed using machine learning (ML) and deep learning (DL) methods such as Random Forest (RF) and deep neural networks (DNN). Feature selection (FS) can be used to construct faster, more interpretable, and more accurate models. We look at three different FS techniques; RF information gain (RF-IG), correlation feature selection using the Bat Algorithm (CFS-BA), and CFS using the Aquila Optimizer (CFS-AO). Our results show CFS-BA to be the most efficient of the FS methods, building in 55% of the time of the best RF-IG model while achieving 99.99% of its accuracy. This reinforces prior contributions attesting to CFS-BA’s accuracy while building upon the relationship between subset size, CFS score, and RF-IG score in final results.  more » « less
Award ID(s):
2100729
NSF-PAR ID:
10436879
Author(s) / Creator(s):
;
Date Published:
Journal Name:
21st IEEE ICMLA (International Conference on Machine Learning and Applications)
Page Range / eLocation ID:
1326-1333
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Previous literature shows that deep learning is an effective tool to decode the motor intent from neural signals obtained from different parts of the nervous system. However, deep neural networks are often computationally complex and not feasible to work in real-time. Here we investigate different approaches' advantages and disadvantages to enhance the deep learning-based motor decoding paradigm's efficiency and inform its future implementation in real-time. Our data are recorded from the amputee's residual peripheral nerves. While the primary analysis is offline, the nerve data is cut using a sliding window to create a “pseudo-online” dataset that resembles the conditions in a real-time paradigm. First, a comprehensive collection of feature extraction techniques is applied to reduce the input data dimensionality, which later helps substantially lower the motor decoder's complexity, making it feasible for translation to a real-time paradigm. Next, we investigate two different strategies for deploying deep learning models: a one-step (1S) approach when big input data are available and a two-step (2S) when input data are limited. This research predicts five individual finger movements and four combinations of the fingers. The 1S approach using a recurrent neural network (RNN) to concurrently predict all fingers' trajectories generally gives better prediction results than all the machine learning algorithms that do the same task. This result reaffirms that deep learning is more advantageous than classic machine learning methods for handling a large dataset. However, when training on a smaller input data set in the 2S approach, which includes a classification stage to identify active fingers before predicting their trajectories, machine learning techniques offer a simpler implementation while ensuring comparably good decoding outcomes to the deep learning ones. In the classification step, either machine learning or deep learning models achieve the accuracy and F1 score of 0.99. Thanks to the classification step, in the regression step, both types of models result in a comparable mean squared error (MSE) and variance accounted for (VAF) scores as those of the 1S approach. Our study outlines the trade-offs to inform the future implementation of real-time, low-latency, and high accuracy deep learning-based motor decoder for clinical applications. 
    more » « less
  2. null (Ed.)
    Deaf spaces are unique indoor environments designed to optimize visual communication and Deaf cultural expression. However, much of the technological research geared towards the deaf involve use of video or wearables for American sign language (ASL) translation, with little consideration for Deaf perspective on privacy and usability of the technology. In contrast to video, RF sensors offer the avenue for ambient ASL recognition while also preserving privacy for Deaf signers. Methods: This paper investigates the RF transmit waveform parameters required for effective measurement of ASL signs and their effect on word-level classification accuracy attained with transfer learning and convolutional autoencoders (CAE). A multi-frequency fusion network is proposed to exploit data from all sensors in an RF sensor network and improve the recognition accuracy of fluent ASL signing. Results: For fluent signers, CAEs yield a 20-sign classification accuracy of %76 at 77 GHz and %73 at 24 GHz, while at X-band (10 Ghz) accuracy drops to 67%. For hearing imitation signers, signs are more separable, resulting in a 96% accuracy with CAEs. Further, fluent ASL recognition accuracy is significantly increased with use of the multi-frequency fusion network, which boosts the 20-sign fluent ASL recognition accuracy to 95%, surpassing conventional feature level fusion by 12%. Implications: Signing involves finer spatiotemporal dynamics than typical hand gestures, and thus requires interrogation with a transmit waveform that has a rapid succession of pulses and high bandwidth. Millimeter wave RF frequencies also yield greater accuracy due to the increased Doppler spread of the radar backscatter. Comparative analysis of articulation dynamics also shows that imitation signing is not representative of fluent signing, and not effective in pre-training networks for fluent ASL classification. Deep neural networks employing multi-frequency fusion capture both shared, as well as sensor-specific features and thus offer significant performance gains in comparison to using a single sensor or feature-level fusion. 
    more » « less
  3. Traditional machine learning approaches for recognizing modes of transportation rely heavily on hand-crafted feature extraction methods which require domain knowledge. So, we propose a hybrid deep learning model: Deep Convolutional Bidirectional-LSTM (DCBL) which combines convolutional and bidirectional LSTM layers and is trained directly on raw sensor data to predict the transportation modes. We compare our model to the traditional machine learning approaches of training Support Vector Machines and Multilayer Perceptron models on extracted features. In our experiments, DCBL performs better than the feature selection methods in terms of accuracy and simplifies the data processing pipeline. The models are trained on the Sussex-Huawei Locomotion-Transportation (SHL) dataset. The submission of our team, Vahan, to SHL recognition challenge uses an ensemble of DCBL models trained on raw data using the different combination of sensors and window sizes and achieved an F1-score of 0.96 on our test data. 
    more » « less
  4. Limestone calcined clay cement (LC3) is a sustainable alternative to ordinary Portland cement, capable of reducing the binder’s carbon footprint by 40% while satisfying all key performance metrics. The inherent compositional heterogeneity in select components of LC3, combined with their convoluted chemical interactions, poses challenges to conventional analytical models when predicting mechanical properties. Although some studies have employed machine learning (ML) to predict the mechanical properties of LC3, many have overlooked the pivotal role of feature selection. Proper feature selection not only refines and simplifies the structure of ML models but also enhances these models’ prediction performance and interpretability. This research harnesses the power of the random forest (RF) model to predict the compressive strength of LC3. Three feature reduction methods—Pearson correlation, SHapley Additive exPlanations, and variable importance—are employed to analyze the influence of LC3 components and mixture design on compressive strength. Practical guidelines for utilizing these methods on cementitious materials are elucidated. Through the rigorous screening of insignificant variables from the database, the RF model conserves computational resources while also producing high-fidelity predictions. Additionally, a feature enhancement method is utilized, consolidating numerous input variables into a singular feature while feeding the RF model with richer information, resulting in a substantial improvement in prediction accuracy. Overall, this study provides a novel pathway to apply ML to LC3, emphasizing the need to tailor ML models to cement chemistry rather than employing them generically.

     
    more » « less
  5. Distributed denial-of-service (DDoS) attack is a malicious cybersecurity attack that has become a global threat. Machine learning (ML) as an advanced technology has been proven to be an effective way against DDoS attacks. Feature selection is a crucial step in ML, and researchers have put endless efforts to mitigate the “Curse of Dimensionality”. Feature selection is also causing problems to ML models, such as a decrease in prediction accuracy. Four supervised classification techniques, namely, Decision Tree (DT), k-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF), are tested using mutual information score ranking to study the necessity of feature selection in DDoS detection. 
    more » « less