skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Shining New Light on Useful Features for Network Intrusion Detection Algorithms
Network intrusion detection systems (NIDS) today must quickly provide visibility into anomalous behavior on a growing amount of data. Meanwhile different data models have evolved over time, each providing a different set of features to classify attacks. Defenders have limited time to retrain classifiers, while the scale of data and feature mismatch between data models can affect the ability to periodically retrain. Much work has focused on classification accuracy yet feature selection is a key part of machine learning that, when optimized, reduces the training time and can increase accuracy by removing poorly performing features that introduce noise. With a larger feature space, the pursuit of more features is not as valuable as selecting better features. In this paper, we use an ensemble approach of filter methods to rank features followed by a voting technique to select a subset of features. We evaluate our approach using three datasets to show that, across datasets and network topologies, similar features have a trivial effect on classifier accuracy after removal. Our approach identifies poorly performing features to remove in a classifier-agnostic manner that can significantly save time for periodic retraining of production NIDS.  more » « less
Award ID(s):
2046705 2001789 2115134 1920462
PAR ID:
10322197
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2022 IEEE 19th Annual Consumer Communications Networking Conference (CCNC)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ayahiko Niimi, Future University-Hakodate (Ed.)
    Traditional Network Intrusion Detection Systems (NIDS) encounter difficulties due to the exponential growth of network traffic data and modern attacks' requirements. This paper presents a novel network intrusion classification framework using transfer learning from the VGG-16 pre-trained model. The framework extracts feature leveraging pre-trained weights trained on the ImageNet dataset in the initial step, and finally, applies a deep neural network to the extracted features for intrusion classification. We applied the presented framework on NSL-KDD, a benchmark dataset for network intrusion, to evaluate the proposed framework's performance. We also implemented other pre-trained models such as VGG19, MobileNet, ResNet-50, and Inception V3 to evaluate and compare performance. This paper also displays both binary classification (normal vs. attack) and multi-class classification (classifying types of attacks) for network intrusion detection. The experimental results show that feature extraction using VGG-16 outperforms other pre-trained models producing better accuracy, precision, recall, and false alarm rates. 
    more » « less
  2. Cappellato, Linda; Eickhoff, Carsten; Ferro, Nicola; Névéol, Aurélie (Ed.)
    This paper describes the approach we took to create a machine learning model for the PAN 2020 Authorship Verification Task. For each document pair, we extracted stylometric features from the documents and used the absolute difference between the feature vectors as input to our classifier. We created two models: a Logistic Regression Model trained on a small dataset, and a Neural Network based model trained on the large dataset. These models achieved AUCs of 0.939 and 0.953 on the small and large datasets, making them the second-best models on both datasets submitted to the shared task. 
    more » « less
  3. Abstract Photospheric magnetic field parameters are frequently used to analyze and predict solar events. Observation of these parameters over time, i.e., representing solar events by multivariate time-series (MVTS) data, can determine relationships between magnetic field states in active regions and extreme solar events, e.g., solar flares. We can improve our understanding of these events by selecting the most relevant parameters that give the highest predictive performance. In this study, we propose a two-step incremental feature selection method for MVTS data using a deep-learning model based on long short-term memory (LSTM) networks. First, each MVTS feature (magnetic field parameter) is evaluated individually by a univariate sequence classifier utilizing an LSTM network. Then, the top performing features are combined to produce input for an LSTM-based multivariate sequence classifier. Finally, we tested the discrimination ability of the selected features by training downstream classifiers, e.g., Minimally Random Convolutional Kernel Transform and support vector machine. We performed our experiments using a benchmark data set for flare prediction known as Space Weather Analytics for Solar Flares. We compared our proposed method with three other baseline feature selection methods and demonstrated that our method selects more discriminatory features compared to other methods. Due to the imbalanced nature of the data, primarily caused by the rarity of minority flare classes (e.g., the X and M classes), we used the true skill statistic as the evaluation metric. Finally, we reported the set of photospheric magnetic field parameters that give the highest discrimination performance in predicting flare classes. 
    more » « less
  4. Quantitative analysis of brain disorders such as Autism Spectrum Disorder (ASD) is an ongoing field of research. Machine learning and deep learning techniques have been playing an important role in automating the diagnosis of brain disorders by extracting discriminative features from the brain data. In this study, we propose a model called Auto-ASD-Network in order to classify subjects with Autism disorder from healthy subjects using only fMRI data. Our model consists of a multilayer perceptron (MLP) with two hidden layers. We use an algorithm called SMOTE for performing data augmentation in order to generate artificial data and avoid overfitting, which helps increase the classification accuracy. We further investigate the discriminative power of features extracted using MLP by feeding them to an SVM classifier. In order to optimize the hyperparameters of SVM, we use a technique called Auto Tune Models (ATM) which searches over the hyperparameter space to find the best values of SVM hyperparameters. Our model achieves more than 70% classification accuracy for 4 fMRI datasets with the highest accuracy of 80%. It improves the performance of SVM by 26%, the stand-alone MLP by 16% and the state of the art method in ASD classification by 14%. The implemented code will be available as GPL license on GitHub portal of our lab (https://github.com/PCDS). 
    more » « less
  5. Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models. 
    more » « less