Network intrusion detection systems (NIDS) today must quickly provide visibility into anomalous behavior on a growing amount of data. Meanwhile different data models have evolved over time, each providing a different set of features to classify attacks. Defenders have limited time to retrain classifiers, while the scale of data and feature mismatch between data models can affect the ability to periodically retrain. Much work has focused on classification accuracy yet feature selection is a key part of machine learning that, when optimized, reduces the training time and can increase accuracy by removing poorly performing features that introduce noise. With a larger feature space, the pursuit of more features is not as valuable as selecting better features. In this paper, we use an ensemble approach of filter methods to rank features followed by a voting technique to select a subset of features. We evaluate our approach using three datasets to show that, across datasets and network topologies, similar features have a trivial effect on classifier accuracy after removal. Our approach identifies poorly performing features to remove in a classifier-agnostic manner that can significantly save time for periodic retraining of production NIDS.
more »
« less
Feature-based data assimilation in geophysics
Abstract. Many applications in science require that computational models and data becombined. In a Bayesian framework, this is usually done by defininglikelihoods based on the mismatch of model outputs and data. However,matching model outputs and data in this way can be unnecessary or impossible.For example, using large amounts of steady state data is unnecessary becausethese data are redundant. It is numerically difficult to assimilate data inchaotic systems. It is often impossible to assimilate data of a complexsystem into a low-dimensional model. As a specific example, consider alow-dimensional stochastic model for the dipole of the Earth's magneticfield, while other field components are ignored in the model. The aboveissues can be addressed by selecting features of the data, and defininglikelihoods based on the features, rather than by the usual mismatch of modeloutput and data. Our goal is to contribute to a fundamental understanding ofsuch a feature-based approach that allows us to assimilate selected aspectsof data into models. We also explain how the feature-based approach can beinterpreted as a method for reducing an effective dimension and derive newnoise models, based on perturbed observations, that lead to computationallyefficient solutions. Numerical implementations of our ideas are illustratedin four examples.
more »
« less
- Award ID(s):
- 1740858
- PAR ID:
- 10111644
- Date Published:
- Journal Name:
- Nonlinear Processes in Geophysics
- Volume:
- 25
- Issue:
- 2
- ISSN:
- 1607-7946
- Page Range / eLocation ID:
- 355 to 374
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider a stochastic differential equation model for Earth's axial magnetic dipole field. The model's parameters are estimated using diverse and independent data sources that had previously been treated separately. The result is a numerical model that is informed by the full paleomagnetic record on kyr to Myr time scales and whose outputs match data of Earth's dipole in a precisely defined feature-based sense. Specifically, we compute model parameters and associated uncertainties that lead to model outputs that match spectral data of Earth's axial magnetic dipole field but our approach also reveals difficulties with simultaneously matching spectral data and reversal rates. This could be due to model deficiencies or inaccuracies in the limited amount of data. More generally, the approach we describe can be seen as an example of an effective strategy for combining diverse data sets that is particularly useful when the amount of data is limited.more » « less
-
Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and extremely high-dimensional features, solving sparse SVMs remains challenging. By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified inactive samples and features from the training phase, leading to substantial savings in both the memory usage and computational cost without sacrificing accuracy. To the best of our knowledge, the proposed method is the first static feature and sample reduction method for sparse SVMs. Experiments on both synthetic and real datasets (e.g., the kddb dataset with about 20 million samples and 30 million features) demonstrate that our approach significantly outperforms state-of-the-art methods and the speedup gained by our approach can be orders of magnitude.more » « less
-
Embedding is widely used in recommendation models to learn feature representations. However, the traditional embedding technique that assigns a fixed size to all categorical features may be suboptimal due to the following reasons. In recommendation domain, the majority of categorical features' embeddings can be trained with less capacity without impacting model performance, thereby storing embeddings with equal length may incur unnecessary memory usage. Existing work that tries to allocate customized sizes for each feature usually either simply scales the embedding size with feature's popularity or formulates this size allocation problem as an architecture selection problem. Unfortunately, most of these methods either have large performance drop or incur significant extra time cost for searching proper embedding sizes. In this article, instead of formulating the size allocation problem as an architecture selection problem, we approach the problem from a pruning perspective and proposePruning-basedMulti-sizeEmbedding (PME) framework. During the search phase, we prune the dimensions that have the least impact on model performance in the embedding to reduce its capacity. Then, we show that the customized size of each token can be obtained by transferring the capacity of its pruned embedding with significant less search cost. Experimental results validate that PME can efficiently find proper sizes and hence achieve strong performance while significantly reducing the number of parameters in the embedding layer.more » « less
-
null (Ed.)Abstract: Deep Learning (DL) has made significant changes to a large number of research areas in recent decades. For example, several astonishing Convolutional Neural Network (CNN) models have been built by researchers to fulfill image classification needs using large-scale visual datasets successfully. Transfer Learning (TL) makes use of those pre-trained models to ease the feature learning process for other target domains that contain a smaller amount of training data. Currently, there are numerous ways to utilize features generated by transfer learning. Pre-trained CNN models prepare mid-/high-level features to work for different targeting problem domains. In this paper, a DL feature and model selection framework based on evolutionary programming is proposed to solve the challenges in visual data classification. It automates the process of discovering and obtaining the most representative features generated by the pre-trained DL models for different classification tasks.more » « less
An official website of the United States government

