skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Optimal Feature Selection for Decision Robustness in Bayesian Networks
In many applications, one can define a large set of features to support the classification task at hand. At test time, however, these become prohibitively expensive to evaluate, and only a small subset of features is used, often selected for their information-theoretic value. For threshold-based, Naive Bayes classifiers, recent work has suggested selecting features that maximize the expected robustness of the classifier, that is, the expected probability it maintains its decision after seeing more features. We propose the first algorithm to compute this expected same-decision probability for general Bayesian network classifiers, based on compiling the network into a tractable circuit representation. Moreover, we develop a search algorithm for optimal feature selection that utilizes efficient incremental circuit modifications. Experiments on Naive Bayes, as well as more general networks, show the efficacy and distinct behavior of this decision-making approach.  more » « less
Award ID(s):
1657613
PAR ID:
10053970
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI)
Page Range / eLocation ID:
1554 to 1560
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Human machine interfaces that can track head motion will result in advances in physical rehabilitation, improved augmented reality/virtual reality systems, and aid in the study of human behavior. This paper presents a head position monitoring and classification system using thin flexible strain sensing threads placed on the neck of an individual. A wireless circuit module consisting of impedance readout circuitry and a Bluetooth module records and transmits strain information to a computer. A data processing algorithm for motion recognition provides near real-time quantification of head position. Incoming data is filtered, normalized and divided into data segments. A set of features is extracted from each data segment and employed as input to nine classifiers including Support Vector Machine, Naive Bayes and KNN for position prediction. A testing accuracy of around 92% was achieved for a set of nine head orientations. Results indicate that this human machine interface platform is accurate, flexible, easy to use, and cost effective. 
    more » « less
  2. This work-in-progress research paper describes a study of different categorical data coding procedures for machine learning(ML) in engineering education. Often left out of methodology sections, preprocessing steps in data analysis can have important ramifications on project outcomes. In this study, we applied three different coding schemes (i.e., scalar conversion, one-hot encoding, and binary) for the categorical variable of Race across three different ML models (i.e., Neural Network, Random Forest, and Naive Bayes classifiers) looking at the four standard measures of ML classification models (i.e., accuracy, precision, recall, and F1-score). Results showed that, in general, the coding scheme did not affect predictive outcomes as much as ML model type did. However, one-hot encoding – the strategy of transforming a categorical variable with k possible values to k binary nodes, a common practice in educational research – does not work well with a Naive Bayes classifier model. Our results indicate that such sensitivity studies at the beginning of ML modeling projects are necessary. Future work includes performing a full range of sensitivity studies on our complete, grant-funded project dataset that has been collected, and publishing our findings. 
    more » « less
  3. This paper establishes the asymptotic consistency of theloss‐calibrated variational Bayes(LCVB) method. LCVB is a method for approximately computing Bayesian posterior approximations in a “loss aware” manner. This methodology is also highly relevant in general data‐driven decision‐making contexts. Here, we establish the asymptotic consistency of both the loss‐ calibrated approximate posterior and the resulting decision rules. We also establish the asymptotic consistency of decision rules obtained from a “naive” two‐stage procedure that first computes a standard variational Bayes approximation and then uses this in the decision‐making procedure. 
    more » « less
  4. Malicious attacks, malware, and ransomware families pose critical security issues to cybersecurity, and it may cause catastrophic damages to computer systems, data centers, web, and mobile applications across various industries and businesses. Traditional anti-ransomware systems struggle to fight against newly created sophisticated attacks. Therefore, state-of-the-art techniques like traditional and neural network-based architectures can be immensely utilized in the development of innovative ransomware solutions. In this paper, we present a feature selection-based framework with adopting different machine learning algorithms including neural network-based architectures to classify the security level for ransomware detection and prevention. We applied multiple machine learning algorithms: Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB), Logistic Regression (LR) as well as Neural Network (NN)-based classifiers on a selected number of features for ransomware classification. We performed all the experiments on one ransomware dataset to evaluate our proposed framework. The experimental results demonstrate that RF classifiers outperform other methods in terms of accuracy, F -beta, and precision scores. 
    more » « less
  5. The COVID-19 pandemic was a catalyst for many different trends in our daily life worldwide. While there has been an overall rise in cybercrime during this time, there has been relatively little research done about malicious COVID-19 themed AndroidOS applications. With the rise in reports of users falling victim to malicious COVID-19 themed AndroidOS applications, there is a need to learn about the detection of malware for pandemics-themed mobile apps.. In this project, we extracted the permissions requests from 1959 APK files from a dataset containing benign and malware COVID-19 themed apps. We then created and compared eight unique models of four varying classifiers to determine their ability to identify potentially malicious APK files based on the permissions the APK file requests: support vector machine, neural network, decision trees, and categorical naive bayes. These classifiers were then trained using Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset due to the lack of samples of malware compared to non-malware APKs. Finally, we evaluated the models using K-Fold Cross-Validation and found the decision tree classifier to be the best performing classifier. 
    more » « less