skip to main content


Title: On–The–Fly Feature Selection and Classification with Application to Civic Engagement Platforms
Online feature selection and classification is crucial for time sensitive decision making. Existing work however either assumes that features are independent or produces a fixed number of features for classification. Instead, we propose an optimal framework to perform joint feature selection and classification on-the-fly while relaxing the assumption on feature independence. The effectiveness of the proposed approach is showed by classifying urban issue reports on the SeeClickFix civic engagement platform. A significant reduction in the average number of features used is observed without a drop in the classification accuracy.  more » « less
Award ID(s):
1737443
NSF-PAR ID:
10196234
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Page Range / eLocation ID:
3762 to 3766
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract: Deep Learning (DL) has made significant changes to a large number of research areas in recent decades. For example, several astonishing Convolutional Neural Network (CNN) models have been built by researchers to fulfill image classification needs using large-scale visual datasets successfully. Transfer Learning (TL) makes use of those pre-trained models to ease the feature learning process for other target domains that contain a smaller amount of training data. Currently, there are numerous ways to utilize features generated by transfer learning. Pre-trained CNN models prepare mid-/high-level features to work for different targeting problem domains. In this paper, a DL feature and model selection framework based on evolutionary programming is proposed to solve the challenges in visual data classification. It automates the process of discovering and obtaining the most representative features generated by the pre-trained DL models for different classification tasks. 
    more » « less
  2. Cite: Abulfaz Hajizada and Sharmin Jahan. 2023. Feature Selections for Phishing URLs Detection Using Combination of Multiple Feature Selection Methods. In 2023 15th International Conference on Machine Learning and Computing (ICMLC 2023), February 17–20, 2023, Zhuhai, China. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3587716.3587790 ABSTRACT In this internet era, we are very prone to fall under phishing attacks where attackers apply social engineering to persuade and manipulate the user. The core attack target is to steal users’ sensitive information or install malicious software to get control over users’ devices. Attackers use different approaches to persuade the user. However, one of the common approaches is sending a phishing URL to the user that looks legitimate and difficult to distinguish. Machine learning is a prominent approach used for phishing URLs detection. There are already some established machine learning models available for this purpose. However, the model’s performance depends on the appropriate selection of features during model building. In this paper, we combine multiple filter methods for feature selections in a procedural way that allows us to reduce a large number of feature list into a reduced number of the feature list. Then we finally apply the wrapper method to select the features for building our phishing detection model. The result shows that combining multiple feature selection methods improves the model’s detection accuracy. Moreover, since we apply the backward feature selection method as our wrapper method on the data set with a reduced number of features, the computational time for backward feature selection gets faster. 
    more » « less
  3. Abstract

    Sensitivity analysis is a popular feature selection approach employed to identify the important features in a dataset. In sensitivity analysis, each input feature is perturbed one-at-a-time and the response of the machine learning model is examined to determine the feature's rank. Note that the existing perturbation techniques may lead to inaccurate feature ranking due to their sensitivity to perturbation parameters. This study proposes a novel approach that involves the perturbation of input features using a complex-step. The implementation of complex-step perturbation in the framework of deep neural networks as a feature selection method is provided in this paper, and its efficacy in determining important features for real-world datasets is demonstrated. Furthermore, the filter-based feature selection methods are employed, and the results obtained from the proposed method are compared. While the results obtained for the classification task indicated that the proposed method outperformed other feature ranking methods, in the case of the regression task, it was found to perform more or less similar to that of other feature ranking methods.

     
    more » « less
  4. Multimodal data fusion is one of the current primary neuroimaging research directions to overcome the fundamental limitations of individual modalities by exploiting complementary information from different modalities. Electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) are especially compelling modalities due to their potentially complementary features reflecting the electro-hemodynamic characteristics of neural responses. However, the current multimodal studies lack a comprehensive systematic approach to properly merge the complementary features from their multimodal data. Identifying a systematic approach to properly fuse EEG-fNIRS data and exploit their complementary potential is crucial in improving performance. This paper proposes a framework for classifying fused EEG-fNIRS data at the feature level, relying on a mutual information-based feature selection approach with respect to the complementarity between features. The goal is to optimize the complementarity, redundancy and relevance between multimodal features with respect to the class labels as belonging to a pathological condition or healthy control. Nine amyotrophic lateral sclerosis (ALS) patients and nine controls underwent multimodal data recording during a visuo-mental task. Multiple spectral and temporal features were extracted and fed to a feature selection algorithm followed by a classifier, which selected the optimized subset of features through a cross-validation process. The results demonstrated considerably improved hybrid classification performance compared to the individual modalities and compared to conventional classification without feature selection, suggesting a potential efficacy of our proposed framework for wider neuro-clinical applications.

     
    more » « less
  5. Abstract

    Machine learning methods have helped to advance wide range of scientific and technological field in recent years, including computational chemistry. As the chemical systems could become complex with high dimension, feature selection could be critical but challenging to develop reliable machine learning based prediction models, especially for proteins as bio‐macromolecules. In this study, we applied sparse group lasso (SGL) method as a general feature selection method to develop classification model for an allosteric protein in different functional states. This results into a much improved model with comparable accuracy (Acc) and only 28 selected features comparing to 289 selected features from a previous study. The Acc achieves 91.50% with 1936 selected feature, which is far higher than that of baseline methods. In addition, grouping protein amino acids into secondary structures provides additional interpretability of the selected features. The selected features are verified as associated with key allosteric residues through comparison with both experimental and computational works about the model protein, and demonstrate the effectiveness and necessity of applying rigorous feature selection and evaluation methods on complex chemical systems.

     
    more » « less