skip to main content

Title: Adaptive Feature Redundancy Minimization
Most existing feature selection methods select the top-ranked features according to certain criterion. However, without considering the redundancy among the features, the selected ones are frequently highly correlated with each other, which is detrimental to the performance. To tackle this problem, we propose a framework regarding adaptive redundancy minimization (ARM) for the feature selection. Unlike other feature selection methods, the proposed model has the following merits: (1) The redundancy matrix is adaptively constructed instead of presetting it as the priori information. (2) The proposed model could pick out the discriminative and nonredundant features via minimizing the global redundancy of the features. (3) ARM can reduce the redundancy of the features from both supervised and unsupervised perspectives.
; ;
Award ID(s):
1947135 1651203 1715385 2003924
Publication Date:
Journal Name:
Page Range or eLocation-ID:
2417 to 2420
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Sensitivity analysis is a popular feature selection approach employed to identify the important features in a dataset. In sensitivity analysis, each input feature is perturbed one-at-a-time and the response of the machine learning model is examined to determine the feature's rank. Note that the existing perturbation techniques may lead to inaccurate feature ranking due to their sensitivity to perturbation parameters. This study proposes a novel approach that involves the perturbation of input features using a complex-step. The implementation of complex-step perturbation in the framework of deep neural networks as a feature selection method is provided in this paper, and its efficacy in determining important features for real-world datasets is demonstrated. Furthermore, the filter-based feature selection methods are employed, and the results obtained from the proposed method are compared. While the results obtained for the classification task indicated that the proposed method outperformed other feature ranking methods, in the case of the regression task, it was found to perform more or less similar to that of other feature ranking methods.

  2. Abstract Background Drug sensitivity prediction and drug responsive biomarker selection on high-throughput genomic data is a critical step in drug discovery. Many computational methods have been developed to serve this purpose including several deep neural network models. However, the modular relations among genomic features have been largely ignored in these methods. To overcome this limitation, the role of the gene co-expression network on drug sensitivity prediction is investigated in this study. Methods In this paper, we first introduce a network-based method to identify representative features for drug response prediction by using the gene co-expression network. Then, two graph-based neural network models are proposed and both models integrate gene network information directly into neural network for outcome prediction. Next, we present a large-scale comparative study among the proposed network-based methods, canonical prediction algorithms (i.e., Elastic Net, Random Forest, Partial Least Squares Regression, and Support Vector Regression), and deep neural network models for drug sensitivity prediction. All the source code and processed datasets in this study are available at . Results In the comparison of different feature selection methods and prediction methods on a non-small cell lung cancer (NSCLC) cell line RNA-seq gene expression dataset with 50 different drug treatments, wemore »found that (1) the network-based feature selection method improves the prediction performance compared to Pearson correlation coefficients; (2) Random Forest outperforms all the other canonical prediction algorithms and deep neural network models; (3) the proposed graph-based neural network models show better prediction performance compared to deep neural network model; (4) the prediction performance is drug dependent and it may relate to the drug’s mechanism of action. Conclusions Network-based feature selection method and prediction models improve the performance of the drug response prediction. The relations between the genomic features are more robust and stable compared to the correlation between each individual genomic feature and the drug response in high dimension and low sample size genomic datasets.« less
  3. Multimodal data fusion is one of the current primary neuroimaging research directions to overcome the fundamental limitations of individual modalities by exploiting complementary information from different modalities. Electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) are especially compelling modalities due to their potentially complementary features reflecting the electro-hemodynamic characteristics of neural responses. However, the current multimodal studies lack a comprehensive systematic approach to properly merge the complementary features from their multimodal data. Identifying a systematic approach to properly fuse EEG-fNIRS data and exploit their complementary potential is crucial in improving performance. This paper proposes a framework for classifying fused EEG-fNIRS data at the feature level, relying on a mutual information-based feature selection approach with respect to the complementarity between features. The goal is to optimize the complementarity, redundancy and relevance between multimodal features with respect to the class labels as belonging to a pathological condition or healthy control. Nine amyotrophic lateral sclerosis (ALS) patients and nine controls underwent multimodal data recording during a visuo-mental task. Multiple spectral and temporal features were extracted and fed to a feature selection algorithm followed by a classifier, which selected the optimized subset of features through a cross-validation process. The results demonstrated considerably improved hybrid classificationmore »performance compared to the individual modalities and compared to conventional classification without feature selection, suggesting a potential efficacy of our proposed framework for wider neuro-clinical applications.

    « less
  4. Background: Machine learning is a promising tool for biomarker-based diagnosis of Alzheimer’s disease (AD). Performing multimodal feature selection and studying the interaction between biological and clinical AD can help to improve the performance of the diagnosis models. Objective: This study aims to formulate a feature ranking metric based on the mutual information index to assess the relevance and redundancy of regional biomarkers and improve the AD classification accuracy. Methods: From the Alzheimer’s Disease Neuroimaging Initiative (ADNI), 722 participants with three modalities, including florbetapir-PET, flortaucipir-PET, and MRI, were studied. The multivariate mutual information metric was utilized to capture the redundancy and complementarity of the predictors and develop a feature ranking approach. This was followed by evaluating the capability of single-modal and multimodal biomarkers in predicting the cognitive stage. Results: Although amyloid-β deposition is an earlier event in the disease trajectory, tau PET with feature selection yielded a higher early-stage classification F1-score (65.4%) compared to amyloid-β PET (63.3%) and MRI (63.2%). The SVC multimodal scenario with feature selection improved the F1-score to 70.0% and 71.8% for the early and late-stage, respectively. When age and risk factors were included, the scores improved by 2 to 4%. The Amyloid-Tau-Neurodegeneration [AT(N)] framework helped to interpretmore »the classification results for different biomarker categories. Conclusion: The results underscore the utility of a novel feature selection approach to reduce the dimensionality of multimodal datasets and enhance model performance. The AT(N) biomarker framework can help to explore the misclassified cases by revealing the relationship between neuropathological biomarkers and cognition.« less
  5. Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide k-mers have been proposed for presenting DNA sequences to the model. However, there are trade-offs between interpretability, computational complexity and accuracy for different feature extraction methods. In this study, we have proposed a new feature extraction method, counting amino acid k-mers or oligopeptides, which provides easier model interpretation compared to counting nucleotide k-mers and reaches the same or even better accuracy in comparison with different methods. Additionally, we have trained machine learning algorithms using different feature extraction methods and compared the results in terms of accuracy, model interpretability and computational complexity. We have built a new feature selection pipeline for extraction of important features so that new AMR determinants can be discovered by analyzing these features. This pipeline allows the construction of models that only use a small number of features and can predict resistance accurately.