skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Second-Order Unsupervised Feature Selection via Knowledge Contrastive Distillation
Unsupervised feature selection aims to select a subset from the original features that are most useful for the downstream tasks without external guidance information. While most unsupervised feature selection methods focus on ranking features based on the intrinsic properties of data, most of them do not pay much attention to the relationships between features, which often leads to redundancy among the selected features. In this paper, we propose a two-stage Second-Order unsupervised Feature selection via knowledge contrastive disTillation (SOFT) model that incorporates the second-order covariance matrix with the first-order data matrix for unsupervised feature selection. In the first stage, we learn a sparse attention matrix that can represent second-order relations between features by contrastively distilling the intrinsic structure. In the second stage, we build a relational graph based on the learned attention matrix and perform graph segmentation. To this end, we conduct feature selection by only selecting one feature from each cluster to decrease the feature redundancy. Experimental results on 12 public datasets show that SOFT outperforms classical and recent state-of-the-art methods, which demonstrates the effectiveness of our proposed method. Moreover, we also provide rich in-depth experiments to further explore several key factors of SOFT.  more » « less
Award ID(s):
2223769 2006844 2144209
PAR ID:
10547053
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume:
45
Issue:
12
ISSN:
0162-8828
Page Range / eLocation ID:
15577 to 15587
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Most existing feature selection methods select the top-ranked features according to certain criterion. However, without considering the redundancy among the features, the selected ones are frequently highly correlated with each other, which is detrimental to the performance. To tackle this problem, we propose a framework regarding adaptive redundancy minimization (ARM) for the feature selection. Unlike other feature selection methods, the proposed model has the following merits: (1) The redundancy matrix is adaptively constructed instead of presetting it as the priori information. (2) The proposed model could pick out the discriminative and nonredundant features via minimizing the global redundancy of the features. (3) ARM can reduce the redundancy of the features from both supervised and unsupervised perspectives. 
    more » « less
  2. Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent over-fitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of LASSO for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features. 
    more » « less
  3. Hyperspectral imaging systems are becoming widely used due to their increasing accessibility and their ability to provide detailed spectral responses based on hundreds of spectral bands. However, the resulting hyperspectral images (HSIs) come at the cost of increased storage requirements, increased computational time to process, and highly redundant data. Thus, dimensionality reduction techniques are necessary to decrease the number of spectral bands while retaining the most useful information. Our contribution is two-fold: First, we propose a filter-based method called interband redundancy analysis (IBRA) based on a collinearity analysis between a band and its neighbors. This analysis helps to remove redundant bands and dramatically reduces the search space. Second, we apply a wrapper-based approach called greedy spectral selection (GSS) to the results of IBRA to select bands based on their information entropy values and train a compact convolutional neural network to evaluate the performance of the current selection. We also propose a feature extraction framework that consists of two main steps: first, it reduces the total number of bands using IBRA; then, it can use any feature extraction method to obtain the desired number of feature channels. We present classification results obtained from our methods and compare them to other dimensionality reduction methods on three hyperspectral image datasets. Additionally, we used the original hyperspectral data cube to simulate the process of using actual filters in a multispectral imager. 
    more » « less
  4. Noise and inconsistency commonly exist in real-world information networks, due to the inherent error-prone nature of human or user privacy concerns. To date, tremendous efforts have been made to advance feature learning from networks, including the most recent graph convolutional networks (GCNs) or attention GCN, by integrating node content and topology structures. However, all existing methods consider networks as error-free sources and treat feature content in each node as independent and equally important to model node relations. Noisy node content, combined with sparse features, provides essential challenges for existing methods to be used in real-world noisy networks. In this article, we propose feature-based attention GCN (FA-GCN), a feature-attention graph convolution learning framework, to handle networks with noisy and sparse node content. To tackle noise and sparse content in each node, FA-GCN first employs a long short-term memory (LSTM) network to learn dense representation for each node feature. To model interactions between neighboring nodes, a feature-attention mechanism is introduced to allow neighboring nodes to learn and vary feature importance, with respect to their connections. By using a spectral-based graph convolution aggregation process, each node is allowed to concentrate more on the most determining neighborhood features aligned with the corresponding learning task. Experiments and validations, w.r.t. different noise levels, demonstrate that FA-GCN achieves better performance than the state-of-the-art methods in both noise-free and noisy network environments. 
    more » « less
  5. Facial micro-expressions (MEs) refer to subtle, transient, and involuntary muscle movements expressing a per-son’s true feelings. This paper presents a novel two-stream relational edge-node graph attention network-based approach to classify MEs in a video by selecting the high-intensity frames and edge-node features that can provide valuable information about the relationship between nodes and structural information in a graph structure. The pa-per examines the impact of different edge-node features and their relationships on the graphs. The first step involves extracting high-intensity-emotion frames from the video using optical flow. Second, node feature embeddings are calculated using the node location coordinate features and the patch size information of the optical flow across each node location. Additionally, we obtain the global and local structural similarity score using the jaccard’s similarity score and radial basis function as the edge features. Third, a self-attention graph pooling layer helps to remove the nodes with lower attention scores based on the top-k selection. As the final step, the network employs a two-stream edge-node graph attention network that focuses on finding correlations among the edge and node features, such as landmark coordinates, optical flow, and global and local edge features. A three-frame graph structure is designed to obtain spatio-temporal information. For 3 and 5 expression classes, the results are compared for SMIC and CASME II databases. 
    more » « less