skip to main content


Title: Sparse feature selection in kernel discriminant analysis via optimal scoring
We consider the two-group classification problem and propose a kernel classifier based on the optimal scoring framework. Unlike previous approaches, we provide theoretical guarantees on the expected risk consistency of the method. We also allow for feature selection by imposing structured sparsity using weighted kernels. We propose fully-automated methods for selection of all tuning parameters, and in particular adapt kernel shrinkage ideas for ridge parameter selection. Numerical studies demonstrate the superior classification performance of the proposed approach compared to existing nonparametric classifiers.  more » « less
Award ID(s):
1712943
NSF-PAR ID:
10098032
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
89
ISSN:
2640-3498
Page Range / eLocation ID:
1704-1713
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider a general formulation of the multiple change-point problem, in which the data is assumed to belong to a set equipped with a positive semidefinite kernel. We propose a model-selection penalty allowing to select the number of change points in Harchaoui and Cappe's kernel-based change-point detection method. The model-selection penalty generalizes non-asymptotic model-selection penalties for the change-in-mean problem with univariate data. We prove a non-asymptotic oracle inequality for the resulting kernel-based change-point detection method, whatever the unknown number of change points, thanks to a concentration result for Hilbert-space valued random variables which may be of independent interest. Experiments on synthetic and real data illustrate the proposed method, demonstrating its ability to detect subtle changes in the distribution of data. 
    more » « less
  2. Motivated by mobile devices that record data at a high frequency, we propose a new methodological framework for analyzing a semi-parametric regression model that allow us to study a nonlinear relationship between a scalar response and multiple functional predictors in the presence of scalar covariates. Utilizing functional principal component analysis (FPCA) and the least-squares kernel machine method (LSKM), we are able to substantially extend the framework of semi-parametric regression models of scalar responses on scalar predictors by allowing multiple functional predictors to enter the nonlinear model. Regularization is established for feature selection in the setting of reproducing kernel Hilbert spaces. Our method performs simultaneously model fitting and variable selection on functional features. For the implementation, we propose an effective algorithm to solve related optimization problems in that iterations take place between both linear mixed-effects models and a variable selection method (e.g., sparse group lasso). We show algorithmic convergence results and theoretical guarantees for the proposed methodology. We illustrate its performance through simulation experiments and an analysis of accelerometer data. 
    more » « less
  3. Online feature selection and classification is crucial for time sensitive decision making. Existing work however either assumes that features are independent or produces a fixed number of features for classification. Instead, we propose an optimal framework to perform joint feature selection and classification on-the-fly while relaxing the assumption on feature independence. The effectiveness of the proposed approach is showed by classifying urban issue reports on the SeeClickFix civic engagement platform. A significant reduction in the average number of features used is observed without a drop in the classification accuracy. 
    more » « less
  4. Summary

    Variable selection for recovering sparsity in nonadditive and nonparametric models with high-dimensional variables has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high-dimensional variables. There is currently no variable selection method to overcome these limitations. Hence, in this article we propose a variable selection approach that is developed by connecting a kernel machine with the nonparametric regression model. The advantages of our approach are that it can: (i) recover the sparsity; (ii) automatically model unknown and complicated interactions; (iii) connect with several existing approaches including linear nonnegative garrote and multiple kernel learning; and (iv) provide flexibility for both additive and nonadditive nonparametric models. Our approach can be viewed as a nonlinear version of a nonnegative garrote method. We model the smoothing function by a Least Squares Kernel Machine (LSKM) and construct the nonnegative garrote objective function as the function of the sparse scale parameters of kernel machine to recover sparsity of input variables whose relevances to the response are measured by the scale parameters. We also provide the asymptotic properties of our approach. We show that sparsistency is satisfied with consistent initial kernel function coefficients under certain conditions. An efficient coordinate descent/backfitting algorithm is developed. A resampling procedure for our variable selection methodology is also proposed to improve the power.

     
    more » « less
  5. Advances in algorithms and low-power computing hardware imply that machine learning is of potential use in off-grid medical data classification and diagnosis applications such as electrocardiogram interpretation. However, although support vector machine algorithms for electrocardiogram classification show high classification accuracy, hardware implementations for edge applications are impractical due to the complexity and substantial power consumption needed for kernel optimization when using conventional complementary metal–oxide–semiconductor circuits. Here we report reconfigurable mixed-kernel transistors based on dual-gated van der Waals heterojunctions that can generate fully tunable individual and mixed Gaussian and sigmoid functions for analogue support vector machine kernel applications. We show that the heterojunction-generated kernels can be used for arrhythmia detection from electrocardiogram signals with high classification accuracy compared with standard radial basis function kernels. The reconfigurable nature of mixed-kernel heterojunction transistors also allows for personalized detection using Bayesian optimization. A single mixed-kernel heterojunction device can generate the equivalent transfer function of a complementary metal–oxide–semiconductor circuit comprising dozens of transistors and thus provides a low-power approach for support vector machine classification applications. 
    more » « less