skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sparse feature selection in kernel discriminant analysis via optimal scoring
We consider the two-group classification problem and propose a kernel classifier based on the optimal scoring framework. Unlike previous approaches, we provide theoretical guarantees on the expected risk consistency of the method. We also allow for feature selection by imposing structured sparsity using weighted kernels. We propose fully-automated methods for selection of all tuning parameters, and in particular adapt kernel shrinkage ideas for ridge parameter selection. Numerical studies demonstrate the superior classification performance of the proposed approach compared to existing nonparametric classifiers.  more » « less
Award ID(s):
1712943
PAR ID:
10098032
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
89
ISSN:
2640-3498
Page Range / eLocation ID:
1704-1713
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider a general formulation of the multiple change-point problem, in which the data is assumed to belong to a set equipped with a positive semidefinite kernel. We propose a model-selection penalty allowing to select the number of change points in Harchaoui and Cappe's kernel-based change-point detection method. The model-selection penalty generalizes non-asymptotic model-selection penalties for the change-in-mean problem with univariate data. We prove a non-asymptotic oracle inequality for the resulting kernel-based change-point detection method, whatever the unknown number of change points, thanks to a concentration result for Hilbert-space valued random variables which may be of independent interest. Experiments on synthetic and real data illustrate the proposed method, demonstrating its ability to detect subtle changes in the distribution of data. 
    more » « less
  2. Motivated by mobile devices that record data at a high frequency, we propose a new methodological framework for analyzing a semi-parametric regression model that allow us to study a nonlinear relationship between a scalar response and multiple functional predictors in the presence of scalar covariates. Utilizing functional principal component analysis (FPCA) and the least-squares kernel machine method (LSKM), we are able to substantially extend the framework of semi-parametric regression models of scalar responses on scalar predictors by allowing multiple functional predictors to enter the nonlinear model. Regularization is established for feature selection in the setting of reproducing kernel Hilbert spaces. Our method performs simultaneously model fitting and variable selection on functional features. For the implementation, we propose an effective algorithm to solve related optimization problems in that iterations take place between both linear mixed-effects models and a variable selection method (e.g., sparse group lasso). We show algorithmic convergence results and theoretical guarantees for the proposed methodology. We illustrate its performance through simulation experiments and an analysis of accelerometer data. 
    more » « less
  3. Online feature selection and classification is crucial for time sensitive decision making. Existing work however either assumes that features are independent or produces a fixed number of features for classification. Instead, we propose an optimal framework to perform joint feature selection and classification on-the-fly while relaxing the assumption on feature independence. The effectiveness of the proposed approach is showed by classifying urban issue reports on the SeeClickFix civic engagement platform. A significant reduction in the average number of features used is observed without a drop in the classification accuracy. 
    more » « less
  4. Advances in algorithms and low-power computing hardware imply that machine learning is of potential use in off-grid medical data classification and diagnosis applications such as electrocardiogram interpretation. However, although support vector machine algorithms for electrocardiogram classification show high classification accuracy, hardware implementations for edge applications are impractical due to the complexity and substantial power consumption needed for kernel optimization when using conventional complementary metal–oxide–semiconductor circuits. Here we report reconfigurable mixed-kernel transistors based on dual-gated van der Waals heterojunctions that can generate fully tunable individual and mixed Gaussian and sigmoid functions for analogue support vector machine kernel applications. We show that the heterojunction-generated kernels can be used for arrhythmia detection from electrocardiogram signals with high classification accuracy compared with standard radial basis function kernels. The reconfigurable nature of mixed-kernel heterojunction transistors also allows for personalized detection using Bayesian optimization. A single mixed-kernel heterojunction device can generate the equivalent transfer function of a complementary metal–oxide–semiconductor circuit comprising dozens of transistors and thus provides a low-power approach for support vector machine classification applications. 
    more » « less
  5. Abstract Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple models that are competitive on a variety of tasks, it has been unclear how to develop scalable kernel-based transfer learning methods across general source and target tasks with possibly differing label dimensions. In this work, we propose a transfer learning framework for kernel methods by projecting and translating the source model to the target task. We demonstrate the effectiveness of our framework in applications to image classification and virtual drug screening. For both applications, we identify simple scaling laws that characterize the performance of transfer-learned kernels as a function of the number of target examples. We explain this phenomenon in a simplified linear setting, where we are able to derive the exact scaling laws. 
    more » « less