Abstract Accurate classification of high‐dimensional data is important in many scientific applications. We propose a family of high‐dimensional classification methods based upon a comparison of the component‐wise distances of the feature vector of a sample to the within‐class population quantiles. These methods are motivated by the fact that quantile classifiers based on these component‐wise distances are the most powerful univariate classifiers for an optimal choice of the quantile level. A simple aggregation approach for constructing a multivariate classifier based upon these component‐wise distances to the within‐class quantiles is proposed. It is shown that this classifier is consistent with the asymptotically optimal classifier as the sample size increases. Our proposed classifiers result in simple piecewise‐linear decision rule boundaries that can be efficiently trained. Numerical results are shown to demonstrate competitive performance for the proposed classifiers on both simulated data and a benchmark email spam application.
more »
« less
Sparsifying the Fisher Linear Discriminant by Rotation
Summary Many high dimensional classification techniques have been proposed in the literature based on sparse linear discriminant analysis. To use them efficiently, sparsity of linear classifiers is a prerequisite. However, this might not be readily available in many applications, and rotations of data are required to create the sparsity needed. We propose a family of rotations to create the sparsity required. The basic idea is to use the principal components of the sample covariance matrix of the pooled samples and its variants to rotate the data first and then to apply an existing high dimensional classifier. This rotate-and-solve procedure can be combined with any existing classifiers and is robust against the level of sparsity of the true model. We show that these rotations do create the sparsity that is needed for high dimensional classifications and we provide theoretical understanding why such a rotation works empirically. The effectiveness of the method proposed is demonstrated by several simulated and real data examples, and the improvements of our method over some popular high dimensional classification rules are clearly shown.
more »
« less
- Award ID(s):
- 1309507
- PAR ID:
- 10397392
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series B: Statistical Methodology
- Volume:
- 77
- Issue:
- 4
- ISSN:
- 1369-7412
- Page Range / eLocation ID:
- p. 827-851
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider the two-group classification problem and propose a kernel classifier based on the optimal scoring framework. Unlike previous approaches, we provide theoretical guarantees on the expected risk consistency of the method. We also allow for feature selection by imposing structured sparsity using weighted kernels. We propose fully-automated methods for selection of all tuning parameters, and in particular adapt kernel shrinkage ideas for ridge parameter selection. Numerical studies demonstrate the superior classification performance of the proposed approach compared to existing nonparametric classifiers.more » « less
-
Linear encoding of sparse vectors is widely popular, but is commonly data-independent – missing any possible extra (but a priori unknown) structure beyond sparsity. In this paper we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used l1 decoder. The convex l1 decoder prevents gradient propagation as needed in standard gradient-based training. Our method is based on the insight that unrolling the convex decoder into T projected subgradient steps can address this issue. Our method can be seen as a data-driven way to learn a compressed sensing measurement matrix. We compare the empirical performance of 10 algorithms over 6 sparse datasets (3 synthetic and 3 real). Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1.1-3x) compared to the previous state-of-the-art methods. We illustrate an application of our method in learning label embeddings for extreme multi-label classification, and empirically show that our method is able to match or outperform the precision scores of SLEEC, which is one of the state-of-the-art embedding-based approaches.more » « less
-
Tetrapods use their neck to move the head three-dimensionally, relative to the body and limbs. Fish lack this anatomical neck, yet during feeding many species elevate (dorsally rotate) the head relative to the body. Cranial elevation is hypothesized to result from the craniovertebral and cranial-most intervertebral joints acting as a neck, by dorsally rotating (extending). However, this has never been tested due to the difficulty of visualizing and measuring vertebral motion in vivo . I used X-ray reconstruction of moving morphology to measure three-dimensional vertebral kinematics in rainbow trout ( Oncorhynchus mykiss ) and Commerson's frogfish ( Antennarius commerson ) during feeding. Despite dramatically different morphologies, in both species dorsoventral rotations extended far beyond the craniovertebral and cranial intervertebral joints. Trout combine small (most less than 3°) dorsal rotations over up to a third of their intervertebral joints to elevate the neurocranium. Frogfish use extremely large (often 20–30°) rotations of the craniovertebral and first intervertebral joint, but smaller rotations occurred across two-thirds of the vertebral column during cranial elevation. Unlike tetrapods, fish rotate large regions of the vertebral column to rotate the head. This suggests both cranial and more caudal vertebrae should be considered to understand how non-tetrapods control motion at the head–body interface.more » « less
-
High-dimensional statistical learning (HDSL) has wide applications in data analysis, operations research, and decision making. Despite the availability of multiple theoretical frameworks, most existing HDSL schemes stipulate the following two conditions: (a) the sparsity and (b) restricted strong convexity (RSC). This paper generalizes both conditions via the use of the folded concave penalty (FCP). More specifically, we consider an M-estimation problem where (i) (conventional) sparsity is relaxed into the approximate sparsity and (ii) RSC is completely absent. We show that the FCP-based regularization leads to poly-logarithmic sample complexity; the training data size is only required to be poly-logarithmic in the problem dimensionality. This finding can facilitate the analysis of two important classes of models that are currently less understood: high-dimensional nonsmooth learning and (deep) neural networks (NNs). For both problems, we show that poly-logarithmic sample complexity can be maintained. In particular, our results indicate that the generalizability of NNs under overparameterization can be theoretically ensured with the aid of regularization.more » « less
An official website of the United States government
