skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Assessment of Projection Pursuit Index for Classifying High Dimension Low Sample Size Data in R
Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($$\mathrm{PDA}$$) index, built upon the Linear Discriminant Analysis ($$\mathrm{LDA}$$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($$\mathrm{SVM}$$). This paper conducts extensive numerical studies to compare the performance of the $$\mathrm{PDA}$$ index with the $$\mathrm{LDA}$$ index and $$\mathrm{SVM}$$, demonstrating that the $$\mathrm{PDA}$$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $$\mathrm{PDA}$$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $$\mathrm{PDA}$$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.  more » « less
Award ID(s):
2013486 1712418
PAR ID:
10454058
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of Data Science
Volume:
21
Issue:
2
ISSN:
1680-743X
Page Range / eLocation ID:
310 to 332
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. While linear discriminant analysis (LDA) is a widely used classification method, it is highly affected by outliers which commonly occur in various real datasets. Therefore, several robust LDA methods have been proposed. However, they either rely on robust estimation of the sample means and covariance matrix which may have noninvertible Hessians or can only handle binary classes or low dimensional cases. The proposed robust discriminant analysis is a multi-directional projection-pursuit approach which can classify multiple classes without estimating the covariance or Hessian matrix and work for high dimensional cases. The weight function effectively gives smaller weights to the points more deviant from the class center. The discriminant vectors and scoring vectors are solved by the proposed iterative algorithm. It inherits good properties of the weight function and multi-directional projection pursuit for reducing the influence of outliers on estimating the discriminant directions and producing robust classification which is less sensitive to outliers. We show that when a weight function is appropriately chosen, then the influence function is bounded and discriminant vectors and scoring vectors are both consistent as the percentage of outliers goes to zero. The experimental results show that the robust optimal scoring discriminant analysis is effective and efficient. 
    more » « less
  2. The problem of classifying multiple categorical responses is fundamental in modern machine learning and statistics, with diverse applications in fields such as bioinformatics and imaging. This manuscript investigates linear discriminant analysis (LDA) with high-dimensional predictors and multiple multi-class responses. Specifically, we first examine two different classification scenarios under the bivariate LDA model: joint classification of the two responses and conditional classification of one response while observing the other. To achieve optimal classification rules for both scenarios, we introduce two novel tensor formulations of the discriminant coefficients and corresponding regularization strategies. For joint classification, we propose an overlapping group lasso penalty and a blockwise coordinate descent algorithm to efficiently compute the joint discriminant coefficient tensors. For conditional classification, we utilize an alternating direction method of multipliers (ADMM) algorithm to compute the discriminant coefficient tensors under new constraints. We then extend our method and algorithms to general multivariate responses. Finally, we validate the effectiveness of our approach through simulation studies and applications to benchmark datasets. 
    more » « less
  3. Low-dimensional discriminative representations enhance machine learning methods in both performance and complexity. This has motivated supervised dimensionality reduction (DR), which transforms high-dimensional data into a discriminative subspace. Most DR methods require data to be i.i.d. However, in some domains, data naturally appear in sequences, where the observations are temporally correlated. We propose a DR method, namely, latent temporal linear discriminant analysis (LT-LDA), to learn low-dimensional temporal representations. We construct the separability among sequence classes by lifting the holistic temporal structures, which are established based on temporal alignments and may change in different subspaces. We jointly learn the subspace and the associated latent alignments by optimizing an objective that favors easily separable temporal structures. We show that this objective is connected to the inference of alignments and thus allows for an iterative solution. We provide both theoretical insight and empirical evaluations on several real-world sequence datasets to show the applicability of our method. 
    more » « less
  4. Multilinear discriminant analysis (MLDA), a novel approach based upon recent developments in tensor-tensor decomposition, has been proposed recently and showed better performance than traditional matrix linear discriminant analysis (LDA). The current paper presents a nonlinear generalization of MLDA (referred to as KMLDA) by extending the well known ``kernel trick" to multilinear data. The approach proceeds by defining a new dot product based on new tensor operators for third-order tensors. Experimental results on the ORL, extended Yale B, and COIL-100 data sets demonstrate that performing MLDA in feature space provides more class separability. It is also shown that the proposed KMLDA approach performs better than the Tucker-based discriminant analysis methods in terms of image classification. 
    more » « less
  5. We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive–regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set. 
    more » « less