We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high dimensional setting where p≫n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule that is obtained from LDA, since it involves all p features. We propose penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher’s discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization–maximization approach to optimize it efficiently when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L1 and fused lasso penalties. Our proposal is equivalent to recasting Fisher’s discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high dimensional setting and explore their relationships with our proposal.
more » « less- NSF-PAR ID:
- 10401174
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series B: Statistical Methodology
- Volume:
- 73
- Issue:
- 5
- ISSN:
- 1369-7412
- Format(s):
- Medium: X Size: p. 753-772
- Size(s):
- p. 753-772
- Sponsoring Org:
- National Science Foundation
More Like this
-
While linear discriminant analysis (LDA) is a widely used classification method, it is highly affected by outliers which commonly occur in various real datasets. Therefore, several robust LDA methods have been proposed. However, they either rely on robust estimation of the sample means and covariance matrix which may have noninvertible Hessians or can only handle binary classes or low dimensional cases. The proposed robust discriminant analysis is a multi-directional projection-pursuit approach which can classify multiple classes without estimating the covariance or Hessian matrix and work for high dimensional cases. The weight function effectively gives smaller weights to the points more deviant from the class center. The discriminant vectors and scoring vectors are solved by the proposed iterative algorithm. It inherits good properties of the weight function and multi-directional projection pursuit for reducing the influence of outliers on estimating the discriminant directions and producing robust classification which is less sensitive to outliers. We show that when a weight function is appropriately chosen, then the influence function is bounded and discriminant vectors and scoring vectors are both consistent as the percentage of outliers goes to zero. The experimental results show that the robust optimal scoring discriminant analysis is effective and efficient.more » « less
-
Fisher’s Linear Discriminant Analysis (FLDA) is a statistical analysis method that linearly embeds data points to a lower dimensional space to maximize a discrimination criterion such that the variance between classes is maximized while the variance within classes is minimized. We introduce a natural extension of FLDA that employs neural networks, called Neural Fisher Discriminant Analysis (NFDA). This method finds the optimal two-layer neural network that embeds data points to optimize the same discrimination criterion. We use tools from convex optimization to transform the optimal neural network embedding problem into a convex problem. The resulting problem is easy to interpret and solve to global optimality. We evaluate the method’s performance on synthetic and real datasets.more » « less
-
null (Ed.)For the pulping process in a pulp & paper plant that uses wood as a raw material, it is important to have real-time knowledge about the moisture content of the woodchips so that the process can be optimized and/or controlled correspondingly to achieve satisfactory product quality while minimizing the consumption of energy and chemicals. Both destructive and non-destructive methods have been developed for estimating moisture content in woodchips, but these methods are often lab-based that cannot be implemented online, or too fragile to stand the harsh manufacturing environment. To address these limitations, we propose a non-destructive and economic approach based on 5 GHz Wi-Fi and use channel state information (CSI) to estimate the moisture content in woodchips. In addition, we propose to use statistics pattern analysis (SPA) to extract features from raw CSI data of amplitude and phase difference. The extracted features are then used for classification model building using linear discriminant analysis (LDA) and subspace discriminant (SD) classification. The woodchip moisture classification results are validated using the oven drying method.more » « less
-
Summary For high dimensional classification, it is well known that naively performing the Fisher discriminant rule leads to poor results due to diverging spectra and accumulation of noise. Therefore, researchers proposed independence rules to circumvent the diverging spectra, and sparse independence rules to mitigate the issue of accumulation of noise. However, in biological applications, often a group of correlated genes are responsible for clinical outcomes, and the use of the covariance information can significantly reduce misclassification rates. In theory the extent of such error rate reductions is unveiled by comparing the misclassification rates of the Fisher discriminant rule and the independence rule. To materialize the gain on the basis of finite samples, a regularized optimal affine discriminant (ROAD) is proposed. The ROAD selects an increasing number of features as the regularization relaxes. Further benefits can be achieved when a screening method is employed to narrow the feature pool before applying the ROAD method. An efficient constrained co-ordinate descent algorithm is also developed to solve the associated optimization problems. Sampling properties of oracle type are established. Simulation studies and real data analysis support our theoretical results and demonstrate the advantages of the new classification procedure under a variety of correlation structures. A delicate result on continuous piecewise linear solution paths for the ROAD optimization problem at the population level justifies the linear interpolation of the constrained co-ordinate descent algorithm.
-
Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($\mathrm{PDA}$) index, built upon the Linear Discriminant Analysis ($\mathrm{LDA}$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($\mathrm{SVM}$). This paper conducts extensive numerical studies to compare the performance of the $\mathrm{PDA}$ index with the $\mathrm{LDA}$ index and $\mathrm{SVM}$, demonstrating that the $\mathrm{PDA}$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $\mathrm{PDA}$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $\mathrm{PDA}$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.more » « less