Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Previous versions of sparse principal component analysis (PCA) have presumed that the eigen-basis (a $$p \times k$$ matrix) is approximately sparse. We propose a method that presumes the $$p \times k$$ matrix becomes approximately sparse after a $$k \times k$$ rotation. The simplest version of the algorithm initializes with the leading $$k$$ principal components. Then, the principal components are rotated with an $$k \times k$$ orthogonal rotation to make them approximately sparse. Finally, soft-thresholding is applied to the rotated principal components. This approach differs from prior approaches because it uses an orthogonal rotation to approximate a sparse basis. One consequence is that a sparse component need not to be a leading eigenvector, but rather a mixture of them. In this way, we propose a new (rotated) basis for sparse PCA. In addition, our approach avoids ``deflation'' and multiple tuning parameters required for that. Our sparse PCA framework is versatile; for example, it extends naturally to a two-way analysis of a data matrix for simultaneous dimensionality reduction of rows and columns. We provide evidence showing that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods. Through three applications---sparse coding of images, analysis of transcriptome sequencing data, and large-scale clustering of social networks, we demonstrate the modern usefulness of sparse PCA in exploring multivariate data.more » « less
-
In the 1930s, Psychologists began developing Multiple-Factor Analysis to decompose multivariate data into a small number of interpretable factors without any a priori knowledge about those factors. In this form of factor analysis, the Varimax factor rotation redraws the axes through the multi-dimensional factors to make them sparse and thus make them more interpretable. Charles Spearman and many others objected to factor rotations because the factors seem to be rotationally invariant. Despite the controversy, factor rotations have remained widely popular among people analyzing data. Reversing nearly a century of statistical thinking on the topic, we show that the rotation makes the factors easier to interpret because the Varimax performs statistical inference; in particular, principal components analysis (PCA) with a Varimax rotation provides a unified spectral estimation strategy for a broad class of semi-parametric factor models, including the Stochastic Blockmodel and a natural variation of Latent Dirichlet Allocation. In addition, we show that Thurstone’s widely employed sparsity diagnostics implicitly assess a key leptokurtic condition that makes the axes statistically identifiable in these models. PCA with Varimax is fast, stable, and practical. Combined with Thurstone’s straightforward diagnostics, this vintage approach is suitable for a wide array of modern applications.more » « less
An official website of the United States government

Full Text Available