We perform a rigorous study of private matrix analysis when only the last 𝑊 updates to matrices are considered useful for analysis. We show the existing framework in the non-private setting is not robust to noise required for privacy. We then propose a framework robust to noise and use it to give first efficient 𝑜(𝑊) space differentially private algorithms for spectral approximation, principal component analysis (PCA), multi-response linear regression, sparse PCA, and non-negative PCA. Prior to our work, no such result was known for sparse and non-negative differentially private PCA even in the static data setting. We also give a lower bound to demonstrate the cost of privacy in the sliding window model.
more »
« less
Sub-exponential time Sum-of-Squares lower bounds for Principal Components Analysis
Principal Components Analysis (PCA) is a dimension-reduction technique widely
used in machine learning and statistics. However, due to the dependence of the
principal components on all the dimensions, the components are notoriously hard to
interpret. Therefore, a variant known as sparse PCA is often preferred. Sparse PCA
learns principal components of the data but enforces that such components must
be sparse. This has applications in diverse fields such as computational biology
and image processing. To learn sparse principal components, it’s well known
that standard PCA will not work, especially in high dimensions, and therefore
algorithms for sparse PCA are often studied as a separate endeavor. Various
algorithms have been proposed for Sparse PCA over the years, but given how
fundamental it is for applications in science, the limits of efficient algorithms
are only partially understood. In this work, we study the limits of the powerful
Sum of Squares (SoS) family of algorithms for Sparse PCA. SoS algorithms have
recently revolutionized robust statistics, leading to breakthrough algorithms for
long-standing open problems in machine learning, such as optimally learning
mixtures of gaussians, robust clustering, robust regression, etc. Moreover, it is
believed to be the optimal robust algorithm for many statistical problems. Therefore,
for sparse PCA, it’s plausible that it can beat simpler algorithms such as diagonal
thresholding that have been traditionally used. In this work, we show that this is not
the case, by exhibiting strong tradeoffs between the number of samples required,
the sparsity and the ambient dimension, for which SoS algorithms, even if allowed
sub-exponential time, will fail to optimally recover the component. Our results
are complemented by known algorithms in literature, thereby painting an almost
complete picture of the behavior of efficient algorithms for sparse PCA. Since SoS
algorithms encapsulate many algorithmic techniques such as spectral or statistical
query algorithms, this solidifies the message that known algorithms are optimal for
sparse PCA. Moreover, our techniques are strong enough to obtain similar tradeoffs
for Tensor PCA, another important higher order variant of PCA with applications
in topic modeling, video processing, etc.
more »
« less
- Award ID(s):
- 2008920
- PAR ID:
- 10483244
- Publisher / Repository:
- MIT Press
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Location:
- New Orleans, LA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
For the tensor PCA (principal component analysis) problem, we propose a new hierarchy of increasingly powerful algorithms with increasing runtime. Our hierarchy is analogous to the sumof-squares (SOS) hierarchy but is instead inspired by statistical physics and related algorithms such as belief propagation and AMP (approximate message passing). Our level-ℓ algorithm can be thought of as a linearized message-passing algorithm that keeps track of ℓ-wise dependencies among the hidden variables. Specifically, our algorithms are spectral methods based on the Kikuchi Hessian, which generalizes the well-studied Bethe Hessian to the higher-order Kikuchi free energies. It is known that AMP, the flagship algorithm of statistical physics, has substantially worse performance than SOS for tensor PCA. In this work we ‘redeem’ the statistical physics approach by showing that our hierarchy gives a polynomial-time algorithm matching the performance of SOS. Our hierarchy also yields a continuum of subexponential-time algorithms, and we prove that these achieve the same (conjecturally optimal) tradeoff between runtime and statistical power as SOS. Our proofs are much simpler than prior work, and also apply to the related problem of refuting random k-XOR formulas. The results we present here apply to tensor PCA for tensors of all orders, and to k-XOR when k is even. Our methods suggest a new avenue for systematically obtaining optimal algorithms for Bayesian inference problems, and our results constitute a step toward unifying the statistical physics and sum-of-squares approaches to algorithm design.more » « less
-
The idea of summarizing the information contained in a large number of variables by a small number of “factors” or “principal components” has been broadly adopted in statistics. This article introduces a generalization of the widely used principal component analysis (PCA) to nonlinear settings, thus providing a new tool for dimension reduction and exploratory data analysis or representation. The distinguishing features of the method include (i) the ability to always deliver truly independent (instead of merely uncorrelated) factors; (ii) the use of optimal transport theory and Brenier maps to obtain a robust and efficient computational algorithm; (iii) the use of a new multivariate additive entropy decomposition to determine the most informative principal nonlinear components, and (iv) formally nesting PCA as a special case for linear Gaussian factor models. We illustrate the method’s effectiveness in an application to excess bond returns prediction from a large number of macro factors. Supplementary materials for this article are available online.more » « less
-
Previous versions of sparse principal component analysis (PCA) have presumed that the eigen-basis (a $p \times k$ matrix) is approximately sparse. We propose a method that presumes the $p \times k$ matrix becomes approximately sparse after a $k \times k$ rotation. The simplest version of the algorithm initializes with the leading $k$ principal components. Then, the principal components are rotated with an $k \times k$ orthogonal rotation to make them approximately sparse. Finally, soft-thresholding is applied to the rotated principal components. This approach differs from prior approaches because it uses an orthogonal rotation to approximate a sparse basis. One consequence is that a sparse component need not to be a leading eigenvector, but rather a mixture of them. In this way, we propose a new (rotated) basis for sparse PCA. In addition, our approach avoids ``deflation'' and multiple tuning parameters required for that. Our sparse PCA framework is versatile; for example, it extends naturally to a two-way analysis of a data matrix for simultaneous dimensionality reduction of rows and columns. We provide evidence showing that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods. Through three applications---sparse coding of images, analysis of transcriptome sequencing data, and large-scale clustering of social networks, we demonstrate the modern usefulness of sparse PCA in exploring multivariate data.more » « less
-
We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA. We develop novel and simple optimization formulations for these problems such that any approximate stationary point of the associated optimization problem yields a near-optimal solution for the underlying robust estimation task. As a corollary, we obtain that any first-order method that efficiently converges to stationarity yields an efficient algorithm for these tasks.1 The obtained algorithms are simple, practical, and succeed under broader distributional assumptions compared to prior work.more » « less