Principal Components Analysis (PCA) is a dimension-reduction technique widely used in machine learning and statistics. However, due to the dependence of the principal components on all the dimensions, the components are notoriously hard to interpret. Therefore, a variant known as sparse PCA is often preferred. Sparse PCA learns principal components of the data but enforces that such components must be sparse. This has applications in diverse fields such as computational biology and image processing. To learn sparse principal components, it’s well known that standard PCA will not work, especially in high dimensions, and therefore algorithms for sparse PCA are often studied as a separate endeavor. Various algorithms have been proposed for Sparse PCA over the years, but given how fundamental it is for applications in science, the limits of efficient algorithms are only partially understood. In this work, we study the limits of the powerful Sum of Squares (SoS) family of algorithms for Sparse PCA. SoS algorithms have recently revolutionized robust statistics, leading to breakthrough algorithms for long-standing open problems in machine learning, such as optimally learning mixtures of gaussians, robust clustering, robust regression, etc. Moreover, it is believed to be the optimal robust algorithm for many statistical problems. Therefore, for sparse PCA, it’s plausible that it can beat simpler algorithms such as diagonal thresholding that have been traditionally used. In this work, we show that this is not the case, by exhibiting strong tradeoffs between the number of samples required, the sparsity and the ambient dimension, for which SoS algorithms, even if allowed sub-exponential time, will fail to optimally recover the component. Our results are complemented by known algorithms in literature, thereby painting an almost complete picture of the behavior of efficient algorithms for sparse PCA. Since SoS algorithms encapsulate many algorithmic techniques such as spectral or statistical query algorithms, this solidifies the message that known algorithms are optimal for sparse PCA. Moreover, our techniques are strong enough to obtain similar tradeoffs for Tensor PCA, another important higher order variant of PCA with applications in topic modeling, video processing, etc.
more »
« less
Compressing Latent Space via Least Volume
This paper introduces Least Volume—a simple yet effective regularization inspired by geometric intuition—that can reduce the necessary number of latent dimensions needed by an autoencoder without requiring any prior knowledge of the intrinsic dimensionality of the dataset. We show that the Lipschitz continuity of the decoder is the key to making it work, provide a proof that PCA is just a linear special case of it, and reveal that it has a similar PCA-like importance ordering effect when applied to nonlinear models. We demonstrate the intuition behind the regularization on some pedagogical toy problems, and its effectiveness on several benchmark problems, including MNIST, CIFAR-10 and CelebA.
more »
« less
- Award ID(s):
- 1943699
- PAR ID:
- 10651205
- Publisher / Repository:
- International Conference on Learning Representations
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
l1 regularization is used to preserve edges or enforce sparsity in a solution to an inverse problem. We investigate the split Bregman and the majorization-minimization iterative methods that turn this nonsmooth minimization problem into a sequence of steps that include solving an -regularized minimization problem. We consider selecting the regularization parameter in the inner generalized Tikhonov regularization problems that occur at each iteration in these iterative methods. The generalized cross validation method and chi2 degrees of freedom test are extended to these inner problems. In particular, for the chi2 test this includes extending the result for problems in which the regularization operator has more rows than columns and showing how to use the -weighted generalized inverse to estimate prior information at each inner iteration. Numerical experiments for image deblurring problems demonstrate that it is more effective to select the regularization parameter automatically within the iterative schemes than to keep it fixed for all iterations. Moreover, an appropriate regularization parameter can be estimated in the early iterations and fixed to convergence.more » « less
-
Abstract Tikhonov regularization is commonly used in the solution of linear discrete ill-posed problems. It is known that iterated Tikhonov regularization often produces approximate solutions of higher quality than (standard) Tikhonov regularization. This paper discusses iterated Tikhonov regularization for large-scale problems with a general regularization matrix. Specifically, the original problem is reduced to small size by application of a fairly small number of steps of the Arnoldi or Golub-Kahan processes, and iterated Tikhonov is applied to the reduced problem. The regularization parameter is determined by using an extension of a technique first described by Donatelli and Hanke for quite special coefficient matrices. Convergence of the method is established and computed examples illustrate its performance.more » « less
-
Abstract In this article, we exploit the similarities between Tikhonov regularization and Bayesian hierarchical models to propose a regularization scheme that acts like a distributed Tikhonov regularization where the amount of regularization varies from component to component, and a computationally efficient numerical scheme that is suitable for large-scale problems. In the standard formulation, Tikhonov regularization compensates for the inherent ill-conditioning of linear inverse problems by augmenting the data fidelity term measuring the mismatch between the data and the model output with a scaled penalty functional. The selection of the scaling of the penalty functional is the core problem in Tikhonov regularization. If an estimate of the amount of noise in the data is available, a popular way is to use the Morozov discrepancy principle, stating that the scaling parameter should be chosen so as to guarantee that the norm of the data fitting error is approximately equal to the norm of the noise in the data. A too small value of the regularization parameter would yield a solution that fits to the noise (too weak regularization) while a too large value would lead to an excessive penalization of the solution (too strong regularization). In many applications, it would be preferable to apply distributed regularization, replacing the regularization scalar by a vector valued parameter, so as to allow different regularization for different components of the unknown, or for groups of them. Distributed Tikhonov-inspired regularization is particularly well suited when the data have significantly different sensitivity to different components, or to promote sparsity of the solution. The numerical scheme that we propose, while exploiting the Bayesian interpretation of the inverse problem and identifying the Tikhonov regularization with the maximum a posteriori estimation, requires no statistical tools. A clever combination of numerical linear algebra and numerical optimization tools makes the scheme computationally efficient and suitable for problems where the matrix is not explicitly available. Moreover, in the case of underdetermined problems, passing through the adjoint formulation in data space may lead to substantial reduction in computational complexity.more » « less
-
null (Ed.)Abstract Discrete ill-posed inverse problems arise in various areas of science and engineering. The presence of noise in the data often makes it difficult to compute an accurate approximate solution. To reduce the sensitivity of the computed solution to the noise, one replaces the original problem by a nearby well-posed minimization problem, whose solution is less sensitive to the noise in the data than the solution of the original problem. This replacement is known as regularization. We consider the situation when the minimization problem consists of a fidelity term, that is defined in terms of a p -norm, and a regularization term, that is defined in terms of a q -norm. We allow 0 < p , q ≤ 2. The relative importance of the fidelity and regularization terms is determined by a regularization parameter. This paper develops an automatic strategy for determining the regularization parameter for these minimization problems. The proposed approach is based on a new application of generalized cross validation. Computed examples illustrate the performance of the method proposed.more » « less
An official website of the United States government

