Previous versions of sparse principal component analysis (PCA) have presumed that the eigen-basis (a $$p \times k$$ matrix) is approximately sparse. We propose a method that presumes the $$p \times k$$ matrix becomes approximately sparse after a $$k \times k$$ rotation. The simplest version of the algorithm initializes with the leading $$k$$ principal components. Then, the principal components are rotated with an $$k \times k$$ orthogonal rotation to make them approximately sparse. Finally, soft-thresholding is applied to the rotated principal components. This approach differs from prior approaches because it uses an orthogonal rotation to approximate a sparse basis. One consequence is that a sparse component need not to be a leading eigenvector, but rather a mixture of them. In this way, we propose a new (rotated) basis for sparse PCA. In addition, our approach avoids ``deflation'' and multiple tuning parameters required for that. Our sparse PCA framework is versatile; for example, it extends naturally to a two-way analysis of a data matrix for simultaneous dimensionality reduction of rows and columns. We provide evidence showing that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods. Through three applications---sparse coding of images, analysis of transcriptome sequencing data, and large-scale clustering of social networks, we demonstrate the modern usefulness of sparse PCA in exploring multivariate data. 
                        more » 
                        « less   
                    
                            
                            Black-Box k-to-1-PCA Reductions: Theory and Applications
                        
                    
    
            The k-principal component analysis (k-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of k-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box deflation methods as a framework for designing k-PCA algorithms, where we model access to the unknown target matrix via a black-box 1-PCA oracle which returns an approximate top eigenvector, under two popular notions of approximation. Despite being arguably the most natural reduction-based approach to k-PCA algorithm design, such black-box methods, which recursively call a 1-PCA oracle k times, were previously poorly-understood. Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for k-PCA. For a quadratic form notion of approximation we term ePCA (energy PCA), we show deflation methods suffer no parameter loss. For an alternative well-studied approximation notion we term cPCA (correlation PCA), we tightly characterize the parameter regimes where deflation methods are feasible. Moreover, we show that in all feasible regimes, k-cPCA deflation algorithms suffer no asymptotic parameter loss for any constant k. We apply our framework to obtain state-of-the-art k-PCA algorithms robust to dataset contamination, improving prior work in sample complexity by a 𝗉𝗈𝗅𝗒(k) factor. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2505865
- PAR ID:
- 10631872
- Publisher / Repository:
- https://doi.org/10.48550/arXiv.2403.03905
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In this paper, we address clustering problems in scenarios where accurate direct access to the full dataset is impractical or impossible. Instead, we leverage oracle-based methods, which are particularly valuable in real-world applications where the data may be noisy, restricted due to privacy concerns or sheer volume. We utilize two oracles, the quadruplet and the distance oracle. The quadruplet oracle is a weaker oracle that only approximately compares the distances of two pairs of vertices. In practice, these oracles can be implemented using crowdsourcing or training classifiers or other predictive models. On the other hand, the distance oracle returns exactly the distance of two vertices, so it is a stronger and more expensive oracle to implement. We consider two noise models for the quadruplet oracle. In the adversarial noise model, if two pairs have similar distances, the response is chosen by an adversary. In the probabilistic noise model, the pair with the smaller distance is returned with a constant probability. We consider a set V of n vertices in a metric space that supports the quadruplet and the distance oracle. For each of the k-center, k-median, and k-means clustering problem on V, we design constant approximation algorithms that perform roughly O(nk) calls to the quadruplet oracle and O(k^2) calls to the distance oracle in both noise models. When the dataset has low intrinsic dimension, we significantly improve the approximation factors of our algorithms by performing a few additional calls to the distance oracle. We also show that for k-median and k-means clustering there is no hope to return any sublinear approximation using only the quadruplet oracle. Finally, we give constant approximation algorithms for estimating the clustering cost induced by any set of k vertices, performing roughly O(nk) calls to the quadruplet oracle and O(k^2) calls to the distance oracle.more » « less
- 
            We study algorithms for approximating the spectral density (i.e., the eigenvalue distribution) of a symmetric matrix A ∈ ℝn×n that is accessed through matrix-vector product queries. Recent work has analyzed popular Krylov subspace methods for this problem, showing that they output an ∈ · || A||2 error approximation to the spectral density in the Wasserstein-1 metric using O (1/∈ ) matrix-vector products. By combining a previously studied Chebyshev polynomial moment matching method with a deflation step that approximately projects off the largest magnitude eigendirections of A before estimating the spectral density, we give an improved error bound of ∈ · σℓ (A) using O (ℓ log n + 1/∈ ) matrix-vector products, where σℓ (A) is the ℓth largest singular value of A. In the common case when A exhibits fast singular value decay and so σℓ (A) « ||A||2, our bound can be much stronger than prior work. We also show that it is nearly tight: any algorithm giving error ∈ · σℓ (A) must use Ω(ℓ + 1/∈ ) matrix-vector products. We further show that the popular Stochastic Lanczos Quadrature (SLQ) method essentially matches the above bound for any choice of parameter ℓ, even though SLQ itself is parameter-free and performs no explicit deflation. Our bound helps to explain the strong practical performance and observed ‘spectrum adaptive’ nature of SLQ, and motivates a simple variant of the method that achieves an even tighter error bound. Technically, our results require a careful analysis of how eigenvalues and eigenvectors are approximated by (block) Krylov subspace methods, which may be of independent interest. Our error bound for SLQ leverages an analysis of the method that views it as an implicit polynomial moment matching method, along with recent results on low-rank approximation with single-vector Krylov methods. We use these results to show that the method can perform ‘implicit deflation’ as part of moment matching.more » « less
- 
            Ta-Shma, Amnon (Ed.)We study the problem of finding a Tarski fixed point over the k-dimensional grid [n]^k. We give a black-box reduction from the Tarski problem to the same problem with an additional promise that the input function has a unique fixed point. It implies that the Tarski problem and the unique Tarski problem have exactly the same query complexity. Our reduction is based on a novel notion of partial-information functions which we use to fool algorithms for the unique Tarski problem as if they were working on a monotone function with a unique fixed point.more » « less
- 
            Megow, Nicole; Smith, Adam (Ed.)We study the question of when an approximate search optimization problem is harder than the associated decision problem. Specifically, we study a natural and quite general model of black-box search-to-decision reductions, which we call branch-and-bound reductions (in analogy with branch-and-bound algorithms). In this model, an algorithm attempts to minimize (or maximize) a function f: D → ℝ_{≥ 0} by making oracle queries to h_f : T → ℝ_{≥ 0} satisfying min_{x ∈ S} f(x) ≤ h_f(S) ≤ γ ⋅ min_{x ∈ S} f(x) (*) for some γ ≥ 1 and any subset S in some allowed class of subsets T; of the domain D. (When the goal is to maximize f, h_f instead yields an approximation to the maximal value of f over S.) We show tight upper and lower bounds on the number of queries q needed to find even a \gamma'-approximate minimizer (or maximizer) for quite large \gamma'; in a number of interesting settings, as follows. - For arbitrary functions f : {0,1}ⁿ → ℝ_{≥ 0}, where T; contains all subsets of the domain, we show that no branch-and-bound reduction can achieve γ' ≲ γ^{n/log q}, while a simple greedy approach achieves essentially γ^{n/log q}. - For a large class of MAX-CSPs, where T = {S_w} contains each set of assignments to the variables induced by a partial assignment w, we show that no branch-and-bound reduction can do significantly better than essentially a random guess, even when the oracle h_f guarantees an approximation factor of γ ≈ 1+√{log(q)/n}. - For the Traveling Salesperson Problem (TSP), where T = {S_p} contains each set of tours extending a path p, we show that no branch-and-bound reduction can achieve γ' ≲ (γ-1) n/log q. We also prove a nearly matching upper bound in our model. These results show an oracle model in which approximate search and decision are strongly separated. (In particular, our result for TSP can be viewed as a negative answer to a question posed by Bellare and Goldwasser (SIAM J. Comput. 1994), though only in an oracle model.) We also note two alternative interpretations of our results. First, if we view h_f as a data structure, then our results unconditionally rule out black-box search-to-decision reductions for certain data structure problems. Second, if we view h_f as an efficiently computable heuristic, then our results show that any reasonably efficient branch-and-bound algorithm requires more guarantees from its heuristic than simply Eq. (*). Behind our results is a ``useless oracle lemma'' which allows us to argue that under certain conditions the oracle h_f is ``useless'' and which might be of independent interest.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    