skip to main content


Title: Ensemble-based estimates of eigenvector error for empirical covariance matrices
Abstract Covariance matrices are fundamental to the analysis and forecast of economic, physical and biological systems. Although the eigenvalues $\{\lambda _i\}$ and eigenvectors $\{\boldsymbol{u}_i\}$ of a covariance matrix are central to such endeavours, in practice one must inevitably approximate the covariance matrix based on data with finite sample size $n$ to obtain empirical eigenvalues $\{\tilde{\lambda }_i\}$ and eigenvectors $\{\tilde{\boldsymbol{u}}_i\}$, and therefore understanding the error so introduced is of central importance. We analyse eigenvector error $\|\boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2$ while leveraging the assumption that the true covariance matrix having size $p$ is drawn from a matrix ensemble with known spectral properties—particularly, we assume the distribution of population eigenvalues weakly converges as $p\to \infty $ to a spectral density $\rho (\lambda )$ and that the spacing between population eigenvalues is similar to that for the Gaussian orthogonal ensemble. Our approach complements previous analyses of eigenvector error that require the full set of eigenvalues to be known, which can be computationally infeasible when $p$ is large. To provide a scalable approach for uncertainty quantification of eigenvector error, we consider a fixed eigenvalue $\lambda $ and approximate the distribution of the expected square error $r= \mathbb{E}\left [\| \boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2\right ]$ across the matrix ensemble for all $\boldsymbol{u}_i$ associated with $\lambda _i=\lambda $. We find, for example, that for sufficiently large matrix size $p$ and sample size $n> p$, the probability density of $r$ scales as $1/nr^2$. This power-law scaling implies that the eigenvector error is extremely heterogeneous—even if $r$ is very small for most eigenvectors, it can be large for others with non-negligible probability. We support this and further results with numerical experiments.  more » « less
Award ID(s):
1815971
NSF-PAR ID:
10148219
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
8
Issue:
2
ISSN:
2049-8772
Page Range / eLocation ID:
289 to 312
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Throughout many scientific and engineering fields, including control theory, quantum mechanics, advanced dynamics, and network theory, a great many important applications rely on the spectral decomposition of matrices. Traditional methods such as the power iteration method, Jacobi eigenvalue method, and QR decomposition are commonly used to compute the eigenvalues and eigenvectors of a square and symmetric matrix. However, these methods suffer from certain drawbacks: in particular, the power iteration method can only find the leading eigen-pair (i.e., the largest eigenvalue and its corresponding eigenvector), while the Jacobi and QR decomposition methods face significant performance limitations when facing with large scale matrices. Typically, even producing approximate eigenpairs of a general square matrix requires at least O(N^3) time complexity, where N is the number of rows of the matrix. In this work, we exploit the newly developed memristor technology to propose a low-complexity, scalable memristor-based method for deriving a set of dominant eigenvalues and eigenvectors for real symmetric non-negative matrices. The time complexity for our proposed algorithm is O(N^2 /Δ) (where Δ governs the accuracy). We present experimental studies to simulate the memristor-supporting algorithm, with results demonstrating that the average error for our method is within 4%, while its performance is up to 1.78X better than traditional methods. 
    more » « less
  2. Abstract

    We study the problem of estimating a $k$-sparse signal ${\boldsymbol \beta }_{0}\in{\mathbb{R}}^{p}$ from a set of noisy observations $\mathbf{y}\in{\mathbb{R}}^{n}$ under the model $\mathbf{y}=\mathbf{X}{\boldsymbol \beta }+w$, where $\mathbf{X}\in{\mathbb{R}}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\boldsymbol \varSigma })$. We consider the class of $L_{q}$-regularized least squares (LQLS) given by the formulation $\hat{{\boldsymbol \beta }}(\lambda )=\text{argmin}_{{\boldsymbol \beta }\in{\mathbb{R}}^{p}}\frac{1}{2}\|\mathbf{y}-\mathbf{X}{\boldsymbol \beta }\|^{2}_{2}+\lambda \|{\boldsymbol \beta }\|_{q}^{q}$, where $\|\cdot \|_{q}$  $(0\le q\le 2)$ denotes the $L_{q}$-norm. In the setting $p,n,k\rightarrow \infty $ with fixed $k/p=\epsilon $ and $n/p=\delta $, we derive the asymptotic risk of $\hat{{\boldsymbol \beta }}(\lambda )$ for arbitrary covariance matrix ${\boldsymbol \varSigma }$ that generalizes the existing results for standard Gaussian design, i.e. $X_{ij}\overset{i.i.d}{\sim }N(0,1)$. The results were derived from the non-rigorous replica method. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of ${\boldsymbol \varSigma }$ in the cases $0\le q\lt 1$ and $1\lt q\le 2,$ which indicates that the correlations among predictors only affect the phase transition curve in the case $q=1$ a.k.a. LASSO. To study the influence of the covariance structure of ${\boldsymbol \varSigma }$ on the performance of LQLS in the cases $0\le q\lt 1$ and $1\lt q\le 2$, we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results.

     
    more » « less
  3. Summary

    Although the covariance matrices corresponding to different populations are unlikely to be exactly equal they can still exhibit a high degree of similarity. For example, some pairs of variables may be positively correlated across most groups, whereas the correlation between other pairs may be consistently negative. In such cases much of the similarity across covariance matrices can be described by similarities in their principal axes, which are the axes that are defined by the eigenvectors of the covariance matrices. Estimating the degree of across-population eigenvector heterogeneity can be helpful for a variety of estimation tasks. For example, eigenvector matrices can be pooled to form a central set of principal axes and, to the extent that the axes are similar, covariance estimates for populations having small sample sizes can be stabilized by shrinking their principal axes towards the across-population centre. To this end, the paper develops a hierarchical model and estimation procedure for pooling principal axes across several populations. The model for the across-group heterogeneity is based on a matrix-valued antipodally symmetric Bingham distribution that can flexibly describe notions of ‘centre’ and ‘spread’ for a population of orthogonal matrices.

     
    more » « less
  4. In the context of principal components analysis (PCA), the bootstrap is commonly applied to solve a variety of inference problems, such as constructing confidence intervals for the eigenvalues of the population covariance matrix Σ. However, when the data are high-dimensional, there are relatively few theoretical guarantees that quantify the performance of the bootstrap. Our aim in this paper is to analyze how well the bootstrap can approximate the joint distribution of the leading eigenvalues of the sample covariance matrix \hat{Σ}, and we establish non-asymptotic rates of approximation with respect to the multivariate Kolmogorov metric. Under certain assumptions, we show that the bootstrap can achieve a dimension-free rate of r(Σ)/sqrt{n} up to logarithmic factors, where r(Σ) is the effective rank of Σ, and n is the sample size. From a methodological standpoint, we show that applying a transformation to the eigenvalues of \hat{Σ} before bootstrapping is an important consideration in high-dimensional settings. 
    more » « less
  5. Abstract

    We analyse the eigenvectors of the adjacency matrix of the Erdős–Rényi graph$${\mathbb {G}}(N,d/N)$$G(N,d/N)for$$\sqrt{\log N} \ll d \lesssim \log N$$logNdlogN. We show the existence of a localized phase, where each eigenvector is exponentially localized around a single vertex of the graph. This complements the completely delocalized phase previously established in Alt et al. (Commun Math Phys 388(1):507–579, 2021). For large enoughd, we establish a mobility edge by showing that the localized phase extends up to the boundary of the delocalized phase. We derive explicit asymptotics for the localization length up to the mobility edge and characterize its divergence near the phase boundary. The proof is based on a rigorous verification of Mott’s criterion for localization, comparing the tunnelling amplitude between localization centres with the eigenvalue spacing. The first main ingredient is a new family of global approximate eigenvectors, for which sharp enough estimates on the tunnelling amplitude can be established. The second main ingredient is a lower bound on the spacing of approximate eigenvalues. It follows from an anticoncentration result for these discrete random variables, obtained by a recursive application of a self-improving anticoncentration estimate due to Kesten.

     
    more » « less