skip to main content


Title: A Hierarchical Eigenmodel for Pooled Covariance Estimation
Summary

Although the covariance matrices corresponding to different populations are unlikely to be exactly equal they can still exhibit a high degree of similarity. For example, some pairs of variables may be positively correlated across most groups, whereas the correlation between other pairs may be consistently negative. In such cases much of the similarity across covariance matrices can be described by similarities in their principal axes, which are the axes that are defined by the eigenvectors of the covariance matrices. Estimating the degree of across-population eigenvector heterogeneity can be helpful for a variety of estimation tasks. For example, eigenvector matrices can be pooled to form a central set of principal axes and, to the extent that the axes are similar, covariance estimates for populations having small sample sizes can be stabilized by shrinking their principal axes towards the across-population centre. To this end, the paper develops a hierarchical model and estimation procedure for pooling principal axes across several populations. The model for the across-group heterogeneity is based on a matrix-valued antipodally symmetric Bingham distribution that can flexibly describe notions of ‘centre’ and ‘spread’ for a population of orthogonal matrices.

 
more » « less
NSF-PAR ID:
10403987
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
71
Issue:
5
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 971-992
Size(s):
["p. 971-992"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Covariance matrices are fundamental to the analysis and forecast of economic, physical and biological systems. Although the eigenvalues $\{\lambda _i\}$ and eigenvectors $\{\boldsymbol{u}_i\}$ of a covariance matrix are central to such endeavours, in practice one must inevitably approximate the covariance matrix based on data with finite sample size $n$ to obtain empirical eigenvalues $\{\tilde{\lambda }_i\}$ and eigenvectors $\{\tilde{\boldsymbol{u}}_i\}$, and therefore understanding the error so introduced is of central importance. We analyse eigenvector error $\|\boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2$ while leveraging the assumption that the true covariance matrix having size $p$ is drawn from a matrix ensemble with known spectral properties—particularly, we assume the distribution of population eigenvalues weakly converges as $p\to \infty $ to a spectral density $\rho (\lambda )$ and that the spacing between population eigenvalues is similar to that for the Gaussian orthogonal ensemble. Our approach complements previous analyses of eigenvector error that require the full set of eigenvalues to be known, which can be computationally infeasible when $p$ is large. To provide a scalable approach for uncertainty quantification of eigenvector error, we consider a fixed eigenvalue $\lambda $ and approximate the distribution of the expected square error $r= \mathbb{E}\left [\| \boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2\right ]$ across the matrix ensemble for all $\boldsymbol{u}_i$ associated with $\lambda _i=\lambda $. We find, for example, that for sufficiently large matrix size $p$ and sample size $n> p$, the probability density of $r$ scales as $1/nr^2$. This power-law scaling implies that the eigenvector error is extremely heterogeneous—even if $r$ is very small for most eigenvectors, it can be large for others with non-negligible probability. We support this and further results with numerical experiments. 
    more » « less
  2. Abstract

    Fluctuations in population abundances are often correlated through time across multiple locations, a phenomenon known as spatial synchrony. Spatial synchrony can exhibit complex spatial structures, termed ‘geographies of synchrony’, that can reveal mechanisms underlying population fluctuations. However, most studies have focused on spatial extents of 10s to 100s of kilometres, making it unclear how synchrony concepts and approaches should apply to dynamics at finer spatial scales.

    We used network analyses, multiple regression on similarity matrices, and wavelet coherence analyses to examine micro‐scale synchrony and geographies of synchrony, over distances up to 30 m, in a serpentine grassland plant community.

    We found that species' populations exhibited a geography of synchrony even over such short distances. Often, well‐synchronized populations were geographically separate, a spatial structure that was shaped mainly by gopher disturbance and dispersal limitation, and to a lesser extent by relationships with other plant species. Precipitation was a significant driver of site‐ and community‐wide temporal dynamics. Gopher disturbance appeared to drive synchrony on 2‐ to 6‐year timescales, and we detected coherent fluctuations among pairs of focal plant taxa.

    Synthesis. Micro‐geographies of synchrony are an intriguing phenomenon that may also help us better understand community dynamics. Additionally, the related geographies of synchrony and coherent temporal dynamics among some species pairs indicate that incorporating interspecific interactions can improve understanding of population spatial synchrony.

     
    more » « less
  3. Abstract

    A separable covariance model can describe the among-row and among-column correlations of a random matrix and permits likelihood-based inference with a very small sample size. However, if the assumption of separability is not met, data analysis with a separable model may misrepresent important dependence patterns in the data. As a compromise between separable and unstructured covariance estimation, we decompose a covariance matrix into a separable component and a complementary ‘core’ covariance matrix. This decomposition defines a new covariance matrix decomposition that makes use of the parsimony and interpretability of a separable covariance model, yet fully describes covariance matrices that are non-separable. This decomposition motivates a new type of shrinkage estimator, obtained by appropriately shrinking the core of the sample covariance matrix, that adapts to the degree of separability of the population covariance matrix.

     
    more » « less
  4. Abstract

    Motivated by brain connectivity analysis and many other network data applications, we study the problem of estimating covariance and precision matrices and their differences across multiple populations. We propose a common reducing subspace model that leads to substantial dimension reduction and efficient parameter estimation. We explicitly quantify the efficiency gain through an asymptotic analysis. Our method is built upon and further extends a nascent technique, the envelope model, which adopts a generalized sparsity principle. This distinguishes our proposal from most xisting covariance and precision estimation methods that assume element-wise sparsity. Moreover, unlike most existing solutions, our method can naturally handle both covariance and precision matrices in a unified way, and work with matrix-valued data. We demonstrate the efficacy of our method through intensive simulations, and illustrate the method with an autism spectrum disorder data analysis.

     
    more » « less
  5. Abstract

    Crenate broomrape (Orobanche crenataForsk.) is a serious long‐standing parasitic weed problem in Algeria, mainly affecting legumes but also vegetable crops. Unresolved questions for parasitic weeds revolve around the extent to which these plants undergo local adaptation, especially with respect to host specialization, which would be expected to be a strong selective factor for obligate parasitic plants. In the present study, the genotyping‐by‐sequencing (GBS) approach was used to analyze genetic diversity and population structure of 10 Northern AlgerianO.crenatapopulations with different geographical origins and host species (faba bean, pea, chickpea, carrot, and tomato). In total, 8004 high‐quality single‐nucleotide polymorphisms (5% missingness) were obtained and used across the study. Genetic diversity and relationships of 95 individuals from 10 populations were studied using model‐based ancestry analysis, principal components analysis, discriminant analysis of principal components, and phylogeny approaches. The genetic differentiation (FST) between pairs of populations was lower between adjacent populations and higher between geographically separated ones, but no support was found for isolation by distance. Further analyses identified four genetic clusters and revealed evidence of structuring among populations and, although confounded with location, among hosts. In the clearest example,O.crenatagrowing on pea had a SNP profile that was distinct from other host/location combinations. These results illustrate the importance and potential of GBS to reveal the dynamics of parasitic weed dispersal and population structure.

     
    more » « less