skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On Information Rank Deficiency in Phenotypic Covariance Matrices
Abstract This article investigates a form of rank deficiency in phenotypic covariance matrices derived from geometric morphometric data, and its impact on measures of phenotypic integration. We first define a type of rank deficiency based on information theory then demonstrate that this deficiency impairs the performance of phenotypic integration metrics in a model system. Lastly, we propose methods to treat for this information rank deficiency. Our first goal is to establish how the rank of a typical geometric morphometric covariance matrix relates to the information entropy of its eigenvalue spectrum. This requires clear definitions of matrix rank, of which we define three: the full matrix rank (equal to the number of input variables), the mathematical rank (the number of nonzero eigenvalues), and the information rank or “effective rank” (equal to the number of nonredundant eigenvalues). We demonstrate that effective rank deficiency arises from a combination of methodological factors—Generalized Procrustes analysis, use of the correlation matrix, and insufficient sample size—as well as phenotypic covariance. Secondly, we use dire wolf jaws to document how differences in effective rank deficiency bias two metrics used to measure phenotypic integration. The eigenvalue variance characterizes the integration change incorrectly, and the standardized generalized variance lacks the sensitivity needed to detect subtle changes in integration. Both metrics are impacted by the inclusion of many small, but nonzero, eigenvalues arising from a lack of information in the covariance matrix, a problem that usually becomes more pronounced as the number of landmarks increases. We propose a new metric for phenotypic integration that combines the standardized generalized variance with information entropy. This metric is equivalent to the standardized generalized variance but calculated only from those eigenvalues that carry nonredundant information. It is the standardized generalized variance scaled to the effective rank of the eigenvalue spectrum. We demonstrate that this metric successfully detects the shift of integration in our dire wolf sample. Our third goal is to generalize the new metric to compare data sets with different sample sizes and numbers of variables. We develop a standardization for matrix information based on data permutation then demonstrate that Smilodon jaws are more integrated than dire wolf jaws. Finally, we describe how our information entropy-based measure allows phenotypic integration to be compared in dense semilandmark data sets without bias, allowing characterization of the information content of any given shape, a quantity we term “latent dispersion”. [Canis dirus; Dire wolf; effective dispersion; effective rank; geometric morphometrics; information entropy; latent dispersion; modularity and integration; phenotypic integration; relative dispersion.]  more » « less
Award ID(s):
1758108
PAR ID:
10318475
Author(s) / Creator(s):
; ;
Editor(s):
Esposito, Lauren
Date Published:
Journal Name:
Systematic Biology
ISSN:
1063-5157
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The field of comparative morphology has entered a new phase with the rapid generation of high-resolution three-dimensional (3D) data. With freely available 3D data of thousands of species, methods for quantifying morphology that harness this rich phenotypic information are quickly emerging. Among these techniques, high-density geometric morphometric approaches provide a powerful and versatile framework to robustly characterize shape and phenotypic integration, the covariances among morphological traits. These methods are particularly useful for analyses of complex structures and across disparate taxa, which may share few landmarks of unambiguous homology. However, high-density geometric morphometrics also brings challenges, for example, with statistical, but not biological, covariances imposed by placement and sliding of semilandmarks and registration methods such as Procrustes superimposition. Here, we present simulations and case studies of high-density datasets for squamates, birds, and caecilians that exemplify the promise and challenges of high-dimensional analyses of phenotypic integration and modularity. We assess: (1) the relative merits of “big” high-density geometric morphometrics data over traditional shape data; (2) the impact of Procrustes superimposition on analyses of integration and modularity; and (3) differences in patterns of integration between analyses using high-density geometric morphometrics and those using discrete landmarks. We demonstrate that for many skull regions, 20–30 landmarks and/or semilandmarks are needed to accurately characterize their shape variation, and landmark-only analyses do a particularly poor job of capturing shape variation in vault and rostrum bones. Procrustes superimposition can mask modularity, especially when landmarks covary in parallel directions, but this effect decreases with more biologically complex covariance patterns. The directional effect of landmark variation on the position of the centroid affects recovery of covariance patterns more than landmark number does. Landmark-only and landmark-plus-sliding-semilandmark analyses of integration are generally congruent in overall pattern of integration, but landmark-only analyses tend to show higher integration between adjacent bones, especially when landmarks placed on the sutures between bones introduces a boundary bias. Allometry may be a stronger influence on patterns of integration in landmark-only analyses, which show stronger integration prior to removal of allometric effects compared to analyses including semilandmarks. High-density geometric morphometrics has its challenges and drawbacks, but our analyses of simulated and empirical datasets demonstrate that these potential issues are unlikely to obscure genuine biological signal. Rather, high-density geometric morphometric data exceed traditional landmark-based methods in characterization of morphology and allow more nuanced comparisons across disparate taxa. Combined with the rapid increases in 3D data availability, high-density morphometric approaches have immense potential to propel a new class of studies of comparative morphology and phenotypic integration. 
    more » « less
  2. Abstract We study the volume growth of metric balls as a function of the radius in discrete spaces and focus on the relationship between volume growth and discrete curvature. We improve volume growth bounds under a lower bound on the so-called Ollivier curvature and discuss similar results under other types of discrete Ricci curvature. Following recent work in the continuous setting of Riemannian manifolds (by the 1st author), we then bound the eigenvalues of the Laplacian of a graph under bounds on the volume growth. In particular, $$\lambda _2$$ of the graph can be bounded using a weighted discrete Hardy inequality and the higher eigenvalues of the graph can be bounded by the eigenvalues of a tridiagonal matrix times a multiplicative factor, both of which only depend on the volume growth of the graph. As a direct application, we relate the eigenvalues to the Cheeger isoperimetric constant. Using these methods, we describe classes of graphs for which the Cheeger inequality is tight on the 2nd eigenvalue (i.e. the 1st nonzero eigenvalue). We also describe a method for proving Buser’s Inequality in graphs, particularly under a lower bound assumption on curvature. 
    more » « less
  3. Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the “spike” eigenvalues and eigenvectors that often capture the low-dimensional signal structure of the learning problem. In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model, including the CK as a special case. Using this general result, we give a quantitative description of how spiked eigenstructure in the input data propagates through the hidden layers of a neural network with random weights. As a second application, we study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training and characterize the alignment of the target function with the spike eigenvector of the CK on test data. 
    more » « less
  4. In an extremal eigenvalue problem, one considers a family of eigenvalue problems, each with discrete spectra, and extremizes a chosen eigenvalue over the family. In this chapter, we consider eigenvalue problems defined on Riemannian manifolds and extremize over the metric structure. For example, we consider the problem of maximizing the principal Laplace–Beltrami eigenvalue over a family of closed surfaces of fixed volume. Computational approaches to such extremal geometric eigenvalue problems present new computational challenges and require novel numerical tools, such as the parameterization of conformal classes and the development of accurate and efficient methods to solve eigenvalue problems on domains with nontrivial genus and boundary. We highlight recent progress on computational approaches for extremal geometric eigenvalue problems, including (i) maximizing Laplace–Beltrami eigenvalues on closed surfaces and (ii) maximizing Steklov eigenvalues on surfaces with boundary. 
    more » « less
  5. Motivated by problems in algebraic complexity theory (e.g., matrix multiplication) and extremal combinatorics (e.g., the cap set problem and the sunflower problem), we introduce the geometric rank as a new tool in the study of tensors and hypergraphs. We prove that the geometric rank is an upper bound on the subrank of tensors and the independence number of hypergraphs. We prove that the geometric rank is smaller than the slice rank of Tao, and relate geometric rank to the analytic rank of Gowers and Wolf in an asymptotic fashion. As a first application, we use geometric rank to prove a tight upper bound on the (border) subrank of the matrix multiplication tensors, matching Strassen's well-known lower bound from 1987. 
    more » « less