skip to main content


Title: Principal Component Analysis for Extremes and Application to U.S. Precipitation
Abstract We propose a method for analyzing extremal behavior through the lens of a most efficient basis of vectors. The method is analogous to principal component analysis, but is based on methods from extreme value analysis. Specifically, rather than decomposing a covariance or correlation matrix, we obtain our basis vectors by performing an eigendecomposition of a matrix that describes pairwise extremal dependence. We apply the method to precipitation observations over the contiguous United States. We find that the time series of large coefficients associated with the leading eigenvector shows very strong evidence of a positive trend, and there is evidence that large coefficients of other eigenvectors have relationships with El Niño–Southern Oscillation.  more » « less
Award ID(s):
1811657
NSF-PAR ID:
10172045
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Climate
Volume:
33
Issue:
15
ISSN:
0894-8755
Page Range / eLocation ID:
6441 to 6451
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    There are many ways of measuring and modeling tail-dependence in random vectors: from the general framework of multivariate regular variation and the flexible class of max-stable vectors down to simple and concise summary measures like the matrix of bivariate tail-dependence coefficients. This paper starts by providing a review of existing results from a unifying perspective, which highlights connections between extreme value theory and the theory of cuts and metrics. Our approach leads to some new findings in both areas with some applications to current topics in risk management.

    We begin by using the framework of multivariate regular variation to show that extremal coefficients, or equivalently, the higher-order tail-dependence coefficients of a random vector can simply be understood in terms of random exceedance sets, which allows us to extend the notion of Bernoulli compatibility. In the special but important case of bivariate tail-dependence, we establish a correspondence between tail-dependence matrices and$$L^1$$L1- and$$\ell _1$$1-embeddable finite metric spaces via the spectral distance, which is a metric on the space of jointly 1-Fréchet random variables. Namely, the coefficients of the cut-decomposition of the spectral distance and of the Tawn-Molchanov max-stable model realizing the corresponding bivariate extremal dependence coincide. We show that line metrics are rigid and if the spectral distance corresponds to a line metric, the higher order tail-dependence is determined by the bivariate tail-dependence matrix.

    Finally, the correspondence between$$\ell _1$$1-embeddable metric spaces and tail-dependence matrices allows us to revisit the realizability problem, i.e. checking whether a given matrix is a valid tail-dependence matrix. We confirm a conjecture of Shyamalkumar and Tao (2020) that this problem is NP-complete.

     
    more » « less
  2. Abstract

    Motivated by the widespread use of large gridded data sets in the atmospheric sciences, we propose a new model for extremes of areal data that is inspired by the simultaneous autoregressive (SAR) model in classical spatial statistics. Our extreme SAR model extends recent work on transformed‐linear operations applied to regularly varying random vectors, and is unique among extremes models in being directly analogous to a classical linear model. An additional appeal is its simplicity; given a proximity matrixW, spatial dependence is described by a single parameter . We develop an estimation method that minimizes the discrepancy between the tail pairwise dependence matrix (TPDM) for the fitted model and the estimated TPDM. Applying this method to simulated data demonstrates that it is able to produce good estimates of extremal spatial dependence even in the case of model misspecification, and additionally produces reasonable estimates of uncertainty. We also apply the method to gridded precipitation observations for a study region over northeast Colorado, and find that a single‐parameter extreme SAR model paired with a neighborhood structure which accounts for longer range dependence effectively models spatial dependence in these data.

     
    more » « less
  3. Merge trees are a type of topological descriptors that record the connectivity among the sublevel sets of scalar fields. They are among the most widely used topological tools in visualization. In this paper, we are interested in sketching a set of merge trees using techniques from matrix sketching. That is, given a large set T of merge trees, we would like to find a much smaller set of basis trees S such that each tree in T can be approximately reconstructed from a linear combination of merge trees in S. A set of high-dimensional vectors can be approximated via matrix sketching techniques such as principal component analysis and column subset selection. However, until now, there has not been any work on sketching a set of merge trees. We develop a framework for sketching a set of merge trees that combines matrix sketching with tools from optimal transport. In particular, we vectorize a set of merge trees into high-dimensional vectors while preserving their structures and structural relations. We demonstrate the applications of our framework in sketching merge trees that arise from time-varying scientific simulations. Specifically, our framework obtains a set of basis trees as representatives that capture the “modes” of physical phenomena for downstream analysis and visualization. 
    more » « less
  4. Previous versions of sparse principal component analysis (PCA) have presumed that the eigen-basis (a $p \times k$ matrix) is approximately sparse. We propose a method that presumes the $p \times k$ matrix becomes approximately sparse after a $k \times k$ rotation. The simplest version of the algorithm initializes with the leading $k$ principal components. Then, the principal components are rotated with an $k \times k$ orthogonal rotation to make them approximately sparse. Finally, soft-thresholding is applied to the rotated principal components. This approach differs from prior approaches because it uses an orthogonal rotation to approximate a sparse basis. One consequence is that a sparse component need not to be a leading eigenvector, but rather a mixture of them. In this way, we propose a new (rotated) basis for sparse PCA. In addition, our approach avoids ``deflation'' and multiple tuning parameters required for that. Our sparse PCA framework is versatile; for example, it extends naturally to a two-way analysis of a data matrix for simultaneous dimensionality reduction of rows and columns. We provide evidence showing that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods. Through three applications---sparse coding of images, analysis of transcriptome sequencing data, and large-scale clustering of social networks, we demonstrate the modern usefulness of sparse PCA in exploring multivariate data. 
    more » « less
  5. Abstract

    The reduction of a large‐scale symmetric linear discrete ill‐posed problem with multiple right‐hand sides to a smaller problem with a symmetric block tridiagonal matrix can easily be carried out by the application of a small number of steps of the symmetric block Lanczos method. We show that the subdiagonal blocks of the reduced problem converge to zero fairly rapidly with increasing block number. This quick convergence indicates that there is little advantage in expressing the solutions of discrete ill‐posed problems in terms of eigenvectors of the coefficient matrix when compared with using a basis of block Lanczos vectors, which are simpler and cheaper to compute. Similarly, for nonsymmetric linear discrete ill‐posed problems with multiple right‐hand sides, we show that the solution subspace defined by a few steps of the block Golub–Kahan bidiagonalization method usually can be applied instead of the solution subspace determined by the singular value decomposition of the coefficient matrix without significant, if any, reduction of the quality of the computed solution.

     
    more » « less