Merge trees are a type of topological descriptors that record the connectivity among the sublevel sets of scalar fields. They are among the most widely used topological tools in visualization. In this paper, we are interested in sketching a set of merge trees using techniques from matrix sketching. That is, given a large set T of merge trees, we would like to find a much smaller set of basis trees S such that each tree in T can be approximately reconstructed from a linear combination of merge trees in S. A set of high-dimensional vectors can be approximated via matrix sketching techniques such as principal component analysis and column subset selection. However, until now, there has not been any work on sketching a set of merge trees. We develop a framework for sketching a set of merge trees that combines matrix sketching with tools from optimal transport. In particular, we vectorize a set of merge trees into high-dimensional vectors while preserving their structures and structural relations. We demonstrate the applications of our framework in sketching merge trees that arise from time-varying scientific simulations. Specifically, our framework obtains a set of basis trees as representatives that capture the “modes” of physical phenomena for downstream analysis and visualization.
more »
« less
Principal Component Analysis for Extremes and Application to U.S. Precipitation
Abstract We propose a method for analyzing extremal behavior through the lens of a most efficient basis of vectors. The method is analogous to principal component analysis, but is based on methods from extreme value analysis. Specifically, rather than decomposing a covariance or correlation matrix, we obtain our basis vectors by performing an eigendecomposition of a matrix that describes pairwise extremal dependence. We apply the method to precipitation observations over the contiguous United States. We find that the time series of large coefficients associated with the leading eigenvector shows very strong evidence of a positive trend, and there is evidence that large coefficients of other eigenvectors have relationships with El Niño–Southern Oscillation.
more »
« less
- Award ID(s):
- 1811657
- PAR ID:
- 10172045
- Date Published:
- Journal Name:
- Journal of Climate
- Volume:
- 33
- Issue:
- 15
- ISSN:
- 0894-8755
- Page Range / eLocation ID:
- 6441 to 6451
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Previous versions of sparse principal component analysis (PCA) have presumed that the eigen-basis (a $$p \times k$$ matrix) is approximately sparse. We propose a method that presumes the $$p \times k$$ matrix becomes approximately sparse after a $$k \times k$$ rotation. The simplest version of the algorithm initializes with the leading $$k$$ principal components. Then, the principal components are rotated with an $$k \times k$$ orthogonal rotation to make them approximately sparse. Finally, soft-thresholding is applied to the rotated principal components. This approach differs from prior approaches because it uses an orthogonal rotation to approximate a sparse basis. One consequence is that a sparse component need not to be a leading eigenvector, but rather a mixture of them. In this way, we propose a new (rotated) basis for sparse PCA. In addition, our approach avoids ``deflation'' and multiple tuning parameters required for that. Our sparse PCA framework is versatile; for example, it extends naturally to a two-way analysis of a data matrix for simultaneous dimensionality reduction of rows and columns. We provide evidence showing that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods. Through three applications---sparse coding of images, analysis of transcriptome sequencing data, and large-scale clustering of social networks, we demonstrate the modern usefulness of sparse PCA in exploring multivariate data.more » « less
-
Abstract The reduction of a large‐scale symmetric linear discrete ill‐posed problem with multiple right‐hand sides to a smaller problem with a symmetric block tridiagonal matrix can easily be carried out by the application of a small number of steps of the symmetric block Lanczos method. We show that the subdiagonal blocks of the reduced problem converge to zero fairly rapidly with increasing block number. This quick convergence indicates that there is little advantage in expressing the solutions of discrete ill‐posed problems in terms of eigenvectors of the coefficient matrix when compared with using a basis of block Lanczos vectors, which are simpler and cheaper to compute. Similarly, for nonsymmetric linear discrete ill‐posed problems with multiple right‐hand sides, we show that the solution subspace defined by a few steps of the block Golub–Kahan bidiagonalization method usually can be applied instead of the solution subspace determined by the singular value decomposition of the coefficient matrix without significant, if any, reduction of the quality of the computed solution.more » « less
-
We present two algorithms to compute system-specific polarizabilities and dispersion coefficients such that required memory and computational time scale linearly with increasing number of atoms in the unit cell for large systems. The first algorithm computes the atom-in-material (AIM) static polarizability tensors, force-field polarizabilities, and C 6 , C 8 , C 9 , C 10 dispersion coefficients using the MCLF method. The second algorithm computes the AIM polarizability tensors and C 6 coefficients using the TS-SCS method. Linear-scaling computational cost is achieved using a dipole interaction cutoff length function combined with iterative methods that avoid large dense matrix multiplies and large matrix inversions. For MCLF, Richardson extrapolation of the screening increments is used. For TS-SCS, a failproof conjugate residual (FCR) algorithm is introduced that solves any linear equation system having Hermitian coefficients matrix. These algorithms have mathematically provable stable convergence that resists round-off errors. We parallelized these methods to provide rapid computation on multi-core computers. Excellent parallelization efficiencies were obtained, and adding parallel processors does not significantly increase memory requirements. This enables system-specific polarizabilities and dispersion coefficients to be readily computed for materials containing millions of atoms in the unit cell. The largest example studied herein is an ice crystal containing >2 million atoms in the unit cell. For this material, the FCR algorithm solved a linear equation system containing >6 million rows, 7.57 billion interacting atom pairs, 45.4 billion stored non-negligible matrix components used in each large matrix-vector multiplication, and ∼19 million unknowns per frequency point (>300 million total unknowns).more » « less
-
null (Ed.)Compressed sensing (CS) as a new data acquisition technique has been applied to monitor manufacturing processes. With a few measurements, sparse coefficient vectors can be recovered by solving an inverse problem and original signals can be reconstructed. Dictionary learning methods have been developed and applied in combination with CS to improve the sparsity level of the recovered coefficient vectors. In this work, a physics-constrained dictionary learning approach is proposed to solve both of reconstruction and classification problems by optimizing measurement, basis, and classification matrices simultaneously with the considerations of the application-specific restrictions. It is applied in image acquisitions in selective laser melting (SLM). The proposed approach includes the optimization in two steps. In the first stage, with the basis matrix fixed, the measurement matrix is optimized by determining the pixel locations for sampling in each image. The optimized measurement matrix only includes one non-zero entry in each row. The optimization of pixel locations is solved based on a constrained FrameSense algorithm. In the second stage, with the measurement matrix fixed, the basis and classification matrices are optimized based on the K-SVD algorithm. With the optimized basis matrix, the coefficient vector can be recovered with CS. The original signal can be reconstructed by the linear combination of the basis matrix and the recovered coefficient vector. The original signal can also be classified to identify different machine states by the linear combination of the classification matrix and the coefficient vector.more » « less
An official website of the United States government

