Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) are fundamental methods in machine learning for dimensionality reduction. The former is a technique for finding this approximation in finite dimensions and the latter is often in an infinite dimensional Reproducing Kernel Hilbert-space (RKHS). In this paper, we present a geometric framework for computing the principal linear subspaces in both situations as well as for the robust PCA case, that amounts to computing the intrinsic average on the space of all subspaces: the Grassmann manifold. Points on this manifold are defined as the subspaces spanned by K -tuples of observations. The intrinsic Grassmann average of these subspaces are shown to coincide with the principal components of the observations when they are drawn from a Gaussian distribution. We show similar results in the RKHS case and provide an efficient algorithm for computing the projection onto the this average subspace. The result is a method akin to KPCA which is substantially faster. Further, we present a novel online version of the KPCA using our geometric framework. Competitive performance of all our algorithms are demonstrated on a variety of real and synthetic data sets.
more »
« less
This content will become publicly available on October 14, 2026
Grassmann extrapolation via direct inversion in the iterative subspace
We present a Grassmann extrapolation method (G-Ext) that combines the mathematical framework of the Grassmann manifold with the direct inversion in the iterative subspace (DIIS) technique to accurately and efficiently extrapolate density matrices in electronic structure calculations. By overcoming the challenges of direct extrapolation on the Grassmann manifold, this indirect G-Ext-DIIS approach successfully preserves the geometric structure and physical constraints of the density matrices. Unlike Tikhonov regularized G-Ext, G-Ext-DIIS requires no tuning of regularization parameters. Its DIIS subspace is compact, numerically stable, and independent of descriptor dimensionality, system size, and basis set, ensuring both robustness and computational efficiency. We evaluate G-Ext-DIIS using alanine dipeptide and its zwitterionic form along ϕ and ψ torsional scans, employing Coulomb, overlap, and core Hamiltonian matrix descriptors with the diffuse 6-311++G(d,p) and aug-cc-pVTZ basis sets. When using overlap or core Hamiltonian descriptors, G-Ext-DIIS achieves sub-millihartree accuracy across angular extrapolation ranges that exceed typical geometry optimization step sizes. This indicates its potential for generating high quality initial density matrices in each optimization cycle. Compared to direct extrapolation methods with or without McWeeny purification, as well as the Löwdin extrapolation from nearby geometries, G-Ext-DIIS demonstrates superior accuracy, variational consistency, and reliability across basis sets. We also explore Fock matrix extrapolation using the same DIIS coefficients, although this strategy proves less reliable for distant geometries. Overall, G-Ext-DIIS offers a robust, efficient, and transferable framework for constructing accurate density matrices, with promising applications in geometry optimization and ab initio molecular dynamics simulations.
more »
« less
- Award ID(s):
- 2441101
- PAR ID:
- 10653061
- Publisher / Repository:
- American Institute of Physics
- Date Published:
- Journal Name:
- The Journal of Chemical Physics
- Volume:
- 163
- Issue:
- 14
- ISSN:
- 0021-9606
- Page Range / eLocation ID:
- 144114
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Low-dimensional and computationally less-expensive reduced-order models (ROMs) have been widely used to capture the dominant behaviors of high-4dimensional systems. An ROM can be obtained, using the well-known proper orthogonal decomposition (POD), by projecting the full-order model to a subspace spanned by modal basis modes that are learned from experimental, simulated, or observational data, i.e., training data. However, the optimal basis can change with the parameter settings. When an ROM, constructed using the POD basis obtained from training data, is applied to new parameter settings, the model often lacks robustness against the change of parameters in design, control, and other real-time operation problems. This paper proposes to use regression trees on Grassmann manifold to learn the mapping between parameters and POD bases that span the low-dimensional subspaces onto which full-order models are projected. Motivated by the observation that a subspace spanned by a POD basis can be viewed as a point in the Grassmann manifold, we propose to grow a tree by repeatedly splitting the tree node to maximize the Riemannian distance between the two subspaces spanned by the predicted POD bases on the left and right daughter nodes. Five numerical examples are presented to comprehensively demonstrate the performance of the proposed method, and compare the proposed tree-based method to the existing interpolation method for POD basis and the use of global POD basis. The results show that the proposed tree-based method is capable of establishing the mapping between parameters and POD bases, and thus adapt ROMs for new parameters.more » « less
-
We extend the K-means and LBG algorithms to the framework of the Grassmann manifold to perform subspace quantization. For K-means it is possible to move a subspace in the direction of another using Grassmannian geodesics. For LBG the centroid computation is now done using a flag mean algorithm for averaging points on the Grassmannian. The resulting unsupervised algorithms are applied to the MNIST digit data set and the AVIRIS Indian Pines hyperspectral data set.more » « less
-
We extend the K-means and LBG algorithms to the framework of the Grassmann manifold to perform subspace quantization. For K-means it is possible to move a subspace in the direction of another using Grassmannian geodesics. For LBG the centroid computation is now done using a flag mean algorithm for averaging points on the Grassmannian. The resulting unsupervised algorithms are applied to the MNIST digit data set and the AVIRIS Indian Pines hyperspectral data set.more » « less
-
We introduce nested gausslet bases, an improvement on previous gausslet bases that can treat systems containing atoms with much larger atomic numbers. We also introduce pure Gaussian distorted gausslet bases, which allow the Hamiltonian integrals to be performed analytically, as well as hybrid bases in which the gausslets are combined with standard Gaussian-type bases. All these bases feature the diagonal approximation for the electron–electron interactions so that the Hamiltonian is completely defined by two Nb × Nb matrices, where Nb ≈ 104 is small enough to permit fast calculations at the Hartree–Fock level. In constructing these bases, we have gained new mathematical insight into the construction of one-dimensional diagonal bases. In particular, we have proved an important theorem relating four key basis set properties: completeness, orthogonality, zero-moment conditions, and diagonalization of the coordinate operator matrix. We test our basis sets on small systems with a focus on high accuracy, obtaining, for example, an accuracy of 2 × 10−5 Ha for the total Hartree–Fock energy of the neon atom in the complete basis set limit.more » « less
An official website of the United States government
