We demonstrate the use of non-linear manifold learning methods to map the connectivity and extent of similarity between diverse metal-organic framework (MOF) structures in terms of their surface areas by taking into account both crystallographic and electronic structure information. The fusing of geometric and chemical bonding information is accomplished by using 3-dimensional Hirshfeld surfaces of MOF structures, which encode both chemical bonding and molecular geometry information. A comparative analysis of the geometry of Hirshfeld surfaces is mapped into a low dimensional manifold through a graph network where each node corresponds to a different compound. By examining nearest neighbor connections, we discover structural and chemical correlations among MOF structures that would not have been discernible otherwise. Examples of the types of information that can be uncovered using this approach are given.
more »
« less
Quantifying and Understanding Errors in Molecular Geometries
Electronic structure calculations are ubiquitous in most branches of chemistry, but all have errors in both energies and equilibrium geometries. Quantifying errors in possibly dozens of bond angles and bond lengths is a Herculean task. A single natural measure of geometric error is introduced, the geometry energy offset (GEO). GEO links many disparate aspects of geometry errors: a new ranking of different methods, quantitative insight into errors in specific geometric parameters, and insight into trends with different methods. GEO can also reduce the cost of high-level geometry optimizations and shows when geometric errors distort the overall error of a method. Results, including some surprises, are given for both covalent and weak interactions.
more »
« less
- Award ID(s):
- 1856165
- PAR ID:
- 10220448
- Date Published:
- Journal Name:
- The journal of physical chemistry letters
- Issue:
- 0
- ISSN:
- 1948-7185
- Page Range / eLocation ID:
- 9957–9964
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Abstract Optimal transport maps and plans between two absolutely continuous measures $$\mu$$ and $$\nu$$ can be approximated by solving semidiscrete or fully discrete optimal transport problems. These two problems ensue from approximating $$\mu$$ or both $$\mu$$ and $$\nu$$ by Dirac measures. Extending an idea from Gigli (2011, On Hölder continuity-in-time of the optimal transport map towards measures along a curve. Proc. Edinb. Math. Soc. (2), 54, 401–409), we characterize how transport plans change under the perturbation of both $$\mu$$ and $$\nu$$. We apply this insight to prove error estimates for semidiscrete and fully discrete algorithms in terms of errors solely arising from approximating measures. We obtain weighted $L^2$ error estimates for both types of algorithms with a convergence rate $$O(h^{1/2})$$. This coincides with the rate in Theorem 5.4 in Berman (2018, Convergence rates for discretized Monge–Ampère equations and quantitative stability of optimal transport. Preprint available at arXiv:1803.00785) for semidiscrete methods, but the error notion is different.more » « less
-
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. As an extremely challenging task, its difficulties root in the drastic view changes and different capturing time between two views. Despite these difficulties, recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks, in which the training and testing data are captured from two different regions. We attribute this deficiency to the lack of ability to extract the spatial configuration of visual feature layouts and models' overfitting on low-level details from the training set. In this paper, we propose GeoDTR which explicitly disentangles geometric information from raw features and learns the spatial correlations among visual features from aerial and ground pairs with a novel geometric layout extractor module. This module generates a set of geometric layout descriptors, modulating the raw features and producing high-quality latent representations. In addition, we elaborate on two categories of data augmentations, (i) Layout simulation, which varies the spatial configuration while keeping the low-level details intact. (ii) Semantic augmentation, which alters the low-level details and encourages the model to capture spatial configurations. These augmentations help to improve the performance of the cross-view geo-localization models, especially on the cross-area benchmarks. Moreover, we propose a counterfactual-based learning process to benefit the geometric layout extractor in exploring spatial information. Extensive experiments show that GeoDTR not only achieves state-of-the-art results but also significantly boosts the performance on same-area and cross-area benchmarks. Our code can be found at https://gitlab.com/vail-uvm/geodtr.more » « less
-
Abstract Over the last three decades, many growth and yield systems developed for the southeast USA have incorporated methods to create a compatible basal area (BA) prediction and projection equation. This technique allows practitioners to calibrate BA models using both measurements at a given arbitrary age, as well as the increment in BA when time series panel data are available. As a result, model parameters for either prediction or projection alternatives are compatible. One caveat of this methodology is that pairs of observations used to project forward have the same weight as observations from a single measurement age, regardless of the projection time interval. To address this problem, we introduce a variance–covariance structure giving different weights to predictions with variable intervals. To test this approach, prediction and projection equations were fitted simultaneously using an ad hoc matrix structure. We tested three different error structures in fitting models with (i) homoscedastic errors described by a single parameter (Method 1); (ii) heteroscedastic errors described with a weighting factor $${w}_t$$ (Method 2); and (iii) errors including both prediction ($$\overset{\smile }{\varepsilon }$$) and projection errors ($$\tilde{\varepsilon}$$) in the weighting factor $${w}_t$$ (Method 3). A rotation-age dataset covering nine sites, each including four blocks with four silvicultural treatments per block, was used for model calibration and validation, including explicit terms for each treatment. Fitting using an error structure which incorporated the combined error term ($$\overset{\smile }{\varepsilon }$$ and $$\tilde{\varepsilon}$$) into the weighting factor $${w}_t$$ (Method 3), generated better results according to the root mean square error with respect to the other two methods evaluated. Also, the system of equations that incorporated silvicultural treatments as dummy variables generated lower root mean square error (RMSE) and Akaike’s index values (AIC) in all methods. Our results show a substantial improvement over the current prediction-projection approach, resulting in consistent estimators for BA.more » « less
-
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery. Existing generative models have several drawbacks including lack of modeling important molecular geometry elements (e.g. torsion angles), separate optimization stages prone to error accumulation, and the need for structure fine-tuning based on approximate classical force-fields or computationally expensive methods such as metadynamics with approximate quantum mechanics calculations at each geometry. We propose GeoMol--an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate distributions of low-energy molecular 3D conformers. Leveraging the power of message passing neural networks (MPNNs) to capture local and global graph information, we predict local atomic 3D structures and torsion angles, avoiding unnecessary over-parameterization of the geometric degrees of freedom (e.g. one angle per non-terminal bond). Such local predictions suffice both for the training loss computation, as well as for the full deterministic conformer assembly (at test time). We devise a non-adversarial optimal transport based loss function to promote diverse conformer generation. GeoMol predominantly outperforms popular open-source, commercial, or state-of-the-art machine learning (ML) models, while achieving significant speed-ups. We expect such differentiable 3D structure generators to significantly impact molecular modeling and related applications.more » « less
An official website of the United States government

