skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Quantifying and Understanding Errors in Molecular Geometries
Electronic structure calculations are ubiquitous in most branches of chemistry, but all have errors in both energies and equilibrium geometries. Quantifying errors in possibly dozens of bond angles and bond lengths is a Herculean task. A single natural measure of geometric error is introduced, the geometry energy offset (GEO). GEO links many disparate aspects of geometry errors: a new ranking of different methods, quantitative insight into errors in specific geometric parameters, and insight into trends with different methods. GEO can also reduce the cost of high-level geometry optimizations and shows when geometric errors distort the overall error of a method. Results, including some surprises, are given for both covalent and weak interactions.  more » « less
Award ID(s):
1856165
PAR ID:
10220448
Author(s) / Creator(s):
;
Date Published:
Journal Name:
The journal of physical chemistry letters
Issue:
0
ISSN:
1948-7185
Page Range / eLocation ID:
9957–9964
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Many data assimilation methods require knowledge of the first two moments of the background and observation errors to function optimally. To ensure the effective performance of such methods, it is often advantageous to estimate the second moment of the observation errors directly. We examine three different strategies for doing so, focusing specifically on the case of a single scalar observation error variance parameterr. The first method is the well-known Desroziers et al. “diagnostic check” iteration (DBCP). The second method, described in Karspeck, adapts the “spread–error” diagnostic—used for assessing ensemble reliability—to observations and generates a point estimate ofrby taking the expectation of various observation-space statistics and using an ensemble to model background error statistics explicitly. The third method is an approximate Bayesian scheme that uses an inverse-gamma prior and a modified Gaussian likelihood. All three methods can recover the correct observation error variance when both the background and observation errors are Gaussian and the background error variance is well specified. We also demonstrate that it is often possible to estimatereven when the observation error is not Gaussian or when the forward operator mapping model states into observation space is nonlinear. The DBCP method is found to be most robust to these complications; however, the other two methods perform similarly well in most cases and have the added benefit that they can be used to estimaterbefore data assimilation. We conclude that further investigation is warranted into the latter two methods, specifically into how they perform when extended to the multivariate case. Significance StatementObservations of the Earth system (e.g., from satellites, radiosondes, aircraft, etc.,) each have some associated uncertainty. To use observations to improve model forecasts, it is important to understand the size of that uncertainty. This study compares three statistical methods for estimating observation errors, all of which can be continuously implemented whenever new observations are used to correct a model. Our results suggest that all three methods can improve forecast outcomes, but that, if observations are believed to have highly biased or skewed errors, care should be taken in choosing which to use and interpreting its results. Future studies should investigate robust methods for estimating more complicated types of errors. 
    more » « less
  2. We demonstrate the use of non-linear manifold learning methods to map the connectivity and extent of similarity between diverse metal-organic framework (MOF) structures in terms of their surface areas by taking into account both crystallographic and electronic structure information. The fusing of geometric and chemical bonding information is accomplished by using 3-dimensional Hirshfeld surfaces of MOF structures, which encode both chemical bonding and molecular geometry information. A comparative analysis of the geometry of Hirshfeld surfaces is mapped into a low dimensional manifold through a graph network where each node corresponds to a different compound. By examining nearest neighbor connections, we discover structural and chemical correlations among MOF structures that would not have been discernible otherwise. Examples of the types of information that can be uncovered using this approach are given. 
    more » « less
  3. null (Ed.)
    Abstract Optimal transport maps and plans between two absolutely continuous measures $$\mu$$ and $$\nu$$ can be approximated by solving semidiscrete or fully discrete optimal transport problems. These two problems ensue from approximating $$\mu$$ or both $$\mu$$ and $$\nu$$ by Dirac measures. Extending an idea from Gigli (2011, On Hölder continuity-in-time of the optimal transport map towards measures along a curve. Proc. Edinb. Math. Soc. (2), 54, 401–409), we characterize how transport plans change under the perturbation of both $$\mu$$ and $$\nu$$. We apply this insight to prove error estimates for semidiscrete and fully discrete algorithms in terms of errors solely arising from approximating measures. We obtain weighted $L^2$ error estimates for both types of algorithms with a convergence rate $$O(h^{1/2})$$. This coincides with the rate in Theorem 5.4 in Berman (2018, Convergence rates for discretized Monge–Ampère equations and quantitative stability of optimal transport. Preprint available at arXiv:1803.00785) for semidiscrete methods, but the error notion is different. 
    more » « less
  4. Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. As an extremely challenging task, its difficulties root in the drastic view changes and different capturing time between two views. Despite these difficulties, recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks, in which the training and testing data are captured from two different regions. We attribute this deficiency to the lack of ability to extract the spatial configuration of visual feature layouts and models' overfitting on low-level details from the training set. In this paper, we propose GeoDTR which explicitly disentangles geometric information from raw features and learns the spatial correlations among visual features from aerial and ground pairs with a novel geometric layout extractor module. This module generates a set of geometric layout descriptors, modulating the raw features and producing high-quality latent representations. In addition, we elaborate on two categories of data augmentations, (i) Layout simulation, which varies the spatial configuration while keeping the low-level details intact. (ii) Semantic augmentation, which alters the low-level details and encourages the model to capture spatial configurations. These augmentations help to improve the performance of the cross-view geo-localization models, especially on the cross-area benchmarks. Moreover, we propose a counterfactual-based learning process to benefit the geometric layout extractor in exploring spatial information. Extensive experiments show that GeoDTR not only achieves state-of-the-art results but also significantly boosts the performance on same-area and cross-area benchmarks. Our code can be found at https://gitlab.com/vail-uvm/geodtr. 
    more » « less
  5. Abstract Over the last three decades, many growth and yield systems developed for the southeast USA have incorporated methods to create a compatible basal area (BA) prediction and projection equation. This technique allows practitioners to calibrate BA models using both measurements at a given arbitrary age, as well as the increment in BA when time series panel data are available. As a result, model parameters for either prediction or projection alternatives are compatible. One caveat of this methodology is that pairs of observations used to project forward have the same weight as observations from a single measurement age, regardless of the projection time interval. To address this problem, we introduce a variance–covariance structure giving different weights to predictions with variable intervals. To test this approach, prediction and projection equations were fitted simultaneously using an ad hoc matrix structure. We tested three different error structures in fitting models with (i) homoscedastic errors described by a single parameter (Method 1); (ii) heteroscedastic errors described with a weighting factor $${w}_t$$ (Method 2); and (iii) errors including both prediction ($$\overset{\smile }{\varepsilon }$$) and projection errors ($$\tilde{\varepsilon}$$) in the weighting factor $${w}_t$$ (Method 3). A rotation-age dataset covering nine sites, each including four blocks with four silvicultural treatments per block, was used for model calibration and validation, including explicit terms for each treatment. Fitting using an error structure which incorporated the combined error term ($$\overset{\smile }{\varepsilon }$$ and $$\tilde{\varepsilon}$$) into the weighting factor $${w}_t$$ (Method 3), generated better results according to the root mean square error with respect to the other two methods evaluated. Also, the system of equations that incorporated silvicultural treatments as dummy variables generated lower root mean square error (RMSE) and Akaike’s index values (AIC) in all methods. Our results show a substantial improvement over the current prediction-projection approach, resulting in consistent estimators for BA. 
    more » « less