We consider the problem of efficient inference of the Average Treatment Effect in a sequential experiment where the policy governing the assignment of subjects to treatment or control can change over time. We first provide a central limit theorem for the Adaptive Augmented Inverse-Probability Weighted estimator, which is semiparametric efficient, under weaker assumptions than those previously made in the literature. This central limit theorem enables efficient inference at fixed sample sizes. We then consider a sequential inference setting, deriving both asymptotic and nonasymptotic confidence sequences that are considerably tighter than previous methods. These anytime-valid methods enable inference under data-dependent stopping times (sample sizes). Additionally, we use propensity score truncation techniques from the recent off-policy estimation literature to reduce the finite sample variance of our estimator without affecting the asymptotic variance. Empirical results demonstrate that our methods yield narrower confidence sequences than those previously developed in the literature while maintaining time-uniform error control.
more »
« less
Understanding the sources of error in MBAR through asymptotic analysis
Many sampling strategies commonly used in molecular dynamics, such as umbrella sampling and alchemical free energy methods, involve sampling from multiple states. The Multistate Bennett Acceptance Ratio (MBAR) formalism is a widely used way of recombining the resulting data. However, the error of the MBAR estimator is not well-understood: previous error analyses of MBAR assumed independent samples. In this work, we derive a central limit theorem for MBAR estimates in the presence of correlated data, further justifying the use of MBAR in practical applications. Moreover, our central limit theorem yields an estimate of the error that can be decomposed into contributions from the individual Markov chains used to sample the states. This gives additional insight into how sampling in each state affects the overall error. We demonstrate our error estimator on an umbrella sampling calculation of the free energy of isomerization of the alanine dipeptide and an alchemical calculation of the hydration free energy of methane. Our numerical results demonstrate that the time required for the Markov chain to decorrelate in individual states can contribute considerably to the total MBAR error, highlighting the importance of accurately addressing the effect of sample correlation.
more »
« less
- PAR ID:
- 10444884
- Date Published:
- Journal Name:
- The Journal of Chemical Physics
- Volume:
- 158
- Issue:
- 21
- ISSN:
- 0021-9606
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Performing alchemical transformations, in which one molecular system is nonphysically changed to another system, is a popular approach adopted in performing free energy calculations associated with various biophysical processes, such as protein–ligand binding or the transfer of a molecule between environments. While the sampling of alchemical intermediate states in either parallel (e.g., Hamiltonian replica exchange) or serial manner (e.g., expanded ensemble) can bridge the high-probability regions in the configurational space between two end states of interest, alchemical methods can fail in scenarios where the most important slow degrees of freedom in the configurational space are, in large part, orthogonal to the alchemical variable, or if the system gets trapped in a deep basin extending in both the configurational and alchemical space. To alleviate these issues, we propose to use alchemical variables as an additional dimension in metadynamics, making it possible to both sample collective variables and to enhance sampling in free energy calculations simultaneously. In this study, we validate our implementation of “alchemical metadynamics” in PLUMED with test systems and alchemical processes with varying complexities and dimensionalities of collective variable space, including the interconversion between the torsional metastable states of a toy system and the methylation of a nucleoside both in the isolated form and in a duplex. We show that multidimensional alchemical metadynamics can address the challenges mentioned above and further accelerate sampling by introducing configurational collective variables. The method can trivially be combined with other metadynamics-based algorithms implemented in PLUMED. The necessary PLUMED code changes have already been released for general use in PLUMED 2.8.more » « less
-
This paper presents finite‐sample efficiency bounds for the core econometric problem of estimation of linear regression coefficients. We show that the classical Gauss–Markov theorem can be restated omitting the unnatural restriction to linear estimators, without adding any extra conditions. Our results are lower bounds on the variances of unbiased estimators. These lower bounds correspond to the variances of the the least squares estimator and the generalized least squares estimator, depending on the assumption on the error covariances. These results show that we can drop the label “linear estimator” from the pedagogy of the Gauss–Markov theorem. Instead of referring to these estimators as BLUE, they can legitimately be called BUE (best unbiased estimators).more » « less
-
https://youtu.be/79Py8KU4_k0 (Ed.)We consider statistical methods that invoke a min-max distributionally robust formulation to extract good out-of-sample performance in data-driven optimization and learning problems. Acknowledging the distributional uncertainty in learning from limited samples, the min-max formulations introduce an adversarial inner player to explore unseen covariate data. The resulting distributionally robust optimization (DRO) formulations, which include Wasserstein DRO formulations (our main focus), are specified using optimal transportation phenomena. Upon describing how these infinite-dimensional min-max problems can be approached via a finite-dimensional dual reformulation, this tutorial moves into its main component, namely, explaining a generic recipe for optimally selecting the size of the adversary’s budget. This is achieved by studying the limit behavior of an optimal transport projection formulation arising from an inquiry on the smallest confidence region that includes the unknown population risk minimizer. Incidentally, this systematic prescription coincides with those in specific examples in high-dimensional statistics and results in error bounds that are free from the curse of dimensions. Equipped with this prescription, we present a central limit theorem for the DRO estimator and provide a recipe for constructing compatible confidence regions that are useful for uncertainty quantification. The rest of the tutorial is devoted to insights into the nature of the optimizers selected by the min-max formulations and additional applications of optimal transport projections.more » « less
-
We present a family of alchemical perturbation potentials that allow the calculation of hydration free energy of small to medium-sized molecules in a single perturbation step. We also present a general framework to optimize the parameters of the alchemical perturbation potentials based on avoiding first order pseudo phase transitions along the alchemical path. We illustrate the method for two compounds of increasing size and complexity: ethanol and 1-naphthol. In each case we show that convergence of the hydration free energy is achieved rapidly when conventional approaches fail.more » « less
An official website of the United States government

