To directly simulate rare events using atomistic molecular dynamics is a significant challenge in computational biophysics. Well-established enhanced-sampling techniques do exist to obtain the thermodynamic functions for such systems. However, developing methods for obtaining the kinetics of long timescale processes from simulation at atomic detail is comparatively less developed an area. Milestoning and the weighted ensemble (WE) method are two different stratification strategies; both have shown promise for computing long timescales of complex biomolecular processes. Nevertheless, both require a significant investment of computational resources. We have combined WE and milestoning to calculate observables in orders-of-magnitude less central processing unit and wall-clock time. Our weighted ensemble milestoning method (WEM) uses WE simulation to converge the transition probability and first passage times between milestones, followed by the utilization of the theoretical framework of milestoning to extract thermodynamic and kinetic properties of the entire process. We tested our method for a simple one-dimensional double-well potential, for an eleven-dimensional potential energy surface with energy barrier, and on the biomolecular model system alanine dipeptide. We were able to recover the free energy profiles, time correlation functions, and mean first passage times for barrier crossing events at a significantly small computational cost. WEM promises to extend the applicability of molecular dynamics simulation to slow dynamics of large systems that are well beyond the scope of present day brute-force computations.
more »
« less
An ergodic theorem for the weighted ensemble method
Abstract We study weighted ensemble, an interacting particle method for sampling distributions of Markov chains that has been used in computational chemistry since the 1990s. Many important applications of weighted ensemble require the computation of long time averages. We establish the consistency of weighted ensemble in this setting by proving an ergodic theorem for time averages. As part of the proof, we derive explicit variance formulas that could be useful for optimizing the method.
more »
« less
- Award ID(s):
- 2111277
- PAR ID:
- 10343832
- Date Published:
- Journal Name:
- Journal of Applied Probability
- Volume:
- 59
- Issue:
- 1
- ISSN:
- 0021-9002
- Page Range / eLocation ID:
- 152 to 166
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as postprocessing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and 2-m temperature 2 weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multimodel approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability. Significance StatementAccurately forecasting temperature and precipitation on subseasonal time scales—2 weeks–2 months in advance—is extremely challenging. These forecasts would have immense value in agriculture, insurance, and economics. Our paper describes an application of machine learning techniques to improve forecasts of monthly average precipitation and 2-m temperature using lagged physics-based predictions and observational data 2 weeks in advance for the entire continental United States. For lagged ensembles, the proposed models outperform standard benchmarks such as historical averages and averages of physics-based predictions. Our findings suggest that utilizing the full set of physics-based predictions instead of the average enhances the accuracy of the final forecast.more » « less
-
null (Ed.)Abstract Background Ensemble modeling aims to boost the forecasting performance by systematically integrating the predictive accuracy across individual models. Here we introduce a simple-yet-powerful ensemble methodology for forecasting the trajectory of dynamic growth processes that are defined by a system of non-linear differential equations with applications to infectious disease spread. Methods We propose and assess the performance of two ensemble modeling schemes with different parametric bootstrapping procedures for trajectory forecasting and uncertainty quantification. Specifically, we conduct sequential probabilistic forecasts to evaluate their forecasting performance using simple dynamical growth models with good track records including the Richards model, the generalized-logistic growth model, and the Gompertz model. We first test and verify the functionality of the method using simulated data from phenomenological models and a mechanistic transmission model. Next, the performance of the method is demonstrated using a diversity of epidemic datasets including scenario outbreak data of the Ebola Forecasting Challenge and real-world epidemic data outbreaks of including influenza, plague, Zika, and COVID-19. Results We found that the ensemble method that randomly selects a model from the set of individual models for each time point of the trajectory of the epidemic frequently outcompeted the individual models as well as an alternative ensemble method based on the weighted combination of the individual models and yields broader and more realistic uncertainty bounds for the trajectory envelope, achieving not only better coverage rate of the 95% prediction interval but also improved mean interval scores across a diversity of epidemic datasets. Conclusion Our new methodology for ensemble forecasting outcompete component models and an alternative ensemble model that differ in how the variance is evaluated for the generation of the prediction intervals of the forecasts.more » « less
-
Recent research in the theory of overparametrized learning has sought to establish generalization guarantees in the interpolating regime. Such results have been established for a few common classes of methods, but so far not for ensemble methods. We devise an ensemble classification method that simultaneously interpolates the training data, and is consistent for a broad class of data distributions. To this end, we define the manifold-Hilbert kernel for data distributed on a Riemannian manifold. We prove that kernel smoothing regression and classification using the manifold-Hilbert kernel are weakly consistent in the setting of Devroye et al. [19]. For the sphere, we show that the manifold-Hilbert kernel can be realized as a weighted random partition kernel, which arises as an infinite ensemble of partition-basedclassifiers.more » « less
-
Abstract Due to their limited resolution, numerical ocean models need to be interpreted as representing filtered or averaged equations. How to interpret models in terms of formally averaged equations, however, is not always clear, particularly in the case of hybrid or generalized vertical coordinate models, which limits our ability to interpret the model results and to develop parameterizations for the unresolved eddy contributions. We here derive the averaged hydrostatic Boussinesq equations in generalized vertical coordinates for an arbitrary thickness‐weighted average. We then consider various special cases and discuss the extent to which the averaged equations are consistent with existing ocean model formulations. As previously discussed, the momentum equations in existing depth‐coordinate models are best interpreted as representing Eulerian averages (i.e., averages taken at fixed depth), while the tracer equations can be interpreted as either Eulerian or thickness‐weighted isopycnal averages. Instead we find that no averaging is fully consistent with existing formulations of the parameterizations in semi‐Lagrangian discretizations of generalized vertical coordinate ocean models such as MOM6. A coordinate‐following average would require “coordinate‐aware” parameterizations that can account for the changing nature of the eddy terms as the coordinate changes. Alternatively, the model variables can be interpreted as representing either Eulerian or (thickness‐weighted) isopycnal averages, independent of the model coordinate that is being used for the numerical discretization. Existing parameterizations in generalized vertical coordinate models, however, are not always consistent with either of these interpretations, which, respectively, would require a three‐dimensional divergence‐free eddy tracer advection or a form‐stress parameterization in the momentum equations.more » « less
An official website of the United States government

