Finite-temperature lattice free energy differences between polymorphs of molecular crystals are fundamental to understanding and predicting the relative stability relationships underpinning polymorphism, yet are computationally expensive to obtain. Here, we implement and critically assess machine-learning-enabled targeted free energy calculations derived from flow-based generative models to compute the free energy difference between two ice crystal polymorphs (Ice XI and Ic), modeled with a fully flexible empirical classical force field. We demonstrate that even when remapping from an analytical reference distribution, such methods enable a cost-effective and accurate calculation of free energy differences between disconnected metastable ensembles when trained on locally ergodic data sampled exclusively from the ensembles of interest. Unlike classical free energy perturbation methods, such as the Einstein crystal method, the targeted approach analyzed in this work requires no additional sampling of intermediate perturbed Hamiltonians, offering significant computational savings. To systematically assess the accuracy of the method, we monitored the convergence of free energy estimates during training by implementing an overfitting-aware weighted averaging strategy. By comparing our results with ground-truth free energy differences computed with the Einstein crystal method, we assess the accuracy and efficiency of two different model architectures, employing two different representations of the supercell degrees of freedom (Cartesian vs quaternion-based). We conduct our assessment by comparing free energy differences between crystal supercells of different sizes and temperatures and assessing the accuracy in extrapolating lattice free energies to the thermodynamic limit. While at low temperatures and in small system sizes, the models perform with similar accuracy. We note that for larger systems and high temperatures, the choice of representation is key to obtaining generalizable results of quality comparable to that obtained from the Einstein crystal method. We believe this work to be a stepping stone toward efficient free energy calculations in larger, more complex molecular crystals.
more »
« less
Implementation of adaptive integration method for free energy calculations in molecular systems
Estimating free energy differences by computer simulation is useful for a wide variety of applications such as virtual screening for drug design and for understanding how amino acid mutations modify protein interactions. However, calculating free energy differences remains challenging and often requires extensive trial and error and very long simulation times in order to achieve converged results. Here, we present an implementation of the adaptive integration method (AIM). We tested our implementation on two molecular systems and compared results from AIM to those from a suite of other methods. The model systems tested here include calculating the solvation free energy of methane, and the free energy of mutating the peptide GAG to GVG. We show that AIM is more efficient than other tested methods for these systems, that is, AIM results converge to a higher level of accuracy and precision for a given simulation time.
more »
« less
- Award ID(s):
- 1736253
- PAR ID:
- 10291357
- Date Published:
- Journal Name:
- PeerJ
- ISSN:
- 2167-8359
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract This paper identifies and explains particular differences and properties of adjoint-free iterative ensemble methods initially developed for parameter estimation in petroleum models. The aim is to demonstrate the methods’ potential for sequential data assimilation in coupled and multiscale unstable dynamical systems. For this study, we have introduced a new nonlinear and coupled multiscale model based on two Kuramoto–Sivashinsky equations operating on different scales where a coupling term relaxes the two model variables toward each other. This model provides a convenient testbed for studying data assimilation in highly nonlinear and coupled multiscale systems. We show that the model coupling leads to cross covariance between the two models’ variables, allowing for a combined update of both models. The measurements of one model’s variable will also influence the other and contribute to a more consistent estimate. Second, the new model allows us to examine the properties of iterative ensemble smoothers and assimilation updates over finite-length assimilation windows. We discuss the impact of varying the assimilation windows’ length relative to the model’s predictability time scale. Furthermore, we show that iterative ensemble smoothers significantly improve the solution’s accuracy compared to the standard ensemble Kalman filter update. Results and discussion provide an enhanced understanding of the ensemble methods’ potential implementation and use in operational weather- and climate-prediction systems.more » « less
-
Abstract Molecular simulations are an important tool for research in physics, chemistry, and biology. The capabilities of simulations can be greatly expanded by providing access to advanced sampling methods and techniques that permit calculation of the relevant underlying free energy landscapes. In this sense, software that can be seamlessly adapted to a broad range of complex systems is essential. Building on past efforts to provide open-source community-supported software for advanced sampling, we introduce PySAGES, a Python implementation of the Software Suite for Advanced General Ensemble Simulations (SSAGES) that provides full GPU support for massively parallel applications of enhanced sampling methods such as adaptive biasing forces, harmonic bias, or forward flux sampling in the context of molecular dynamics simulations. By providing an intuitive interface that facilitates the management of a system’s configuration, the inclusion of new collective variables, and the implementation of sophisticated free energy-based sampling methods, the PySAGES library serves as a general platform for the development and implementation of emerging simulation techniques. The capabilities, core features, and computational performance of this tool are demonstrated with clear and concise examples pertaining to different classes of molecular systems. We anticipate that PySAGES will provide the scientific community with a robust and easily accessible platform to accelerate simulations, improve sampling, and enable facile estimation of free energies for a wide range of materials and processes.more » « less
-
Context. Ambipolar diffusion is a physical mechanism related to the drift between charged and neutral particles in a partially ionized plasma that is key to many different astrophysical systems. However, understanding its effects is challenging due to basic uncertainties concerning relevant microphysical aspects and the strong constraints it imposes on the numerical modeling. Aims. Our aim is to introduce a numerical tool that allows us to address complex problems involving ambipolar diffusion in which, additionally, departures from ionization equilibrium are important or high resolution is needed. The primary application of this tool is for solar atmosphere calculations, but the methods and results presented here may also have a potential impact on other astrophysical systems. Methods. We have developed a new module for the stellar atmosphere Bifrost code that improves its computational capabilities of the ambipolar diffusion term in the generalized Ohm’s law. This module includes, among other things, collision terms adequate to processes in the coolest regions in the solar chromosphere. As the main feature of the module, we have implemented the super time stepping (STS) technique, which allows an important acceleration of the calculations. We have also introduced hyperdiffusion terms to guarantee the stability of the code. Results. We show that to have an accurate value for the ambipolar diffusion coefficient in the solar atmosphere it is necessary to include as atomic elements in the equation of state not only hydrogen and helium, but also the main electron donors like sodium, silicon, and potassium. In addition, we establish a range of criteria to set up an automatic selection of the free parameters of the STS method that guarantees the best performance, optimizing the stability and speed for the ambipolar diffusion calculations. We validate the STS implementation by comparison with a self-similar analytical solution.more » « less
-
Alkan, Can (Ed.)Abstract SummaryGenome-centric analysis of metagenomic samples is a powerful method for understanding the function of microbial communities. Calculating read coverage is a central part of analysis, enabling differential coverage binning for recovery of genomes and estimation of microbial community composition. Coverage is determined by processing read alignments to reference sequences of either contigs or genomes. Per-reference coverage is typically calculated in an ad-hoc manner, with each software package providing its own implementation and specific definition of coverage. Here we present a unified software package CoverM which calculates several coverage statistics for contigs and genomes in an ergonomic and flexible manner. It uses “Mosdepth arrays” for computational efficiency and avoids unnecessary I/O overhead by calculating coverage statistics from streamed read alignment results. Availability and implementationCoverM is free software available at https://github.com/wwood/coverm. CoverM is implemented in Rust, with Python (https://github.com/apcamargo/pycoverm) and Julia (https://github.com/JuliaBinaryWrappers/CoverM_jll.jl) interfaces.more » « less
An official website of the United States government

