skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Tuckerman, Mark E."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. NWe present MXtalTools, a flexible Python package for the data-driven modeling of molecular crystals, facilitating machine learning studies of the molecular solid state. MXtalTools comprises several classes of utilities: (1) synthesis, collation, and curation of molecule and crystal data sets, (2) integrated workflows for model training and inference, (3) crystal parametrization and representation, (4) crystal structure sampling and optimization, (5) end-to-end differentiable crystal sampling, construction, and analysis. Our modular functions can be integrated into existing workflows or combined and used to build novel modeling pipelines. MXtalTools leverages CUDA acceleration to enable high-throughput crystal modeling. The Python code is available open-source on our GitHub page, with detailed documentation on ReadTheDocs.ot Available 
    more » « less
  2. Machine learning interatomic potentials (MLIPs) have become powerful tools to extend molecular simulations beyond the limits of quantum methods, offering near-quantum accuracy at much lower computational cost. Yet, developing reliable MLIPs remains difficult because it requires generating high-quality data sets, preprocessing atomic structures, and carefully training and validating models. In this work, we introduce an Automated Machine Learning Pipeline (AMLP) that unifies the entire workflow from data set creation to model validation. AMLP employs large-language-model agents to assist with electronic-structure code selection, input preparation, and output conversion, while its analysis suite (AMLP-Analysis) based on ASE supports a range of molecular simulations. The pipeline is built on the MACE architecture and validated on acridine polymorphs, where with a straightforward fine-tuning of a foundation model mean absolute errors of 1.7 meV/atom in energies and 7.0 meV/& Aring; in forces are achieved. The fitted MLIP reproduces DFT geometries with sub-& Aring; accuracy and demonstrates stability during molecular dynamics simulations in the microcanonical and canonical ensemble. 
    more » « less
  3. Abstract Representations are a foundational component of any modeling protocol, including on molecules and molecular solids. For tasks that depend on knowledge of both molecular conformation and 3D orientation, such as the modeling of molecular dimers, clusters, or condensed phases, we desire a rotatable representation that is provably complete in the types and positions of atomic nuclei and roto-inversion equivariant with respect to the input point cloud. In this paper, we develop, train, and evaluate a new type of autoencoder, molecular O(3) encoding net (Mo3ENet), for multi-type point clouds, for which we propose a new reconstruction loss, capitalizing on a Gaussian mixture representation of the input and output point clouds. Mo3ENet is end-to-end equivariant, meaning the learned representation can be manipulated on O(3), a practical bonus. An appropriately trained Mo3ENet latent space comprises a universal embedding for scalar, vector, and tensorial molecule property prediction tasks, as well as other downstream tasks incorporating the 3D molecular pose, and we demonstrate its fitness on several such tasks. 
    more » « less
  4. Hydrogen bonded electrolytes that exhibit accelerated proton transport via sequential reactive hops have drawn interest for their promise in clean energy applications. Molecular dynamics simulations of these electrolytes offer the opportunity to uncover microscopic mechanistic details that could be used to design and tune the properties of candidate electrolyte technologies. However, accurately modeling the proton transfer reactions and transport properties that give rise to high charge conductivites in these electrolytes proves computationally challenging because of the need to perform lengthy condensed phase simulations, treating both the electronic and nuclear degrees of freedom quantum mechanically. In this paper, we demonstrate that such a modeling task can be efficiently achieved with the use of density functional theory (DFT)-trained machine learning potentials (MLP) to accelerate path integral molecular dynamics (PIMD) simulations. We highlight the practical utility of this approach by using it to benchmark how closely PIMD simulations employing different DFT exchange–correlation functionals reproduce the composition-dependent densities, diffusion coefficients, and electrical conductivities of mixtures consisting of imidazole and levulinic acid. Even with the speedup afforded by our MLPs, PIMD simulations remain quite expensive. In order to render PIMD more computationally tractable, we introduce and benchmark the accuracy of a ring polymer contraction approach that leverages a computationally efficient short-range MLP to accelerate our PIMD simulations by an additional factor of four. 
    more » « less
  5. Organic molecular crystals constitute a class of materials of critical importance in numerous industries. Despite the ubiquity of these systems, our ability to predict molecular crystal structures starting only from a two-dimensional diagram of the constituent compound(s) remains a significant challenge. Most structure-prediction protocols require a customized interatomic interaction model on which the quality of the results can depend sensitively. To overcome this problem, we introduce a new topological approach to molecular crystal structure prediction. The approach posits that in a stable structure, molecules are oriented such that principal axes and normal ring plane vectors are aligned with specific crystallographic directions and that heavy atoms occupy positions that correspond to minima of a set of geometric order parameters. By minimizing an objective function that encodes these orientations and atomic positions, and ltering based on the vdW free volume and intermolecular close contact distributions derived from the Cambridge Structural Database, stable structures and polymorphs for a given crystal can be predicted entirely mathematically without reliance on an interaction model. 
    more » « less
  6. A novel approach to computationally enhance the sampling of molecular crystal structures is proposed and tested. This method is based on the use of extended variables coupled to a Monte Carlo based crystal polymorph generator. Inspired by the established technique of quasi-random sampling of polymorphs using the rigid molecule constraint, this approach represents molecular clusters as extended variables within a thermal reservoir. Polymorph unit-cell variables are generated using pseudo-random sampling. Within this framework, a harmonic coupling between the extended variables and polymorph configurations is established. The extended variables remain fixed during the inner loop dedicated to polymorph sampling, enforcing a stepwise propagation of the extended variables to maintain system exploration. The final processing step results in a polymorph energy landscape, where the raw structures sampled to create the extended variable trajectory are re-optimized without the thermal coupling term. The foundational principles of this approach are described and its effectiveness using both a Metropolis Monte Carlo type algorithm and modifications that incorporate replica exchange is demonstrated. A comparison is provided with pseudo-random sampling of polymorphs for the molecule coumarin. The choice to test a design of this algorithm as relevant for enhanced sampling of crystal structures was due to the obvious relation between molecular structure variables and corresponding crystal polymorphs as representative of the inherent vapor to crystal transitions that exist in nature. Additionally, it is shown that the trajectories of extended variables can be harnessed to extract fluctuation properties that can lead to valuable insights. A novel thermodynamic variable is introduced: the free energy difference between ensembles ofZ′ = 1 andZ′ = 2 crystal polymorphs. 
    more » « less
  7. Abstract The theorems of density functional theory (DFT) establish bijective maps between the local external potential of a many-body system and its electron density, wavefunction and, therefore, one-particle reduced density matrix. Building on this foundation, we show that machine learning models based on the one-electron reduced density matrix can be used to generate surrogate electronic structure methods. We generate surrogates of local and hybrid DFT, Hartree-Fock and full configuration interaction theories for systems ranging from small molecules such as water to more complex compounds like benzene and propanol. The surrogate models use the one-electron reduced density matrix as the central quantity to be learned. From the predicted density matrices, we show that either standard quantum chemistry or a second machine-learning model can be used to compute molecular observables, energies, and atomic forces. The surrogate models can generate essentially anything that a standard electronic structure method can, ranging from band gaps and Kohn-Sham orbitals to energy-conserving ab-initio molecular dynamics simulations and infrared spectra, which account for anharmonicity and thermal effects, without the need to employ computationally expensive algorithms such as self-consistent field theory. The algorithms are packaged in an efficient and easy to use Python code, QMLearn, accessible on popular platforms. 
    more » « less
  8. Collective variable (CV)‐based enhanced sampling techniques are widely used today for accelerating barrier‐crossing events in molecular simulations. A class of these methods, which includes temperature accelerated molecular dynamics (TAMD)/driven‐adiabatic free energy dynamics (d‐AFED), unified free energy dynamics (UFED), and temperature accelerated sliced sampling (TASS), uses an extended variable formalism to achieve quick exploration of conformational space. These techniques are powerful, as they enhance the sampling of a large number of CVs simultaneously compared to other techniques. Extended variables are kept at a much higher temperature than the physical temperature by ensuring adiabatic separation between the extended and physical subsystems and employing rigorous thermostatting. In this work, we present a computational platform to perform extended phase space enhanced sampling simulations using the open‐source molecular dynamics engine OpenMM. The implementation allows users to have interoperability of sampling techniques, as well as employ state‐of‐the‐art thermostats and multiple time‐stepping. This work also presents protocols for determining the critical parameters and procedures for reconstructing high‐dimensional free energy surfaces. As a demonstration, we present simulation results on the high dimensional conformational landscapes of the alanine tripeptide in vacuo, tetra‐N‐methylglycine (tetra‐sarcosine) peptoid in implicit solvent, and the Trp‐cage mini protein in explicit water. 
    more » « less
  9. Determining collective variables (CVs) for conformational transitions is crucial to understanding their dynamics and targeting them in enhanced sampling simulations. Often, CVs are proposed based on intuition or prior knowledge of a system. However, the problem of systematically determining a proper reaction coordinate (RC) for a specific process in terms of a set of putative CVs can be achieved using committor analysis (CA). Identifying essential degrees of freedom that govern such transitions using CA remains elusive because of the high dimensionality of the conformational space. Various schemes exist to leverage the power of machine learning (ML) to extract an RC from CA. Here, we extend these studies and compare the ability of 17 different ML schemes to identify accurate RCs associated with conformational transitions. We tested these methods on an alanine dipeptide in vacuum and on a sarcosine dipeptoid in an implicit solvent. Our comparison revealed that the light gradient boosting machine method outperforms other methods. In order to extract key features from the models, we employed Shapley Additive exPlanations analysis and compared its interpretation with the “feature importance” approach. For the alanine dipeptide, our methodology identifies ϕ and θ dihedrals as essential degrees of freedom in the C7ax to C7eq transition. For the sarcosine dipeptoid system, the dihedrals ψ and ω are the most important for the cisαD to transαD transition. We further argue that analysis of the full dynamical pathway, and not just endpoint states, is essential for identifying key degrees of freedom governing transitions. 
    more » « less