skip to main content


Title: A data-driven and topological mapping approach for the a priori prediction of stable molecular crystalline hydrates
Predictions of the structures of stoichiometric, fractional, or nonstoichiometric hydrates of organic molecular crystals are immensely challenging due to the extensive search space of different water contents, host molecular placements throughout the crystal, and internal molecular conformations. However, the dry frameworks of these hydrates, especially for nonstoichiometric or isostructural dehydrates, can often be predicted from a standard anhydrous crystal structure prediction (CSP) protocol. Inspired by developments in the field of drug binding, we introduce an efficient data-driven and topologically aware approach for predicting organic molecular crystal hydrate structures through a mapping of water positions within the crystal structure. The method does not require a priori specification of water content and can, therefore, predict stoichiometric, fractional, and nonstoichiometric hydrate structures. This approach, which we term a mapping approach for crystal hydrates (MACH), establishes a set of rules for systematic determination of favorable positions for water insertion within predicted or experimental crystal structures based on considerations of the chemical features of local environments and void regions. The proposed approach is tested on hydrates of three pharmaceutically relevant compounds that exhibit diverse crystal packing motifs and void environments characteristic of hydrate structures. Overall, we show that our mapping approach introduces an advance in the efficient performance of hydrate CSP through generation of stable hydrate stoichiometries at low cost and should be considered an integral component for CSP workflows.  more » « less
Award ID(s):
1955381
NSF-PAR ID:
10401653
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
119
Issue:
43
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Hydrate formation is often unavoidable during crystallization, leading to performance degradation of pharmaceuticals and energetics. In some cases, water molecules trapped within crystal lattices can be substituted for hydrogen peroxide, improving the solubility of drugs and detonation performance of explosives. The present work compares hydrates and hydrogen peroxide solvates in two ways: (1) analyzing structural motifs present in crystal structures accessed from the Cambridge Structural Database and (2) developing potential energy surfaces for water and hydrogen peroxide interacting with functional groups of interest at geometries relevant to the solid state. By elucidating fundamental differences in local interactions that can be formed with molecules of hydrogen peroxide and/or water, the analyses presented here provide a foundation for the design and selection of candidate molecules for the formation of hydrogen peroxide solvates. 
    more » « less
  2. The syntheses and crystal structures of two bimetallic molecular compounds, namely, bis[bis(6,6′-dimethyl-2,2′-bipyridine)copper(I)] hexafluoridozirconate(IV) 1.134-hydrate, [Cu(dmbpy) 2 ] 2 [ZrF 6 ]·1.134H 2 O (dmbpy = 6,6′-dimethyl-2,2′-bipyridyl, C 12 H 12 N 2 ), (I), and bis[bis(6,6′-dimethyl-2,2′-bipyridine)copper(I)] hexafluoridohafnate(IV) 0.671-hydrate, [Cu(dmbpy) 2 ] 2 [HfF 6 ]·0.671H 2 O, (II), are reported. Apart from a slight site occupany difference for the water molecule of crystallization, compounds (I) and (II) are isostructural, featuring isolated tetrahedral cations of copper(I) ions coordinated by two dmbpy ligands and centrosymmetric, octahedral anions of fluorinated early transition metals. The tetrahedral environments of the copper complexes are distorted owing to the steric effects of the dmbpy ligands. The extended structures are built up through Coulombic interactions between cations and anions and π–π stacking interactions between heterochiral Δ- and Λ-[Cu(dmbpy) 2 ] + complexes. A comparison between the title compounds and other [Cu(dmbpy) 2 ] + compounds with monovalent and bivalent anions reveals a significant influence of the cation-to-anion ratio on the resulting crystal packing architectures, providing insights for future crystal design of distorted tetrahedral copper compounds. 
    more » « less
  3. Identifying local structure in molecular simulations is of utmost importance. The most common existing approach to identify local structure is to calculate some geometrical quantity referred to as an order parameter. In simple cases order parameters are physically intuitive and trivial to develop ( e.g. , ion-pair distance), however in most cases, order parameter development becomes a much more difficult endeavor ( e.g. , crystal structure identification). Using ideas from computer vision, we adapt a specific type of neural network called a PointNet to identify local structural environments in molecular simulations. A primary challenge in applying machine learning techniques to simulation is selecting the appropriate input features. This challenge is system-specific and requires significant human input and intuition. In contrast, our approach is a generic framework that requires no system-specific feature engineering and operates on the raw output of the simulations, i.e. , atomic positions. We demonstrate the method on crystal structure identification in Lennard-Jones (four different phases), water (eight different phases), and mesophase (six different phases) systems. The method achieves as high as 99.5% accuracy in crystal structure identification. The method is applicable to heterogeneous nucleation and it can even predict the crystal phases of atoms near external interfaces. We demonstrate the versatility of our approach by using our method to identify surface hydrophobicity based solely upon positions and orientations of surrounding water molecules. Our results suggest the approach will be broadly applicable to many types of local structure in simulations. 
    more » « less
  4. Abstract An inexpensive and reliable method for molecular crystal structure predictions (CSPs) has been developed. The new CSP protocol starts from a two-dimensional graph of crystal’s monomer(s) and utilizes no experimental information. Using results of quantum mechanical calculations for molecular dimers, an accurate two-body, rigid-monomer ab initio-based force field (aiFF) for the crystal is developed. Since CSPs with aiFFs are essentially as expensive as with empirical FFs, tens of thousands of plausible polymorphs generated by the crystal packing procedures can be optimized. Here we show the robustness of this protocol which found the experimental crystal within the 20 most stable predicted polymorphs for each of the 15 investigated molecules. The ranking was further refined by performing periodic density-functional theory (DFT) plus dispersion correction (pDFT+D) calculations for these 20 top-ranked polymorphs, resulting in the experimental crystal ranked as number one for all the systems studied (and the second polymorph, if known, ranked in the top few). Alternatively, the polymorphs generated can be used to improve aiFFs, which also leads to rank one predictions. The proposed CSP protocol should result in aiFFs replacing empirical FFs in CSP research. 
    more » « less
  5. Abstract

    Water and ligand binding play critical roles in the structure and function of proteins, yet their binding sites and significance are difficult to predict a priori. Multiple solvent crystal structures (MSCS) is a method where several X‐ray crystal structures are solved, each in a unique solvent environment, with organic molecules that serve as probes of the protein surface for sites evolved to bind ligands, while the first hydration shell is essentially maintained. When superimposed, these structures contain a vast amount of information regarding hot spots of protein‐protein or protein‐ligand interactions, as well as conserved water‐binding sites retained with the change in solvent properties. Optimized mining of this information requires reliable structural data and a consistent, objective analysis tool. Detection of related solvent positions (DRoP) was developed to automatically organize and rank the water or small organic molecule binding sites within a given set of structures. It is a flexible tool that can also be used in conserved water analysis given multiple structures of any protein independent of the MSCS method. The DRoP output is an HTML format list of the solvent sites ordered by conservation rank in its population within the set of structures, along with renumbered and recolored PDB files for visualization and facile analysis. Here, we present a previously unpublished set of MSCS structures of bovine pancreatic ribonuclease A (RNase A) and use it together with published structures to illustrate the capabilities of DRoP.

     
    more » « less