skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Progressive alignment of crystals: reproducible and efficient assessment of crystal structure similarity
During in silico crystal structure prediction of organic molecules, millions of candidate structures are often generated. These candidates must be compared to remove duplicates prior to further analysis ( e.g. optimization with electronic structure methods) and ultimately compared with structures determined experimentally. The agreement of predicted and experimental structures forms the basis of evaluating the results from the Cambridge Crystallographic Data Centre (CCDC) blind assessment of crystal structure prediction, which further motivates the pursuit of rigorous alignments. Evaluating crystal structure packings using coordinate root-mean-square deviation (RMSD) for N molecules (or N asymmetric units) in a reproducible manner requires metrics to describe the shape of the compared molecular clusters to account for alternative approaches used to prioritize selection of molecules. Described here is a flexible algorithm called Progressive Alignment of Crystals ( PAC ) to evaluate crystal packing similarity using coordinate RMSD and introducing the radius of gyration ( R g ) as a metric to quantify the shape of the superimposed clusters. It is shown that the absence of metrics to describe cluster shape adds ambiguity to the results of the CCDC blind assessments because it is not possible to determine whether the superposition algorithm has prioritized tightly packed molecular clusters ( i.e. to minimize R g ) or prioritized reduced RMSD ( i.e. via possibly elongated clusters with relatively larger R g ). For example, it is shown that when the PAC algorithm described here uses single linkage to prioritize molecules for inclusion in the superimposed clusters, the results are nearly identical to those calculated by the widely used program COMPACK . However, the lower R g values obtained by the use of average linkage are favored for molecule prioritization because the resulting RMSDs more equally reflect the importance of packing along each dimension. It is shown that the PAC algorithm is faster than COMPACK when using a single process and its utility for biomolecular crystals is demonstrated. Finally, parallel scaling up to 64 processes in the open-source code Force Field X is presented.  more » « less
Award ID(s):
1751688
PAR ID:
10403343
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Journal of Applied Crystallography
Volume:
55
Issue:
6
ISSN:
1600-5767
Page Range / eLocation ID:
1528 to 1537
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. New additions to quasiracemic materials have been developed by cocrystallizing a ternary component – hydrogen oxalate – with pairs of amino acid quasienantiomers where at least one of the side-chain R groups contains a sulfur atom. Of the eight quasiracemates investigated, six exhibit crystal packing that drastically deviates from the expected centrosymmetric alignment present in the racemic counterparts and the extant database of quasiracemic materials. These structures were quantitatively assessed for conformational similarity (CCDC-Mercury structure overlay) and the degree of inversion symmetry (Avnir's Continuous Symmetry Measures) for each quasienantiomeric pair. Despite the variance in quasienantiomeric components, these structures exhibit a high degree of isostructurality where the principal components assemble by a complex blend of common N + –H⋯O and O–H⋯O − interactions. These charge-assisted hydrogen-bonded networks form thermodynamically favored crystal packing that promotes cocrystallization of a structurally diverse set of quasienantiomeric components. 
    more » « less
  2. To date X-ray protein crystallography is the most successful technique available for the determination of high-resolution 3D structures of biological molecules and their complexes. In X-ray protein crystallography the structure of a protein is refined against the set of observed Bragg reflections from a protein crystal. The resolution of the refined protein structure is limited by the highest angle at which Bragg reflections can be observed. In addition, the Bragg reflections alone are typically insufficient (by a factor of two) to determine the structureab initio, and so prior information is required. Crystals formed from an imperfect packing of the protein molecules may also exhibit continuous diffraction between and beyond these Bragg reflections. When this is due to random displacements of the molecules from each crystal lattice site, the continuous diffraction provides the necessary information to determine the protein structure without prior knowledge, to a resolution that is not limited by the angular extent of the observed Bragg reflections but instead by that of the diffraction as a whole. This article presents an iterative projection algorithm that simultaneously uses the continuous diffraction as well as the Bragg reflections for the determination of protein structures. The viability of this method is demonstrated on simulated crystal diffraction. 
    more » « less
  3. Identifying thermodynamically stable crystal structures remains a key challenge in materials chemistry. Computational crystal structure prediction (CSP) workflows typically rank candidate structures by lattice energy to assess relative stability. Approaches using self-consistent first-principles calculations become prohibitively expensive, especially when millions of energy evaluations are required for complex molecular systems with many atoms per unit cell. Here, we provide a detailed analysis of our methodology and results from the seventh blind test of crystal structure prediction organized by the Cambridge Crystallographic Data Centre (CCDC). We present an approach that significantly accelerates CSP by training target-specific machine learned interatomic potentials (MLIPs). AIMNet2 MLIPs are trained on density functional theory (DFT) calculations of molecular clusters, herein referred to as n-mers. We demonstrate that potentials trained on gas phase dispersion-corrected DFT reference data of n-mers successfully extend to crystalline environments, accurately characterizing the CSP landscape and correctly ranking structures by relative stability. Our methodology effectively captures the underlying physics of thermodynamic crystal stability using only molecular cluster data, avoiding the need for expensive periodic calculations. The performance of target-specific AIMNet2 interatomic potentials is illustrated across diverse chemical systems relevant to pharmaceutical, optoelectronic, and agrochemical applications, demonstrating their promise as efficient alternatives to full DFT calculations for routine CSP tasks. 
    more » « less
  4. Accuracy in the prediction of protein structures is key in understanding the biological functions of different proteins. Numerous measures of similarity tools for protein structures have been developed over the years, and these include Root Mean Square Deviation (RMSD), as well as Template Modeling Score (TM-score). While RMSD is influenced by the length of the protein and therefore the similarity between superimposed models can be affected by divergent loops in the models, TM-score is rather a robust and a more accurate method. TM-score, however, is much slower than RMSD in terms of calculations for the optimal superimposed model. Here, we present initial optimization work on GPU-TM-score, a GPU accelerated Template Modeling Score for fast and accurate measuring of similarity between protein structures. Our optimization is based on OpenACC parallelization and performance analysis of bottleneck regions and the KABSCH algorithm for the calculation of optimal superimposition within parallel architectures. Our initial results indicate an average 3.14× speedup compared to original TM-score on a benchmark set of 20 protein structures. This speedup is recorded on an Nvidia Volta V100 GPU compared to an AMD EPYC 7742 64-core processor. 
    more » « less
  5. A molecular crystal structure prediction (CSP) protocol used in the seventh blind test is presented. The seventh blind test was divided into two stages and included seven targets, with crystals containing from one to three molecules in asymmetric units, monomers built of up to 100 atoms, and all targets containing monomers with flexible degrees of freedom. Some targets were cocrystals and one target was a salt. These diverse targets were treated using a CSP protocol starting from finding the global and local minima conformations of the target molecule. Subsequently, anab initiotwo-body rigid-monomer six-dimensional force field (aiFF) was developed for the global-minimum conformer. These aiFFs were then used in CSPs consisting of packing and lattice-energy minimization stages. Flexible-monomer CSPs were used for some targets. To describe the intramonomer FF, either generic empirical FFs or reparametrized FFs of this type were used, with some parameters fitted toab initioenergies of monomers in the latter case. A novel packing procedure was applied for two targets in stage 1. The success rate in the structure generation stage was 15% in submission phase and 54% in post-submission phase, while the corresponding values in the structure rating stage were 33% and 89%. We conclude that the inexpensive conformer-based approach with rigid-monomer CSPs can be recommended for investigations of crystals with flexible monomers. An advantage of this protocol is that it is fully based on first-principles quantum mechanics and generates tailor-made FFs suitable for use in subsequent molecular dynamics simulations investigating temperature-dependent effects. However, empirical intramonomer FFs reparametrized usingab initiodata are not yet adequate for CSPs. 
    more » « less