skip to main content


Title: Progressive alignment of crystals: reproducible and efficient assessment of crystal structure similarity
During in silico crystal structure prediction of organic molecules, millions of candidate structures are often generated. These candidates must be compared to remove duplicates prior to further analysis ( e.g. optimization with electronic structure methods) and ultimately compared with structures determined experimentally. The agreement of predicted and experimental structures forms the basis of evaluating the results from the Cambridge Crystallographic Data Centre (CCDC) blind assessment of crystal structure prediction, which further motivates the pursuit of rigorous alignments. Evaluating crystal structure packings using coordinate root-mean-square deviation (RMSD) for N molecules (or N asymmetric units) in a reproducible manner requires metrics to describe the shape of the compared molecular clusters to account for alternative approaches used to prioritize selection of molecules. Described here is a flexible algorithm called Progressive Alignment of Crystals ( PAC ) to evaluate crystal packing similarity using coordinate RMSD and introducing the radius of gyration ( R g ) as a metric to quantify the shape of the superimposed clusters. It is shown that the absence of metrics to describe cluster shape adds ambiguity to the results of the CCDC blind assessments because it is not possible to determine whether the superposition algorithm has prioritized tightly packed molecular clusters ( i.e. to minimize R g ) or prioritized reduced RMSD ( i.e. via possibly elongated clusters with relatively larger R g ). For example, it is shown that when the PAC algorithm described here uses single linkage to prioritize molecules for inclusion in the superimposed clusters, the results are nearly identical to those calculated by the widely used program COMPACK . However, the lower R g values obtained by the use of average linkage are favored for molecule prioritization because the resulting RMSDs more equally reflect the importance of packing along each dimension. It is shown that the PAC algorithm is faster than COMPACK when using a single process and its utility for biomolecular crystals is demonstrated. Finally, parallel scaling up to 64 processes in the open-source code Force Field X is presented.  more » « less
Award ID(s):
1751688
NSF-PAR ID:
10403343
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Journal of Applied Crystallography
Volume:
55
Issue:
6
ISSN:
1600-5767
Page Range / eLocation ID:
1528 to 1537
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. New additions to quasiracemic materials have been developed by cocrystallizing a ternary component – hydrogen oxalate – with pairs of amino acid quasienantiomers where at least one of the side-chain R groups contains a sulfur atom. Of the eight quasiracemates investigated, six exhibit crystal packing that drastically deviates from the expected centrosymmetric alignment present in the racemic counterparts and the extant database of quasiracemic materials. These structures were quantitatively assessed for conformational similarity (CCDC-Mercury structure overlay) and the degree of inversion symmetry (Avnir's Continuous Symmetry Measures) for each quasienantiomeric pair. Despite the variance in quasienantiomeric components, these structures exhibit a high degree of isostructurality where the principal components assemble by a complex blend of common N + –H⋯O and O–H⋯O − interactions. These charge-assisted hydrogen-bonded networks form thermodynamically favored crystal packing that promotes cocrystallization of a structurally diverse set of quasienantiomeric components. 
    more » « less
  2. Accuracy in the prediction of protein structures is key in understanding the biological functions of different proteins. Numerous measures of similarity tools for protein structures have been developed over the years, and these include Root Mean Square Deviation (RMSD), as well as Template Modeling Score (TM-score). While RMSD is influenced by the length of the protein and therefore the similarity between superimposed models can be affected by divergent loops in the models, TM-score is rather a robust and a more accurate method. TM-score, however, is much slower than RMSD in terms of calculations for the optimal superimposed model. Here, we present initial optimization work on GPU-TM-score, a GPU accelerated Template Modeling Score for fast and accurate measuring of similarity between protein structures. Our optimization is based on OpenACC parallelization and performance analysis of bottleneck regions and the KABSCH algorithm for the calculation of optimal superimposition within parallel architectures. Our initial results indicate an average 3.14× speedup compared to original TM-score on a benchmark set of 20 protein structures. This speedup is recorded on an Nvidia Volta V100 GPU compared to an AMD EPYC 7742 64-core processor. 
    more » « less
  3. All-nitrogen solids, if successfully synthesized, are ideal high-energy-density materials because they store a great amount of energy and produce only harmless N 2 gas upon decomposition. Currently, the only method to obtain all-nitrogen solids is to apply high pressure to N 2 crystals. However, products such as cg-N tend to decompose upon releasing the pressure. Compared to covalent solids, molecular crystals are more likely to remain stable during decompression because they can relax the strain by increasing the intermolecular distances. The challenge of such a route is to find a molecular crystal that can attain a favorable phase under elevated pressure. In this work, we show, by designing a novel N 16 molecule (tripentazolylamine) and examining its crystal structures under a series of pressures, that the aromatic units and high molecular symmetry are the key factors to achieving an all-nitrogen molecular crystal. Density functional calculations and structural studies reveal that this new all-nitrogen molecular crystal exhibits a particularly slow enthalpy increase with pressure due to the highly efficient crystal packing of its highly symmetric molecules. Vibration mode calculations and molecular dynamics (MD) simulations show that N 16 crystals are metastable at ambient pressure and could remain inactive up to 400 K. The initial reaction steps of the decomposition are calculated by following the pathway of the concerted excision of N 2 from the N 5 group as revealed by the MD simulations. 
    more » « less
  4. Estrada, Ernesto (Ed.)
    Abstract A direct way to spot structural features that are universally shared among proteins is to find analogues from simpler condensed matter systems. In the current study, the feasibility of creating ensembles of artificial structures that can automatically reproduce a large number of geometrical and topological descriptors of globular proteins is investigated. Towards this aim, a simple cubic (SC) arrangement is shown to provide the best background lattice after a careful analysis of the residue packing trends from 210 globular proteins. It is shown that a minimalistic set of rules imposed on this lattice is sufficient to generate structures that can mimic real proteins. In the proposed method, 210 such structures are generated by randomly removing residues (beads) from clusters that have a SC lattice arrangement such that all the generated structures have single connected components. Two additional sets are prepared from the initial structures via random relaxation and a reverse Monte Carlo simulated annealing algorithm, which targets the average radial distribution function (RDF) of 210 globular proteins. The initial and relaxed structures are compared to real proteins via RDF, bond orientational order parameters and several descriptors of network topology. Based on these features, results indicate that the structures generated with 40% occupancy closely resemble real residue networks. The structure generation mechanism automatically produces networks that are in the same topological class as globular proteins and reproduce small-world characteristics of high clustering and small shortest path lengths. Most notably, the established correspondence rules out icosahedral order as a relevant structural feature for residue networks in contrast to other amorphous systems where it is an inherent characteristic. The close correspondence is also observed in the vibrational characteristics as computed from the Anisotropic Network Model, therefore hinting at a non-superficial link between the proteins and the defect laden cubic crystalline order. 
    more » « less
  5. Hydrogen bonding (HB) interactions are well known to impact the properties of water in the bulk and within hydrated materials. A series of Ni( ii ) complexes based on chelates containing N -(2-aminoethyl)-1-methylimidazole-2-carboxamide have been synthesized and fully characterized by single crystal X-ray diffraction, spectroscopic methods, and thermal analysis. The complexes reveal a variety of water cluster motifs dependent on the packing arrangement in the solid state. A key feature is the orientation of the carboxamide moiety, which leads to the formation of void spaces that accommodate water through HB interactions. The water motifs contain 1D water chains (streams), 2D tapes of infused rings (cascades), and isolated water dimers (pools). The HB motifs in the hydrated structures vary as a function of the crystal packing of the host molecules. Thermal analyses show a correlation between the HB motif in the hydrated crystals and the temperature range of the dehydration process. The conductivity of the hydrated crystals varies as a function of the crystal packing interactions between metal complexes. 
    more » « less