skip to main content


Title: A generalized deep learning approach for local structure identification in molecular simulations
Identifying local structure in molecular simulations is of utmost importance. The most common existing approach to identify local structure is to calculate some geometrical quantity referred to as an order parameter. In simple cases order parameters are physically intuitive and trivial to develop ( e.g. , ion-pair distance), however in most cases, order parameter development becomes a much more difficult endeavor ( e.g. , crystal structure identification). Using ideas from computer vision, we adapt a specific type of neural network called a PointNet to identify local structural environments in molecular simulations. A primary challenge in applying machine learning techniques to simulation is selecting the appropriate input features. This challenge is system-specific and requires significant human input and intuition. In contrast, our approach is a generic framework that requires no system-specific feature engineering and operates on the raw output of the simulations, i.e. , atomic positions. We demonstrate the method on crystal structure identification in Lennard-Jones (four different phases), water (eight different phases), and mesophase (six different phases) systems. The method achieves as high as 99.5% accuracy in crystal structure identification. The method is applicable to heterogeneous nucleation and it can even predict the crystal phases of atoms near external interfaces. We demonstrate the versatility of our approach by using our method to identify surface hydrophobicity based solely upon positions and orientations of surrounding water molecules. Our results suggest the approach will be broadly applicable to many types of local structure in simulations.  more » « less
Award ID(s):
1725573
NSF-PAR ID:
10109492
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Chemical Science
ISSN:
2041-6520
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Electron Backscatter Diffraction (EBSD) is a widely used approach for characterising the microstructure of various materials. However, it is difficult to accurately distinguish similar (body centred cubic and body centred tetragonal, with small tetragonality) phases in steels using standard EBSD software. One method to tackle the problem of phase distinction is to measure the tetragonality of the phases, which can be done using simulated patterns and cross‐correlation techniques to detect distortion away from a perfectly cubic crystal lattice. However, small errors in the determination of microscope geometry (the so‐called pattern or projection centre) can cause significant errors in tetragonality measurement and lead to erroneous results. This paper utilises a new approach for accurate pattern centre determination via a strain minimisation routine across a large number of grains in dual phase steels. Tetragonality maps are then produced and used to identify phase and estimate local carbon content. The technique is implemented using both kinetically simulated and dynamically simulated patterns to determine their relative accuracy. Tetragonality maps, and subsequent phase maps, based on dynamically simulated patterns in a point‐by‐point and grain average comparison are found to consistently produce more precise and accurate results, with close to 90% accuracy for grain phase identification, when compared with an image‐quality identification method. The error in tetragonality measurements appears to be of the order of 1%, thus producing a commensurate ∼0.2% error in carbon content estimation. Such an error makes the technique unsuitable for estimation of total carbon content of most commercial steels, which often have carbon levels below 0.1%. However, even in the DP steel for this study (0.1 wt.% carbon) it can be used to map carbon in regions with higher accumulation (such as in martensite with nonhomogeneous carbon content).

    Lay Description

    Electron Backscatter Diffraction (EBSD) is a widely used approach for characterising the microstructure of various materials. However, it is difficult to accurately distinguish similar (BCC and BCT) phases in steels using standard EBSD software due to the small difference in crystal structure. One method to tackle the problem of phase distinction is to measure the tetragonality, or apparent ‘strain’ in the crystal lattice, of the phases. This can be done by comparing experimental EBSD patterns with simulated patterns via cross‐correlation techniques, to detect distortion away from a perfectly cubic crystal lattice. However, small errors in the determination of microscope geometry (the so‐called pattern or projection centre) can cause significant errors in tetragonality measurement and lead to erroneous results. This paper utilises a new approach for accurate pattern centre determination via a strain minimisation routine across a large number of grains in dual phase steels. Tetragonality maps are then produced and used to identify phase and estimate local carbon content. The technique is implemented using both simple kinetically simulated and more complex dynamically simulated patterns to determine their relative accuracy. Tetragonality maps, and subsequent phase maps, based on dynamically simulated patterns in a point‐by‐point and grain average comparison are found to consistently produce more precise and accurate results, with close to 90% accuracy for grain phase identification, when compared with an image‐quality identification method. The error in tetragonality measurements appears to be of the order of 1%, thus producing a commensurate error in carbon content estimation. Such an error makes an estimate of total carbon content particularly unsuitable for low carbon steels; although maps of local carbon content may still be revealing.

    Application of the method developed in this paper will lead to better understanding of the complex microstructures of steels, and the potential to design microstructures that deliver higher strength and ductility for common applications, such as vehicle components.

     
    more » « less
  2. ABSTRACT: Molecular simulations with atomistic or coarse- 6 grained force fields are a powerful approach for understanding and 7 predicting the self-assembly phase behavior of complex molecules. 8 Amphiphiles, block oligomers, and block polymers can form 9 mesophases with different ordered morphologies describing the 10 spatial distribution of the blocks, but entirely amorphous nature for 11 local packing and chain conformation. Screening block oligomer 12 chemistry and architecture through molecular simulations to find 13 promising candidates for functional materials is aided by effective 14 and straightforward morphology identification techniques. Captur- 15 ing 3-dimensional periodic structures, such as ordered network 16 morphologies, is hampered by the requirement that the number of 17 molecules in the simulated system and the shape of the periodic simulation box need to be commensurate with those of the resulting 18 network phase. Common strategies for structure identification include structure factors and order parameters, but these fail to 19 identify imperfect structures in simulations with incorrect system sizes. Building upon pioneering work by DeFever et al. [Chem. Sci. 20 2019, 10, 7503−7515] who implemented a PointNet (i.e., a neural network designed for computer vision applications using point 21 clouds) to detect local structure in simulations of single-bead particles and water molecules, we present a PointNet for detection of 22 nonlocal ordered morphologies of complex block oligomers. Our PointNet was trained using atomic coordinates from molecular 23 dynamics simulation trajectories and synthetic point clouds for ordered network morphologies that were absent from previous 24 simulations. In contrast to prior work on simple molecules, we observe that large point clouds with 1000 or more points are needed 25 for the more complex block oligomers. The trained PointNet model achieves an accuracy as high as 0.99 for globally ordered 26 morphologies formed by linear diblock, linear triblock, and 3-arm and 4-arm star-block oligomers, and it also allows for the discovery 27 of emerging ordered patterns from nonequilibrium systems. 
    more » « less
  3. Predictions of the structures of stoichiometric, fractional, or nonstoichiometric hydrates of organic molecular crystals are immensely challenging due to the extensive search space of different water contents, host molecular placements throughout the crystal, and internal molecular conformations. However, the dry frameworks of these hydrates, especially for nonstoichiometric or isostructural dehydrates, can often be predicted from a standard anhydrous crystal structure prediction (CSP) protocol. Inspired by developments in the field of drug binding, we introduce an efficient data-driven and topologically aware approach for predicting organic molecular crystal hydrate structures through a mapping of water positions within the crystal structure. The method does not require a priori specification of water content and can, therefore, predict stoichiometric, fractional, and nonstoichiometric hydrate structures. This approach, which we term a mapping approach for crystal hydrates (MACH), establishes a set of rules for systematic determination of favorable positions for water insertion within predicted or experimental crystal structures based on considerations of the chemical features of local environments and void regions. The proposed approach is tested on hydrates of three pharmaceutically relevant compounds that exhibit diverse crystal packing motifs and void environments characteristic of hydrate structures. Overall, we show that our mapping approach introduces an advance in the efficient performance of hydrate CSP through generation of stable hydrate stoichiometries at low cost and should be considered an integral component for CSP workflows. 
    more » « less
  4. null (Ed.)
    Context. The excitation of the filamentary gas structures surrounding giant elliptical galaxies at the center of cool-core clusters, also known as brightest cluster galaxies (BCGs), is key to our understanding of active galactic nucleus (AGN) feedback, and of the impact of environmental and local effects on star formation. Aims. We investigate the contribution of thermal radiation from the cooling flow surrounding BCGs to the excitation of the filaments. We explore the effects of small levels of extra heating (turbulence), and of metallicity, on the optical and infrared lines. Methods. Using the C LOUDY code, we modeled the photoionization and photodissociation of a slab of gas of optical depth A V  ≤ 30 mag at constant pressure in order to calculate self-consistently all of the gas phases, from ionized gas to molecular gas. The ionizing source is the extreme ultraviolet (EUV) and soft X-ray radiation emitted by the cooling gas. We tested these models comparing their predictions to the rich multi-wavelength observations from optical to submillimeter, now achieved in cool core clusters. Results. Such models of self-irradiated clouds, when reaching sufficiently large A V , lead to a cloud structure with ionized, atomic, and molecular gas phases. These models reproduce most of the multi-wavelength spectra observed in the nebulae surrounding the BCGs, not only the low-ionization nuclear emission region like optical diagnostics, [O  III ] λ 5007 Å/H β , [N  II ] λ 6583 Å/H α , and ([S  II ] λ 6716 Å+[S  II ] λ 6731 Å)/H α , but also the infrared emission lines from the atomic gas. [O  I ] λ 6300 Å/H α , instead, is overestimated across the full parameter space, except for very low A V . The modeled ro-vibrational H 2 lines also match observations, which indicates that near- and mid-infrared H 2 lines are mostly excited by collisions between H 2 molecules and secondary electrons produced naturally inside the cloud by the interaction between the X-rays and the cold gas in the filament. However, there is still some tension between ionized and molecular line tracers (i.e., CO), which requires optimization of the cloud structure and the density of the molecular zone. The limited range of parameters over which predictions match observations allows us to constrain, in spite of degeneracies in the parameter space, the intensity of X-ray radiation bathing filaments, as well as some of their physical properties like A V or the level of turbulent heating rate. Conclusions. The reprocessing of the EUV and X-ray radiation from the plasma cooling is an important powering source of line emission from filaments surrounding BCGs. C LOUDY self-irradiated X-ray excitation models coupled with a small level of turbulent heating manage to simultaneously reproduce a large number of optical-to-infrared line ratios when all the gas phases (from ionized to molecular) are modeled self-consistently. Releasing some of the simplifications of our model, like the constant pressure, or adding the radiation fields from the AGN and stars, as well as a combination of matter- and radiation-bounded cloud distribution, should improve the predictions of line emission from the different gas phases. 
    more » « less
  5. Objective and Impact Statement . Identifying benign mimics of prostatic adenocarcinoma remains a significant diagnostic challenge. In this work, we developed an approach based on label-free, high-resolution molecular imaging with multispectral deep ultraviolet (UV) microscopy which identifies important prostate tissue components, including basal cells. This work has significant implications towards improving the pathologic assessment and diagnosis of prostate cancer. Introduction . One of the most important indicators of prostate cancer is the absence of basal cells in glands and ducts. However, identifying basal cells using hematoxylin and eosin (H&E) stains, which is the standard of care, can be difficult in a subset of cases. In such situations, pathologists often resort to immunohistochemical (IHC) stains for a definitive diagnosis. However, IHC is expensive and time-consuming and requires more tissue sections which may not be available. In addition, IHC is subject to false-negative or false-positive stains which can potentially lead to an incorrect diagnosis. Methods . We leverage the rich molecular information of label-free multispectral deep UV microscopy to uniquely identify basal cells, luminal cells, and inflammatory cells. The method applies an unsupervised geometrical representation of principal component analysis to separate the various components of prostate tissue leading to multiple image representations of the molecular information. Results . Our results show that this method accurately and efficiently identifies benign and malignant glands with high fidelity, free of any staining procedures, based on the presence or absence of basal cells. We further use the molecular information to directly generate a high-resolution virtual IHC stain that clearly identifies basal cells, even in cases where IHC stains fail. Conclusion . Our simple, low-cost, and label-free deep UV method has the potential to improve and facilitate prostate cancer diagnosis by enabling robust identification of basal cells and other important prostate tissue components. 
    more » « less