skip to main content


Title: Characterising the atomic structure of mono-metallic nanoparticles from x-ray scattering data using conditional generative models
The development of new nanomaterials for energy technologies is dependent on understanding the intricate relation between material properties and atomic structure. It is, therefore, crucial to be able to routinely characterise the atomic structure in nanomaterials, and a promising method for this task is Pair Distribution Function (PDF) analysis. The PDF can be obtained through Fourier transformation of x-ray total scattering data, and represents a histogram of all interatomic distances in the sample. Going from the distance information in the PDF to a chemical structure is an unassigned distance geometry problem (uDGP), and solving this is often the bottleneck in nanostructure analysis. In this work, we propose to use a Conditional Variational Autoencoder (CVAE) to automatically solve the uDGP to obtain valid chemical structures from PDFs. We use a simple model system of hypothetical mono-metallic nanoparticles containing up to 100 atoms in the face centered cubic (FCC) structure as a proof of concept. The model is trained to predict the assigned distance matrix (aDM) from a simulated PDF of the structure as the conditional input. We introduce a novel representation of structures by projecting them inside a unit sphere and adding additional anchor points or satellites to help in the reconstruction of the chemical structure. The performance of the CVAE model is compared to a Deterministic Autoencoder (DAE) showing that both models are able to solve the uDGP reasonably well. We further show that the CVAE learns a structured and meaningful latent embedding space which can be used to predict new chemical structures.  more » « less
Award ID(s):
1922234
NSF-PAR ID:
10300745
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
ChemRxiv
ISSN:
2573-2293
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Identifying point defects and other structural anomalies using scanning transmission electron microscopy (STEM) is important to understand a material's properties caused by the disruption of the regular pattern of crystal lattice. Due to improvements in instrumentation stability and electron optics, atomic‐resolution images with a field of view of several hundred nanometers can now be routinely acquired at 1–10 Hz frame rates and such data, which often contain thousands of atomic columns, need to be analyzed. To date, image analysis is performed largely manually, but recent developments in computer vision (CV) and machine learning (ML) now enable automated analysis of atomic structures and associated defects. Here, the authors report on how a Convolutional Variational Autoencoder (CVAE) can be utilized to detect structural anomalies in atomic‐resolution STEM images. Specifically, the training set is limited to perfect crystal images , and the performance of a CVAE in differentiating between single‐crystal bulk data or point defects is demonstrated. It is found that the CVAE can reproduce the perfect crystal data but not the defect input data. The disagreesments between the CVAE‐predicted data for defects allows for a clear and automatic distinction and differentiation of several point defect types.

     
    more » « less
  2. ABSTRACT

    We develop a machine-learning (ML) algorithm that generates high-resolution thermal Sunyaev–Zeldovich (SZ) maps of novel galaxy clusters given only halo mass and mass accretion rate (MAR). The algorithm uses a conditional variational autoencoder (CVAE) in the form of a convolutional neural network and is trained with SZ maps generated from the IllustrisTNG simulation. Our method can reproduce many of the details of galaxy clusters that analytical models usually lack, such as internal structure and aspherical distribution of gas created by mergers, while achieving the same computational feasibility, allowing us to generate mock SZ maps for over 105 clusters in 30 s on a laptop. We show that the model is capable of generating novel clusters (i.e. not found in the training set) and that the model accurately reproduces the effects of mass and MAR on the SZ images, such as scatter, asymmetry, and concentration, in addition to modelling merging sub-clusters. This work demonstrates the viability of ML-based methods for producing the number of realistic, high-resolution maps of galaxy clusters necessary to achieve statistical constraints from future SZ surveys.

     
    more » « less
  3. We present a deep learning algorithm, DeepStruc, that can solve a simple nanoparticle structure directly from an experimental Pair Distribution Function (PDF) by using a conditional variational autoencoder.

     
    more » « less
  4. null (Ed.)
    Inferring molecular structure from Nuclear Magnetic Resonance (NMR) measurements requires an accurate forward model that can predict chemical shifts from 3D structure. Current forward models are limited to specific molecules like proteins and state-of-the-art models are not differentiable. Thus they cannot be used with gradient methods like biased molecular dynamics. Here we use graph neural networks (GNNs) for NMR chemical shift prediction. Our GNN can model chemical shifts accurately and capture important phenomena like hydrogen bonding induced downfield shift between multiple proteins, secondary structure effects, and predict shifts of organic molecules. Previous empirical NMR models of protein NMR have relied on careful feature engineering with domain expertise. These GNNs are trained from data alone with no feature engineering yet are as accurate and can work on arbitrary molecular structures. The models are also efficient, able to compute one million chemical shifts in about 5 seconds. This work enables a new category of NMR models that have multiple interacting types of macromolecules. 
    more » « less
  5. Abstract

    Characterization of material structure with X-ray or neutron scattering using e.g. Pair Distribution Function (PDF) analysis most often rely on refining a structure model against an experimental dataset. However, identifying a suitable model is often a bottleneck. Recently, automated approaches have made it possible to test thousands of models for each dataset, but these methods are computationally expensive and analysing the output, i.e. extracting structural information from the resulting fits in a meaningful way, is challenging. OurMachineLearning basedMotifExtractor (ML-MotEx) trains an ML algorithm on thousands of fits, and uses SHAP (SHapley Additive exPlanation) values to identify which model features are important for the fit quality. We use the method for 4 different chemical systems, including disordered nanomaterials and clusters. ML-MotEx opens for a type of modelling where each feature in a model is assigned an importance value for the fit quality based on explainable ML.

     
    more » « less