skip to main content


Title: Using a machine learning approach to determine the space group of a structure from the atomic pair distribution function
A method is presented for predicting the space group of a structure given a calculated or measured atomic pair distribution function (PDF) from that structure. The method utilizes machine learning models trained on more than 100 000 PDFs calculated from structures in the 45 most heavily represented space groups. In particular, a convolutional neural network (CNN) model is presented which yields a promising result in that it correctly identifies the space group among the top-6 estimates 91.9% of the time. The CNN model also successfully identifies space groups for 12 out of 15 experimental PDFs. Interesting aspects of the failed estimates are discussed, which indicate that the CNN is failing in similar ways as conventional indexing algorithms applied to conventional powder diffraction data. This preliminary success of the CNN model shows the possibility of model-independent assessment of PDF data on a wide class of materials.  more » « less
Award ID(s):
1740833
NSF-PAR ID:
10112800
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Acta Crystallographica Section A Foundations and Advances
Volume:
75
Issue:
4
ISSN:
2053-2733
Page Range / eLocation ID:
633 to 643
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift -- i.e., the fraction of times the true redshift falls between two limits z1 and z2 should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs. Though we focus on an example from astrophysics, our method can produce PDFs which are calibrated at all locations in feature space for any use case. 
    more » « less
  2. Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift — i.e., the fraction of times the true redshift falls between two limits z1 and z2 should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs resulting in calibrated predictive distributions. Though we focus on an example from astrophysics, our method can produce predictive distributions which are calibrated at all locations in feature space for any use case. 
    more » « less
  3. null (Ed.)
    The development of new nanomaterials for energy technologies is dependent on understanding the intricate relation between material properties and atomic structure. It is, therefore, crucial to be able to routinely characterise the atomic structure in nanomaterials, and a promising method for this task is Pair Distribution Function (PDF) analysis. The PDF can be obtained through Fourier transformation of x-ray total scattering data, and represents a histogram of all interatomic distances in the sample. Going from the distance information in the PDF to a chemical structure is an unassigned distance geometry problem (uDGP), and solving this is often the bottleneck in nanostructure analysis. In this work, we propose to use a Conditional Variational Autoencoder (CVAE) to automatically solve the uDGP to obtain valid chemical structures from PDFs. We use a simple model system of hypothetical mono-metallic nanoparticles containing up to 100 atoms in the face centered cubic (FCC) structure as a proof of concept. The model is trained to predict the assigned distance matrix (aDM) from a simulated PDF of the structure as the conditional input. We introduce a novel representation of structures by projecting them inside a unit sphere and adding additional anchor points or satellites to help in the reconstruction of the chemical structure. The performance of the CVAE model is compared to a Deterministic Autoencoder (DAE) showing that both models are able to solve the uDGP reasonably well. We further show that the CVAE learns a structured and meaningful latent embedding space which can be used to predict new chemical structures. 
    more » « less
  4. Machine learning models based on convolutional neural networks have been used for predicting space groups of crystal structures from their atomic pair distribution function (PDF). However, the PDFs used to train the model are calculated using a fixed set of parameters that reflect specific experimental conditions, and the accuracy of the model when given PDFs generated with different choices of these parameters is unknown. In this work, the results of the top-1 accuracy and top-6 accuracy are robust when applied to PDFs of different choices of experimental parameters r max , Q max , Q damp and atomic displacement parameters. 
    more » « less
  5. Abstract

    Coarse-gridded atmospheric models often account for subgrid-scale variability by specifying probability distribution functions (PDFs) of process rate inputs such as cloud and rainwater mixing ratios (qcandqr, respectively). PDF parameters can be obtained from numerous sources: in situ observations, ground- or space-based remote sensing, or fine-scale modeling such as large-eddy simulation (LES). LES is appealing to constrain PDFs because it generates large sample sizes, can simulate a variety of cloud regimes/case studies, and is not subject to the ambiguities of observations. However, despite the appeal of using model output for parameterization development, it has not been demonstrated that LES satisfactorily reproduces the observed spatial structure of microphysical fields. In this study, the structure of observed and modeled microphysical fields are compared by applying bifractal analysis, an approach that quantifies variability across spatial scales, to simulations of a drizzling stratocumulus field that span a range of domain sizes, drop concentrations (a proxy for mesoscale organization), and microphysics schemes (bulk and bin). Simulatedqcclosely matches observed estimates of bifractal parameters that measure smoothness and intermittency. There are major discrepancies between observed and simulatedqrproperties, though, with bulk simulatedqrconsistently displaying the bifractal properties of observed clouds (smooth, minimally intermittent) rather than rain while bin simulations produceqrthat is appropriately intermittent but too smooth. These results suggest fundamental limitations of bulk and bin schemes to realistically represent higher-order statistics of the observed rain structure.

     
    more » « less