skip to main content

Title: Topological representations of crystalline compounds for the machine-learning prediction of materials properties
Abstract Accurate theoretical predictions of desired properties of materials play an important role in materials research and development. Machine learning (ML) can accelerate the materials design by building a model from input data. For complex datasets, such as those of crystalline compounds, a vital issue is how to construct low-dimensional representations for input crystal structures with chemical insights. In this work, we introduce an algebraic topology-based method, called atom-specific persistent homology (ASPH), as a unique representation of crystal structures. The ASPH can capture both pairwise and many-body interactions and reveal the topology-property relationship of a group of atoms at various scales. Combined with composition-based attributes, ASPH-based ML model provides a highly accurate prediction of the formation energy calculated by density functional theory (DFT). After training with more than 30,000 different structure types and compositions, our model achieves a mean absolute error of 61 meV/atom in cross-validation, which outperforms previous work such as Voronoi tessellations and Coulomb matrix method using the same ML algorithm and datasets. Our results indicate that the proposed topology-based method provides a powerful computational tool for predicting materials properties compared to previous works.
; ; ; ; ;
Award ID(s):
1761320 1900473
Publication Date:
Journal Name:
npj Computational Materials
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Designing a new heterostructure electrode has many challenges associated with interface engineering. Demanding simulation resources and lack of heterostructure databases continue to be a barrier to understanding the chemistry and mechanics of complex interfaces using simulations. Mixed-dimensional heterostructures composed of two-dimensional (2D) and three-dimensional (3D) materials are undisputed next-generation materials for engineered devices due to their changeable properties. The present work computationally investigates the interface between 2D graphene and 3D tin (Sn) systems with density functional theory (DFT) method. This computationally demanding simulation data is further used to develop machine learning (ML)-based potential energy surfaces (PES). The approach to developing PES for complex interface systems in the light of limited data and the transferability of such models has been discussed. To develop PES for graphene-tin interface systems, high-dimensional neural networks (HDNN) are used that rely on atom-centered symmetry function to represent structural information. HDNN are modified to train on the total energies of the interface system rather than atomic energies. The performance of modified HDNN trained on 5789 interface structures of graphene|Sn is tested on new interfaces of the same material pair with varying levels of structural deviations from the training dataset. Root-mean-squared error (RMSE) for test interfaces fallmore »in the range of 0.01–0.45 eV/atom, depending on the structural deviations from the reference training dataset. By avoiding incorrect decomposition of total energy into atomic energies, modified HDNN model is shown to obtain higher accuracy and transferability despite a limited dataset. Improved accuracy in the ML-based modeling approach promises cost-effective means of designing interfaces in heterostructure energy storage systems with higher cycle life and stability.« less
  2. Abstract

    A catalytic surface should be stable under reaction conditions to be effective. However, it takes significant effort to screen many surfaces for their stability, as this requires intensive quantum chemical calculations. To more efficiently estimate stability, we provide a general and data-efficient machine learning (ML) approach to accurately and efficiently predict the surface energies of metal alloy surfaces. Our ML approach introduces an element-centered fingerprint (ECFP) which was used as a vector representation for fitting models for predicting surface formation energies. The ECFP is significantly more accurate than several existing feature sets when applied to dilute alloy surfaces and is competitive with existing feature sets when applied to bulk alloy surfaces or gas-phase molecules. Models using the ECFP as input can be quite general, as we created models with good accuracy over a broad set of bimetallic surfaces including most d-block metals, even with relatively small datasets. For example, using the ECFP, we developed a kernel ridge regression ML model which is able to predict the surface energies of alloys of diverse metal combinations with a mean absolute error of 0.017 eV atom−1. Combining this model with an existing model for predicting adsorption energies, we estimated segregation trends ofmore »596 single-atom alloys (SAAs)with and without CO adsorbed on these surfaces. As a simple test of the approach, we identify specific cases where CO does not induce segregation in these SAAs.

    « less
  3. Abstract—Materials Genomics initiative has the goal of rapidly synthesizing materials with a given set of desired properties using data science techniques. An important step in this direction is the ability to predict the outcomes of complex chemical reactions. Some graph-based feature learning algorithms have been proposed recently. However, the comprehensive relationship between atoms or structures is not learned properly and not explainable, and multiple graphs cannot be handled. In this paper, chemical reaction processes are formulated as translation processes. Both atoms and edges are mapped to vectors represent- ing the structural information. We employ the graph convolution layers to learn meaningful information of atom graphs, and further employ its variations, message passing networks (MPNN) and edge attention graph convolution network (EAGCN) to learn edge representations. Particularly, multi-view EAGCN groups and maps edges to a set of representations for the properties of the chemical bond between atoms from multiple views. Each bond is viewed from its atom type, bond type, distance and neighbor environment. The final node and edge representations are mapped to a sequence defined by the SMILES of the molecule and then fed to a decoder model with attention. To make full usage of multi-view information, we propose multi-viewmore »attention model to handle self correlation inside each atom or edge, and mutual correlation between edges and atoms, both of which are important in chemical reaction processes. We have evaluated our method on the standard benchmark datasets (that have been used by all the prior works), and the results show that edge embedding with multi-view attention achieves superior accuracy compared to existing techniques.« less
  4. A host of important performance properties for metal–organic frameworks (MOFs) and other complex materials can be calculated by modeling statistical ensembles. The principle challenge is to develop accurate and computationally efficient interaction models for these simulations. Two major approaches are (i) ab initio molecular dynamics in which the interaction model is provided by an exchange–correlation theory ( e.g. , DFT + dispersion functional) and (ii) molecular mechanics in which the interaction model is a parameterized classical force field. The first approach requires further development to improve computational speed. The second approach requires further development to automate accurate forcefield parameterization. Because of the extreme chemical diversity across thousands of MOF structures, this problem is still mostly unsolved today. For example, here we show structures in the 2014 CoRE MOF database contain more than 8 thousand different atom types based on first and second neighbors. Our results showed that atom types based on both first and second neighbors adequately capture the chemical environment, but atom types based on only first neighbors do not. For 3056 MOFs, we used density functional theory (DFT) followed by DDEC6 atomic population analysis to extract a host of important forcefield precursors: partial atomic charges; atom-in-material (AIM) Cmore »6 , C 8 , and C 10 dispersion coefficients; AIM dipole and quadrupole moments; various AIM polarizabilities; quantum Drude oscillator parameters; AIM electron cloud parameters; etc. Electrostatic parameters were validated through comparisons to the DFT-computed electrostatic potential. These forcefield precursors should find widespread applications to developing MOF force fields.« less
  5. As machine learning becomes more widely adopted across domains, it is critical that researchers and ML engineers think about the inherent biases in the data that may be perpetuated by the model. Recently, many studies have shown that such biases are also imbibed in Graph Neural Network (GNN) models if the input graph is biased, potentially to the disadvantage of underserved and underrepresented communities. In this work, we aim to mitigate the bias learned by GNNs by jointly optimizing two different loss functions: one for the task of link prediction and one for the task of demographic parity. We further implement three different techniques inspired by graph modification approaches: the Global Fairness Optimization (GFO), Constrained Fairness Optimization (CFO), and Fair Edge Weighting (FEW) models. These techniques mimic the effects of changing underlying graph structures within the GNN and offer a greater degree of interpretability over more integrated neural network methods. Our proposed models emulate microscopic or macroscopic edits to the input graph while training GNNs and learn node embeddings that are both accurate and fair under the context of link recommendations. We demonstrate the effectiveness of our approach on four real world datasets and show that we can improve themore »recommendation fairness by several factors at negligible cost to link prediction accuracy.« less