Abstract High‐throughput screening has become one of the major strategies for the discovery of novel functional materials. However, its effectiveness is severely limited by the lack of sufficient and diverse materials in current materials repositories such as the open quantum materials database (OQMD). Recent progress in deep learning have enabled generative strategies that learn implicit chemical rules for creating hypothetical materials with new compositions and structures. However, current materials generative models have difficulty in generating structurally diverse, chemically valid, and stable materials. Here we propose CubicGAN, a generative adversarial network (GAN) based deep neural network model for large scale generative design of novel cubic materials. When trained on 375 749 ternary materials from the OQMD database, the authors show that the model is able to not only rediscover most of the currently known cubic materials but also generate hypothetical materials of new structure prototypes. A total of 506 such materials have been verified by phonon dispersion calculation. Considering the importance of cubic materials in wide applications such as solar panels, the GAN model provides a promising approach to significantly expand existing materials repositories, enabling the discovery of new functional materials via screening. The new crystal structures discovered are freely accessible atwww.carolinamatdb.org.
more »
« less
Crystal Composition Transformer: Self‐Learning Neural Language Model for Generative and Tinkering Design of Materials
Abstract Self‐supervised neural language models have recently achieved unprecedented success from natural language processing to learning the languages of biological sequences and organic molecules. These models have demonstrated superior performance in the generation, structure classification, and functional predictions for proteins and molecules with learned representations. However, most of the masking‐based pre‐trained language models are not designed for generative design, and their black‐box nature makes it difficult to interpret their design logic. Here a Blank‐filling Language Model for Materials (BLMM) Crystal Transformer is proposed, a neural network‐based probabilistic generative model for generative and tinkering design of inorganic materials. The model is built on the blank‐filling language model for text generation and has demonstrated unique advantages in learning the “materials grammars” together with high‐quality generation, interpretability, and data efficiency. It can generate chemically valid materials compositions with as high as 89.7% charge neutrality and 84.8% balanced electronegativity, which are more than four and eight times higher compared to a pseudo‐random sampling baseline. The probabilistic generation process of BLMM allows it to recommend materials tinkering operations based on learned materials chemistry, which makes it useful for materials doping. The model is applied to discover a set of new materials as validated using the Density Functional Theory (DFT) calculations. This work thus brings the unsupervised transformer language models based generative artificial intelligence to inorganic materials. A user‐friendly web app for tinkering materials design has been developed and can be accessed freely atwww.materialsatlas.org/blmtinker.
more »
« less
- Award ID(s):
- 1832809
- PAR ID:
- 10640957
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Advanced Science
- Volume:
- 11
- Issue:
- 36
- ISSN:
- 2198-3844
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The availability and easy access of large-scale experimental and computational materials data have enabled the emergence of accelerated development of algorithms and models for materials property prediction, structure prediction, and generative design of materials. However, the lack of user-friendly materials informatics web servers has severely constrained the wide adoption of such tools in the daily practice of materials screening, tinkering, and design space exploration by materials scientists. Herein we first survey current materials informatics web apps and then propose and develop MaterialsAtlas.org, a web-based materials informatics toolbox for materials discovery, which includes a variety of routinely needed tools for exploratory materials discovery, including material’s composition and structure validity check (e.g. charge neutrality, electronegativity balance, dynamic stability, Pauling rules), materials property prediction (e.g. band gap, elastic moduli, hardness, and thermal conductivity), search for hypothetical materials, and utility tools. These user-friendly tools can be freely accessed athttp://www.materialsatlas.org. We argue that such materials informatics apps should be widely developed by the community to speed up materials discovery processes.more » « less
-
Abstract Generative deep learning methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a denoising diffusion framework. However, such methods are unable to learn important geometric properties of 3D molecules, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which notably hinders their ability to generate valid large 3D molecules. In this work, we address these gaps by introducing the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset and the larger GEOM-Drugs dataset, respectively. Importantly, we demonstrate that GCDM’s generative denoising process enables the model to generate a significant proportion of valid and energetically-stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that extensions of GCDM can not only effectively design 3D molecules for specific protein pockets but can be repurposed to consistently optimize the geometry and chemical composition of existing 3D molecules for molecular stability and property specificity, demonstrating new versatility of molecular diffusion models. Code and data are freely available onGitHub.more » « less
-
Abstract Pre-trained transformer language models (LMs) on large unlabeled corpus have produced state-of-the-art results in natural language processing, organic molecule design, and protein sequence generation. However, no such models have been applied to learn the composition patterns for the generative design of material compositions. Here we train a series of seven modern transformer models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) for materials design using the expanded formulas of the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or EB samples are used to benchmark the generative design performances and uncover the biases of modern transformer models for the generative design of materials compositions. Our experiments show that the materials transformers based on causal LMs can generate chemically valid material compositions with as high as 97.61% to be charge neutral and 91.22% to be electronegativity balanced, which has more than six times higher enrichment compared to the baseline pseudo-random sampling algorithm. Our LMs also demonstrate high generation novelty and their potential in new materials discovery is proved by their capability to recover the leave-out materials. We also find that the properties of the generated compositions can be tailored by training the models with selected training sets such as high-bandgap samples. Our experiments also show that different models each have their own preference in terms of the properties of the generated samples and their running time complexity varies a lot. We have applied our materials transformers to discover a set of new materials as validated using density functional theory calculations. All our trained materials transformer models and code can be accessed freely at http://www.github.com/usccolumbia/MTransformer .more » « less
-
Abstract Discovering new materials is a challenging task in materials science crucial to the progress of human society. Conventional approaches based on experiments and simulations are labor-intensive or costly with success heavily depending on experts’ heuristic knowledge. Here, we propose a deep learning based Physics Guided Crystal Generative Model (PGCGM) for efficient crystal material design with high structural diversity and symmetry. Our model increases the generation validity by more than 700% compared to FTCP, one of the latest structure generators and by more than 45% compared to our previous CubicGAN model. Density Functional Theory (DFT) calculations are used to validate the generated structures with 1869 materials out of 2000 are successfully optimized and deposited into the Carolina Materials Databasewww.carolinamatdb.org, of which 39.6% have negative formation energy and 5.3% have energy-above-hull less than 0.25 eV/atom, indicating their thermodynamic stability and potential synthesizability.more » « less
An official website of the United States government
