skip to main content


Title: High‐Throughput Discovery of Novel Cubic Crystal Materials Using Deep Generative Neural Networks
Abstract

High‐throughput screening has become one of the major strategies for the discovery of novel functional materials. However, its effectiveness is severely limited by the lack of sufficient and diverse materials in current materials repositories such as the open quantum materials database (OQMD). Recent progress in deep learning have enabled generative strategies that learn implicit chemical rules for creating hypothetical materials with new compositions and structures. However, current materials generative models have difficulty in generating structurally diverse, chemically valid, and stable materials. Here we propose CubicGAN, a generative adversarial network (GAN) based deep neural network model for large scale generative design of novel cubic materials. When trained on 375 749 ternary materials from the OQMD database, the authors show that the model is able to not only rediscover most of the currently known cubic materials but also generate hypothetical materials of new structure prototypes. A total of 506 such materials have been verified by phonon dispersion calculation. Considering the importance of cubic materials in wide applications such as solar panels, the GAN model provides a promising approach to significantly expand existing materials repositories, enabling the discovery of new functional materials via screening. The new crystal structures discovered are freely accessible atwww.carolinamatdb.org.

 
more » « less
Award ID(s):
1940099 1905775
NSF-PAR ID:
10366840
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Advanced Science
Volume:
8
Issue:
20
ISSN:
2198-3844
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data driven generative deep learning models have recently emerged as one of the most promising approaches for new materials discovery. While generator models can generate millions of candidates, it is critical to train fast and accurate machine learning models to filter out stable, synthesizable materials with the desired properties. However, such efforts to build supervised regression or classification screening models have been severely hindered by the lack of unstable or unsynthesizable samples, which usually are not collected and deposited in materials databases such as ICSD and Materials Project (MP). At the same time, there is a significant amount of unlabelled data available in these databases. Here we propose a semi-supervised deep neural network (TSDNN) model for high-performance formation energy and synthesizability prediction, which is achieved via its unique teacher-student dual network architecture and its effective exploitation of the large amount of unlabeled data. For formation energy based stability screening, our semi-supervised classifier achieves an absolute 10.3% accuracy improvement compared to the baseline CGCNN regression model. For synthesizability prediction, our model significantly increases the baseline PU learning's true positive rate from 87.9% to 92.9% using 1/49 model parameters. To further prove the effectiveness of our models, we combined our TSDNN-energy and TSDNN-synthesizability models with our CubicGAN generator to discover novel stable cubic structures. Out of the 1000 recommended candidate samples by our models, 512 of them have negative formation energies as validated by our DFT formation energy calculations. Our experimental results show that our semi-supervised deep neural networks can significantly improve the screening accuracy in large-scale generative materials design. Our source code can be accessed at https://git/hub.com/usccolumbia/tsdnn. 
    more » « less
  2. Abstract

    The availability and easy access of large-scale experimental and computational materials data have enabled the emergence of accelerated development of algorithms and models for materials property prediction, structure prediction, and generative design of materials. However, the lack of user-friendly materials informatics web servers has severely constrained the wide adoption of such tools in the daily practice of materials screening, tinkering, and design space exploration by materials scientists. Herein we first survey current materials informatics web apps and then propose and develop MaterialsAtlas.org, a web-based materials informatics toolbox for materials discovery, which includes a variety of routinely needed tools for exploratory materials discovery, including material’s composition and structure validity check (e.g. charge neutrality, electronegativity balance, dynamic stability, Pauling rules), materials property prediction (e.g. band gap, elastic moduli, hardness, and thermal conductivity), search for hypothetical materials, and utility tools. These user-friendly tools can be freely accessed athttp://www.materialsatlas.org. We argue that such materials informatics apps should be widely developed by the community to speed up materials discovery processes.

     
    more » « less
  3. Abstract

    Discovering new materials is a challenging task in materials science crucial to the progress of human society. Conventional approaches based on experiments and simulations are labor-intensive or costly with success heavily depending on experts’ heuristic knowledge. Here, we propose a deep learning based Physics Guided Crystal Generative Model (PGCGM) for efficient crystal material design with high structural diversity and symmetry. Our model increases the generation validity by more than 700% compared to FTCP, one of the latest structure generators and by more than 45% compared to our previous CubicGAN model. Density Functional Theory (DFT) calculations are used to validate the generated structures with 1869 materials out of 2000 are successfully optimized and deposited into the Carolina Materials Databasewww.carolinamatdb.org, of which 39.6% have negative formation energy and 5.3% have energy-above-hull less than 0.25 eV/atom, indicating their thermodynamic stability and potential synthesizability.

     
    more » « less
  4. Properties in material composition and crystal structures have been explored by density functional theory (DFT) calculations, using databases such as the Open Quantum Materials Database (OQMD). Databases like these have been used currently for the training of advanced machine learning and deep neural network models, the latter providing higher performance when predicting properties of materials. However, current alternatives have shown a deterioration in accuracy when increasing the number of layers in their architecture (over-fitting problem). As an alternative method to address this problem, we have implemented residual neural network architectures based on Merge and Run Networks, IRNet and UNet to improve performance while relaxing the observed network depth limitation. The evaluation of the proposed architectures include a 9:1 ratio to train and test as well as 10 fold cross validation. In the experiments we found that our proposed architectures based on IRNet and UNet are able to obtain a lower Mean Absolute Error (MAE) than current strategies. The full implementation (Python, Tensorflow and Keras) and the trained networks will be available online for community validation and advancing the state of the art from our findings. 
    more » « less
  5. Abstract

    Personalized (patient-specific) approaches have recently emerged with a precision medicine paradigm that acknowledges the fact that molecular pathway structures and activity might be considerably different within and across tumors. The functional cancer genome and proteome provide rich sources of information to identify patient-specific variations in signaling pathways and activities within and across tumors; however, current analytic methods lack the ability to exploit the diverse and multi-layered architecture of these complex biological networks. We assessed pan-cancer pathway activities for >7700 patients across 32 tumor types from The Cancer Proteome Atlas by developing a personalized cancer-specific integrated network estimation (PRECISE) model. PRECISE is a general Bayesian framework for integrating existing interaction databases, data-drivende novocausal structures, and upstream molecular profiling data to estimate cancer-specific integrated networks, infer patient-specific networks and elicit interpretable pathway-level signatures. PRECISE-based pathway signatures, can delineate pan-cancer commonalities and differences in proteomic network biology within and across tumors, demonstrates robust tumor stratification that is both biologically and clinically informative and superior prognostic power compared to existing approaches. Towards establishing the translational relevance of the functional proteome in research and clinical settings, we provide an online, publicly available, comprehensive database and visualization repository of our findings (https://mjha.shinyapps.io/PRECISE/).

     
    more » « less