skip to main content


Title: Melting temperature prediction using a graph neural network model: From ancient minerals to new materials
The melting point is a fundamental property that is time-consuming to measure or compute, thus hindering high-throughput analyses of melting relations and phase diagrams over large sets of candidate compounds. To address this, we build a machine learning model, trained on a database of ∼10,000 compounds, that can predict the melting temperature in a fraction of a second. The model, made publicly available online, features graph neural network and residual neural network architectures. We demonstrate the model’s usefulness in diverse applications. For the purpose of materials design and discovery, we show that it can quickly discover novel multicomponent materials with high melting points. These predictions are confirmed by density functional theory calculations and experimentally validated. In an application to planetary science and geology, we employ the model to analyze the melting temperatures of ∼4,800 minerals to uncover correlations relevant to the study of mineral evolution.  more » « less
Award ID(s):
2015852 2209026 2209027
NSF-PAR ID:
10356782
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
119
Issue:
36
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Ultrahigh temperature ceramics (UHTCs) have melting points above 3000°C and outstanding strength at high temperatures, thus making them apposite structural materials for high‐temperature applications. Di‐borides, nitride, and carbide compounds—processed via various techniques—have been extensively studied and used in the manufacture of UHTCs. Current analytical models, based on our current but incomplete understanding of the theory, are unable to produce a priori predictions of mechanical properties of UHTCs based on their mixture designs and processing parameters. As a result, researchers have to rely on experiments—which are often costly and time‐consuming—to understand composition–structure–performance links in UHTCs. This study employs machine learning (ML) models (i.e., random forest and artificial neural network models) to predict Young's modulus, flexural strength, and fracture toughness of UHTCs in relation to a wide range of mixture designs, processing parameters, and testing conditions. Outcomes demonstrate that adequately trained ML models can yield reliable predictions, a priori, of the three aforesaid mechanical properties. The prediction performance on Young's modulus is superior to flexural strength and fracture toughness. Next, the ML model with the best prediction performance is utilized to evaluate and rank the impacts of input variables on Young's modulus. Finally, on the basis of such classification of consequential and inconsequential input variables, this study develops an easy‐to‐use, closed‐form analytical model to predict Young's modulus of UHTCs. Overall, this study highlights the ability of data‐driven numerical models to complement, or even replace, time‐consuming experiments, thereby accelerating the development of UHTCs.

     
    more » « less
  2. Gas-particle partitioning of secondary organic aerosols is impacted by particle phase state and viscosity, which can be inferred from the glass transition temperature ( T g ) of the constituting organic compounds. Several parametrizations were developed to predict T g of organic compounds based on molecular properties and elemental composition, but they are subject to relatively large uncertainties as they do not account for molecular structure and functionality. Here we develop a new T g prediction method powered by machine learning and “molecular embeddings”, which are unique numerical representations of chemical compounds that retain information on their structure, inter atomic connectivity and functionality. We have trained multiple state-of-the-art machine learning models on databases of experimental T g of organic compounds and their corresponding molecular embeddings. The best prediction model is the tgBoost model built with an Extreme Gradient Boosting (XGBoost) regressor trained via a nested cross-validation method, reproducing experimental data very well with a mean absolute error of 18.3 K. It can also quantify the influence of number and location of functional groups on the T g of organic molecules, while accounting for atom connectivity and predicting different T g for compositional isomers. The tgBoost model suggests the following trend for sensitivity of T g to functional group addition: –COOH (carboxylic acid) > –C(O)OR (ester) ≈ –OH (alcohol) > –C(O)R (ketone) ≈ –COR (ether) ≈ –C(O)H (aldehyde). We also developed a model to predict the melting point ( T m ) of organic compounds by training a deep neural network on a large dataset of experimental T m . The model performs reasonably well against the available dataset with a mean absolute error of 31.0 K. These new machine learning powered models can be applied to field and laboratory measurements as well as atmospheric aerosol models to predict the T g and T m of SOA compounds for evaluation of the phase state and viscosity of SOA. 
    more » « less
  3. The process of developing new compounds and materials is increasingly driven by computational modeling and simulation, which allow us to characterize candidates before pursuing them in the laboratory. One of the non-trivial properties of interest for organic materials is their packing in the bulk, which is highly dependent on their molecular structure. By controlling the latter, we can realize materials with a desired density (as well as other target properties). Molecular dynamics simulations are a popular and reasonably accurate way to compute the bulk density of molecules, however, since these calculations are computationally intensive, they are not a practically viable option for high-throughput screening studies that assess material candidates on a massive scale. In this work, we employ machine learning to develop a data-derived prediction model that is an alternative to physics-based simulations, and we utilize it for the hyperscreening of 1.5 million small organic molecules as well as to gain insights into the relationship between structural makeup and packing density. We also use this study to analyze the learning curve of the employed neural network approach and gain empirical data on the dependence of model performance and training data size, which will inform future investigations. 
    more » « less
  4. In the present paper, we introduce a new neural network-based tool for the prediction of formation energies of atomic structures based on elemental and structural features of Voronoi-tessellated materials. We provide a concise overview of the connection between the machine learning and the true material–property relationship, how to improve the generalization accuracy by reducing overfitting, how new data can be incorporated into the model to tune it to a specific material system, and preliminary results on using models to preform local structure relaxations. The present work resulted in three final models optimized for (1) highest test accuracy on the Open Quantum Materials Database (OQMD), (2) performance in the discovery of new materials, and (3) performance at a low computational cost. On a test set of 21,800 compounds randomly selected from OQMD, they achieve a mean absolute error (MAE) of 28, 40, and 42 meV/atom, respectively. The second model provides better predictions in a test case of interest not present in the OQMD, while the third reduces the computational cost by a factor of 8. We collect our results in a new open-source tool called SIPFENN (Structure-Informed Prediction of Formation Energy using Neural Networks). SIPFENN not only improves the accuracy beyond existing models but also ships in a ready-to-use form with pre-trained neural networks and a GUI interface. By virtue of this, it can be included in DFT calculations routines at nearly no cost. 
    more » « less
  5. Abstract Motivation

    Tandem mass spectrometry is an essential technology for characterizing chemical compounds at high sensitivity and throughput, and is commonly adopted in many fields. However, computational methods for automated compound identification from their MS/MS spectra are still limited, especially for novel compounds that have not been previously characterized. In recent years, in silico methods were proposed to predict the MS/MS spectra of compounds, which can then be used to expand the reference spectral libraries for compound identification. However, these methods did not consider the compounds’ 3D conformations, and thus neglected critical structural information.

    Results

    We present the 3D Molecular Network for Mass Spectra Prediction (3DMolMS), a deep neural network model to predict the MS/MS spectra of compounds from their 3D conformations. We evaluated the model on the experimental spectra collected in several spectral libraries. The results showed that 3DMolMS predicted the spectra with the average cosine similarity of 0.691 and 0.478 with the experimental MS/MS spectra acquired in positive and negative ion modes, respectively. Furthermore, 3DMolMS model can be generalized to the prediction of MS/MS spectra acquired by different labs on different instruments through minor fine-tuning on a small set of spectra. Finally, we demonstrate that the molecular representation learned by 3DMolMS from MS/MS spectra prediction can be adapted to enhance the prediction of chemical properties such as the elution time in the liquid chromatography and the collisional cross section measured by ion mobility spectrometry, both of which are often used to improve compound identification.

    Availability and implementation

    The codes of 3DMolMS are available at https://github.com/JosieHong/3DMolMS and the web service is at https://spectrumprediction.gnps2.org.

     
    more » « less