skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Representations and strategies for transferable machine learning improve model performance in chemical discovery
Strategies for machine-learning (ML)-accelerated discovery that are general across material composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets such as open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (∼1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the Periodic Table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to the graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the group number alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data are limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the Periodic Table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the Periodic Table, a property we expect to be broadly useful for other material domains.  more » « less
Award ID(s):
1846426 1704266
PAR ID:
10593492
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
American Institute of Physics
Date Published:
Journal Name:
The Journal of Chemical Physics
Volume:
156
Issue:
7
ISSN:
0021-9606
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract As machine learning (ML) has matured, it has opened a new frontier in theoretical and computational chemistry by offering the promise of simultaneous paradigm shifts in accuracy and efficiency. Nowhere is this advance more needed, but also more challenging to achieve, than in the discovery of open‐shell transition metal complexes. Here, localizeddorfelectrons exhibit variable bonding that is challenging to capture even with the most computationally demanding methods. Thus, despite great promise, clear obstacles remain in constructing ML models that can supplement or even replace explicit electronic structure calculations. In this article, I outline the recent advances in building ML models in transition metal chemistry, including the ability to approach sub‐kcal/mol accuracy on a range of properties with tailored representations, to discover and enumerate complexes in large chemical spaces, and to reveal opportunities for design through analysis of feature importance. I discuss unique considerations that have been essential to enabling ML in open‐shell transition metal chemistry, including (a) the relationship of data set size/diversity, model complexity, and representation choice, (b) the importance of quantitative assessments of both theory and model domain of applicability, and (c) the need to enable autonomous generation of reliable, large data sets both for ML model training and in active learning or discovery contexts. Finally, I summarize the next steps toward making ML a mainstream tool in the accelerated discovery of transition metal complexes. This article is categorized under: Electronic Structure Theory > Density Functional Theory Software > Molecular Modeling Computer and Information Science > Chemoinformatics 
    more » « less
  2. Density functional theory (DFT) is widely used in transition-metal chemistry, yet essential properties such as spin-state energetics in transition-metal complexes (TMCs) are well known to be sensitive to the choice of the exchange-correlation functional. Increasing the amount of exchange in a functional typically shifts the preferred ground state in first-row TMCs from low-spin to high-spin by penalizing delocalization error, but the effect on properties of second-row complexes is less well known. We compare the exchange sensitivity of adiabatic spin-splitting energies in pairs of mononuclear 3d and 4d mid-row octahedral transition metal complexes. We analyze hundreds of complexes assembled from four metals in two oxidation states with ten small monodentate ligands that span a wide range of field strengths expected to favor a variety of ground states. We observe consistently lower but proportional sensitivity to exchange fraction among 4d TMCs with respect to their isovalent 3d TMC counterparts, leading to the largest difference in sensitivities for the strongest field ligands. The combined effect of reduced exchange sensitivities and the greater low-spin bias of most 4d TMCs means that while over one-third of 3d TMCs change ground states over a modest variation (ca. 0.0–0.3) in exchange fraction, almost no 4d TMCs do. Differences in delocalization, as judged through changes in the metal–ligand bond lengths of spin states, do not explain the distinct behavior of 4d TMCs. Instead, evaluation of potential energy curves in 3d and 4d TMCs reveals that higher exchange sensitivities in 3d TMCs are likely due to the opposing effect of exchange on the low-spin and high-spin states, whereas the effect on both spin states is more comparable in 4d TMCs. 
    more » « less
  3. Abstract This review encompasses guided ion beam tandem mass spectrometry studies of hydrated metal dication complexes. Metals include the Group 2 alkaline earths (Mg, Ca, Sr, and Ba), late first‐row transition metals (Mn, Fe, Co, Ni, Cu, and Zn), along with Cd. In all cases, threshold collision‐induced dissociation experiments are used to quantitatively determine the sequential hydration energies for M 2+ (H 2 O) x complexes ranging in size from one to 11 water molecules. Periodic trends in these bond dissociation energies are examined and discussed. Values are compared to other experimental results when available. In addition to dissociation by simple water ligand loss, complexes at a select size (which differs from metal to metal) are also observed to undergo charge separation to yield a hydrated metal hydroxide cation and a hydrated proton. This leads to the concept of a critical size, x crit , and the periodic trends in this value are also discussed. 
    more » « less
  4. Representation learning is popular for its power of learning latent feature vectors (i.e., embeddings) to represent data units from a complex type of data (e.g., languages, networks, behaviors). The embeddings preserve specific structure and thus improve the performance of predictive models. In this work, we develop a new representation learning method in the chemistry domain. Given a large set of compounds of inorganic crystals, the method learns the embeddings of atoms so that the predictive models can place them into the periodic table correctly. Our method preserves not only the compounds' compositions but also their structures such as crystal system, point group, and space group. Experiments demonstrate the effectiveness of the proposed method, compared to the state-of-the-art method (in PNAS 2018). One interesting result is that given 20 atoms with known positions in the periodic table, our method can achieve an accuracy of 0.70, while the baseline makes only 0.54, on filling the remaining 14 hidden atoms into the table. This shows that the atomic embeddings we generated preserve useful information and can be extended for scientific exploration. 
    more » « less
  5. Abstract Modern machine learning (ML) and deep learning (DL) techniques using high-dimensional data representations have helped accelerate the materials discovery process by efficiently detecting hidden patterns in existing datasets and linking input representations to output properties for a better understanding of the scientific phenomenon. While a deep neural network comprised of fully connected layers has been widely used for materials property prediction, simply creating a deeper model with a large number of layers often faces with vanishing gradient problem, causing a degradation in the performance, thereby limiting usage. In this paper, we study and propose architectural principles to address the question of improving the performance of model training and inference under fixed parametric constraints. Here, we present a general deep-learning framework based on branched residual learning (BRNet) with fully connected layers that can work with any numerical vector-based representation as input to build accurate models to predict materials properties. We perform model training for materials properties using numerical vectors representing different composition-based attributes of the respective materials and compare the performance of the proposed models against traditional ML and existing DL architectures. We find that the proposed models are significantly more accurate than the ML/DL models for all data sizes by using different composition-based attributes as input. Further, branched learning requires fewer parameters and results in faster model training due to better convergence during the training phase than existing neural networks, thereby efficiently building accurate models for predicting materials properties. 
    more » « less