skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 14, 2026

Title: Large property models: a new generative machine-learning formulation for molecules
We have built the first transformers trained on the property-to-molecular-graph task, which we dub “large property models”. A key ingredient is supplementing these models during training with relatively basic but abundant chemical property data.  more » « less
Award ID(s):
2045887
PAR ID:
10621793
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Royal Society of Chemistry
Date Published:
Journal Name:
Faraday Discussions
Volume:
256
ISSN:
1359-6640
Page Range / eLocation ID:
104 to 119
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In real-world materials research, machine learning (ML) models are usually expected to predict and discover novel exceptional materials that deviate from the known materials. It is thus a pressing question to provide an objective evaluation of ML model performances in property prediction of out-of-distribution (OOD) materials that are different from the training set. Traditional performance evaluation of materials property prediction models through the random splitting of the dataset frequently results in artificially high-performance assessments due to the inherent redundancy of typical material datasets. Here we present a comprehensive benchmark study of structure-based graph neural networks (GNNs) for extrapolative OOD materials property prediction. We formulate five different categories of OOD ML problems for three benchmark datasets from the MatBench study. Our extensive experiments show that current state-of-the-art GNN algorithms significantly underperform for the OOD property prediction tasks on average compared to their baselines in the MatBench study, demonstrating a crucial generalization gap in realistic material prediction tasks. We further examine the latent physical spaces of these GNN models and identify the sources of CGCNN, ALIGNN, and DeeperGATGNN’s significantly more robust OOD performance than those of the current best models in the MatBench study (coGN and coNGN) as a case study for the perovskites dataset, and provide insights to improve their performance. 
    more » « less
  2. Abstract While machine learning has emerged in recent years as a useful tool for the rapid prediction of materials properties, generating sufficient data to reliably train models without overfitting is often impractical. Towards overcoming this limitation, we present a general framework for leveraging complementary information across different models and datasets for accurate prediction of data-scarce materials properties. Our approach, based on a machine learning paradigm called mixture of experts, outperforms pairwise transfer learning on 14 of 19 materials property regression tasks, performing comparably on four of the remaining five. The approach is interpretable, model-agnostic, and scalable to combining an arbitrary number of pre-trained models and datasets to any downstream property prediction task. We anticipate the performance of our framework will further improve as better model architectures, new pre-training tasks, and larger materials datasets are developed by the community. 
    more » « less
  3. Abstract We continue the study of the Galvin property from Benhamou, Garti, and Shelah (2023,Proceedings of the American Mathematical Society151, 1301–1309) and Benhamou (2023,Saturation properties in canonical inner models, submitted). In particular, we deepen the connection between certain diamond-like principles and non-Galvin ultrafilters. We also show that any Dodd sound nonp-point ultrafilter is non-Galvin. We use these ideas to formulate what appears to be the optimal large cardinal hypothesis implying the existence of a non-Galvin ultrafilter, improving on a result from Benhamou and Dobrinen (2023,Journal of Symbolic Logic, 1–34). Finally, we use a strengthening of the Ultrapower Axiom to prove that in all the known canonical inner models, a$$\kappa $$-complete ultrafilter has the Galvin property if and only if it is an iterated sum ofp-points. 
    more » « less
  4. Abstract Modern data mining methods have demonstrated effectiveness in comprehending and predicting materials properties. An essential component in the process of materials discovery is to know which material(s) will possess desirable properties. For many materials properties, performing experiments and density functional theory computations are costly and time-consuming. Hence, it is challenging to build accurate predictive models for such properties using conventional data mining methods due to the small amount of available data. Here we present a framework for materials property prediction tasks using structure information that leverages graph neural network-based architecture along with deep-transfer-learning techniques to drastically improve the model’s predictive ability on diverse materials (3D/2D, inorganic/organic, computational/experimental) data. We evaluated the proposed framework in cross-property and cross-materials class scenarios using 115 datasets to find that transfer learning models outperform the models trained from scratch in 104 cases, i.e., ≈90%, with additional benefits in performance for extrapolation problems. We believe the proposed framework can be widely useful in accelerating materials discovery in materials science. 
    more » « less
  5. Abstract Anthropogenic climate change is projected to drive increases in climate extremes and climate-sensitive ecosystem disturbances such as wildfire with enormous economic impacts. Understanding spatial and temporal patterns of risk to property values from climate-sensitive disturbances at national and regional scales and from multiple disturbances is urgently needed to inform risk management and policy efforts. Here, we combine models for three major climate-sensitive disturbances (i.e., wildfire, climate stress-driven tree mortality, and insect-driven tree mortality), future climate projections of these disturbances, and high-resolution property values data to quantify the spatiotemporal exposure of property values to disturbance across the contiguous United States (US). We find that property values exposed to these climate-sensitive disturbances increase sharply in future climate scenarios, particularly in existing high-risk regions of the western US, and that novel exposure risks emerge in some currently lower-risk regions such as the southeast and Great Lakes regions. Climate policy that drives emissions towards low-to-moderate climate futures avoids large increases in disturbance risk exposure compared to high emissions scenarios. Our results provide an important large-scale assessment of climate-sensitive disturbance risk to property values to help inform land management and climate adaptation efforts. 
    more » « less