Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Gruver, N; Sriram, A; Madotto, A; Wilson, AG; Zitnick, LC; Ulissi, Z

Citation Details

We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models' ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data. more »

Award ID(s):: 2118310

PAR ID:: 10625625

Author(s) / Creator(s):: Gruver, N; Sriram, A; Madotto, A; Wilson, AG; Zitnick, LC; Ulissi, Z

Publisher / Repository:: International Conference on Learning Representations

Date Published:: 2024-05-07

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this