skip to main content


Title: Material transformers: deep learning language models for generative materials design
Abstract Pre-trained transformer language models (LMs) on large unlabeled corpus have produced state-of-the-art results in natural language processing, organic molecule design, and protein sequence generation. However, no such models have been applied to learn the composition patterns for the generative design of material compositions. Here we train a series of seven modern transformer models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) for materials design using the expanded formulas of the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or EB samples are used to benchmark the generative design performances and uncover the biases of modern transformer models for the generative design of materials compositions. Our experiments show that the materials transformers based on causal LMs can generate chemically valid material compositions with as high as 97.61% to be charge neutral and 91.22% to be electronegativity balanced, which has more than six times higher enrichment compared to the baseline pseudo-random sampling algorithm. Our LMs also demonstrate high generation novelty and their potential in new materials discovery is proved by their capability to recover the leave-out materials. We also find that the properties of the generated compositions can be tailored by training the models with selected training sets such as high-bandgap samples. Our experiments also show that different models each have their own preference in terms of the properties of the generated samples and their running time complexity varies a lot. We have applied our materials transformers to discover a set of new materials as validated using density functional theory calculations. All our trained materials transformer models and code can be accessed freely at http://www.github.com/usccolumbia/MTransformer .  more » « less
Award ID(s):
1905775 1940099 2110033
NSF-PAR ID:
10391724
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Machine Learning: Science and Technology
Volume:
4
Issue:
1
ISSN:
2632-2153
Page Range / eLocation ID:
015001
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Transformers trained on huge text corpora exhibit a remarkable set of capabilities. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. In this work, we train Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities and show that: (1) Transformers generalize to exponentially or even combinatorially many functions not seen in the training data; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions; (3) the training data has a significant impact on the model’s ability to compose functions (4) Attention layers in the latter half of the model seem critical to compositionality. 
    more » « less
  2. Two‐dimensional (2D) materials offer great potential in various fields like superconductivity, quantum systems, and topological materials. However, designing them systematically remains challenging due to the limited pool of fewer than 100 experimentally synthesized 2D materials. Recent advancements in deep learning, data mining, and density functional theory (DFT) calculations have paved the way for exploring new 2D material candidates. Herein, a generative material design pipeline known as the material transformer generator (MTG) is proposed. MTG leverages two distinct 2D material composition generators, both trained using self‐learning neural language models rooted in transformers, with and without transfer learning. These models generate numerous potential 2D compositions, which are plugged into established templates for known 2D materials to predict their crystal structures. To ensure stability, DFT computations assess their thermodynamic stability based on energy‐above‐hull and formation energy metrics. MTG has found four new DFT‐validated stable 2D materials: NiCl4, IrSBr, CuBr3, and CoBrCl, all with zero energy‐above‐hull values that indicate thermodynamic stability. Additionally, GaBrO and NbBrCl3are found with energy‐above‐hull values below 0.05 eV. CuBr3and GaBrO exhibit dynamic stability, confirmed by phonon dispersion analysis. In summary, the MTG pipeline shows significant potential for discovering new 2D and functional materials.

     
    more » « less
  3. Abstract

    Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.

     
    more » « less
  4. Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) Autoregressive Transformers can learn compositional structures from the training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) the training data has a significant impact on the model’s ability to compose unseen combinations of functions; and (4) the attention layers in the latter half of the model are critical to compositionality 
    more » « less
  5. Abstract This work presents a linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly monotonic, positive log-linear relationship between perplexity and fit to reading times for the more recently released five GPT-Neo variants and eight OPT variants on two separate datasets, replicating earlier results limited to just GPT-2 (Oh et al., 2022). Subsequently, analysis of residual errors reveals a systematic deviation of the larger variants, such as underpredicting reading times of named entities and making compensatory overpredictions for reading times of function words such as modals and conjunctions. These results suggest that the propensity of larger Transformer-based models to ‘memorize’ sequences during training makes their surprisal estimates diverge from humanlike expectations, which warrants caution in using pre-trained language models to study human language processing. 
    more » « less