- NSF-PAR ID:
- 10391724
- Date Published:
- Journal Name:
- Machine Learning: Science and Technology
- Volume:
- 4
- Issue:
- 1
- ISSN:
- 2632-2153
- Page Range / eLocation ID:
- 015001
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Transformers trained on huge text corpora exhibit a remarkable set of capabilities. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. In this work, we train Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities and show that: (1) Transformers generalize to exponentially or even combinatorially many functions not seen in the training data; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions; (3) the training data has a significant impact on the model’s ability to compose functions (4) Attention layers in the latter half of the model seem critical to compositionality.more » « less
-
Two‐dimensional (2D) materials offer great potential in various fields like superconductivity, quantum systems, and topological materials. However, designing them systematically remains challenging due to the limited pool of fewer than 100 experimentally synthesized 2D materials. Recent advancements in deep learning, data mining, and density functional theory (DFT) calculations have paved the way for exploring new 2D material candidates. Herein, a generative material design pipeline known as the material transformer generator (MTG) is proposed. MTG leverages two distinct 2D material composition generators, both trained using self‐learning neural language models rooted in transformers, with and without transfer learning. These models generate numerous potential 2D compositions, which are plugged into established templates for known 2D materials to predict their crystal structures. To ensure stability, DFT computations assess their thermodynamic stability based on energy‐above‐hull and formation energy metrics. MTG has found four new DFT‐validated stable 2D materials: NiCl4, IrSBr, CuBr3, and CoBrCl, all with zero energy‐above‐hull values that indicate thermodynamic stability. Additionally, GaBrO and NbBrCl3are found with energy‐above‐hull values below 0.05 eV. CuBr3and GaBrO exhibit dynamic stability, confirmed by phonon dispersion analysis. In summary, the MTG pipeline shows significant potential for discovering new 2D and functional materials.
-
Abstract Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.
-
Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper “how capable can a transformer become?”. Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) Autoregressive Transformers can learn compositional structures from the training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) the training data has a significant impact on the model’s ability to compose unseen combinations of functions; and (4) the attention layers in the latter half of the model are critical to compositionalitymore » « less
-
Abstract This work presents a linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly monotonic, positive log-linear relationship between perplexity and fit to reading times for the more recently released five GPT-Neo variants and eight OPT variants on two separate datasets, replicating earlier results limited to just GPT-2 (Oh et al., 2022). Subsequently, analysis of residual errors reveals a systematic deviation of the larger variants, such as underpredicting reading times of named entities and making compensatory overpredictions for reading times of function words such as modals and conjunctions. These results suggest that the propensity of larger Transformer-based models to ‘memorize’ sequences during training makes their surprisal estimates diverge from humanlike expectations, which warrants caution in using pre-trained language models to study human language processing.more » « less