PIM GPT a hybrid process in memory accelerator for autoregressive transformers

Wu, Yuting; Wang, Ziyu; Lu, Wei_D

doi:10.1038/s44335-024-00004-2

Citation Details

PIM GPT a hybrid process in memory accelerator for autoregressive transformers

Abstract Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters. more »

Award ID(s):: 1900675

PAR ID:: 10526626

Author(s) / Creator(s):: Wu, Yuting; Wang, Ziyu; Lu, Wei_D

Publisher / Repository:: Nature Publishing Group

Date Published:: 2024-07-25

Journal Name:: npj Unconventional Computing

Volume:: 1

Issue:: 1

ISSN:: 3004-8672

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1038/s44335-024-00004-2

More Like this