A DRAM-based Near-Memory Architecture for Accelerated and Energy-Efficient Execution of Transformers

Singh, Gian; Vrudhula, Sarma

doi:10.1145/3649476.3658732

Citation Details

A DRAM-based Near-Memory Architecture for Accelerated and Energy-Efficient Execution of Transformers

Transformers-based language models have achieved remarkable accuracy in various NLP tasks, employing self-attention mecha- nisms primarily based on matrix multiplication. However, their significant size leads to data movement issues, causing latency and energy efficiency challenges in conventional Von-Neumann systems. To mitigate these issues, several in-memory and near- memory architectures have been proposed. This paper introduces PACT-3D, a near-memory architecture featuring novel computing units integrated with DRAM banks. PACT-3D significantly reduces latency by 1.7× and improves energy efficiency by 18.7× compared to state-of-the-art near-memory architectures. more »

Award ID(s):: 2008244 2231620

PAR ID:: 10519580

Author(s) / Creator(s):: Singh, Gian; Vrudhula, Sarma

Publisher / Repository:: ACM

Date Published:: 2024-06-12

ISBN:: 9798400706059

Page Range / eLocation ID:: 57 to 62

Subject(s) / Keyword(s):: In/Near-memory Processing, LLMs, Transformers, DRAM, Memory Wall, Energy Efficiency

Format(s):: Medium: X

Location:: Clearwater FL USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3649476.3658732

More Like this