A Scalable and Energy-Efficient Processing-in-Memory Architecture for Gen-AI

Singh, Gian; Vrudhula, Sarma

doi:10.1109/JETCAS.2025.3566929

Citation Details

This content will become publicly available on June 1, 2026

A Scalable and Energy-Efficient Processing-in-Memory Architecture for Gen-AI

Large language models (LLMs) have achieved high accuracy in diverse NLP and computer vision tasks due to self- attention mechanisms relying on GEMM and GEMV operations. However, scaling LLMs poses significant computational and energy challenges, particularly for traditional Von-Neumann architectures (CPUs/GPUs), which incur high latency and energy consumption from frequent data movement. These issues are even more pronounced in energy-constrained edge environments. While DRAM-based near-memory architectures offer improved energy efficiency and throughput, their processing elements are limited by strict area, power, and timing constraints. This work introduces CIDAN-3D, a novel Processing-in-Memory (PIM) architecture tailored for LLMs. It features an ultra-low-power Neuron Processing Element (NPE) with high compute density (#Operations/Area), enabling ecient in-situ execution of LLM operations by leveraging high parallelism within DRAM. CIDAN- 3D reduces data movement, improves locality, and achieves substantial gains in performance and energy efficiency—showing up to 1.3X higher throughput and 21.9X better energy efficiency for smaller models, and 3X throughput and 7X energy improvement for large decoder-only models compared to prior near-memory designs. As a result, CIDAN-3D offers a scalable, energy-efficient platform for LLM-driven Gen-AI applications. more »

Award ID(s):: 2425535 2324945

PAR ID:: 10616484

Author(s) / Creator(s):: Singh, Gian; Vrudhula, Sarma

Publisher / Repository:: IEEE

Date Published:: 2025-06-01

Journal Name:: IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Volume:: 15

Issue:: 2

ISSN:: 2156-3357

Page Range / eLocation ID:: 285 to 298

Subject(s) / Keyword(s):: LLMs, transformers, in/near-memory processing, DRAM, memory wall, energy efficiency

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 1, 2026
Journal Article:
https://doi.org/10.1109/JETCAS.2025.3566929

More Like this