Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available June 29, 2026
-
Large language models (LLMs) have achieved high accuracy in diverse NLP and computer vision tasks due to self- attention mechanisms relying on GEMM and GEMV operations. However, scaling LLMs poses significant computational and energy challenges, particularly for traditional Von-Neumann architectures (CPUs/GPUs), which incur high latency and energy consumption from frequent data movement. These issues are even more pronounced in energy-constrained edge environments. While DRAM-based near-memory architectures offer improved energy efficiency and throughput, their processing elements are limited by strict area, power, and timing constraints. This work introduces CIDAN-3D, a novel Processing-in-Memory (PIM) architecture tailored for LLMs. It features an ultra-low-power Neuron Processing Element (NPE) with high compute density (#Operations/Area), enabling ecient in-situ execution of LLM operations by leveraging high parallelism within DRAM. CIDAN- 3D reduces data movement, improves locality, and achieves substantial gains in performance and energy efficiency—showing up to 1.3X higher throughput and 21.9X better energy efficiency for smaller models, and 3X throughput and 7X energy improvement for large decoder-only models compared to prior near-memory designs. As a result, CIDAN-3D offers a scalable, energy-efficient platform for LLM-driven Gen-AI applications.more » « lessFree, publicly-accessible full text available June 1, 2026
-
Free, publicly-accessible full text available November 2, 2025
An official website of the United States government

Full Text Available