ReApprox-PIM: Reconfigurable Approximate Lookup-Table (LUT)-Based Processing-in-Memory (PIM) Machine Learning Accelerator

Bavikadi, Sathwika; Sutradhar, Purab Ranjan; Indovina, Mark A; Ganguly, Amlan; Pudukotai_Dinakarrao, Sai_Manoj

doi:10.1109/TCAD.2024.3367822

Graph application workloads are dominated by random memory accesses with poor locality. To tackle the irregular and sparse nature of computation, ReRAM-based Processing-in-Memory (PIM) architectures have been proposed recently. Most of these ReRAM architecture designs have focused on mapping graph computations into a set of multiply-and-accumulate (MAC) operations. ReRAMs also offer a key advantage in reducing memory latency between cores and memory by allowing for processing-in-memory (PIM). However, when implemented on a ReRAM-based manycore architecture, graph applications still pose two key challenges – significant storage requirements (particularly due to wasted zero cell storage), and significant amount of on-chip traffic. To tackle these two challenges, in this paper we propose the design of a 3D NoC-enabled ReRAM-based manycore architecture. Our proposed architecture incorporates a novel crossbar-aware node reordering to reduce ReRAM storage requirements. Secondly, its 3D NoC-enabled design reduces on-chip communication latency. Our architecture outperforms the state-of-the-art in ReRAM-based graph acceleration by up to 5x in performance while consuming up to 10.3x less energy for a range of graph inputs and workloads.

More Like this