NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Odin: Learning to Optimize Operation Unit Configuration for Energy-efficient DNN Inferencing

https://doi.org/10.23919/DATE64628.2025.10993275

Narang, Gaurav; Doppa, Janardhan Rao; Pande, Partha Pratim (March 2025, IEEE)

Free, publicly-accessible full text available March 31, 2026
HpT: Hybrid Acceleration of Spatio-Temporal Attention Model Training on Heterogeneous Manycore Architectures

https://doi.org/10.1109/TPDS.2024.3522781

Dahal, Saiman; Dhingra, Pratyush; Thapa, Krishu Kumar; Pande, Partha Pratim; Kalyanaraman, Ananth (March 2025, IEEE Transactions on Parallel and Distributed Systems)

Free, publicly-accessible full text available March 1, 2026
HuNT: Exploiting Heterogeneous PIM Devices to Design a 3-D Manycore Architecture for DNN Training

https://doi.org/10.1109/TCAD.2024.3444708

Ogbogu, Chukwufumnanya; Narang, Gaurav; Joardar, Biresh Kumar; Doppa, Janardhan Rao; Chakrabarty, Krishnendu; Pande, Partha Pratim (November 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
Heterogeneous Manycore In-Memory Computing Architectures

https://doi.org/10.1145/3676536.3697138

Ogbogu, Chukwufumnanya; Narang, Gaurav; Joardar, Biresh Kumar; Doppa, Janardhan Rao; Pande, Partha Pratim (October 2024, ACM)

Full Text Available
Data Pruning-enabled High Performance and Reliable Graph Neural Network Training on ReRAM-based Processing-in-Memory Accelerators

https://doi.org/10.1145/3656171

Ogbogu, Chukwufumnanya; Joardar, Biresh; Chakrabarty, Krishnendu; Doppa, Jana; Pande, Partha Pratim (September 2024, ACM Transactions on Design Automation of Electronic Systems)

Graph Neural Networks (GNNs) have achieved remarkable accuracy in cognitive tasks such as predictive analytics on graph-structured data. Hence, they have become very popular in diverse real-world applications. However, GNN training with large real-world graph datasets in edge-computing scenarios is both memory- and compute-intensive. Traditional computing platforms such as CPUs and GPUs do not provide the energy efficiency and low latency required in edge intelligence applications due to their limited memory bandwidth. Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures have been proposed as suitable candidates for accelerating AI applications at the edge, including GNN training. However, ReRAM-based PIM architectures suffer from low reliability due to their limited endurance, and low performance when they are used for GNN training in real-world scenarios with large graphs. In this work, we propose a learning-for-data-pruning framework, which leverages a trained Binary Graph Classifier (BGC) to reduce the size of the input data graph by pruning subgraphs early in the training process to accelerate the GNN training process on ReRAM-based architectures. The proposed light-weight BGC model reduces the amount of redundant information in input graph(s) to speed up the overall training process, improves the reliability of the ReRAM-based PIM accelerator, and reduces the overall training cost. This enables fast, energy-efficient, and reliable GNN training on ReRAM-based architectures. Our experimental results demonstrate that using this learning for data pruning framework, we can accelerate GNN training and improve the reliability of ReRAM-based PIM architectures by up to 1.6×, and reduce the overall training cost by 100× compared to state-of-the-art data pruning techniques.
more » « less
Full Text Available
TEFLON: Thermally Efficient Dataflow-aware 3D NoC for Accelerating CNN Inferencing on Manycore PIM Architectures

https://doi.org/10.1145/3665279

Narang, Gaurav; Ogbogu, Chukwufumnanya; Doppa, Janardhan_Rao; Pande, Partha_Pratim (August 2024, ACM Transactions on Embedded Computing Systems)

Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures are used extensively to accelerate inferencing/training with convolutional neural networks (CNNs). Three-dimensional (3D) integration is an enabling technology to integrate many PIM cores on a single chip. In this work, we propose the design of athermallyefficient dataflow-aware monolithic 3D (M3D)NoC architecture referred to asTEFLONto accelerate CNN inferencing without creating any thermal bottlenecks.TEFLONreduces the Energy-Delay-Product (EDP) by 42%, 46%, and 45% on an average compared to a conventional 3D mesh NoC for systems with 36-, 64-, and 100-PIM cores, respectively.TEFLONreduces the peak chip temperature by 25Kand improves the inference accuracy by up to 11% compared to sole performance-optimized SFC-based counterpart for inferencing with diverse deep CNN models using CIFAR-10/100 datasets on a 3D system with 100-PIM cores.
more » « less
HeTraX: Energy Efficient 3D Heterogeneous Manycore Architecture for Transformer Acceleration

https://doi.org/10.1145/3665314.3670814

Dhingra, Pratyush; Doppa, Jana; Pande, Partha Pratim (August 2024, ACM)

Full Text Available
Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads

Sharma, H; Narang, G; Doppa, J; Ogras, U; Pande, P (June 2024, Proceedings)

Full Text Available
FARe: Fault-Aware GNN Training on ReRAM-Based PIM Accelerators

Dhingra, P; Ogbogu, C; Joardar, B; Doppa, J; Kalyanaraman, A; Pande, P (June 2024, IEEE)

Full Text Available
Pareto Front-Diverse Batch Multi-Objective Bayesian Optimization

https://doi.org/10.1609/aaai.v38i10.28951

Ahmadianshalchi, Alaleh; Belakaria, Syrine; Doppa, Janardhan Rao (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

We consider the problem of multi-objective optimization (MOO) of expensive black-box functions with the goal of discovering high-quality and diverse Pareto fronts where we are allowed to evaluate a batch of inputs. This problem arises in many real-world applications including penicillin production where diversity of solutions is critical. We solve this problem in the framework of Bayesian optimization (BO) and propose a novel approach referred to as Pareto front-Diverse Batch Multi-Objective BO (PDBO). PDBO tackles two important challenges: 1) How to automatically select the best acquisition function in each BO iteration, and 2) How to select a diverse batch of inputs by considering multiple objectives. We propose principled solutions to address these two challenges. First, PDBO employs a multi-armed bandit approach to select one acquisition function from a given library. We solve a cheap MOO problem by assigning the selected acquisition function for each expensive objective function to obtain a candidate set of inputs for evaluation. Second, it utilizes Determinantal Point Processes (DPPs) to choose a Pareto-front-diverse batch of inputs for evaluation from the candidate set obtained from the first step. The key parameters for the methods behind these two steps are updated after each round of function evaluations. Experiments on multiple MOO benchmarks demonstrate that PDBO outperforms prior methods in terms of both the quality and diversity of Pareto solutions.
more » « less
Full Text Available

« Prev Next »

Search for: All records